Why Alexa won't wake up when hearing "Alexa" in Amazon's Super Bowl ad (2019)

itqwertz · on Jan 16, 2023

The real Alexa product is the wealth of data that the device can provide with 24 hr surveillance. The shaky voice recognition/command model is merely just the Trojan horse.

Amazon now knows who is watching the Super Bowl and will treat you as a consumer they can advertise to.

Alexa skill developers were sold a promise of platform promotion, quality, and customer interaction. The whole consumer experience for adding custom Skills is fairly arbitrary and broken. You’ll be lucky to make $20/month from ISPs (in-skill purchases) and the other dead-end up sells your customers/users will ignore.

paxys · on Jan 16, 2023

This keeps getting repeated all over the internet but is simply not true. There may be a day when Amazon turns Alexa into advertising spyware, but that definitely wasn't its original intention and nothing of the sort is happening today.

candiddevmike · on Jan 16, 2023

How would you know when they do this? Buried inside a "We've made some updates to our Terms of Service" email?

brian_cloutier · on Jan 16, 2023

There are a lot of people who run proxies like pi-hole while using these devices. It only takes one of those people to notice a sudden increase in traffic and post that finding somewhere.

If you discover that Alexa is uploading information that it shouldn't be and blog about it you can be almost guaranteed to hit the front page of HN, these kinds of posts appear from time to time: https://news.ycombinator.com/item?id=9447080

gruez · on Jan 16, 2023

>It only takes one of those people to notice a sudden increase in traffic and post that finding somewhere.

Not if they perform speech speech to text on-device and send the parsed results (only a few kb). If you really want to keep things hidden you can perform all the recognition/inference on-device and only send the topic (eg. [1]), which is only a few bits of information.

[1] https://github.com/patcg-individual-drafts/topics/blob/main/...

bastawhiz · on Jan 16, 2023

While plausible on paper, it's not practical unless they jam an order of magnitude or two more compute into the devices. To get reasonable accuracy (i.e., enough to be able to use for profit) from any casual speech, the current models run far from realtime on a modern MacBook. You're not going to squeeze reasonable accuracy from the tiny processor on the devices in the world today, even if you record and process async as a way to hide from people inspecting traffic.

Edit: it's worth noting that this dramatically increases the cost of the device. They'd need to be able to see a way to recoup those costs if they eat the additional hardware cost. But that's silly for a company that's literally in the business of cloud computing and where the goal of the hardware is to hide what you're doing. When will people start asking why there's a full GPU in their Echo?

gruez · on Jan 16, 2023

> While plausible on paper, it's not practical unless they jam an order of magnitude or two more compute into the devices. To get reasonable accuracy (i.e., enough to be able to use for profit) from any casual speech, the current models run far from realtime on a modern MacBook. You're not going to squeeze reasonable accuracy from the tiny processor on the devices in the world today, even if you record and process async as a way to hide from people inspecting traffic.

Do you really need 100% accuracy here? This isn't like cops setting up a wiretap. Google isn't waiting for you to slip up and admit that you like funko pops or whatever. If you're constantly talking about your cat, or wanting to get a car, that's all they need to target ads to you.

Also, the processing doesn't have to be real time. It doesn't matter that google learns about your cat 8 hours late because the device is running its ML models in the background while you're asleep. If the device picks up 3 hours of speech per day, it only needs to process at 1/8x speed to catch up. On the off chance you have a house party and it's picking up 6 hours of speech, it can always buffer it for later, or drop it altogether (see above paragraph about how it doesn't need to pick up everything).

bastawhiz · on Jan 17, 2023

It kind of does matter, actually. Lots of English (and other words!) sound the same. A cat lover who starts getting ads for baseball bats and fat loss pills isn't going to convert. Context matters, too, not just matching words. If I start talking about "my dear father" and get ads for tractors and hunting gear, I'm not going to convert.

Advertisers aren't going to pay for random spoken keywords anyway. They're going to pay to target people by demographic and interest. Things _about_ you, not things you're talking about. Just because I mentioned tampons doesn't mean I'll ever buy a box of tampons (I simply lack the anatomy). And if you start building a profile about somebody based on poorly-overheard bits of speech, you're building a castle on bad foundations. The data is bunk.

Just having a TV or radio on near the device will have suddenly poisoned the data.

> If the device picks up 3 hours of speech per day, it only needs to process at 1/8x speed to catch up.

The Echo currently has a 32-bit processor that is designed to be pretty minimal. OpenAI Whisper tiny runs at about 2/3 speed. That's with a 6-core ~2.3ghz laptop processor. The CPU in the Echo runs 0.6-1ghz, and the system is not designed for general purpose computing. I don't have the ability to benchmark it, but you're not going to get close to 1/8 with the Echo hardware.

upon_drumhead · on Jan 16, 2023

Pihole is a dns solution. It wouldn’t notice any uptick in actual data transferred.

fernandotakai · on Jan 16, 2023

i mean, it's quite easy to monitor a device's network usage. you would be able to see alexa uploading tons of data when not in use.

paxys · on Jan 16, 2023

The average person may not read the terms of service update emails but there are plenty of lawyers/journalists/tech bloggers that do. A company like Amazon can't simply sneak in a "Alexa will now listen to and use everything you say around it for advertising" clause without anyone noticing.

bboygravity · on Jan 16, 2023

Why sell advertising of you can lease out a backdoor to the NSA and other (international) agencies?

iamacyborg · on Jan 16, 2023

Because Amazon operate a very effective advertising model already and this could be a hugely valuable source of data.

fortyseven · on Jan 16, 2023

I read this in Dale Gribble's voice.

31337Logic · on Jan 16, 2023

Found the Amazon shill.

dang · on Jan 17, 2023

Could you please review https://news.ycombinator.com/newsguidelines.html and stick to the site rules when posting here? You've unfortunately been breaking them repeatedly.

googlryas · on Jan 16, 2023

Far more people have smart phones using one of 2 OSes, and unlike smart home devices, people generally take these with them wherever they go - to the store, to the bathroom, to sleep, to the doctors office, etc. Do you think cell phones are trojan horses too?

htag · on Jan 16, 2023

You don't?

* How many apps ask for location data and background location data, when the use case for such permissions are questionable.

* Look at how much FB's revenue decreased when it's consumer tracking efforts were given the smallest roadblock on iOS.

* Law enforcement has access to a variety of tools that track individuals based on their cellphones. [0]

* Google Maps requires an incredible amount of connectivity in order to do simple navigation. [1]

* We still don't have end-to-end encryption for basic communication protocols such as SMS.

* The entire Tik-tok controversy highlights the risks associated with tracking user behavior.

* How much personal data was compromised by cloud based backups? How many nudes leaked, or criminal activity proven?

[0] https://en.wikipedia.org/wiki/Stingray_use_in_United_States_...

[1] https://news.ycombinator.com/item?id=30167865

sofixa · on Jan 17, 2023

> How many apps ask for location data and background location data, when the use case for such permissions are questionable.

On Android anything that can be used to deduce location, including Bluetooth, WiFi, etc. is lumped in the location permission, which can be answered with "only while using the app" or "only once".

> We still don't have end-to-end encryption for basic communication protocols such as SMS.

Because SMS is a legacy protocol that needs to maintain backwards compatiblity, so bolting on encryption is extremely complex for literally no gain - chat messaging apps with better UX, and sometimes even end to end encryption, are a dime a dozen.

gruez · on Jan 16, 2023

>* Look at how much FB's revenue decreased when it's consumer tracking efforts were given the smallest roadblock on iOS.

How much did it drop? By "smallest roadblock on iOS", I'm assuming you mean app tracking transparency, which was implemented in ios 14.5 (released April 26, 2021). According to this graph[1], facebook revenue for Q2 2021 was up from last quarter.

[1] https://www.statista.com/statistics/277963/facebooks-quarter...

googlryas · on Jan 16, 2023

I certainly think phones have more spying potential than Alexa/smart devices. But my point is very few people bring up "but what about spying?" whenever smartphones are discussed, with the implication that these devices should be avoided because of that.

Rimintil · on Jan 16, 2023

They certainly do make for easy tracking down to <1m. So they make a good target for surveillance states.

htag · on Jan 16, 2023

> You’ll be lucky to make $20/month from ISPs (in-skill purchases) and the other dead-end up sells your customers/users will ignore.

I agree with this point. It's basically impossible to monetize as a third party on Alexa. The closest thing happening is Spotify, Audible, and other audio content, but it would be hard to pinpoint how much of their sales/ads are attributable to Alexa and how much would have happened on another platform if Alexa was unavailable. It's rare to have someone trigger even a $3 transaction via Alexa.

Kwpolska · on Jan 16, 2023

So they’re training Alexa not to recognize the specific recording. I’m wondering if this has any side effects for the actors from those ads: if they have an Alexa at home, does it (sometimes) fail to recognize them?

(And wouldn’t it be cool, if somewhat dystopian, if Alexas all over the US quipped “hey, that’s my big sister on the TV!”?)

crote · on Jan 16, 2023

Ads usually have music in the background, those get included in the fingerprinting too. Besides that, the fingerprint consists of a number of time slices.

The actors personal Alexa will probably only fail if they try to use it with exactly the same millisecond-scale timing, exactly the same intonation, and with exactly the same background noise.

darkwater · on Jan 16, 2023

If it was like that then a home with some other source of noise (other music stream, someone speaking loudly etc) during the commercial would trigger the hotword. I guess that fingerprinting has to have some sort of tolerance anyway

paxys · on Jan 16, 2023

Possibly, but a system like this would also be time-boxed (turned on for the day of the super bowl or just a few hours).

bcrl · on Jan 17, 2023

I wonder what sort of technical debt the Alexa team is racking up by putting in hacks like this.

danpalmer · on Jan 16, 2023

I wonder if there's a way to identify audio from speakers. IT might not work for higher end speakers or lower end microphones, but I'm sure the limited frequency range from speakers, and perhaps artifacts of audio compression that might not be noticeable to humans, would identify non-human audio. I've seen a similar approach used for visuals, separating an image at the point of recording based on the type of light source – artificial light having different properties to natural light.

remram · on Jan 16, 2023

It always seems like my cat can tell what's from a speaker and what's real. My wife calling to the cat over the phone never triggers any reaction, and the cat will often jump out because of a little noise in the kitchen or outside even though a movie is playing.

I don't know how much of that is directionality and how much is the sound itself. Cats probably hear different frequencies than humans, but as you point out it wouldn't be too hard to make smart devices' microphones sensitive to those frequencies that our devices' speakers don't emit correctly (because humans don't hear them).

buildbot · on Jan 16, 2023

Interestingly, this was true for my cat until we tried with a device with beamforming - a homepod. It will convince her there is a bird/cat/thing in the room.

ilyt · on Jan 16, 2023

Phone voice codecs have pretty bad audio tho.

kccqzy · on Jan 16, 2023

My cat definitely is not like your cat then. When calling her over a speaker, she would definitely have a reaction, namely seeming startled and not knowing where the humans calling her are.

saghm · on Jan 16, 2023

One of my cats cat never reacts to the TV, but she reacts to phone calls mostly due to me being on them; she literally starts meowing and running over when the phone just from the sound of the ringback tone (the pulse when I call someone and am waiting for them to pick up, not sure if I'm using the correct word). From what I can tell, she doesn't really understand there's a person talking to me from the phone; she doesn't react at all to anything the other person says or even indicate that she notices it at all. I think she just gets excited when she knows I'll be talking a lot because she's very interactive. When I first adopted her a couple of years ago, she was very clingy at first, and after a couple of weeks developed a (thankfully short-lived) habit where she would start trying to attack my hands whenever I was on the phone or a work video call, so I think this is just the more toned down version of that excitement.

My other cat will react very strongly to seemingly arbitrary sounds regardless of the source (TV, outside, a person in the room) and also seem not to be able to tell where the sound is coming from. He'll get wide eyed and start looking around randomly trying to identify the source, but often completely in the wrong direction (e.g. at me or the other cat instead of the TV). Some of the sounds he's reacted like this to are a hunting horn sound from TV, someone's stomach growling when he was on their lap, or me singing in falsetto. My partner and I suspect that he reacts like this when he thinks something sounds like an animal and then looks around to try to find where it's hiding, but there's no way for us to verify this.

jmole · on Jan 16, 2023

Not for consumer level devices. There’s an inherent tradeoff between false positives and true recognition rate.

If you make the wrong choice, it means the device doesn’t respond when you say the wake word. There’s nothing that infuriates users more than having to say it again and again to get a response: “Alexa. Alexa! ALEXA!!”

incanus77 · on Jan 16, 2023

Constrast this with Apple's machine learning write-up from 2017, where the audio never leaves the device instead of being streamed to the cloud continuously.

https://machinelearning.apple.com/research/hey-siri

> This process not only reduces the probability that "Hey Siri" spoken by another person will trigger the iPhone, but also reduces the rate at which other, similar-sounding phrases trigger Siri.

btgeekboy · on Jan 16, 2023

The article mentions that the Echo does something similar locally, so the cloud-side part is an additional layer.

incanus77 · on Jan 16, 2023

No, it's different in a major way.

Apple:

  - Only ever uploads anything to the cloud after a *successful* wake word is detected on-device (and, critically, lets you know with a voice confirmation).

Amazon:

  - "On most Echo devices" checks on-device against known commercials (the point of this article).

Then:

  - "In the cloud: Every audio request to Alexa that starts with a wake word is checked..." — successful or not. Meaning, all audio requests are streamed, then problematic wake words are filtered, then the phrase is acted on.

From the article:

> Ideally, a device will identify media audio using locally stored fingerprints, so it does not wake up at all. If it does wake up, and we match the media event in the cloud, the device will quickly and quietly turn back off.

i.e. if a media match is not detected locally, the audio (all audio) is next sent to the cloud for screening & possible action.

Everything here on the Amazon side is about detected false-positives, on-device or not, and nothing is about protecting the user's privacy.

jsmith45 · on Jan 16, 2023

The cloud only gets audio when local processing identifies the wake word. Only if the local device determines it has heard the wake word is audio sent to the cloud, which may perform additional analysis on the wake word, and cancel the audio streaming.

I'd bet that it totally omits sending any audio for exclusions processed on device, and that the article is not ideally worded. It would be true that some devices may not be able to perform on device exclusion processing (perhaps some of the oldest ones). Further cloud processing would use any information not available to local processing (like sound signatures from commercials no longer running, or the unknown media processing described).

incanus77 · on Jan 16, 2023

Per the Apple article, the detection is customized to the user who setup the device and went through an (on-device) enrollment training.

> We compare the distances to the reference patterns created during enrollment with another threshold to decide whether the sound that triggered the detector is likely to be "Hey Siri" spoken by the enrolled user.

> This process not only reduces the probability that "Hey Siri" spoken by another person will trigger the iPhone, but also reduces the rate at which other, similar-sounding phrases trigger Siri.

Contrast with Amazon, where the mere suspected presence of the wake word is enough to get it sent to the cloud for further analysis.

My overall point is that Amazon is much more indiscriminate about assuming you are talking to it and sending stuff to the cloud to act on (which sometimes includes discarding). Whereas Apple will only send what it believes to be a command.

Amazon will continue to build up a database of sent audio in order to improve this cloud-based double-check of the wake word, whereas Apple will discard a command that it didn't know how to act on, never needing to improve a cloud-based wake word refinement.

ars · on Jan 16, 2023

You seem to have a strange blind spot for Apple. Amazon is doing the exact same thing: Only sending to the cloud what it believes to be a wake word.

The stuff you are writing is just not true.

> This process not only reduces the probability that "Hey Siri" spoken by another person ... Contrast with Amazon

Siri is personal on a single phone, while Alexa is meant to reply to anyone in the room.

You need to figure out why you have this blind spot for Apple, it's causing you to misunderstand things.

incanus77 · on Jan 16, 2023

It's true, I have a long Apple history and I do think that Apple means what it says about prioritizing privacy over convenience (and in many cases, accuracy — Siri is quite frustrating at times). They have taken a not-cloud-first approach which causes the quality to suffer for the tradeoff of gaining plausible deniability with storage of personal data and emphasis on user privacy.

But I still think Apple's approach is radically more aggressive on the "what it believes to be a wake word" angle. And the fact that wake words will never be validated in the cloud is arguably (I'm trying to make the point) better than sending any wake words to the cloud. Less data in the cloud is better for user privacy.

ars · on Jan 16, 2023

> And the fact that wake words will never be validated in the cloud is arguably (I'm trying to make the point) better

This is what I mean by "blind spot", Amazon is not validating wake words in the cloud - it's filtering out REAL wake words that it thinks might be TV words.

Siri on the other hand does no such filtering, so is arguably worse. You think that's it's better, but that's because you have not thought this all the way through.

incanus77 · on Jan 16, 2023

I am fully onboard with the fact that Siri is the worst personal digital assistant out there.

Will think on the rest, and do some more reading. I appreciate the responses.

ars · on Jan 16, 2023

How is that different? Alexa also only uploads after a successful wake word.

> is very possible that anything could be matched (i.e. sent) in the cloud.

You misread it. Only after a wakeword (Alexa) is anything sent, and then it gets double checked. Apple doesn't see to have any double check at all - that doesn't make it better.

jorvi · on Jan 16, 2023

Meanwhile, I’ve had to disable ‘Hey, Siri’ on my iPhone because it’ll trigger at least 4-5 times a day during normal conversations.

No my name is not anything close to Siri. No, I didn’t train ‘Hey, Siri’ in a noisy environment. All I could possibly give for a reason is that I have a deep sonorous voice.

leokennis · on Jan 16, 2023

Siri is triggered by so many words and phrases. Serious. Basically. Make sure it. Series.

The worst part is that when it’s triggered, it refuses to shut up and be canceled.

Siri is the laughing stock of the “AI assistants” space.

geenew · on Jan 16, 2023

To be fair, you have to say 'hey' before any of those. There aren't many times in normal conversation when one would say 'Hey serious', or 'Hey series'.

leokennis · on Jan 17, 2023

You’d think an explicit “hey” was needed, but my daily experience makes clear that any sequence of sound even vaguely approximating something remotely sounding like “hey siri” will trigger Siri.

dangrie158 · on Jan 16, 2023

Well my dog is called Lizzy. That’s apparently close enough. The first times I freaked out a bit when a voice answered to something I said to my dog when I was alone at home

TeMPOraL · on Jan 16, 2023

Google's assistant ain't better. Every few days something random will make it trigger on my or my wife's phone. Every time my mom visits, something ends up triggering it on her phone.

Google must have made an update or something, because all those false positives started suddenly a few months back. The first time was both the most irritating and the most hilarious.

We have an old phone loaded with some educational videos and select songs for our 3.5 y.o., which we recently let her handle herself when she asks for it (and wasn't otherwise misbehaving for most of the day). So one time, we gave her the phone when she asked, and a minute later, we heard an increasingly agitated voice from the children's room, saying "I don't like you. I DON'T LIKE YOU. GO AWAY." - and we noticed we are not hearing the music. I went over to check, and sure enough, Google's assistant somehow triggered in the middle of the video, pausing it, refusing to turn itself off, and saying some nonsense.

(The hilarious bit is that my daughter perfectly summed up what my wife and I learned to think about the voice assistants. We don't like you. GTFO.)

bitwize · on Jan 16, 2023

This reminds me of a PSA about texting and driving that aired on the radio in some parts of the USA (or maybe Canada?) a few years back. At the end, the announcer says: "Hey Siri, go into airplane mode."

RHSeeger · on Jan 16, 2023

That sounds like a recipe for causing _more_ accidents as a lot of people are forced to take their eyes off the road to fix their phone; all at the same time.

philwelch · on Jan 16, 2023

And losing their GPS to boot

ghaff · on Jan 16, 2023

Airplane mode AFAIK (and seems to be confirmed online and in my own experience) does not disable GPS.

bitwize · on Jan 16, 2023

Acquiring location via GPS does not require transmitting over the radio, so that alone may well not be affected by airplane mode.

However, most map apps download street map info through the network and cache it for unspecified amounts of time, so unless you use OsmAnd or something, you may lose map data and turn-by-turn navigation.

philwelch · on Jan 17, 2023

GPS itself, no, but phones will pick up geolocation data through cell towers and most turn-by-turn directions use data connections. Though I've never driven in airplane mode so I don't know how well it actually works.

jjtheblunt · on Jan 16, 2023

Our Alexa responds to more than just her name including on TV which is amusing and kinda annoying at once.

xuaihua · on Jan 16, 2023

An older article, and specifically about their Super Bowl ads, but a fascinating answer to a question I had directly from the source at Amazon.

CharlesW · on Jan 16, 2023

For anyone who's curious, it's typical for Alexa to respond to pre-recorded mentions of her name. Last night, ours woke to "Alex" while watching The Traitors.

(Speaking of being intrusive, lately Alexa has also been very aggressive about promoting other Alexa capabilities to the point that my kids asked me to fire her. It's looking like Siri + Homebridge will allow me to do that.)

chasd00 · on Jan 16, 2023

> Alexa has also been very aggressive about promoting other Alexa capabilities

I was wondering it that was just me, it seems a lot of responses are now ending with "...by the way, did you know i can...". I sometimes listen to NPR ( please don't judge ) and i get the "...we still don't have your zip code..." a lot but i think that is an NPR thing and not an Alexa thing.

delecti · on Jan 16, 2023

It's definitely not just you. I use the "Good Morning" skill nearly daily (which responds with a joke or a little fact about the date), and about 1/3 of the time, before the actual response, she'll interrupt to ask that I setup voice recognition so she knows my name.

scubbo · on Jan 17, 2023

While I heartily recommend you to explore other options, there's a solution to that infuriating behaviour: https://fosstodon.org/@scubbo/109412458604860895

CharlesW · on Jan 17, 2023

Thanks! I found it a few messages down: "Alexa, stop 'By The Way'"

Alexa will then reply, "I will snooze my suggestions…for now." (The foreboding pause was implied.)

minutillo · on Jan 17, 2023

And the suggestions will restart in a few days. What we did is add "stop by the way" as a custom command to our "goodnight" routine, so the thing is reminded every single night that these suggestions are unwelcome. The annoyance maximization team at amazon hasn't deployed countermeasures to this yet.

scubbo · on Jan 18, 2023

Haha, yeah, not a fan of that :( anecdotally, they haven't restarted for me yet (activated it ~6 months ago), but I really wish there was a permanent option.

tyoma · on Jan 16, 2023

My echo dot repeatedly self-commands during Amazon Story Time, which was hilarious at first but is now kind of frustrating.

polski-g · on Jan 16, 2023

In the movie Moonfall, there was a Google Home product placement.

The actor said "We have to go guys. Hey google, turn off the TV"

My Google Home activated and then turned off my TV. I'm not sure it was a good sales pitch for the product.

blowski · on Jan 16, 2023

My friend has an Alexa. It's not an unusual for it to suddenly respond to something happening elsewhere in the kitchen, like a YouTube video on a laptop.

layer8 · on Jan 16, 2023

I wonder when we’ll have on-device AI smart enough to recognize when they’re not being meant, like humans are capable off. It still seems a long way off.

hn92726819 · on Jan 18, 2023

> the audio is checked against a fraction of other Alexa requests arriving at around the same time. If the audio of a request matches that of requests from at least two other customers, we identify it as a media event

I wonder what that fraction is. If you put 10 Alexas in a room I wonder if it's possible that none of them wake up.

markstos · on Jan 16, 2023

Following the article, it would still work to send malicious media to targets to be played in the presence of their Echo devices. Presuming the media was played one at a time, it would not trigger as a " mass media" event would still work despite being pre-recorded and a "mass" distribution.

root5 · on Jan 17, 2023

If you like to learn more about the fingerprint internals on an Echo smart speaker, checkout Section II -> Alexa Internals -> Acoustic Fingerprints in the paper at https://unacceptable-privacy.github.io

yftsui · on Jan 16, 2023

Isn’t this just a fix after Jimmy Kimmel’s prank ordered $500 worth of swimming noodles for most Alexa users back in 2017? https://youtu.be/hdDBKxJSAHQ (Starting around 6:40 mark is the 10 unit order)

ksaj · on Jan 17, 2023

They could have done it easier with a simple notch filter. Basically completely remove a frequency that doesn't affect intelligibility for ad purposes, and Alexa ignores anything with that frequency notched out.

AstixAndBelix · on Jan 16, 2023

This makes me wonder, has anyone found a buffer overflow in the Alexa cue? Meaning that by activating Alexa with the right audio you make the audio processor crash?

AnimalMuppet · on Jan 16, 2023

Wow. That's a thought.

And if you can make it crash, then you can maybe make it run shellcode instead...

dylan604 · on Jan 16, 2023

do you then have to tell it elle ess space slash var slash log?

zahma · on Jan 17, 2023

Acoustic fingerprinting sounds like an excellent method to begin identifying anyone and everyone who utters a sentence around one of Amazon’s spy toys.

EGreg · on Jan 16, 2023

Is there an open source software that can identify who is speaking, not just what is being said, based on some small training based on voice?

EGreg · on Jan 16, 2023

I thought that these voice assistants were keyed to people’s individual voices, and their characteristics. Like what VALL-E clones

vidarh · on Jan 16, 2023

Alexa/Echo devices at least can recognise voice. If you ask "Alexa, who am I?" it will try to tell you who it heard. To learn a new person you can go "Alexa, learn my voice" (I have no idea to what extent it uses this - e.g. I'd love for it to know that when it recognises my girlfriend, it should apply her music preferences from her profile on my account, but if it does it's by no means obvious).

It specifically seems to recognise the voice based on the wake word - my son an I tested this by having one of us say "Alexa" and then the other saying "who am I?", and it'd consistently give the name of the person who said "Alexa".

jsmith45 · on Jan 16, 2023

One use I've seen of it knowing who is talking is that it can attribute items added to a shopping or todo (or other) list with the name of the person who added it.

Beyond that, I'm not at all sure what scenarios it uses that data for.

EGreg · on Jan 16, 2023

Can we find an open source package that can analyze audio from a podcast with multiple people and determine who is speaking in a given second?

nimbius · on Jan 16, 2023

why not just embed an ultrasonic tone sequence that invalidates the Alexa invocation?

freedomben · on Jan 16, 2023

That would work for their own commercial, but it wouldn't address any of the other scenarios like the pranks.

akira2501 · on Jan 16, 2023

You're presuming it won't be filtered out or that the device has speakers capable of reproducing that tone reliably. You might consider some kind of embedded psychoacoustically masked signal that could be detected.

mabbo · on Jan 16, 2023

But who is paying for all this R&D?

I love my echo. Sits in my livingroom and mostly answers my questions about the weather and converting units from oz to ml when I'm cooking. But since I purchased it I have never once paid a penny for the use of it's service.

One could argue that Amazon can advertise better because it knows I ask all these questions, but Amazon 'advertising' is mostly just forcing sellers to pay for the top spot on Amazon search. Is that really going to make more money by knowing that I don't know the weather today?

I get the impression that Amazon has finally clued into the fact that paying thousands of developers and research scientists to build a voice assistant that makes no money is not a great business model. The coming layoffs have left a lot of my friends in the Alexa organization rightfully worried.

kulahan · on Jan 16, 2023

Amazon has been subsidizing it in the hope that people would buy stuff on a moment's whim any time of the day. "Alexa, order me a package of keebler elf cookies" or whatever. I'm sure there's more to it - some kind of useful data collection or whatever, but that was their main stated goal.

It's been a pretty huge money sink because most people turn out to be more like you. If I were your friends I'd be polishing that resume just in case.

mabbo · on Jan 16, 2023

> Amazon has been subsidizing it in the hope that people would buy stuff on a moment's whim any time of the day.

I think the deeper issue is that when Amazon started the product (I heard rumours of it circa 2014 or so?) Amazon was a trusted retailed. Stuff was cheap and generally good. Now it's a scummy marketplace filled with fakes and frauds. I'm not buying anything off it until I've deeply reviewed what I'm about to buy.

krasin · on Jan 16, 2023

> I'm not buying anything off it until I've deeply reviewed what I'm about to buy.

<...> and even then you might get a fake from due to commingled inventory. Even for books ([1], [2], [3]).

1. https://www.linkedin.com/feed/update/urn:li:activity:6920552...

2. https://twitter.com/burkov/status/1369096357252849664

3. https://hairysun.com/amazons-book-piracy-problem.html

paxys · on Jan 16, 2023

I will never understand how strategists at trillion dollar companies like Amazon can get it so wrong. They poured a massive amount of money into dash buttons and gave away millions of them, thinking that if people had a "Tide" branded button next to their washing machine they'd press it on a whim and spend a fortune on detergent pods. The concept might sound great on an MBA slide deck but in reality people don't behave like that.

8note · on Jan 16, 2023

I don't know if I'd spend a fortune on detergent pod with a button, but I'd definitely use it to order more detergent and thus end up buying it from Amazon instead of the grocery store. Same with toilet paper and whatever infrequent buys that you want when you notice something is running low

paxys · on Jan 16, 2023

Except you don't know what exactly you are buying, what the price is, whether the purchase went through, when it will reach you... What Amazon quickly found out is that even if the button is within reach people will still prefer to take an extra 10 seconds to pull out the phone from their pocket and tap the Amazon app instead.

Rychard · on Jan 16, 2023

When doing the initial activation of the Dash button, you had to choose the item that it would purchase from a list of items belonging to the corresponding brand on the button itself. The available choices were fairly diverse, differing in style/quantity/price/etc.

When you pressed the button, you know what you were buying and how much it cost, because you're the one who set it up.

As for how to determine whether the purchase went through or not, an LED would light/flash on the button to indicate success/failure, and you'd receive an email with more detailed information such as the item being purchased, its cost, and the address where it will be shipped. If the button press was a mistake, you could simply cancel the order.

ghaff · on Jan 16, 2023

I wouldn't count on prices not changing relative to other retailers on this sort of thing--or a particular SKU remaining the best deal.

Rychard · on Jan 16, 2023

For the types of things purchased with these buttons, an extra dollar here and there wasn't something I was concerned about. Customers who were price-conscience likely wouldn't be purchasing the name brand items anyway; all the big box stores have "store brands" that serve the same purpose at a much lower price.

kulahan · on Jan 20, 2023

I think this is what sets you apart from the customer Amazon was trying to target

ghaff · on Jan 16, 2023

And, especially if you don't live in a dense city without a car, things like Tide pods are exactly the sort of thing that is very likely cheaper at the local Walmart than on Amazon. Pretty much all the stuff Amazon was pushing these buttons for is exactly the stuff I usually stock up on every six months at Walmart.

ohyoutravel · on Jan 16, 2023

Did you use the buttons for these things?

Rychard · on Jan 16, 2023

I had a large number of the dash buttons (likely two dozen or so), which I always found to be extremely convenient. When I was running out of some product, a simple press of the button was far more convenient for me than remembering to pick it up the next time I went to the store.

When they discontinued the service, I put all the buttons in a cabinet drawer hoping that they could be "jailbroken" to act as an arbitrary IoT button in the future (like the official Amazon IoT button[1]), but as far as I can tell this never came to fruition (barring a few exceptions) and they're basically doomed to e-waste.

[1]: https://aws.amazon.com/iotbutton/

mattmanser · on Jan 16, 2023

It is fairly obvious to me, they were trying to make their service part of infrequent product grocery shopping.

And not a bad idea, just one that didn't play out.

I certainly wouldn't call it 'so wrong'.

ghaff · on Jan 16, 2023

You sort of wonder what happened. I assume the thinking was along the lines of: If you had a personal human assistant at home you'd probably delegate a lot of things to them. (Not sure how true this is in general, but let's go with that.)

But Alexa would have to be a lot more intelligent and plugged into a lot more systems before it could do tasks of any complexity--especially tasks with financial consequences.

There's maybe half a dozen things I use Alexa for and none of them involve giving Amazon a nickel.

limitedsupply · on Jan 16, 2023

The thing that always seemed really strange to me is that I don’t understand how to order anything without looking at the details, comparing prices, checking alternatives, etc. Even with food that I use regularly - do I just say “order almond milk”? I can’t remember the brand… do I need to specify the size? Do I need to double check when is the earliest delivery? Do other people order just “bread” or “light bulbs” or “toilet tissue”?

roundandround · on Jan 16, 2023

I didn't really see a reason why it should be any less successful than owning the browser, even if the tradeoffs for 3rd party ads and domains are a little different, then two people each tried to get me interested in the API to help them..

Amazon couldn't commit to letting anyone else make the thing useful so it is as useless as any of the current walled gardens would have been if they didn't start from open ecosystems and copy or allow whatever was good for a few years before shutting down all freedom and things that are interesting.

candiddevmike · on Jan 16, 2023

I'm signed up for Alexa Developer stuff and I've seen a huge uptick in emails about how to monetize your skill since last September. Amazon really wants folks to start charging money.

mattigames · on Jan 16, 2023

Has anybody tried asking her? "Alexa, find a way to make yourself and your clones profitable for Amazon!"

TeMPOraL · on Jan 16, 2023

It's a question better directed to ChatGPT.

But hey, maybe this is the answer: Amazon could offer a subscription service that would make Alexa use a GPT-based model for any query that isn't an obvious command people rely on nowadays. This would make Alexa be able to hold its end of a conversation, and perhaps give useful answer sometimes - which, based on popularity of ChatGPT, may just be worth a monthly subscription.

ghaff · on Jan 16, 2023

That's an interesting thought even if I'm not convinced how much you can charge for this sort of thing. But really improving Alexa's conversational ability is an interesting angle--even if that means you need some greater skepticism about the info it returns.

chasd00 · on Jan 16, 2023

> "Alexa, find a way to make yourself and your clones profitable for Amazon!"

i'm sorry, i don't know that

atkailash · on Jan 16, 2023

Ok but why she respond still responds to Michael from the TV but not echo when I’m yelling at it the third time trying to set a timer.

There’s a reason I’ve switched almost exclusively to Siri.

jcutrell · on Jan 16, 2023

Today I asked Siri what the temperature was this morning. It told me it doesn’t have access to past weather information.

Siri has been available for 12 years… I cannot understand why it hasn’t leaped forward more than it has. Embarrassingly low level of progress IMO, though for some things it still remains best in class.

twobitshifter · on Jan 16, 2023

Has anyone seen a change log for Siri or a roadmap for future features? I feel as if the functionality is almost the same as it was on release with a nicer sounding voice today.

dpkirchner · on Jan 16, 2023

I know it's probably not fair, yet, but tools like ChatGPT show where Siri et al could end up with the right engineering and product design. It's almost spooky how well ChatGPT understands my queries.

(Granted, I don't think the understanding is necessarily related to GPT itself.)

jsight · on Jan 16, 2023

I've been amazed at how rough the flows are around timers. Its a pretty basic piece of functionality, but even something as simple as cancelling them does not work reliably.

miles · on Jan 16, 2023

Trick question; she never sleeps.

Amazon Echo Recorded And Sent Couple's Conversation — All Without Their Knowledge https://www.npr.org/sections/thetwo-way/2018/05/25/614470096...

Cary man says 'Alexa' disclosed private conversation https://www.wral.com/cary-man-says-alexa-disclosed-private-c...

jrockway · on Jan 16, 2023

Next week on Local News: "Man fined after his butt dials 911" and "Cat walking on keyboard sends unfinished draft of email"

Basically, both of these articles are about a poorly-thought-out feature ("Alexa, send a message to $CONTACT") rather than some deep conspiracy to share your private thoughts with corporate America. I can see why the victims are annoyed, but it's kind of like "I turned my thermostat up to 80 and got a $700 gas bill". That can happen. It's not a conspiracy.

TeMPOraL · on Jan 16, 2023

> Man fined after his butt dials 911"

Tangent: I know the phenomenon can also happen with your thigh, but sticking to the name itself: I never got and still don't get why "butt dialing" is a thing - because I can't understand why on Earth would people carry a phone in their back pocket. Or a wallet, for that matter, which was a popular sight before smartphones. I regularly see people carrying either on their butt, often sticking half-way out of pocket. They're like walking billboards for pickpockets, with blinking LEDs forming a banner saying "easy target // steal from me!!". Hell, in some cases the phone is sticking out so much that I wonder how many times a day they lose it.

rootusrootus · on Jan 16, 2023

> I can't understand why on Earth would people carry a phone in their back pocket.

My guess is the biggest single reason is women. Have you seen what passes for pockets in womens' pants? Especially front pockets. My wife will put her phone in her back pocket sometimes because the front pocket is maybe two inches deep.

ghaff · on Jan 16, 2023

Sitting on a thick wallet probably also contributed to sciatica I (mostly) used to have an issue with. Well that and long plane flights.

I started carrying my wallet in a front pocket and, more recently, downsized to a much smaller wallet/business card holder which I carry in my front pocket. For the most part, there's no reason to carry a full wallet these days--though I do have more than works with just a phone sleeve. (Don't really like having all my eggs in one basket anyway.)