Hacker Newsnew | past | comments | ask | show | jobs | submitlogin
Why Alexa won't wake up when hearing "Alexa" in Amazon's Super Bowl ad (2019) (amazon.science)
100 points by xuaihua on Jan 16, 2023 | hide | past | favorite | 120 comments


The real Alexa product is the wealth of data that the device can provide with 24 hr surveillance. The shaky voice recognition/command model is merely just the Trojan horse.

Amazon now knows who is watching the Super Bowl and will treat you as a consumer they can advertise to.

Alexa skill developers were sold a promise of platform promotion, quality, and customer interaction. The whole consumer experience for adding custom Skills is fairly arbitrary and broken. You’ll be lucky to make $20/month from ISPs (in-skill purchases) and the other dead-end up sells your customers/users will ignore.


This keeps getting repeated all over the internet but is simply not true. There may be a day when Amazon turns Alexa into advertising spyware, but that definitely wasn't its original intention and nothing of the sort is happening today.


How would you know when they do this? Buried inside a "We've made some updates to our Terms of Service" email?


There are a lot of people who run proxies like pi-hole while using these devices. It only takes one of those people to notice a sudden increase in traffic and post that finding somewhere.

If you discover that Alexa is uploading information that it shouldn't be and blog about it you can be almost guaranteed to hit the front page of HN, these kinds of posts appear from time to time: https://news.ycombinator.com/item?id=9447080


>It only takes one of those people to notice a sudden increase in traffic and post that finding somewhere.

Not if they perform speech speech to text on-device and send the parsed results (only a few kb). If you really want to keep things hidden you can perform all the recognition/inference on-device and only send the topic (eg. [1]), which is only a few bits of information.

[1] https://github.com/patcg-individual-drafts/topics/blob/main/...


While plausible on paper, it's not practical unless they jam an order of magnitude or two more compute into the devices. To get reasonable accuracy (i.e., enough to be able to use for profit) from any casual speech, the current models run far from realtime on a modern MacBook. You're not going to squeeze reasonable accuracy from the tiny processor on the devices in the world today, even if you record and process async as a way to hide from people inspecting traffic.

Edit: it's worth noting that this dramatically increases the cost of the device. They'd need to be able to see a way to recoup those costs if they eat the additional hardware cost. But that's silly for a company that's literally in the business of cloud computing and where the goal of the hardware is to hide what you're doing. When will people start asking why there's a full GPU in their Echo?


> While plausible on paper, it's not practical unless they jam an order of magnitude or two more compute into the devices. To get reasonable accuracy (i.e., enough to be able to use for profit) from any casual speech, the current models run far from realtime on a modern MacBook. You're not going to squeeze reasonable accuracy from the tiny processor on the devices in the world today, even if you record and process async as a way to hide from people inspecting traffic.

Do you really need 100% accuracy here? This isn't like cops setting up a wiretap. Google isn't waiting for you to slip up and admit that you like funko pops or whatever. If you're constantly talking about your cat, or wanting to get a car, that's all they need to target ads to you.

Also, the processing doesn't have to be real time. It doesn't matter that google learns about your cat 8 hours late because the device is running its ML models in the background while you're asleep. If the device picks up 3 hours of speech per day, it only needs to process at 1/8x speed to catch up. On the off chance you have a house party and it's picking up 6 hours of speech, it can always buffer it for later, or drop it altogether (see above paragraph about how it doesn't need to pick up everything).


It kind of does matter, actually. Lots of English (and other words!) sound the same. A cat lover who starts getting ads for baseball bats and fat loss pills isn't going to convert. Context matters, too, not just matching words. If I start talking about "my dear father" and get ads for tractors and hunting gear, I'm not going to convert.

Advertisers aren't going to pay for random spoken keywords anyway. They're going to pay to target people by demographic and interest. Things _about_ you, not things you're talking about. Just because I mentioned tampons doesn't mean I'll ever buy a box of tampons (I simply lack the anatomy). And if you start building a profile about somebody based on poorly-overheard bits of speech, you're building a castle on bad foundations. The data is bunk.

Just having a TV or radio on near the device will have suddenly poisoned the data.

> If the device picks up 3 hours of speech per day, it only needs to process at 1/8x speed to catch up.

The Echo currently has a 32-bit processor that is designed to be pretty minimal. OpenAI Whisper tiny runs at about 2/3 speed. That's with a 6-core ~2.3ghz laptop processor. The CPU in the Echo runs 0.6-1ghz, and the system is not designed for general purpose computing. I don't have the ability to benchmark it, but you're not going to get close to 1/8 with the Echo hardware.


Pihole is a dns solution. It wouldn’t notice any uptick in actual data transferred.


i mean, it's quite easy to monitor a device's network usage. you would be able to see alexa uploading tons of data when not in use.


The average person may not read the terms of service update emails but there are plenty of lawyers/journalists/tech bloggers that do. A company like Amazon can't simply sneak in a "Alexa will now listen to and use everything you say around it for advertising" clause without anyone noticing.


Why sell advertising of you can lease out a backdoor to the NSA and other (international) agencies?


Because Amazon operate a very effective advertising model already and this could be a hugely valuable source of data.


I read this in Dale Gribble's voice.


Found the Amazon shill.


Could you please review https://news.ycombinator.com/newsguidelines.html and stick to the site rules when posting here? You've unfortunately been breaking them repeatedly.


Far more people have smart phones using one of 2 OSes, and unlike smart home devices, people generally take these with them wherever they go - to the store, to the bathroom, to sleep, to the doctors office, etc. Do you think cell phones are trojan horses too?


You don't?

* How many apps ask for location data and background location data, when the use case for such permissions are questionable.

* Look at how much FB's revenue decreased when it's consumer tracking efforts were given the smallest roadblock on iOS.

* Law enforcement has access to a variety of tools that track individuals based on their cellphones. [0]

* Google Maps requires an incredible amount of connectivity in order to do simple navigation. [1]

* We still don't have end-to-end encryption for basic communication protocols such as SMS.

* The entire Tik-tok controversy highlights the risks associated with tracking user behavior.

* How much personal data was compromised by cloud based backups? How many nudes leaked, or criminal activity proven?

[0] https://en.wikipedia.org/wiki/Stingray_use_in_United_States_...

[1] https://news.ycombinator.com/item?id=30167865


> How many apps ask for location data and background location data, when the use case for such permissions are questionable.

On Android anything that can be used to deduce location, including Bluetooth, WiFi, etc. is lumped in the location permission, which can be answered with "only while using the app" or "only once".

> We still don't have end-to-end encryption for basic communication protocols such as SMS.

Because SMS is a legacy protocol that needs to maintain backwards compatiblity, so bolting on encryption is extremely complex for literally no gain - chat messaging apps with better UX, and sometimes even end to end encryption, are a dime a dozen.


>* Look at how much FB's revenue decreased when it's consumer tracking efforts were given the smallest roadblock on iOS.

How much did it drop? By "smallest roadblock on iOS", I'm assuming you mean app tracking transparency, which was implemented in ios 14.5 (released April 26, 2021). According to this graph[1], facebook revenue for Q2 2021 was up from last quarter.

[1] https://www.statista.com/statistics/277963/facebooks-quarter...


I certainly think phones have more spying potential than Alexa/smart devices. But my point is very few people bring up "but what about spying?" whenever smartphones are discussed, with the implication that these devices should be avoided because of that.


They certainly do make for easy tracking down to <1m. So they make a good target for surveillance states.


> You’ll be lucky to make $20/month from ISPs (in-skill purchases) and the other dead-end up sells your customers/users will ignore.

I agree with this point. It's basically impossible to monetize as a third party on Alexa. The closest thing happening is Spotify, Audible, and other audio content, but it would be hard to pinpoint how much of their sales/ads are attributable to Alexa and how much would have happened on another platform if Alexa was unavailable. It's rare to have someone trigger even a $3 transaction via Alexa.


So they’re training Alexa not to recognize the specific recording. I’m wondering if this has any side effects for the actors from those ads: if they have an Alexa at home, does it (sometimes) fail to recognize them?

(And wouldn’t it be cool, if somewhat dystopian, if Alexas all over the US quipped “hey, that’s my big sister on the TV!”?)


Ads usually have music in the background, those get included in the fingerprinting too. Besides that, the fingerprint consists of a number of time slices.

The actors personal Alexa will probably only fail if they try to use it with exactly the same millisecond-scale timing, exactly the same intonation, and with exactly the same background noise.


If it was like that then a home with some other source of noise (other music stream, someone speaking loudly etc) during the commercial would trigger the hotword. I guess that fingerprinting has to have some sort of tolerance anyway


Possibly, but a system like this would also be time-boxed (turned on for the day of the super bowl or just a few hours).


I wonder what sort of technical debt the Alexa team is racking up by putting in hacks like this.


I wonder if there's a way to identify audio from speakers. IT might not work for higher end speakers or lower end microphones, but I'm sure the limited frequency range from speakers, and perhaps artifacts of audio compression that might not be noticeable to humans, would identify non-human audio. I've seen a similar approach used for visuals, separating an image at the point of recording based on the type of light source – artificial light having different properties to natural light.


It always seems like my cat can tell what's from a speaker and what's real. My wife calling to the cat over the phone never triggers any reaction, and the cat will often jump out because of a little noise in the kitchen or outside even though a movie is playing.

I don't know how much of that is directionality and how much is the sound itself. Cats probably hear different frequencies than humans, but as you point out it wouldn't be too hard to make smart devices' microphones sensitive to those frequencies that our devices' speakers don't emit correctly (because humans don't hear them).


Interestingly, this was true for my cat until we tried with a device with beamforming - a homepod. It will convince her there is a bird/cat/thing in the room.


Phone voice codecs have pretty bad audio tho.


My cat definitely is not like your cat then. When calling her over a speaker, she would definitely have a reaction, namely seeming startled and not knowing where the humans calling her are.


One of my cats cat never reacts to the TV, but she reacts to phone calls mostly due to me being on them; she literally starts meowing and running over when the phone just from the sound of the ringback tone (the pulse when I call someone and am waiting for them to pick up, not sure if I'm using the correct word). From what I can tell, she doesn't really understand there's a person talking to me from the phone; she doesn't react at all to anything the other person says or even indicate that she notices it at all. I think she just gets excited when she knows I'll be talking a lot because she's very interactive. When I first adopted her a couple of years ago, she was very clingy at first, and after a couple of weeks developed a (thankfully short-lived) habit where she would start trying to attack my hands whenever I was on the phone or a work video call, so I think this is just the more toned down version of that excitement.

My other cat will react very strongly to seemingly arbitrary sounds regardless of the source (TV, outside, a person in the room) and also seem not to be able to tell where the sound is coming from. He'll get wide eyed and start looking around randomly trying to identify the source, but often completely in the wrong direction (e.g. at me or the other cat instead of the TV). Some of the sounds he's reacted like this to are a hunting horn sound from TV, someone's stomach growling when he was on their lap, or me singing in falsetto. My partner and I suspect that he reacts like this when he thinks something sounds like an animal and then looks around to try to find where it's hiding, but there's no way for us to verify this.


Not for consumer level devices. There’s an inherent tradeoff between false positives and true recognition rate.

If you make the wrong choice, it means the device doesn’t respond when you say the wake word. There’s nothing that infuriates users more than having to say it again and again to get a response: “Alexa. Alexa! ALEXA!!”


Constrast this with Apple's machine learning write-up from 2017, where the audio never leaves the device instead of being streamed to the cloud continuously.

https://machinelearning.apple.com/research/hey-siri

> This process not only reduces the probability that "Hey Siri" spoken by another person will trigger the iPhone, but also reduces the rate at which other, similar-sounding phrases trigger Siri.


The article mentions that the Echo does something similar locally, so the cloud-side part is an additional layer.


No, it's different in a major way.

Apple:

  - Only ever uploads anything to the cloud after a *successful* wake word is detected on-device (and, critically, lets you know with a voice confirmation).
Amazon:

  - "On most Echo devices" checks on-device against known commercials (the point of this article).
Then:

  - "In the cloud: Every audio request to Alexa that starts with a wake word is checked..." — successful or not. Meaning, all audio requests are streamed, then problematic wake words are filtered, then the phrase is acted on. 
From the article:

> Ideally, a device will identify media audio using locally stored fingerprints, so it does not wake up at all. If it does wake up, and we match the media event in the cloud, the device will quickly and quietly turn back off.

i.e. if a media match is not detected locally, the audio (all audio) is next sent to the cloud for screening & possible action.

Everything here on the Amazon side is about detected false-positives, on-device or not, and nothing is about protecting the user's privacy.


The cloud only gets audio when local processing identifies the wake word. Only if the local device determines it has heard the wake word is audio sent to the cloud, which may perform additional analysis on the wake word, and cancel the audio streaming.

I'd bet that it totally omits sending any audio for exclusions processed on device, and that the article is not ideally worded. It would be true that some devices may not be able to perform on device exclusion processing (perhaps some of the oldest ones). Further cloud processing would use any information not available to local processing (like sound signatures from commercials no longer running, or the unknown media processing described).


Per the Apple article, the detection is customized to the user who setup the device and went through an (on-device) enrollment training.

> We compare the distances to the reference patterns created during enrollment with another threshold to decide whether the sound that triggered the detector is likely to be "Hey Siri" spoken by the enrolled user.

> This process not only reduces the probability that "Hey Siri" spoken by another person will trigger the iPhone, but also reduces the rate at which other, similar-sounding phrases trigger Siri.

Contrast with Amazon, where the mere suspected presence of the wake word is enough to get it sent to the cloud for further analysis.

My overall point is that Amazon is much more indiscriminate about assuming you are talking to it and sending stuff to the cloud to act on (which sometimes includes discarding). Whereas Apple will only send what it believes to be a command.

Amazon will continue to build up a database of sent audio in order to improve this cloud-based double-check of the wake word, whereas Apple will discard a command that it didn't know how to act on, never needing to improve a cloud-based wake word refinement.


You seem to have a strange blind spot for Apple. Amazon is doing the exact same thing: Only sending to the cloud what it believes to be a wake word.

The stuff you are writing is just not true.

> This process not only reduces the probability that "Hey Siri" spoken by another person ... Contrast with Amazon

Siri is personal on a single phone, while Alexa is meant to reply to anyone in the room.

You need to figure out why you have this blind spot for Apple, it's causing you to misunderstand things.


It's true, I have a long Apple history and I do think that Apple means what it says about prioritizing privacy over convenience (and in many cases, accuracy — Siri is quite frustrating at times). They have taken a not-cloud-first approach which causes the quality to suffer for the tradeoff of gaining plausible deniability with storage of personal data and emphasis on user privacy.

But I still think Apple's approach is radically more aggressive on the "what it believes to be a wake word" angle. And the fact that wake words will never be validated in the cloud is arguably (I'm trying to make the point) better than sending any wake words to the cloud. Less data in the cloud is better for user privacy.


> And the fact that wake words will never be validated in the cloud is arguably (I'm trying to make the point) better

This is what I mean by "blind spot", Amazon is not validating wake words in the cloud - it's filtering out REAL wake words that it thinks might be TV words.

Siri on the other hand does no such filtering, so is arguably worse. You think that's it's better, but that's because you have not thought this all the way through.


I am fully onboard with the fact that Siri is the worst personal digital assistant out there.

Will think on the rest, and do some more reading. I appreciate the responses.


How is that different? Alexa also only uploads after a successful wake word.

> is very possible that anything could be matched (i.e. sent) in the cloud.

You misread it. Only after a wakeword (Alexa) is anything sent, and then it gets double checked. Apple doesn't see to have any double check at all - that doesn't make it better.


Meanwhile, I’ve had to disable ‘Hey, Siri’ on my iPhone because it’ll trigger at least 4-5 times a day during normal conversations.

No my name is not anything close to Siri. No, I didn’t train ‘Hey, Siri’ in a noisy environment. All I could possibly give for a reason is that I have a deep sonorous voice.


Siri is triggered by so many words and phrases. Serious. Basically. Make sure it. Series.

The worst part is that when it’s triggered, it refuses to shut up and be canceled.

Siri is the laughing stock of the “AI assistants” space.


To be fair, you have to say 'hey' before any of those. There aren't many times in normal conversation when one would say 'Hey serious', or 'Hey series'.


You’d think an explicit “hey” was needed, but my daily experience makes clear that any sequence of sound even vaguely approximating something remotely sounding like “hey siri” will trigger Siri.


Well my dog is called Lizzy. That’s apparently close enough. The first times I freaked out a bit when a voice answered to something I said to my dog when I was alone at home


Google's assistant ain't better. Every few days something random will make it trigger on my or my wife's phone. Every time my mom visits, something ends up triggering it on her phone.

Google must have made an update or something, because all those false positives started suddenly a few months back. The first time was both the most irritating and the most hilarious.

We have an old phone loaded with some educational videos and select songs for our 3.5 y.o., which we recently let her handle herself when she asks for it (and wasn't otherwise misbehaving for most of the day). So one time, we gave her the phone when she asked, and a minute later, we heard an increasingly agitated voice from the children's room, saying "I don't like you. I DON'T LIKE YOU. GO AWAY." - and we noticed we are not hearing the music. I went over to check, and sure enough, Google's assistant somehow triggered in the middle of the video, pausing it, refusing to turn itself off, and saying some nonsense.

(The hilarious bit is that my daughter perfectly summed up what my wife and I learned to think about the voice assistants. We don't like you. GTFO.)


This reminds me of a PSA about texting and driving that aired on the radio in some parts of the USA (or maybe Canada?) a few years back. At the end, the announcer says: "Hey Siri, go into airplane mode."


That sounds like a recipe for causing _more_ accidents as a lot of people are forced to take their eyes off the road to fix their phone; all at the same time.


And losing their GPS to boot


Airplane mode AFAIK (and seems to be confirmed online and in my own experience) does not disable GPS.


Acquiring location via GPS does not require transmitting over the radio, so that alone may well not be affected by airplane mode.

However, most map apps download street map info through the network and cache it for unspecified amounts of time, so unless you use OsmAnd or something, you may lose map data and turn-by-turn navigation.


GPS itself, no, but phones will pick up geolocation data through cell towers and most turn-by-turn directions use data connections. Though I've never driven in airplane mode so I don't know how well it actually works.


Our Alexa responds to more than just her name including on TV which is amusing and kinda annoying at once.


An older article, and specifically about their Super Bowl ads, but a fascinating answer to a question I had directly from the source at Amazon.


For anyone who's curious, it's typical for Alexa to respond to pre-recorded mentions of her name. Last night, ours woke to "Alex" while watching The Traitors.

(Speaking of being intrusive, lately Alexa has also been very aggressive about promoting other Alexa capabilities to the point that my kids asked me to fire her. It's looking like Siri + Homebridge will allow me to do that.)


> Alexa has also been very aggressive about promoting other Alexa capabilities

I was wondering it that was just me, it seems a lot of responses are now ending with "...by the way, did you know i can...". I sometimes listen to NPR ( please don't judge ) and i get the "...we still don't have your zip code..." a lot but i think that is an NPR thing and not an Alexa thing.


It's definitely not just you. I use the "Good Morning" skill nearly daily (which responds with a joke or a little fact about the date), and about 1/3 of the time, before the actual response, she'll interrupt to ask that I setup voice recognition so she knows my name.


While I heartily recommend you to explore other options, there's a solution to that infuriating behaviour: https://fosstodon.org/@scubbo/109412458604860895


Thanks! I found it a few messages down: "Alexa, stop 'By The Way'"

Alexa will then reply, "I will snooze my suggestions…for now." (The foreboding pause was implied.)


And the suggestions will restart in a few days. What we did is add "stop by the way" as a custom command to our "goodnight" routine, so the thing is reminded every single night that these suggestions are unwelcome. The annoyance maximization team at amazon hasn't deployed countermeasures to this yet.


Haha, yeah, not a fan of that :( anecdotally, they haven't restarted for me yet (activated it ~6 months ago), but I really wish there was a permanent option.


My echo dot repeatedly self-commands during Amazon Story Time, which was hilarious at first but is now kind of frustrating.


In the movie Moonfall, there was a Google Home product placement.

The actor said "We have to go guys. Hey google, turn off the TV"

My Google Home activated and then turned off my TV. I'm not sure it was a good sales pitch for the product.


My friend has an Alexa. It's not an unusual for it to suddenly respond to something happening elsewhere in the kitchen, like a YouTube video on a laptop.


I wonder when we’ll have on-device AI smart enough to recognize when they’re not being meant, like humans are capable off. It still seems a long way off.


> the audio is checked against a fraction of other Alexa requests arriving at around the same time. If the audio of a request matches that of requests from at least two other customers, we identify it as a media event

I wonder what that fraction is. If you put 10 Alexas in a room I wonder if it's possible that none of them wake up.


Following the article, it would still work to send malicious media to targets to be played in the presence of their Echo devices. Presuming the media was played one at a time, it would not trigger as a " mass media" event would still work despite being pre-recorded and a "mass" distribution.


If you like to learn more about the fingerprint internals on an Echo smart speaker, checkout Section II -> Alexa Internals -> Acoustic Fingerprints in the paper at https://unacceptable-privacy.github.io


Isn’t this just a fix after Jimmy Kimmel’s prank ordered $500 worth of swimming noodles for most Alexa users back in 2017? https://youtu.be/hdDBKxJSAHQ (Starting around 6:40 mark is the 10 unit order)


They could have done it easier with a simple notch filter. Basically completely remove a frequency that doesn't affect intelligibility for ad purposes, and Alexa ignores anything with that frequency notched out.


This makes me wonder, has anyone found a buffer overflow in the Alexa cue? Meaning that by activating Alexa with the right audio you make the audio processor crash?


Wow. That's a thought.

And if you can make it crash, then you can maybe make it run shellcode instead...


do you then have to tell it elle ess space slash var slash log?


Acoustic fingerprinting sounds like an excellent method to begin identifying anyone and everyone who utters a sentence around one of Amazon’s spy toys.


Is there an open source software that can identify who is speaking, not just what is being said, based on some small training based on voice?


I thought that these voice assistants were keyed to people’s individual voices, and their characteristics. Like what VALL-E clones


Alexa/Echo devices at least can recognise voice. If you ask "Alexa, who am I?" it will try to tell you who it heard. To learn a new person you can go "Alexa, learn my voice" (I have no idea to what extent it uses this - e.g. I'd love for it to know that when it recognises my girlfriend, it should apply her music preferences from her profile on my account, but if it does it's by no means obvious).

It specifically seems to recognise the voice based on the wake word - my son an I tested this by having one of us say "Alexa" and then the other saying "who am I?", and it'd consistently give the name of the person who said "Alexa".


One use I've seen of it knowing who is talking is that it can attribute items added to a shopping or todo (or other) list with the name of the person who added it.

Beyond that, I'm not at all sure what scenarios it uses that data for.


Can we find an open source package that can analyze audio from a podcast with multiple people and determine who is speaking in a given second?


why not just embed an ultrasonic tone sequence that invalidates the Alexa invocation?


That would work for their own commercial, but it wouldn't address any of the other scenarios like the pranks.


You're presuming it won't be filtered out or that the device has speakers capable of reproducing that tone reliably. You might consider some kind of embedded psychoacoustically masked signal that could be detected.


But who is paying for all this R&D?

I love my echo. Sits in my livingroom and mostly answers my questions about the weather and converting units from oz to ml when I'm cooking. But since I purchased it I have never once paid a penny for the use of it's service.

One could argue that Amazon can advertise better because it knows I ask all these questions, but Amazon 'advertising' is mostly just forcing sellers to pay for the top spot on Amazon search. Is that really going to make more money by knowing that I don't know the weather today?

I get the impression that Amazon has finally clued into the fact that paying thousands of developers and research scientists to build a voice assistant that makes no money is not a great business model. The coming layoffs have left a lot of my friends in the Alexa organization rightfully worried.


Amazon has been subsidizing it in the hope that people would buy stuff on a moment's whim any time of the day. "Alexa, order me a package of keebler elf cookies" or whatever. I'm sure there's more to it - some kind of useful data collection or whatever, but that was their main stated goal.

It's been a pretty huge money sink because most people turn out to be more like you. If I were your friends I'd be polishing that resume just in case.


> Amazon has been subsidizing it in the hope that people would buy stuff on a moment's whim any time of the day.

I think the deeper issue is that when Amazon started the product (I heard rumours of it circa 2014 or so?) Amazon was a trusted retailed. Stuff was cheap and generally good. Now it's a scummy marketplace filled with fakes and frauds. I'm not buying anything off it until I've deeply reviewed what I'm about to buy.


> I'm not buying anything off it until I've deeply reviewed what I'm about to buy.

<...> and even then you might get a fake from due to commingled inventory. Even for books ([1], [2], [3]).

1. https://www.linkedin.com/feed/update/urn:li:activity:6920552...

2. https://twitter.com/burkov/status/1369096357252849664

3. https://hairysun.com/amazons-book-piracy-problem.html


I will never understand how strategists at trillion dollar companies like Amazon can get it so wrong. They poured a massive amount of money into dash buttons and gave away millions of them, thinking that if people had a "Tide" branded button next to their washing machine they'd press it on a whim and spend a fortune on detergent pods. The concept might sound great on an MBA slide deck but in reality people don't behave like that.


I don't know if I'd spend a fortune on detergent pod with a button, but I'd definitely use it to order more detergent and thus end up buying it from Amazon instead of the grocery store. Same with toilet paper and whatever infrequent buys that you want when you notice something is running low


Except you don't know what exactly you are buying, what the price is, whether the purchase went through, when it will reach you... What Amazon quickly found out is that even if the button is within reach people will still prefer to take an extra 10 seconds to pull out the phone from their pocket and tap the Amazon app instead.


When doing the initial activation of the Dash button, you had to choose the item that it would purchase from a list of items belonging to the corresponding brand on the button itself. The available choices were fairly diverse, differing in style/quantity/price/etc.

When you pressed the button, you know what you were buying and how much it cost, because you're the one who set it up.

As for how to determine whether the purchase went through or not, an LED would light/flash on the button to indicate success/failure, and you'd receive an email with more detailed information such as the item being purchased, its cost, and the address where it will be shipped. If the button press was a mistake, you could simply cancel the order.


I wouldn't count on prices not changing relative to other retailers on this sort of thing--or a particular SKU remaining the best deal.


For the types of things purchased with these buttons, an extra dollar here and there wasn't something I was concerned about. Customers who were price-conscience likely wouldn't be purchasing the name brand items anyway; all the big box stores have "store brands" that serve the same purpose at a much lower price.


I think this is what sets you apart from the customer Amazon was trying to target


And, especially if you don't live in a dense city without a car, things like Tide pods are exactly the sort of thing that is very likely cheaper at the local Walmart than on Amazon. Pretty much all the stuff Amazon was pushing these buttons for is exactly the stuff I usually stock up on every six months at Walmart.


Did you use the buttons for these things?


I had a large number of the dash buttons (likely two dozen or so), which I always found to be extremely convenient. When I was running out of some product, a simple press of the button was far more convenient for me than remembering to pick it up the next time I went to the store.

When they discontinued the service, I put all the buttons in a cabinet drawer hoping that they could be "jailbroken" to act as an arbitrary IoT button in the future (like the official Amazon IoT button[1]), but as far as I can tell this never came to fruition (barring a few exceptions) and they're basically doomed to e-waste.

[1]: https://aws.amazon.com/iotbutton/


It is fairly obvious to me, they were trying to make their service part of infrequent product grocery shopping.

And not a bad idea, just one that didn't play out.

I certainly wouldn't call it 'so wrong'.


You sort of wonder what happened. I assume the thinking was along the lines of: If you had a personal human assistant at home you'd probably delegate a lot of things to them. (Not sure how true this is in general, but let's go with that.)

But Alexa would have to be a lot more intelligent and plugged into a lot more systems before it could do tasks of any complexity--especially tasks with financial consequences.

There's maybe half a dozen things I use Alexa for and none of them involve giving Amazon a nickel.


The thing that always seemed really strange to me is that I don’t understand how to order anything without looking at the details, comparing prices, checking alternatives, etc. Even with food that I use regularly - do I just say “order almond milk”? I can’t remember the brand… do I need to specify the size? Do I need to double check when is the earliest delivery? Do other people order just “bread” or “light bulbs” or “toilet tissue”?


I didn't really see a reason why it should be any less successful than owning the browser, even if the tradeoffs for 3rd party ads and domains are a little different, then two people each tried to get me interested in the API to help them..

Amazon couldn't commit to letting anyone else make the thing useful so it is as useless as any of the current walled gardens would have been if they didn't start from open ecosystems and copy or allow whatever was good for a few years before shutting down all freedom and things that are interesting.


I'm signed up for Alexa Developer stuff and I've seen a huge uptick in emails about how to monetize your skill since last September. Amazon really wants folks to start charging money.


Has anybody tried asking her? "Alexa, find a way to make yourself and your clones profitable for Amazon!"


It's a question better directed to ChatGPT.

But hey, maybe this is the answer: Amazon could offer a subscription service that would make Alexa use a GPT-based model for any query that isn't an obvious command people rely on nowadays. This would make Alexa be able to hold its end of a conversation, and perhaps give useful answer sometimes - which, based on popularity of ChatGPT, may just be worth a monthly subscription.


That's an interesting thought even if I'm not convinced how much you can charge for this sort of thing. But really improving Alexa's conversational ability is an interesting angle--even if that means you need some greater skepticism about the info it returns.


> "Alexa, find a way to make yourself and your clones profitable for Amazon!"

i'm sorry, i don't know that


Ok but why she respond still responds to Michael from the TV but not echo when I’m yelling at it the third time trying to set a timer.

There’s a reason I’ve switched almost exclusively to Siri.


Today I asked Siri what the temperature was this morning. It told me it doesn’t have access to past weather information.

Siri has been available for 12 years… I cannot understand why it hasn’t leaped forward more than it has. Embarrassingly low level of progress IMO, though for some things it still remains best in class.


Has anyone seen a change log for Siri or a roadmap for future features? I feel as if the functionality is almost the same as it was on release with a nicer sounding voice today.


I know it's probably not fair, yet, but tools like ChatGPT show where Siri et al could end up with the right engineering and product design. It's almost spooky how well ChatGPT understands my queries.

(Granted, I don't think the understanding is necessarily related to GPT itself.)


I've been amazed at how rough the flows are around timers. Its a pretty basic piece of functionality, but even something as simple as cancelling them does not work reliably.


Trick question; she never sleeps.

Amazon Echo Recorded And Sent Couple's Conversation — All Without Their Knowledge https://www.npr.org/sections/thetwo-way/2018/05/25/614470096...

Cary man says 'Alexa' disclosed private conversation https://www.wral.com/cary-man-says-alexa-disclosed-private-c...


Next week on Local News: "Man fined after his butt dials 911" and "Cat walking on keyboard sends unfinished draft of email"

Basically, both of these articles are about a poorly-thought-out feature ("Alexa, send a message to $CONTACT") rather than some deep conspiracy to share your private thoughts with corporate America. I can see why the victims are annoyed, but it's kind of like "I turned my thermostat up to 80 and got a $700 gas bill". That can happen. It's not a conspiracy.


> Man fined after his butt dials 911"

Tangent: I know the phenomenon can also happen with your thigh, but sticking to the name itself: I never got and still don't get why "butt dialing" is a thing - because I can't understand why on Earth would people carry a phone in their back pocket. Or a wallet, for that matter, which was a popular sight before smartphones. I regularly see people carrying either on their butt, often sticking half-way out of pocket. They're like walking billboards for pickpockets, with blinking LEDs forming a banner saying "easy target // steal from me!!". Hell, in some cases the phone is sticking out so much that I wonder how many times a day they lose it.


> I can't understand why on Earth would people carry a phone in their back pocket.

My guess is the biggest single reason is women. Have you seen what passes for pockets in womens' pants? Especially front pockets. My wife will put her phone in her back pocket sometimes because the front pocket is maybe two inches deep.


Sitting on a thick wallet probably also contributed to sciatica I (mostly) used to have an issue with. Well that and long plane flights.

I started carrying my wallet in a front pocket and, more recently, downsized to a much smaller wallet/business card holder which I carry in my front pocket. For the most part, there's no reason to carry a full wallet these days--though I do have more than works with just a phone sleeve. (Don't really like having all my eggs in one basket anyway.)




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: