Smart Speakers and the definition of "Always On"

FFS people! Your Echo, Google Home and Siri are NOT ALWAYS LISTENING. Or rather, they are, just not in the way that you think they are.

I will repeat this again. If these companies were processing a constant stream server side of every noise these speakers heard the internet would be brought to a halt.

Allow me to explain what these devices do. They are technically always "on". They are hooked to a power source and they consume a low amount of current to power a microphone. The unit itself is always listening for the activation phrase; that "Alexa" or "Hey Google". When they are in this mode however, they are NOT streaming a damn thing back to anyone. The voice model they are trained to recognize... is processed locally.

Once the device is "woken" up, it listens either for a predefined length of time or until the end of a perceived command. These devices likely try and process the command purely client side. Which they undoubtedly support for their baked in commands, providing they are received clearly enough by the device to process them. This saves Google, Amazon, etc... on bandwidth. If the device cannot recognize the command, then it will pass the audio feed (in one form or another, securely, to the hosts system to process). At which point it will process the command and stream the appropriate response to the speaker.

In other words, yes, the physical device is always listening. But the company who sold you the device is only "listening" when the device is actively in use. And depending on how well designed it is, even then it may only be doing that some of the time. And that company... would much rather either avoid persisting data altogether or only hold onto it as long as necessary. For the exact same reasons you're paranoid about. The best way to not worry about what will happen to data you're holding onto belonging to a user... is to not hold on to it. And again, even if we're just talking active commands, we're still talking about thousands, if not millions of hours of audio per day.

Which brings us to security concerns. Rather than tell you why this is silly. I'm going to tell you what would be required to compromise these devices.

Firstly, a would be hacker would have to find a way to turn a device on (wake it up) without the users knowledge. Right now, all of the smart speakers I've seen have visual indicators when they are on. This is actually how a journalist discovered a design flaw in the Google Home Mini. So, basically, the hacker would need to push a firmware update to the device which disabled the LEDs when they were listening (and only them) as well as either hacking in a backdoor or finding a vulnerability to allow them to wake the device.

But then, next problem, these devices are all basically hardcoded to talk to a specific set of servers. And incidentally, these wouldn't be the ones the hackers would want the data to go to. It would be sending to the data to Google or Amazon or Apple or whoever.

Furthermore, they'd likely need to pull such a hack off quickly, if not the first time around. Otherwise, they'd run the risk of someone noticing the failed attempts against their servers.

BUT WAIT! There's more. So far, the way we've been talking, we're kind of implying remote attacks. Some dark underground or state sponsored agency. But there is another problem. Remember what I said at the beginning about how much data would potentially be processed? Well, these remote entities would most likely be unable to push these sort of hacks in a targeted fashion. At the end of the day, these devices primarily communicate using a unique device id and IP address. The hackers would need to get WAY deeper into the target servers to map device ID to particular users. If they were that deep in the system, they wouldn't need to hack your device... they could just take your data directly from those servers. And trust me... it is much more profitable to just steal your emails than go through the mess of HOPING to hear something useful over a smart speaker.

So, we have firmware pushed to EVERY smart device. Well, it won't be long until the manufacturer becomes aware... because guess what? They have these devices too and means of monitoring how they function. But, in that theoretical window of time between the attack and having it shutdown, EVERY. SINGLE. SPEAKER. IS. STREAMING. EVERY. SINGLE. SECOND. OF. EVERY. SINGLE. USER. ON. EVERY. APPLICABLE. DEVICE.

Calculated risk. And the calculated risk here is insanely low.

I fall back to my usual defense if you want to take this to a local level. If you're being targeted specifically, we're still talking a highly sophisticated skill set. And we're still talking about needing to be basically on your property to do this. And there needs to be a vulnerability to exploit to allow this. And I hate to break it to you, but there are cheaper and easier surveillance means out there if you want to accomplish this that aren't gimped by the problems these devices have.

Is it possible? Sure. Is it emote remotely plausible. NO.

Comments

Popular Posts