The Amazon Echo is an unlikely hit. After all, the world’s largest online retailer hasn’t always won its bets on hardware. (RIP, Fire Phone.) And a gadget that relies solely on voice? Let’s just say Siri hasn’t inspired confidence.

Yet Amazon has by one estimate sold some 3 million of the squat cylinders since the Echo launched in November, 2014. The company doesn’t share sales data, but it did say Alexa, the voice-activated software that powers Echo, is active in millions of places, including smartphone apps and other Amazon gadgets. And this month the Echo surpassed more than 1,000 “skills,” or apps, after the company opened up the software developer kit last spring. This third-party enthusiasm could create a virtuous cycle where the more Echo does, the more it sells, just like the iPhone after Apple opened the App Store.

Right now, tech’s biggest companies are all working on creating products and services that, like Alexa, aren’t tethered to one single device. For Facebook, it’s Messenger. For Apple, it’s Siri. Google has Allo for messaging, and the speaker Home, which is suspiciously reminiscent of the Echo. Each company is creating its version of the post-hardware, post-app future.

But Amazon has an advantage. Not only has the Echo been on sale for 19 months, helping Amazon build on what the Echo can do and what it can understand, but the company also has the massive Amazon Web Services cloud infrastructure, which underpins everyone from Netflix to the CIA.. “The great thing about Alexa is it’s based in the cloud, so we can improve it every minute,” says Dave Limp, Amazon’s head of devices. “Alexa is constantly getting smarter.”

Trying to Understand

The Echo runs only a small amount of code on the device itself—just enough to listen for the “Alexa” that wakes it up. But even that isn’t simple—“far-field recognition” isolates the sound of your voice, a trick that depends upon machine learning and deep neural networks to distinguish that word from all the others. Once Alexa “hears” it, the Echo streams everything you say to Amazon’s cloud.

There, Alexa transforms your speech into text that the system attempts to decipher using natural language processing, an AI discipline that parses grammar and syntax. Alexa tries its best to understand, and then pings the right database to retrieve the relevant info. It will send “How tall is the Golden Gate Bridge?” to a knowledge database like Wikipedia and “Alexa, play me some Coldplay” into its vast jukebox. “Alexa, read me Moby Dick” draws on Amazon’s own trove of data on books.

Finally, Alexa sends the response to your device, where a voice refined by still more machine learning reads it back to you. Amazon wants all this to happen in two seconds or less, Limp says.

Skilled Labor

As much as Limp resists comparing the Echo to a smartphone, he struggles to explain how it works without falling back on that metaphor. It’s difficult to compare the device to any other user interface, he argues. After all, how many other mass market digital devices don’t have a screen? “An Alexa ‘skill’ is how the Echo extends itself, the way your smartphone extends itself through apps,” Limp says. But to run an app, you must download it, install it, and launch it. Skills run entirely in the cloud—and don’t take up memory on the device.

“If you really have an appetite, you could enable a thousand skills,” says Limp. Of course, without a screen, you have to remember them all.

Alexa also uses AWS Lambda, which Amazon markets as a way to run code in the cloud without individual servers. “What we externalize for other developers, we also use in our internal teams, because it works so well,” Limp says. More than 1,000 people at Amazon are now working on Alexa around the world.

The Future of Internet Services

It’s an interesting play: Make hands-free and screen-free the future of the Internet. “No one’s saying this is absolutely the way the world is going to evolve, but it feels very natural,” says Sameer Ghandi of the venture capital firm Accel. He expects tasks like finding movies, booking reservations, and retrieving basic information to catch on quickly. More complicated tasks like shopping (say, for clothes, not consumables) will take longer.

Of course, app-less services will be limited by what the AI behind them can understand. Whether queries come by text, as in Messenger, or voice, as with Siri, doesn’t matter as much as the process of trying to understand that request. “Certain questions involve much deeper levels of inference than simple ones, like finding information or looking things up,” says Noah Smith, a University of Washington computer scientist working on natural language processing. Asking Alexa, “How long would it take to fly to Mumbai?” is one thing,” Smith says. “It’s something else entirely to ask, “I don’t have a car right now, but I’m in a town where I can rent one—is that a better option than trying to take public transportation?”

Amazon may have another advantage there. The company has been collecting data from users since the Echo arrived in late 2014. “The more its users are providing Amazon with interactive experiences and data, and the more cleverly and quickly the company turns that data into machine learning algorithms, the better off Amazon is going to be,” Smith says. Neural nets, after all, get better with more data. It’s easy to imagine Amazon deploying Alexa in areas that would allow it to gather even more data to further tune the system, like a customer service chatbot.

Limp wouldn’t speculate on Amazon’s future projects. But judging from tens of thousands of four and five-star ratings, customers seem to think the Echo works pretty well. The next phase of the screenless future will depend on whether Amazon’s competitors can—yes, I’m going to say it—echo what the company has already accomplished.


The Amazon Echo Is Winning the Race to a Screenless Future