Articles: The Voice of the Future
“Open the pod bay doors, HAL.”
“I’m sorry, Dave. I’m afraid I can’t do that.”
While 2001 A Spacy Odyssey was set in a future that is now almost a decade-and-a-half old, 2015 may be the year that voice–user interface (VUI) finally stops being a novelty and matures into being one of the shaping technologies of our time. VUI has been one of our oldest desires when envisioning how we interact with computers… we all saw the potential as far back as the original Star Trek, and in fact most science fiction ever written dispensed with the idea of a keyboard interface almost as soon as computers were invented. Unfortunately, VUI has only offered false starts and ultimately disappointing implementations. While I have ultimately written off every VUI product as simply more examples of its failure, the competition has quietly heated up, and what we’re seeing from VUI this year is making me wonder if, in the future, 2015 is the year we will point to as to when “VUI happened.
Long before there was an iPhone or Google, the Mac OS supported voice commands. In OS 9, there were a limited number of options that could be invoked with your voice, and all future versions supported the feature. With each new version, I would attempt the feature once again, but quickly disabled it each time it after it became clear it still did not work well.
Siri launched on the iPhone 4s in 2011 and Google has offered spoken search for a few years now. Both showed potential at their launch, but have generated only light user adoption. The fact that high volume brands like Apple and Google offered any product at all compelled many to give VUI a chance. Siri fans tried a myriad of searches hoping for useful results, but network errors and misidentified commands dampened the enthusiasm for it. Most early adopters of Siri speak to her less today than they did when it was first released. Google Voice search had far fewer of the network troubles that Siri had, but offered little user incentive, only saving you a modest amount of time as opposed to typing the question. Google’s returned results were identical to the typed questions. In later versions, Google offered a handful of spoken results that really just made a nice demo, but were extremely limited.
I had high hopes for Siri and the dictation feature when it was implemented in iOS, and was happy once again to see them replace the older VUI in OS X. But just as before, I tried VUI but gradually abandoned it, closing out 2014 still fighting the autocorrect on my small iPhone keyboard.
Replacing the standard methods of computer input is difficult. I’m sure there are studies, but most users find out that if something only works well 90%-95% of the time, it should be deemed a failure. Even a 1% failure rate can prove frustrating because you might spend more time looking for the errors than creating new content. How many of us have abandoned speaking to our phones to create text because we spent more time correcting the mistakes than we saved with the talking? Casual conversations might endure some of those typos; more important conversations can’t be left to that kind of chance. And so we keep typing.
Your keyboard works, the mouse works, and now touch screens work. Those User Interface elements are pretty much 100% successful at meeting your expectations; the errors are solely yours. Until recently, VUI has failed us because it almost worked, but fell short of being reliable… enough so to be frustrating and untrustworthy.
However, an interaction I had recently with my wife gave me a glimpse into how and why VUI may finally move beyond the fringe into mainstream life. She and I were watching the TV show Empire via On Demand. This episode featured Courtney Love playing a -ahem- stretch role as washed-up performer. My wife asked, “Where does Courtney Love get her money nowadays?” I told her it was all Nirvana royalties, but my wife was sure that money was long gone. “That was in the ‘90’s after all,” she argued. Wanting to prove that I was right, I went to my iPad to search the internet. This time, instead of typing in my search, I clicked on the Google Search app and said, “OK Google, how much is Courtney Love’s net worth?” Google responded back in a natural voice, “Courtney Love is worth an estimated $150 million, thanks in part to the ownership stake in Nirvana that she inherited when husband Kurt Cobain died. After Cobain committed suicide in 1994, Love inherited his writing and publishing rights, which were valued at $130 and $115 million, respectively.” My wife responded with a simple acknowledgement that, this time anyway, she supposes I was right.
Google’s Search app provided a near-perfect response, spoken with the fluidity and authority of that college friend who never got over Kurt’s death and could speak on all things Nirvana. It was fast, accurate, and worked the first time. This showed me that Google has finally matured its Voice Search into a reliable product. It happened so gradually that I hadn’t seen it coming. I gave my iPad a kiss and knew that I had witnessed the defining shift in how we will likely treat VUI going forward.
I am not the only person in my house to come to this conclusion. My youngest daughter has found a friend in the Google Search app. She can read and spell well for her age, but typing is not coming to her naturally. Her hands are still small and the keyboard arrangement looks random to her. Instead of typing out her searches, she now searches for videos and pictures with VUI. “Show me pictures of horses!” can be heard shouted from her bedroom (I have Safe Search turned on for her account, but I still like hearing what she’s asking to see).
The conversational approach of Siri is good for making appointments, but the Google approach is best for search. Would you put up with a keyboard saying “OK, I will type that word for you now?” Both companies are differentiating their VUI with conversational or direct implementations; users can just choose which one he or she likes best.
Just as it has been with iOS vs. Android, VUI looked like it was going to be a two horse race. While Microsoft has pushed further with its VUI assistant named Cortana, Amazon surprised us with its own take on VUI with its Echo device. Its specialized tube-shaped hardware is designed for VUI and responds somewhere in between Siri and Google Voice. Echo is dedicated and sound-optimized hardware that brings the use of your voice into the room in a way that lets you speak naturally from wherever you’re seated. You only need to be in the same room as the Echo. I will be shocked if Apple or Google don’t eventually try to offer the same type of standalone voice device. Google owns Nest and could build this feature into those devices. The Nest products are already hooked up to the internet; why not settle an argument or two by answering some simple questions as well? Apple does have the Apple TV and a ton of phones everywhere but none of those would work well as dedicated VUI hardware.
It’s all about ensuring the interaction is better than hitting keys with your fingers or clicking on buttons with your remote. Perhaps it’s all going to happen one device at a time… your TV, your car, then your thermostat and refrigerator.
All we can be sure of is that many smart and powerful tech companies see voice as a battleground. That battle could lead to a wonderful era where our keyboards become dusty and our conversations with computers are frustration-free… anyway we can continue to dream, can’t we?
But we don’t have to dream to see exciting times ahead, so take good care of your lungs and throat. It may not be long before those body parts provide the main interface between you and your computer.