
哈囉電腦?與AI對話真的是人機互動的未來嗎?
本文質疑純語音AI硬體裝置的可行性與吸引力,儘管OpenAI已投入巨資並收購Jony Ive的設計公司。文章探討了語音控制是否僅限於特定情境,而非一種全面的互動方式。
Sign up
Sign in
Sign up
Sign in

Bootcamp

From idea to product, one lesson at a time. To submit your story: https://tinyurl.com/bootspub1
Hello Computer? Is Talking To AI Really The Future Of Human-Machine Interfacing?
The idea of talking with your computer has been a dream for decades. But is it really that convenient?

--
Listen
Share
As you might know, OpenAI has been teasing an upcoming “hardware device” for a while. To show how serious they are about this mysterious new device, OpenAI acquired IO this year, the design firm of former Apple industrial design star Jony Ive, for around $6.5 billion.
Now, a few weeks ago, OpenAI CEO Sam Altman and Jony Ive, now back in the spotlight, appeared on stage at Emerson Collective’s Demo Day for an interview with, appropriately, Laurene Powell Jobs. All we know is that it is a small, rectangular, wearable device without a screen that could arrive in “less than” two years.
That made me think: A (universal) device purely controlled by voice? Despite all, this is still science fiction. But is it really something we would want? And if it is not even desirable, why would OpenAI want to do it?
The part technology struggled with for decades, translating voice into a signal the computer understands, it mostly solved. The question however is not if voice-control has its place in specific circumstances, the question is if voice-control-only is a good idea.
By “understanding” we need to look beyond language recognition, but the actual meaning of a command or question and its context. If you look at the interaction with chatbots, they are mostly playing safe by responding with multiple or long-winded answers that attempt to cover a user’s question. This, however, would be a user’s nightmare if the response is given in spoken language.
Not another AI pin, please.
Now, we already had such a device: The Humane AI Pin, a device clipped to your clothes: It had a camera, too. You could ask it questions, about your the environment, generally things it could look up, and a limited number of agentic tasks.
The device was generally despised as unfinished and not worth bothering. The company was sold for scraps to HP, likely for patent reasons, or HP maybe wants us to shout at its printers, which we all regularly do anyway, so they might as well listen.
The dream of the voice-only interface
The idea of interfacing with an all-knowing computer using only your voice has long been a staple of science fiction and has often been used to show true AI capabilities.
It is, if you like, the ultimate Turing test, a machine you interact with like you would do with another human. It “understands” you. But I will come back to the question of “understanding”.
From HAL in 2001 — A Space Odyssey to Star Trek, people in the future appear to casually use language to give computers instructions or receive information. Google even named it’s virtual assistant briefly “Majel”, after Majel Barrett-Roddenberry, the iconic voice of the Star Trek ship’s computer, though this became just “Google Assistant” later.
Accuracy, context and response
Talking about Star Trek, there is a scene from the movie “First Contact”, in which the interaction with the computer plays out roughly like this:
Captain Picard tells the computer (in a hurry) that they need “21st century clothing”. Of course, the computer perfectly understands what he means. Imagine telling the computer you want 20th century clothing. What exactly? 1904? With top hat and a frock coat? Or 1990 skater clothing? Baggy pants? Are you going on a mountain trip or a stroll along the beach?
While Scotty’s interaction with a Macintosh Computer in “The Voyage Home” is the stuff of legends, it is worth noting that whenever it matters, Star Trek personal reverts to manual controls: Large control panels, elaborate keyboard interface such as the famous LCARS user interface.
Which brings me to the main flaw of voice control: accuracy and context.
When it comes to accuracy, some years ago I attended an event organized by IBM in a hotel in London (IBM’s events always had great catering). The topic was a preview of “voice controlled virtual containers”. The presenter showed us how he issued commands to the servers like “add container A” and “clone container C to D”, and so on. It worked. Well, it worked in 75% without repeating.
The audience was baffled: Sure, it works, but why do it that way? It all was fun to watch until someone from the audience shouted: “Shut down all containers”. That worked, too.
Of course, we do have voice controlled devices. We have Siri on the iPhone, and we have Alexa. But for good reason, people use this functionality for very specific and limited tasks.
Show, don’t tell
Siri and Google Assistant on a phone however have an advantage that a screen-less voice-only device does not have: It can give us visual feedback. Yes, asking “When was this church built” can be easily answered by voice, but explaining “What is gothic architecture” with only speech is a challenge.
And it is not how we communicate as humans, either. If you asked someone (knowledgeable) this question, they will surely use their hands to explain a pointed arch.
Not to mention that the most popular use cases for smartphones are (in no particular order): Video communication, social media, entertainment (video and audio) and gaming, all heavily relying on a screen.
A product designed in reverse, and ethical problems.
One is tempted to think the mystical AI companion is a futuristic but likely pointless product.
In that way the voice-only controlled interface is a naive understanding of “progress”, such as the graphic interface was often seen as “replacing” the command line that in many circumstances still offers a speed and accuracy most graphical interfaces lack until today, or the humanoid robot for the masses craze. Something that will keep the media engaged but will run into a multitude of problems in real life.
However, there may be more to it.
Sam Altman appears to love the idea of “an AI assistant that is always with you”, as “always on”. This blends perfectly in his recent theory that AI will improve by consuming everything you do and perceive to create a model that “shares your memory”, therefore solving the problem of context and providing a better “understanding” when communicating with you.
Leaving aside that this is questionable from a theoretical point of view (our memory also includes our thoughts, dreams and smells), it shows that OpenAI sees “access to everything”, however private it is, as a condition to achieve better AI.
That the AI industry has been running out of training data to improve LLMs is no secret, so the next frontier appears to let the users collect as much private data as possible, and what is better for that purpose than a device that is “always-on”, filming, listening, recording.
I have written in the past that any ethically designed products should honestly reflect their purpose. Ethical question arise if a product or service was designed for a purpose the user does not desire (such as intrusion into privacy) but is pretending to provide another.
Conclusion
For OpenAI, it may just be a way to differentiate themselves in a time that where people start to question the usefulness. But using only a voice interface to replace your computer, your phone and make it a permanent “companion” seems like a hammer — nail situation.
True voice-only interfaces are a long way away, and in the end, Scotty did just fine using the Macintosh’s keyboard.
References
Sam Altman Says Jony Ive’s Mysterious OpenAI Device Will Be Lickable. Futurism. Nov 2025
Jony Ive and Sam Altman say they finally have an AI hardware prototype. The Verge. Nov 2025
OpenAI buys iPhone architect’s startup for $6.4bn. The Guardian. May 2025
Sam Altman and Jony Ive’s secret device won’t be ‘your weird AI girlfriend’. The Verge. Oct 2025
Image: Collage by Wolfgang Hauptfleisch, based on a scene from “Star Trek — The Voyage Home”
--
--


Published in Bootcamp
From idea to product, one lesson at a time. To submit your story: https://tinyurl.com/bootspub1


Written by Wolfgang Hauptfleisch
Software architect and project manager. Obsessed with complex systems, data, urban architecture and other things. Editor of misaligned.
No responses yet
Help
Status
About
Careers
Press
Blog
Privacy
Rules
Terms
Text to speech
相關文章
其他收藏 · 0