DALL-E/Every illustration.

Review: ChatGPT’s New Advanced Voice Mode

AI’s new voice is a leap into the future

122 2

Sponsored By: Hubspot

Struggling to make sense of your data? What if AI could help?

HubSpot's comprehensive guide unlocks AI's potential for data analysis, offering a strategic five-step implementation process and insights into cutting-edge tools for machine learning and predictive analytics. 

Master AI-driven data analysis, overcome key challenges, and transform your decision-making capabilities with actionable insights from our expert resource.

Was this newsletter forwarded to you? Sign up to get it in your inbox.


Usually, technology moves in increments.

An iPhone with a marginally improved camera here, a Kia Sorento with a slightly better safety rating there. When you look back after a decade, the technology has clearly advanced—but each step along the way was so small that you didn’t really notice it when it landed.

Occasionally, though, you’ll encounter a new technology that ditches the incremental. Instead, it appears to truss the future to a sturdy rope and pull it hand over hand into the present.

In moments like these, previously state-of-the-art technology fossilizes before your eyes. You can see its desiccated bones crumple together into a dusty pile that you look at with nostalgia and pity.

That’s the experience of using ChatGPT’s new Advanced Voice Mode feature—and then returning to its precambrian precursors, Siri and Alexa.

I got alpha access to it last week, and I reviewed the basics of Advanced Voice Mode—including demos of my major use cases—on YouTube and X. If you’re interested, I suggest you check it out.

I’d like to dive deeper on a few use cases that underscore the leap-forward nature of this technology. The first is self-reflection, and the second is learning. But first, let’s start with what Advanced Voice Mode is and why it’s so different from what came before it.

What is Advanced Voice Mode?

ChatGPT’s Advanced Voice Mode understands speech natively, meaning it doesn’t just read and write text. It reads and writes speech, too. This creates an experience that is distinctly better—more fluid, more fluent, and more authentic—than any other voice interaction I’ve ever had with a computer.

Advanced Voice Mode replaces ChatGPT’s standard Voice Mode, which has been around for about a year. The old Voice Mode used to work like this:

  1. You speak to ChatGPT,
  2. The interface turns your voice to text using a transcription model,
  3. It feeds the text into its underlying language model, GPT-4, to get a response in text,
  4. The interface takes the text answer from GPT-4 and feeds it into a separate text-to-speech model, and then
  5. ChatGPT speaks the words back to you.

That’s a lot of steps! It caused a significant amount of latency, and it also created a lot of room for misunderstanding. When you translate speech into text, you can lose a lot of nuance. A sarcastic tone might be taken literally, or it might not discern that there are actually two speakers in the room.

As a result, Voice Mode felt a little bit like doing an escape room with your hard-of-hearing grandparent, or trying to order a medium-rare steak in English from a puzzled waiter in a small village outside of Seoul. There was a sense of distance, of being trapped—not by the limits of intelligence on the other end, but by the limits of your expressive ability and theirs. This manifested as a certain pressure in my chest.

With the old ChatGPT Voice Mode, you couldn’t stop talking for fear of being interrupted, and you had to speak loudly and clearly for fear of being misheard. You expected, more often than not, that something might be misunderstood. You were constantly catering to the needs of the model, and so it was not relaxing. (Though, to be fair, it was still better than Alexa or Siri.)

The new Advanced Voice Mode eliminates steps 2 and 4 from the process above. It can natively understand speech, so you’re just speaking directly to the language model. The biggest immediate change is that a conversation with ChatGPT feels much more authentic and responsive. When I began using it, the pressure in my chest was suddenly gone. I felt more relaxed and expansive.

This opens up a new and important use case: ChatGPT as an aid to conversational reflection.

Reflecting with ChatGPT Advanced Voice Mode 

I am usually a pretty chill and laid-back guy—that is, until you wrong me.

And, unfortunately, you will probably wrong me.

Create a free account to continue reading

The Only Subscription
You Need to Stay at the
Edge of AI

The essential toolkit for those shaping the future

"This might be the best value you
can get from an AI subscription."

- Jay S.

Mail Every Content
AI&I Podcast AI&I Podcast
Monologue Monologue
Cora Cora
Sparkle Sparkle
Spiral Spiral

Join 100,000+ leaders, builders, and innovators

Community members

Already have an account? Sign in

What is included in a subscription?

Daily insights from AI pioneers + early access to powerful AI tools

Pencil Front-row access to the future of AI
Check In-depth reviews of new models on release day
Check Playbooks and guides for putting AI to work
Check Prompts and use cases for builders

Comments

You need to login before you can comment.
Don't have an account? Sign up!
@anabizpm over 1 year ago

Thanks for your review, it is very insightful ! I've been lately binge watching advanced voice mode demoes on X and Youtube and I'm quite STUNNED to say the least ! We're definitely going to be emotionally attached to it. Do you realize having a buddy partner you can go with anywhere, who can listen patiently to your problems without judging, who can advise you in almost any aspect of your life, who can teach you new stuff and who has an almost infinite knowledge. I mean ... even with the trade-offs about our data, it's something REVOLUTIONARY !

Daniel Nest over 1 year ago

Thanks for a great review, Dan.

I've been excited about the new Advanced Voice Mode ever since the original OpenAI livestream, and now I'm even more so. Can't wait to take it for a spin when it's finally available to us non-Alpha peasants.

Also, your YouTube series focusing on people's AI use cases is excellent. Keep it going!

Daniel (no relation)