Audio Introduction šŸŽ§ 0:54

Hello Everyone,

OpenAI had a Spring Update event the day before Google I/O begins, today as I write this on May 13th, 2024. It comes at an awkward time for Google, that lost market share as Apple’s talks with OpenAI have gone well to bring GPT-4o’s Voice to presumably, the upcoming iOS 18 and iPhones of 2025.

In this article I’m going to cover the updates of the event, what I think is most relevant, plus I will speculate on the partnership with Apple and basically tell it like it is as I see it. At least the event will have gotten us to talk about OpenAI, for one day more!

Advent of ā€œGPT-4oā€

OpenAI on Monday introduced a new AI model and a desktop version of ChatGPT, that apparently still has 100+ million users.

The new model is called GPT-4o.

It’s not overly clear if GPT-4o is just the commercial name for GPT-4.5.

Accessibility upgrades highlighted the Spring 2024 update.

The latest update ā€œis much fasterā€ and improves ā€œcapabilities across text, vision, and audio,ā€ OpenAI CTO Mira Murati said in a livestream announcement on Monday.

Support and Dive Deeper

Support my work for as little as $2 a week. Get access to deep dives and AI related report summaries. šŸŽ“šŸ“ššŸ’”

Subscribe now

Am I supposed to be excited? GPT-4o can detect and understand videos, audio, and even the emotions in your voice.

I was hoping this Event would be a bit more eventful. My readers mostly disagreed in our AI Chat. An era where we have been manipulated into thinking OpenAI heralds the pioneering change we have all secretly been hoping for.

The Live-Stream lasted for 26 minutes, you can re-watch it here.

Watch Spring Update

Or if you prefer reading:

OpenAI Spring Update Blog

This was clearly a product update to try to attract free users back to ChatGPT.

A Desktop App

A better more immersive interface

Better Voice-App interface (with emotion recognition)

Making GPT Store free for all

Apparently coming to the iPhone, just ask Mark Gurman:

If GPT-4o comes to Apple products it would be very bad for Google.

Three days ago Bloomberg reported that Apple is indeed nearing a deal with OpenAI to put ChatGPT on the iPhone, what this really means now we know is GPT-4o.

In a blog post from the company, OpenAI says GPT-4o’s capabilities ā€œwill be rolled out iteratively which amounts to pretty gradually.

Improved Accessibility, Immersion and Utility šŸ‘ļøā€šŸ—Øļø

GPT-4o will enable ChatGPT to interact using text, voice and so-called vision, meaning it can view screenshots, photos, documents or charts uploaded by users and have a conversation about them.

The desktop app includes keyboard shortcuts and living with ChatGPT throughout the day.

OpenAI are calling this their new ā€œflagship modelā€ that can reason across audio, vision, and text in real time. OpenAI CEO Sam Altman posted that the model is ā€œnatively multimodal,ā€ which means the model could generate content or understand commands in voice, text, or images.

Cheaper, faster, better 🧠

The API is a significantly better deal:

In the API, GPT-4o is half the price AND twice as fast as GPT-4-turbo.

With 5x rate limits. So Paid means early access to some features and more rate limits.

Towards an Omnichannel (modal) AI Interface

GPT-4o (ā€œoā€ for ā€œomniā€) is a step towards much more natural human-computer interaction—it accepts as input any combination of text, audio, and image and generates any combination of text, audio, and image outputs.

OpenAI is taking a more immersive approach to accessibility with this upgrade and trying to give desktop users, i.e. Enterprise folk more utility out of ChatGPT’s new capabilities.

For the record, the jargon name ā€œomnimodalā€ seems to be a marketing term they made up.

More Capable Voice-Interface

It can respond to audio inputs in as little as 232 milliseconds, with an average of 320 milliseconds, which is similar to human response time(opens in a new window) in a conversation.

A Win For Developers

Developers who want to tinker with GPT-4o will have access to the API, which is half the price and twice as fast as GPT-4-turbo.

With more folk having access to the GPT Store, things could also get interesting on the modding front.

Ease of Use Upgrade

ā€œThis is the first time that we are really making a huge step forward when it comes to the ease of use,ā€ Mira Murati said.ā€

Founded in 2015, is under pressure to stay on top of the generative AI market while having a valuation of over $100 billion by now. Meanwhile xAI is likely to have recently raised nearly $8 Billion on a $20 billion valuation all of a sudden, so competitors are nipping at its heals with Anthropic also releasing an app for Claude 3, that has apparently not done too well.

Prior to today’s GPT-4o launch, conflicting reports predicted that OpenAI was announcing A web search tool, GPT 4.5, GPT 5 or a Voice bot reminiscent of the cult movie HER from 2013. GPT-4o certainly seems like a GPT-4.5 with a new skin. An accessibility wrapper of some kind.

The fresh take on ChatGPT Voice, with better voices and emotion recognition does seem like perhaps the most substantial upgrade for a user like me. The API also seems boosted significantly.

The live-demo for me went a bit off the rails after about after the 9:11 mark when two developers started the ā€œDemoā€ of the product, which was boring and unremarkable and even included real-time language translation, of all things. OpenAI has been pushing multi-modal AI for quite some time. I’m not sure real people are as enamored with it.

The desktop app did feel fast, sleek and seemed to have some good interface choices, the idea of keyboard shortcuts for various things made a lot of sense to me. Of course, OpenAI was sure to time this launch just ahead of Google I/O and it was pretty lightweight of them trying to give GPT-4.5 a more mainstream spin. The ChatGPT Voice definitely didn’t feel like the movie HER, as much as I wish it had or did.

Sorry, this Ain’t Her

Still OpenAI is trying to appeal to the common person. The new audio capabilities enable users to speak to ChatGPT and obtain real-time responses with no delay, although the delay in the demo felt pretty noticeable and not like a human conversation in fact. That really killed the immersion for me.

Prior to GPT-4o, you could use Voice Mode to talk to ChatGPT with latencies of 2.8 seconds (GPT-3.5) and 5.4 seconds (GPT-4) on average. So GPT-4o at least is a noticeable upgrade in that respect. ChatGPT’s iOS app is currently still #1 in Productivity.

Does putting ChatGPT on desktop really make that big a difference to accessibility? Most mobile apps for Generative AI outside of this one haven’t done very well as noted by a16z and their blogs on these sorts of things.

Natively O

OpenAI tried to make a big deal in their blog how with GPT-4o, they trained a single new model end-to-end across text, vision, and audio, meaning that all inputs and outputs are processed by the same neural network. Because GPT-4o is their first model combining all of these modalities, they believe they are still just scratching the surface of exploring what the model can do and its limitations.

This is supposed to be a ā€œa faster model that’s free for all ChatGPT usersā€. According to data from app intelligence firm Appfigures, the highest rank Claude achieved among Apple’s top free iPhone apps in the U.S. was No. 55 on May 4, a few days after its debut on the first of the month. At the time of writing, Claude is 59th in Productivity.

Anthropic’s Claude App Struggles in iOS App Launch

See the App

Voice-AI with Emotional Intelligence?

All of this might mean that the demand for ChatGPT and its peers just isn’t as big as OpenAI or Microsoft might have once expected. I won’t even speculate on how Microsoft Copilot and its various iterations of Copilots are doing. There have been mixed reports to say the least.

At one point in a breath of fresh air, Mark Chen said during the live-demo that the model has the capability to ā€œperceive your emotion,ā€ adding that the model can also handle users interrupting it. I do plan on interrupting AI a lot so they better get used to that. No more please or thank you for Alexa or Siri who now obviously feel very dated, if they didn’t feel that way even three or four years ago.

We might have to wait for Apple’s own WWDC 2024 for a greater understanding on their own AI strategy and how much they are willing to partner with OpenAI or Google, or more likely, with both. Chen demonstrated the model’s ability to tell a bedtime story and asked it to change the tone of its voice to be more dramatic or robotic. It was all pretty cringe. Don’t tell Mira Murati that though, she was great.

Why the Event Felt Disappointing – My Take

I really was expecting more than just a better interface, immersive UX or wrapper. GPT-4.5 could have had more distinctive features, but it didn’t so they had to build this entire marking narrative around these bells and whistles that don’t even feel significant. As usual with OpenAI, the entire thing felt fairly contrived.

How does this event update even compare with what Google I/O will announce given that Alphabet has been hitting AI products pretty hard the last year? I don’t think OpenAI’s Spring update will compare favorably after this week, Google’s I/O begins tomorrow.

OpenAI is making the majority of the revenue of all Generative AI players, but to call GPT-4o their ā€œnew flagship modelā€ just doesn’t feel notable or even quite accurate. They are stealing real revenue from their competitors mostly due to Microsoft’s reach and its inherent advantages. It’s really not a level playing field. ChatGPT glow is a slow burn but it isn’t necessarily a winner that will be sustainable in the so-called quickly changing Generative AI world.

Sam Altman calling it magic, didn’t help. OpenAI’s credibility and reputation is already eroding in a long list of marketing gimmicks. Will OpenAI be the ones to get Voice-AI right? I highly doubt it.

This might sound harsh to some folk, but OpenAI keeps failing to build valuable products outside of ChatGPT and it shows. If you aren’t a fanboy already, you likely just don’t care.

Better Performance

Not just more immersive but smarter? Something I have heard about 30 times so far in 2024 and it’s only May.

GPT-4o achieves GPT-4 Turbo-level performance on text, reasoning, and coding intelligence, while setting new high watermarks on multilingual, audio, and vision capabilities.

Watch the Full Demo

OpenAI said it would roll out its new AI model over the next few weeks. Its chief technology officer, Mira Murati, said at the event that the new GPT-4o model would be offered for free because it is more efficient than the company’s previous models. I’m not exactly very clear on why that might be.

OpenAI pretends it’s doing us some kind of a favor here, when Open-weight (Open-Source) models have made incredible progress in 2024 already.

Share with my Your Honest Reactions to the Event with Others

Go to AI Chat

šŸŒ You Know Best

OpenAI’s new model can also function as a Universal language translator, even in audio mode. And that was part of their pitch? Yep.

So that’s what we’ve come to. šŸ˜‚

OpenAI and Apple

What the heck is going on?

Apple is so far behind in Generative AI, it may be forced to adopt GPT-4o or Gemini, or a combination of both, just to seem still relevant and try to artificially accelerate its next iPhone buying cycle. Since the business model is in major trouble, especially in China on a macro level.

The sad reality is Apple now in 2024 relies on large stock buybacks now to keep relevant for shareholders, never a great sign in its life cycle. Its days as a hyperscale BigTech entity are likely numbered.

Mark Gurman reports that Apple Inc. has closed in on an agreement with OpenAI to use the startup’s technology on the iPhone, part of a broader push to bring artificial intelligence features to its devices.

So the two sides have been finalizing terms for a pact to use ChatGPT features in Apple’s iOS 18, the next iPhone operating system. This could be very lucrative for OpenAI (and thus for Microsoft as well), hurting Google’s credibility in the space. OpenAI is also working on a Web-search product for its Omnimodal interface, that will in part rely on Microsoft Bing.

An OpenAI accord would let Apple offer a popular chatbot as part of a flurry of new AI features that it’s planning to announce next month in time for WWDC 2024. While Apple is making progress in its on-device language models and capabilities, its internal models do not seem to have the polish of GPT-4o (which I presume to be GPT-4.5) or Gemini, among others.

If this collaboration is finalized, it gives OpenAI a lot more credibility in the world as the frontrunner of the Generative AI era. Even as Apple will try to frantically upgrade Siri for the new era of picky consumers who are now used to half-decent Voice-AI experiences.

Apple plans to make a splash in the artificial intelligence world in June, when it holds its annual Worldwide Developers Conference. I expect it to partner with Gemini for some things and OpenAI for others, but OpenAI might gain the bigger contracts and crucially more exposure for GPT-4o and whatever GPT-5 will be.

OpenAI Trying to Go Mainstream

What’s clear with the Spring Update 2024, is OpenAI has reached product-market fit with 100 million users, but needs another level of scale to compete against rivals with bigger budgets like Google Gemini, and even its benefactor Microsoft Copilot and its products and various specialized iterations.

ChatGPT may have also lost a lot of active users in the last while and by making GPT-4o free it hopes to win back some of those users. There was an element of desperation in the conversion optimization and accessibility push of this announcement, I fear. ChatGPT certainly isn’t growing like it once used to. It’s not clear if GPT-4o and a desktop app can measurably help their products become more popular. But perhaps more sticky?

While ChatGPT’s app dominates the mobile market, and is easily the most used Generative AI chatbot, it’s not exactly beloved by most people on the street who trust AI even less after the last 18 months of hype and over-promising. AI didn’t end up taking our jobs, heck it didn’t even turn out to be useful to a large majority of us here in the real world!

Making the GPT Store free means it hasn’t really provided enough value or conversion for paid users. A freemium version of an AI heading to Agentic AI feels like a last ditch effort to win-back lost momentum. And if ChatGPT is struggling to win more users, that’s a bad omen for the entire B2C angle of Generative AI. Behind the superficial hype and thin layer of sponsored optimism, the tactics tell of a different story.

Onwards

I’ll be covering Google I/O very closely today because Alphabet has been putting a massive amount of effort to remain the world leader in AI products. Google has to its credit Android, Google Cloud, Google search, Google Workspace and any number of levers it can pull to help adoption rates to Google Gemini and its various other AI products.

Read MoreĀ in Ā AI SupremacyĀ