Hey Everyone,
It feels like we’ve been waiting for Google Gemini for years, but it’s really just months. This was supposed to be the product to compete with GPT-4 and ChatGPT. But does it even do that?
“Every technology shift is an opportunity to advance scientific discovery, accelerate human progress, and improve lives.”
Sell Smarter with AI in 2024! New Data from HubSpot and G2 dives into insights from 600+ sales pros and leaders across B2B and B2C teams on how they’re using AI tools and the results they’ve seen.
Now more than ever, sales teams to turn to AI-based tools to optimize the sales process, automate manual tasks, helping them actually spend their time connecting with prospects and closing more deals.
The preview of Gemini is tentatively here.
Alphabet finally announced Gemini, on December 6th, 2023 introducing its most advanced artificial intelligence model, a technology capable of crunching different forms of information such as video, audio and text.
It’s not as far as we know at GPT-4 level. Of course Google claims Gemini beats GPT-4 in “30 of the 32 widely used academic benchmarks.” Oddly though executives said Tuesday that Gemini Pro outperformed OpenAI’s GPT-3.5 but dodged questions about how it stacked up against GPT-4. Which is odd because their blog and technical paper makes things rather clear.
After more than eight months of research and development, Google has unveiled its most powerful A.I. so far. But only parts of it are widely available, and it’s not exactly what we were expecting!
If you value my work and topic choice, please considering getting all of my deep dives as a paid subscriber.
The large language model Gemini will include a suite of three different sizes: Gemini Ultra, its largest, most capable category; Gemini Pro, which scales across a wide range of tasks; and Gemini Nano, which it will use for specific tasks and mobile devices.
Three Flavors of GDM Gemini
Gemini, as it turns out, is actually a family of AI models — not just one. It comes in three flavors:
Gemini Ultra, the flagship Gemini model
Gemini Pro, a “lite” Gemini model
Gemini Nano, which is distilled to run on mobile devices like the Pixel 8 Pro*
Gemini Ultra leads the pack with groundbreaking scores, surpassing human experts on the MMLU benchmark. It shines in complex problem-solving, showcasing an unmatched blend of world knowledge and analytical prowess.
Gemini excels in reasoning, successfully deciphering complex written and visual data, setting a new standard in AI’s ability to derive knowledge from vast information pools.
Gemini is multimodal. With its training in text, images, audio, and more, Gemini navigates intricate topics like math and physics, providing explanations with unprecedented clarity.
For coding tasks, Gemini is capable of understanding and generating high-quality code in several programming languages, pushing the boundaries of AI-assisted coding.
Some analysts online are claiming the advantage of Gemini is its ability to run efficiently on everything from data centers to mobile devices. This adaptability is a leap forward, hinting at AI’s boundless potential in diverse applications.
Gemini is Designed to Boost Google Cloud
The company is planning to license Gemini to customers through Google Cloud for them to use in their own applications.
It will also power consumer-facing Google AI apps like the Bard chatbot and Search Generative Experience.
As you can imagine, it’s also designed to improve customer experience of Google’s own products.
Gemini: Google’s newest and most capable AI model
Listen about it:
While OpenAI’s ChatGPT become a symbol, some would even call it a “worldwide phenomenon”, and one of the fastest-growing consumer products ever (caveat, only at the start), Google’s Bard was a major disappointment and embarrassment for Alphabet. As so many Google brain researchers, engineers and scientists were poached by OpenAI and started their owns tartups, they literally had to merge the team with Google DeepMind (GDM). This did not bode well for Google’s chances.
Bard may have been a sign of things to come. We will have to see Gemini in practice and all the various A.I. products Google has announced and will announce in 2024 to try and make itself look good.
Gemini’s Full Release is Something More
Important to remember in all of this that this is just a preview announcement. Gemini Pro, is essentially a lightweight offshoot of a more powerful, capable Gemini model set to arrive… sometime next year.
Google also announced AlphaCode 2. I would not be surprised if Alphacode 2 is actually better at coding than ChatGPT.
Google announced Cloud TPU v5p as well.
Gemini Ultra will be available to select customers and developers for early experimentation and feedback before a broader rollout early next year. Also early next year, Google will launch Bard Advanced, powered by Gemini Ultra.
Gemini: ‘the most capable model we’ve ever built’
Time will tell how Google’s best Gemini models stack up to GPT-4. And I’m sure I’ll be delving more deeply into this as well soon.
Called Gemini, the Google owner’s highly anticipated AI model is capable of more sophisticated reasoning and understanding information with a greater degree of nuance than Google’s prior technology, the company said. But there is a sense that Gemini has a lot left up its sleeve Google hasn’t yet announced.
AlphaCode 2 is in fact powered by Gemini, or at least some variant of it (Gemini Pro) fine-tuned on coding contest data. And it’s far more capable than its predecessor, Google says — at least on one benchmark.
Google Needs Gemini to Work for Google Cloud Growth
We know that Google DeepMind will rapidly improve their LLMs in line with their existing products and suites of tools.
The entire battle over the Cloud is at stake. If Gemini can exceed expectations, Google Cloud is a huge winner. And the Cloud market is so huge for Alphabet’s future to diversify away from just Search advertising.
In LLM innovation it’s all about the profit incentive. You can see that with Microsoft’s adoption of GPT-4 or Nvidia’s flurry of investments into Generative A.I. startups in 2023. Those are two of the most notable events in the history of A.I. in 2023 for me.
Is Google Gemini Really Intelligent?
Pardon the annoying robotic voice!
Gemini’s Reasoning Capabilities
Gemini: Reasoning about user intent to generate bespoke experiences
I also like all the YouTube Gemini videos:
Gemini: Explaining reasoning in math and physics
What does it all Mean?
Gemini is natively multi-modal and its coding capabilities really stand out. It’s improved a lot from PaLM-2.
Nevermind that “Bard” is a terrible name. Bard is now powered by Google’s new Gemini model, which it says matches and even exceeds OpenAI’s tech in a number of ways. (Google says Gemini is coming to more languages and countries “in the near future.”)
The version of Bard with Gemini Pro will first become available in English in more than 170 countries.
Gemini’s Trinity of utility in Nano, Pro and Ultra is scalable and demonstrates a keen design architecture.
As a business, since the launch of OpenAI’s ChatGPT roughly a year ago, Google has been racing to produce AI software that rivals what the Microsoft. Mind you so have Amazon, Apple, Alibaba, Baidu and dozens of other major companies.
Thus far in 2023, Nvidia and Microsoft have been winners and Google, Apple and Amazon have definatley been the losers. Meta with Lamda-2 have really proven their value to the ecosystem. Amazon also is catching up with more partnerships and utility. Apple-GPT is also coming in 2024.
Google Gemini’s more advanced features will only roll out in 2024 along with new A.I. products and updates. It’s still very much a work in progress.
Ultra is the biggest and slowest but the most capable, Nano is small and fast and meant for on-device tasks, and Pro sits right in the middle. It’s meant to be the Goldilocks version of the model, really: fast and efficient while still as capable as possible.
Moreover this will allow Alphabet to monetize AI-as-a-Service better. Gemini allows for a better product-market fit of its AI advantage, both in terms of talent, budget and LLMs.
Before launching to the public, Gemini Pro was run through a series of industry standard benchmarks, and in six out eight of those benchmarks, Gemini outperformed GPT-3.5. That’s to be expected, GPT-3.5 is kind of old in A.I. time.
Gemini Pro Launch
Gemini Pro will also launch December 13 for enterprise customers using Vertex AI, Google’s fully managed machine learning platform, and then head to Google’s Generative AI Studio developer suite.
Gemini is for Cloud Customers
For now, the company is planning to license Gemini to customers through Google Cloud for them to use in their own applications.
Starting Dec. 13, developers and enterprise customers can access Gemini Pro via the Gemini API in Google AI Studio or Google Cloud Vertex AI.
Gemini will also be used to power Google products like its Bard chatbot and Search Generative Experience, which tries to answer search queries with conversational-style text (SGE is not widely available yet).
How about for B2C?
Everyone is asking how do I use Gemini, where do I find it? That is a funny question in a sense. The improvements will make Bard more capable in terms of things like understanding and summarizing content, reasoning, brainstorming, writing and planning, the company notes.
Gemini in some ways, feels like an product upgrade rather than some flashy LLMs that goes up against ChatGPT on its own terms.
Does Gemini have some sort of Native Multimodal Advantage?
Google is referred to as a natively multimodal model, meaning it can analyze text, audio, video, images, and code. While other multimodal offerings exist, Google says Gemini stands apart because the model was designed to take all of those mediums into account from the beginning.
Does this give it an advantage in how it will scale and perform across capabilities and tools in the future? I think it does at least for Alphabet’s own needs. It’s leverage Gemini primarily for commercial growth.
Gemini Ultra
Gemini Ultra is the first model to outperform human experts on MMLU (massive multitask language understanding), which uses a combination of 57 subjects such as math, physics, history, law, medicine and ethics for testing both world knowledge and problem-solving abilities, the company said in a blog post.
I think Alphabet needs to release AlphaCode 2 as a stand-alone product, and I think in 2024 it will along with many many other tools.
The skeleton of Gemini is a decent architecture for product innovation around A.I., something clearly outside the scope of OpenAI’s capabilities or even Microsoft’s frankly.
However Google’s ability to do the product-marketing required to demonstrate the value of Gemini appears a bit lacklustre as usual, they sort of suck at product launches. So many historical failures at Google can be easily attributes to this. They don’t connect the dots well for consumers or potential customers, Google remains a bunch of engineers and it shows even with the Gemini soft Launch announcement this week in December, 2023.
Whatever the Gemini era leads to it won’t be easy.
As Microsoft and Amazon continue to improve their LLM integration and Cloud AI offerings, we’ll know more in 2024 where Google and Google Cloud will fit in the big picture. The battle for the next era of search and advertising will also slowly begin. Google has the most riding on Gemini in that sense.
As part of the announcement, Google released a series of videos demonstrating Gemini’s capabilities. Some of the appears in this article and you can easily find them on YouTube or have seen them already on X or LinkedIn.
As for Gemini Ultra you had some interesting tweets about it.
And the usual academics. Who noticeably don’t sound as in awe as they would anything to do with their pet OpenAI.
“The big deal is that it appears to be the first model to beat GPT-4. The fascinating thing is that it does it by just a tiny bit.” – Ethan Mollick
I think GPT-5 will give us more information and then this race for A.I. Supremacy will happen all over again in yet another cycle. Of course with better A.I. chips, bigger models, better fine-tuning and improvements on RLHF we will eventually get somewhere. Maybe even somewhere impressive in the 2020s, who knows!
Sissie Hsiao, who runs Bard and Assistant at Google, said in a press briefing that Gemini represents the “biggest and best upgrade yet” for Bard. It’s really hard for me to get excited about Bard or any of these chatbots for that matter. It will likely take them years to get genuinely useful. Just give me a Google Assistant that is at Bard’s supposedly upgraded level.
Google Gemini feels veiled beneath a layer of Alphabet dust and veils, where it’s hard to even picture what it can actually do today, for me! If Gemini far exceeded GPT-4 it wouldn’t have to wait so long and shroud this key product in mystery. Or force me to experience it on Bard, a product I’ve already found inferior. Going back to chatbots like BingAI or Bard is not something many consumers will do with much enthusiasm.
The Gemini announcement feels like a not so well thought-out preview in many ways. It’s troubling how Google boasts without showing the goods or even turning them into real products. I think the Tweets of Burkov best exemplify this reality. But it is what it is, Google’s AI talent is clearly not what it once was relative to OpenAI, Microsoft Research and all of these dozens of A.I. startups that are trying to innovate rapidly in the space. GDM may no longer be that place where the world’s top talent in AI goes. Like FAIR, it’s no longer world-class, per se. The pressure to commercialize isn’t what the top A.I. researchers necessarily want to do.
Gemini only Arrives Sometime in 2024 in Most Products
Gemini will arrive in the coming months in Google products like Duet AI, Chrome and Ads, as well as Search as a part of Google’s Search Generative Experience. So this preview of Gemini isn’t exactly ready for launch as of December 6th.
Google DeepMind appear to be working on so many fronts, it might prevent them from getting their core LLMs just right. It appears that 2023 at Alphabet has been a bit messy. Google’s stock is up only 46% in 2023 so far, a shadow of their BigTech peers and it’s a problem if Gemini does not live up to expectations.
So by the time Gemini gets fitted for real world products, OpenAI may have already announced GPT-5. Google has moved way too slow! Given the evolution of Generative A.I. as a whole, it’s difficult for Google to keep up on all fronts. Instead they will need to invest in Anthropic, Character.AI and others and hope for the best.
Google will likely in 2024 lose its long held dominance in A.I. and it’s a major problem for the company in their Search Advertising monopolistic sense. There’s no other way to explain what is going on.
Gemini becoming three is confusing for some customers and consumers as well. This announcement doesn’t make them hold their breath for Gemini in 2024. Alphabet unnecessarily ads friction to their product-marketing and the product-market fit of these products, if some of them even become real products! It’s super juvenile from a marketing perspective.
That there are more GPT-5 memes trending on Twitter than discussions about Gemini the day after the announcement is not a good sign. And what are they even going to say? Bard is the only consumer touch-point we have about Gemini and that’s not something to be proud of.
This will also gallvinize OpenAI to release GPT-5 sooner rather than later which will make Gemini look incredibly clumsy, slow and late to the party.
Does Gemini Uplift LLM research?
I think what Gemini does is commercialize and make multi-modal LLMs more mainstream.
But is it a major step forwards? Google claims its Ultra version achieves “state-of-the-art performance” across 30 out of 32 academic benchmarks used in LLM (large language model) development.
Gemini Ultra is the first model to achieve human-expert performance on MMLU (Hendrycks et al., 2021a) — a prominent benchmark testing knowledge and reasoning via a suite of exams — with a score above 90% – GDM Technical Paper.
2024, the Year AI Surpasses Human Expertise Across a Wide Range of Tasks?
Furthermore, Google Gemini scores 90% on a massive multitask language understanding (MMLU) test, surpassing human expert performance, according to Google.
It’s important to go through Gemini’s technical paper to better understand its capabilities. Google Gemini may have marginally better reasoning capabilities than GPT-4.
According to Jeff Dean, the multimodal and reasoning capabilities of Gemini are quite strong.
Consider the image below. A teacher has drawn a physics problem of a skier going down a slope, and a student has worked through a solution to computing the speed of the skier at the bottom of the slope. Using Gemini’s multimodal reasoning capabilities, the model is able to read the messy handwriting, correctly understand the problem formulation, convert both the problem and solution to mathematical typesetting, identify the specific step of reasoning where the student went wrong in solving the problem, and then give a worked through correct solution to the problem. The possibilities in education alone are exciting, and these multimodal and reasoning capabilities of Gemini models could have dramatic applications across many fields.
Sample of Reasoning Capabilities of Gemini in late 2023
There are a number of short videos that discuss Gemini and demonstrate its capabilities: Overview:
Gemini extracting relevant information from tens of thousands of scientific papers:
Highlights of the native multimodality of Gemini with audio and images:
A version of AlphaCode built on top of Gemini that performs in the top 15% of competitors in competitive programming:
Gemini helping a parent and student with their physics homework:
Gemini creating bespoke UIs that are contextual and relevant to an ongoing conversation:
A full set of demos is at: https://deepmind.google/gemini
Gemini can also code in popular programming languages including Python, Java, C++ and Go. Google has even leveraged a specialized version of Gemini to create AlphaCode 2, a successor to last year’s competition-winning generativeAI. I have some faith that AlphaCode 2 will be a decent product.
Gemini Ultra — their largest and most capable model for highly complex tasks.
Gemini Pro — their best model for scaling across a wide range of tasks.
Gemini Nano — their most efficient model for on-device tasks.
I think Gemini can also evolve into 2025.
How does Google Gemini Ultra compare with GPT-4?
It’s important to note that GPT-4 is substantially better at common sense reasoning for everyday tasks, one of the key things in making a good chatbot. Here GPT-4 is not marginally better than Gemini Ultra, it’s by a significant margin.
But what about the soon to be released GPT-5?
In early 2024 Alphabet will release Bard Advanced, powered by the Gemini Ultra model, after they complete further trust and safety checks on the model and further refine it with additional RLHF tuning.
As part of this, they said they are making Ultra available to select customers, developers, partners and safety and responsibility experts for early experimentation and feedback before we make it more broadly accessible.
You might be impressed or disappointed or like me or a lack-lustre combination of both. Google had a lot of time to get Gemini right, it sounds like it isn’t even finished.
Alphabet’s progress in AI in 2024
Google’s Gemini is a clear winner in pushing multimodality further.
It’s really in terms of coding where I’m not sure we have all the data yet.
Google is essentially claiming on their blog and in their technical report that Gemini > GPT-4.
Gemini appears to be fine-tuned to problem solving and basic math more than GPT-4. Is certain kinds of coding and reasoning it is better.
The Real Launch of the API is December 13th, 2023
Google also claims to care about Responsible AI. Their investment in Anthropic makes this clear.
Gemini: Safety and responsibility at the core
Gemini is a Massive Engineering Feat
Gemini is a large-scale science and engineering effort, requiring all kinds of different expertise in ML, distributed systems, data, evaluation, RL, fine-tuning, and more (800+ authors on the report). The largest Gemini model was trained on a significant number of TPUv4 pods. It is built on top of JAX and the Pathways system (https://arxiv.org/abs/2203.12533), which enables us to orchestrate the large-scale training computation across a large number of TPUv4 pods across multiple data centers from a single Python process. – Jeff Dean
In some ways this is not the same Google of twenty years ago. They are locked in to familiar patterns of revenue generation and don’t want to diminish their monopoly in Search Advertising in any way. Google Cloud is year from catching up from AWS or Azure, if they ever do.
Google has no moat against open-source LLMs, and have lost every single research from the most important paper that started LLM innovation. The implies Google is no longer the golden ticket for A.I. researchers. This might not bode well for their future even if DeepMind is still a world-class research Lab. The pressure to commercialize and given how bloated Google is with nearly 200,000 employees all over the world, they just can’t move fast on major developments.
I think it’s unrealistic to expect Alphabet to remain a leader in Generative A.I., even if for a brief window, Gemini might be the best set of models. Let them have their moment of glory, because it may not last long. Just as Microsoft did when they first got early access to GPT-4, it’s only a brief moment in time.
While Google did not immediately share the number of parameters that Gemini can utilize, the company did tout the model’s operational flexibility and ability to work in form factors from large data centers to local mobile devices. Gemini is made for the real world, and in particular, Google’s own products.
Ultra promises to be an incredibly powerful for further AI development, available to select customers sometime in early 2024.
Gemini Excels at Advanced Image Understanding
Here Gemini Ultra really outperforms GPT-4 by a significant margin.
Read the 60 page PDF for more intricate details.
Gemini, is a family of highly capable multimodal models developed at Google. Their efficacy will be more clearly understood in early 2024. Gemini 1.0 might need a lot of tweaking.
Gemini Ultra looks pretty capable.
On MMLU, Gemini Ultra is the first model to achieve human-expert performance on MMLU across 57 subjects with a score above 90%.
There are some bragging rights for sure, but it’s the implementation in the real world for consumers and enterprise customers that worries me the most.
I hope this gives a broad sense of what Google announced this week. I’ll make more details available as they become known.
Read More in AI Supremacy