Is Bigger Better for LLMs?
It’s only normal that a new AI architecture like LLMs of which Generative AI are based upon would hit a point of diminishing returns and in 2025 we might learn a lot more about how frontier model builders have a hit a scaling wall. For example, GPT-5 won’t feel as groundbreaking as GPT-4 did and Gemini 2, Grok 3 and Claude 4 (Claude 3.5 Opus) models might disappoint.
In reality, 2025 might be a year of text-to-video innovation, Open-source models and AI infrastructure scaling, while the scaling reality of LLMs faces significant challenges and hurdles and product-market fit growing pains, truly come alive.
Everything everywhere all at once, sure, but we need to have realistic expectations for Generative AI’s trajectory in the real world. In real GDP terms, the promise of AI boosting productivity might not be enough to offset an aging demographics and lower fertility in the West on pure labor and economic terms in the years and decades ahead.
Eric Schmidt and the Venture Capital community pumping Generative AI hype will of course beg to differ about scaling laws slowing down or AI augmenting productivity, but that’s also their job to do so. Amid reports of scaling laws deterring the development of advanced AI models, former Google CEO Eric Schmidt claims no evidence shows the process has been stunted.
Scaling laws are fundamental to understanding the development of advanced AI models. They describe how factors like model size, training data volume, and computational resources interact to influence AI performance. We might face inevitably another AI and deep learning plateau, as the first wave of Generative AI startups consolidate, with very few actual winners.
🕯️📖 Recent Articles you may have missed: 📚
Here are some of the titles of articles in recent days on the Newsletter:
The Chief AI Officer of America, Elon Musk
How Silicon Valley is prepping for War
Is Alibaba’s Qwen the Open-Source AI Winner?
State of AI Report 2024 Summary, Part I
AI is fueling a data center boom (The “GDB” as I coined it)
Disillusionment is Normal for AI
What might a trough of disillusionment look like in Generative AI? The history of technology shows it’s inevitable we experience it. OpenAI’s Orion models are unlikely as significant as Sam Altman and OpenAI have attempted to claim. OpenAI’s revenue is scaling and Anthropic is building meaningful products, but those early magical days of Generative AI’s peak excitement are definitely over. OpenAI’s next model is showing a slower rate of improvement and all indications it’s the same thing over at Google, Anthropic and even at Microsoft whose Copilot products have definitely hit a wall. The exodus of top talent from OpenAI itself is another signal that radically higher funding and access to state-of-the-art compute, promises of AGI and a viral product, aren’t even enough. Reversion to the mean or scaling laws adhering to a less exponential trajectory are only to be expected.
The Anti-Vibecession around Slowing Scaling Laws
Even the supposed enthusiasm over GPT-5 hasn’t really materialized. There’s a anti-Vibecession around slower scaling of AI. There are hints in the actual myth of Orion that sound a lot like OpenAI’s story. The model under development is called “Orion”, named after the giant huntsman of ancient Greek mythology. When he grew up, Orion got a bit too big for his massive boots and, in a fit of pique, threatened to hunt down every living thing on earth. It’s like Sam Altman saying to startups that we’re going to steamroll you. Orion’s myth reflects themes of hubris and retribution common in Greek mythology. If Elon Musk is Apollo and Anthropic is Artemis in the world of Generative AI, I totally get where this is likely heading for the Orion models. Frontier model builders have a lot of engineering and business problems.
OpenAI CEO Sam Altman said on X in February that “scaling laws are decided by god; the constants are determined by members of the technical staff.” The research firm Epoch AI predicted in June that firms could exhaust usable textual data by 2028. Companies are trying to overcome constraints by turning to synthetic data generated by AI itself but B2C and AI Enterprise product uptake also come with their own limitations of scale, adoption and pricing power. In 2025 we’ll be learning a lot more about the upper ceiling of these things and current AI capabilities also depending on the performance of so-called AI agents. I think we can all agree that we should be prepared to have way lower expectations.
With xAI obtaining funding more aggressively than OpenAI and Anthropic combined in recent months, while building an early lead in AI infrastructure and compute capabilities, moving into the second half of the twenty twenties, the promises of OpenAI certainly don’t look very grounded in reality. No matter how viral ChatGPT has gone two years later, it’s obviously not just OpenAI that is hitting a wall as it continues to pour more computing power into its much-hyped large language models (LLMs). Competition is actually good for innovation, and yet the U.S. is really trying to limit the scope of China’s capabilities in AI limiting their access to AI chips, semiconductors, GPUs and even chip-making equipment like the lithography machines that are an essential component in chip manufacturing. Clearly this is not just about a slow down of scaling laws of AI at work here.
Hype Cycles are Brief Windows into Possible Futures in Technology
A depiction of the hype cycle of emerging technologies 2024 by Gartner.
After Scaling Laws and Hype Trends
If the Generative AI trend began in 2017 with the key academic paper and only accelerated in late 2022 with ChatGPT, seven years later it’s only normal we’ll have more of a slowdown and a foundational year for AI Infrastructure and AI Agent frameworks that 2025 is looking to be. In the Autumn of 2024 there’s a growing consensus that AI “scaling law” optimism has indeed been replaced by fears or pragmatism that we may already be hitting a plateau in the capabilities of large language models trained with standard methods.
You don’t need an Ilya Sutskever interview with Reuters, a recently-exited OpenAI cofounder who started SSI to tell you this is happening, because frankly it’s been happening for quite some time. In a Reddit “ask me anything” thread last month, Sam Altman acknowledged that his company faced “a lot of limitations and hard decisions” about allocating its computing resources.
As Nathan Lambert points out the recent Lex interview with Dario has more useful commentary on the topic. A large part of the training problem, according to experts and insiders cited in these and other pieces, is a lack of new, quality textual data for new LLMs to train on.
Consumer Saturation of Chatbot Culture
There’s also a saturation commercially in chatbot culture and mistrust among actual people on the street, consumers, professionals, SMEs, and bigger companies due to things like hallucination rates and the proliferation of synthetic and AI slop content all over the internet. This all creates a fatigue and a mistrust of brute-force capex spend and Generative AI fads.
In the last six months it’s felt like Enterprise AI and now even Government contracts are the next frontier for the likes of OpenAI and Anthropic to find real customers and scale revenue. The ROI for actual people, companies and job productivity isn’t what it’s often claimed to be, scaling laws be damned. If LLM scaling laws themselves are slowing down, it turns out there are multiple bottlenecks for both the pace and ceiling of what Generative AI can accomplish, some technical, others cultural and some even geopolitically based around the capabilities of Generative AI and how quickly LLMs are impacting the world.
Research outfit Epoch AI tried to quantify this problem data (e.g. human generated public text) scaling problem in a paper earlier this year, measuring the rate of increase in LLM training data sets against the “estimated stock of human-generated public text.”
An image showing the projections of the total stock of public text and data usage plateauing in the late twenty twenties hitting critical levels around the year 2028.
(Garrison Lovely) in of The Obsolete Newsletter, recently cited the predictions of Gary Marcus, but here’s when the whole AGI marketing narrative really starts to break down in the 2025 to 2030 period. “The 2010s were the age of scaling, now we’re back in the age of wonder and discovery once again,” said Ilya Sutskever but startups with huge funding with no products (like SSI) don’t sound so wonderful to me. The mismatch between VC friendly AI Accelerationists and these scaling laws showing a significant slowdown is a dire environment for any significant Generative AI startup belonging and trying to scale in the 2025 to 2028 period.
Even the founders of prominent venture capital firm Andreessen Horowitz are observing a plateau in artificial intelligence capabilities, citing data shortages and computational limitations as key barriers to continued rapid advancement. Outside of ChatGPT, its doubtful many of the current Generative AI products will even be around by 2030. Capex by BigTech in AI infrastructure is more a bet on AI’s future capabilities than anything they can do today or will be able to do in the near-future.
So has Generative AI reached a point of diminishing returns? I asked (Tobias Mark Jensen) of the Newsletter Futuristic Lawyer to look into this for us.
To get access to all of my work, considering becoming a premium reader.
🙏🏻 Thanks for reading AI Supremacy.
Has Generative AI Reached a Point of Diminishing Returns?
My most popular article to date on Futuristic Lawyer has the catchy title “AI Could Be Heading Towards the Trough of Disillusionment”. Now that the year is soon coming to an end, it deserves a Part 2, and as always, glad to have the opportunity to share my work here on AI Supremacy.
Recently, new evidence has emerged that the so-called “scaling paradigm” in AI is over. Practically, this means that more compute and more data will not continue to yield better results. We have in other words – perhaps – reached a point of diminishing returns for AI models.
The evidence includes testimonials from insiders at OpenAI, Google, and Anthropic shared with The Information and Bloomberg, statements from the scientific mastermind behind the early GPT models, Ilya Sutskever, and famous venture capitalist, Marc Andreessen.
On OpenAI’s new benchmark SimpleQA, the company’s latest model o1-preview had an accuracy rate of 42.7%, performing only slightly better than GPT4-o at 38.2%. OpenAI’s other new model, o1-mini, had a low accuracy score at 8.1% , slightly worse than the previously released and corresponding model, GPT-4o-mini, which scored at 8.6%. Even though the o1 model series is customized for reasoning tasks and science, coding, and math, we are no longer seeing the same jumps in performance we saw from GPT-2 to GPT-3 and GPT-3 to GPT-4.
Another relevant point, which I won’t get further into here, is whether benchmark tests as a measurement of AI model’s capabilities are even useful. After all, the goal of major AI labs is not to narrowly beat benchmark scores as if it were a 100-meter track race. Within a few years, Sam Altman and Dario Amodei believe the current trajectory of AI can bring us to superintelligence that will solve climate change and cure all diseases. We can clearly infer from the companies’ market valuations that investors see the same potential.
The road to AGI is paved with good intentions
I wrote a post for AI Supremacy in September 2023, about the “AGI” concept and a study by Microsoft Research titled “ Sparks of Artificial General Intelligence: Early experiments with GPT-4”. OpenAI realizing that it could not market its new GPT-4 model as full-blown AGI was begging for just a few sparks.
Anecdotally, I received a lot of positive feedback from the post but a few readers, including a furious manager at Microsoft, made us aware that I had mistakenly cited OpenAI rather than Microsoft Research as the study’s author. This was an actual oversight on my end but I think there is a larger and more important point to make: Microsoft and OpenAI’s commercial interest in GPT-4 is disguised by such a paper-thin margin that OpenAI might as well be a subsidiary of Microsoft.
Notably, the lead author of the “sparks of AGI”- paper, Sébastien Bubeck, now works in OpenAI. The whole purpose of the study was to claim that GPT-4 is not AGI but with sufficient scaling of OpenAI’s current training method, it could be, potentially. The quasi-religious belief in scaling has to some extent incentivized the tech giants and major AI labs to ramp up computing facilities like there is no tomorrow and make huge investments in both green and black energy.
The so-called “scaling paradigm” was formulated in a prominent paper, “Scaling Laws for Neural Language Models” (January 2020) by researchers at OpenAI. It was published a few months before OpenAI’s breakthrough model, GPT-3, was released. The paper demonstrates how increasing the size of a model, using more training data and compute, predictably improves the AI model’s performance. Another very influential paper made by a team at Google DeepMind, “Training Compute-Optimal Large Language Models” showed – in layman’s terms – how the quantity of training data for an AI model is very important, more so than previous research had expected.
The major AI labs have scraped most of the internet’s data now and are reaching a point where training data, namely high-quality training data, is becoming an exhausted resource. This data exhaustion seems to be the main reason why the “scaling paradigm” is struggling. According to OpenAI employees and AI researchers who spoke with The Information, the approach of using synthetic data (not original, human-created data) as a supplement for training AI models, is not having the desired results.
A case made by pundits is that scaling is now moving from training to inference. This means that instead of scaling AI models by increasing compute, data, and parameters during the training process, new AI models could spend more time “thinking” in response to prompts. Such a new paradigm was explored with OpenAI’s o1 model.
I don’t have the knowledge background to assess whether inference scaling is the new paradigm that will be chased. However, costs seem to be a barrier. Not only is o1 expensive to use, it continues to make mistakes, not to mention that the energy consumption it uses takes a large toll on the environment.
A New Phase of Discovery?
If we take the evidence at face value and presume that the scaling paradigm for AI models is over – at least for training – where does that leave us? According to Ilya Sutskever, a new phase of discovery and experimentation is now beginning.
“The 2010s were the age of scaling, now we’re back in the age of wonder and discovery once again. Everyone is looking for the next thing.”
That sounds great but perhaps less so for researchers inside of the major AI labs.
The problem is, when you have a company like OpenAI valued at $157 billion and Microsoft, BlackRock, Global Infrastructure Partners, and MGX spending $100 billion on new data centers for AI, there is not much time for discovery and experimentation. Results need to happen fast.
For instance, if the legal sector is to undergo a complete AI-driven reformation, hallucinations have to be quickly eliminated from AI solutions. As a lawyer, your reputation is the most important asset you have, and one bad mistake can ruin it completely. To keep the momentum of AI-legal tech companies from 2024 going, and implement AI even deeper into the core business of many law companies, the AI solutions have to be trustworthy. Hallucinations should be eliminated to near-zero, preferably yesterday, and the time frame for this to happen cannot be two to three years if the investments are to keep floating.
That is the business reality. AI models not only have to deliver consistent performance gains, they have to drastically remove hallucinations to the point where the models are safe from failure like airplanes are safe from crashing, if relied on in high-stakes applications. AI technology will surely improve over time, but time is – like high-quality data – not a resource in high stock for the major AI companies.
On the other hand, Sam Altman and Dario Amodei appear confident that OpenAI and Anthropic are inching closer toward superintelligence by the day. This kind of marketing (and it is marketing) is meant to stoke attention and excitement among investors and users but it could backfire if the rate of progress is not maintained. Unfortunately, “could”, “might”, “perhaps”, “potentially”, “one day” and “wait and see”, sell much better than cold facts, harsh truths, and actual products.
Editor’s Note:
Tobias Jensen from Denmark is making sense of grey areas between law, IT and ethics where on the Newsletter Futuristic Lawyer he writes about the knowledge gap between big tech companies and democratic institutions. He recently did a deep dive on whether EU AI regulation is holding it back in AI.
Homework: if this topic is of interest to you listen to the CEO of Anthropic’s long 5-hour interview or read the transcript.
If you want a gentle introduction for further insights into into Dario’s interview, I’ve made an Audio Overview by NotebookLM here:
What follow is the Audio Overview based on the 5-hour long YouTube interview between Lex and Dario, it is 20 minutes and 38 seconds long.
Who is Dario Amodei?
As a young researcher, Dario Amodei helped prove the so-called “scaling laws” of AI: the observation that neural networks perform reliably better given more data and computing power. The finding underpins almost everything about today’s AI boom. – Time.
You can also read his latest blog here.
Waiting for the next group of frontier models like Orion, GPT-5, Gemini 2, Claude 3.5 Opus, we’ll get a better sense for how AI scaling laws have slowed down and what it means for Generative AI product development, scaling revenue and real applications. As Generative AI continues to consolidate, frontier models will become increasingly dependent on AI infrastructure that is highly likely to price out smaller players and even entire countries and global regions.
Furthermore there aren’t AI agent frameworks yet that make much sense for scaling them in industry in any meaningful way. With the scaling law plateauing 2025 is likely to be a year of AI product development and innovation in more defined niches of the ecosystem. I also expect Chinese AI startups even on much smaller budgets to make important innovations, like in Open-weight LLMs.
Generative AI needs it own equivalent to Moore’s law, but as the semiconductor industry becomes more important than ever, we are still just in the foundational years of what AI will ultimately become for human civilization. We have to admit that the frontier model makers of today might not be the eventual winners. Diminishing returns on the improvements of LLMs is just one aspect of the challenges ahead.
Read More in AI Supremacy