Hey Everyone,
While covering AI this week I noticed something peculiar. Open-source LLMs had a coming out party. š Iām not usually one to brag of exaggerate, but what we saw this week was unusual.
The Hugging Face folk were noticing it too:
Like Omar Sanseviero (X, here).
DBRX, by Databricks (MosaicML really)
Jamba, by A21 Labs
Qwen1.5, by Alibaba Cloud
Samba-CoE v0.2 by SambaNova Systems
xAIās Grok 1.5
Mistralās 7B v2
Wild 1-bit and 2-bit quantization with HQQ+
Earlier in March we saw, SaulLM-7B for Law.
If you believe this is a good overview of Open-source innovation at this time, support the author and share it.
Of course many predicted that 2024 would be a great year for Open-source AI, but also Agentic AI and AI devices are showing promise as 2024 trends in AI. But in my mind, Jamba, DBRX and Samba-CoE are all incredibly unusual and specific launches of Open-source LLMs demonstrating a pivotal moment in the diversification and proliferation of these accessible and decentralized AI models.
Billionaire Elon Musk said Friday his artificial intelligence company xAI’s chatbot Grok-1.5 āshould be availableā to the public next week, after the chatbot became open-source and officially entered the rapidly growing AI chatbot market.
The current generative AI revolution wouldnāt be possible without the so-called large language models (LLMs) being optimized constantly and globally simultaneously in a more or less decentralized manner. However itās open-source LLMs that are making many new things possible in how companies train their proprietary data on their own models to have full control and customization.
But behind every AI tool or feature, thereās a large language model (LLM) doing all the heavy lifting, many (to now most) of which are open-source. As these periods like March, 2024 occur, new possibilities will emerge. The density of launches of quality open-source LLMs in March, 2024 is peculiar.
āIn my mind Jamba, DBRX and Samba-CoE are all incredibly unusual and specific launches of Open-source LLMs demonstrating a pivotal moment in the diversification and proliferation of these accessible and decentralized AI models.ā
Gap Between Open-Source and Closed-Source is Narrowing
Th performance of DBRX, Jamba or Qwen1.5 shows the gap between closed-source LLMs and open-source may be closing. As good as Claude 3 Opus and GPT-Turbo are today.
Comparing LLMs and benchmarking have become more complex and complicated than ever before.
Above by of .
Evals on open LLMs
Open LLM Leaderboard by Hugging Face
The Spectrum of Open-Source LLMs
What is an open-source LLM?
The easiest description of an open-source LLM is an LLM thatās available for free and can be modified and customized by anyone.
With an open-source LLM, any person or business can use it for their means without having to pay licensing fees. This includes deploying the LLM to their own infrastructure and fine-tuning it to fit their own needs.
This is the opposite of a closed-source LLM, which is a proprietary model owned by a single person or organization thatās unavailable to the public. The most famous example of this is OpenAIās GPT series of models.
However like some have pointed out, thereās a huge spectrum. Without transparency into the data, with just weights, itās not really open-source. Whatās clear is the evolutionary tree of OS-LLMs is branching out faster in 2024.
Itās beyond the scope of this article to display a full spectrum of Open-source LLMs, but the different grades are truly considerable. Burkov is an interesting thinker on this.
Also the motives for developing Open-source models by Corporations are notable and important incentives to bear in mind. From Meta to Databricks to xAI, they have unique incentives.
Even how corporations like Databricks used the development of an in-house open-source LLM like DBRX, to get ahead of their rivals in AI (in this case Snowflake) means they spent ten million on an impact that could be hundreds of millions of future revenue. I wrote about it this week.
The Untold Story of Open-Source LLMs in 2024
The untold story about the evolution of Open-source LLMs, is how much progress has been made in China and how China will become a leader in this domain. As a China tech watcher, Western tech news rarely covers this I have noticed:
Innovation in China with open-source LLMs is even more frantic than in other regions in early 2024.
United States
France
China
United Kingdom
Israel
Canada
Germany
Singapore
Previous summaries of the top Open-source LLMs are now utterly and completely out of date just a few months later:
Meanwhile even smaller models are getting more efficient, powerful and capable.
Open-source LLMs are evolving at a speed of being a moat killer and leading to new AI startups every day, of every week in 2024. Itās hard for any one human being to meditate on and keep up with this.
Chatbot Arena Leaders Keep Changing
Qwen1.5-MoE: Matching 7B Model Performance with 1/3 Activated Parameters
TL;DR
Ā 14.3B parameters with 2.7B activated during generation
Ā 60 experts with 4 active in generation
Ā Build on Qwen-1.8B using āupcyclingā (Fragment FFN into 8 pieces (Deepspeek MoE) ā Create MoE by replicating fragment 8×8 ā continue pretraining)*
Ā Based & Chat model with 32 768 context
Ā Chat model used DPO for preference training
Ā 75% reduction in training costs to match Qwen1.5-7B performance
Ā Custom License, commercially useable
Models: https://huggingface.co/models?other=qwen2_moeā¦
Blog: https://qwenlm.github.io/blog/qwen-moe/
Mixture of Experts
A lot of the new Open-source models are using more MoEs, like 14 or mostly 16.
The training costs of MoE models deviates significantly from that of their dense counterparts. Despite a larger parameter count, MoE modelsā training expenses can be notably reduced due to sparsity.
The era of widespread MoE is here in early 2024, and this means way more efficient models and the gap between GPT-4 an the rest of foundational models has narrowed considerably.
Conclusion of Open-Source LLMs impact on Generative AI
We can no longer say Open-source LLMs are years behind closed-source models or that China is nine months to years behind the West. Itās no longer true.
Itās all āMaking MoEs go brrrā now, itās a new era. Some people donāt understand that yet. What comes after MoEs? More on that a bit further down.
How good is Grok 1.5?
Notably, the only benchmark where Grok-1.5 seemed to have an edge was HumanEval, where it outperformed all models except Claude 3 Opus.
Elon Musk is of course making grandiose claims about how good Grok-2.0 will be.
Grok-1.5 benefits from āimproved reasoning,ā according to X.ai, particularly where it concerns coding and math-related tasks.
Grok-1.5 will soon be available to early testers on X. Likely next week, in early April, 2024.
Apache 2.0 License Arena Leaderboard
As of March 29th, 2024. Source.
Top Orgs
Mistral
Together AI
OpenChat
NousResearch
RWKV
Nexusflow
How Good is SambaNova?
Samba-CoE v0.2
Have you heard of CoE?
CoE stands for āComposition of Expertsā.
Oddly Sambaās models are composed of more than 50 AI models that have been individually trained and then optimized to work together.Ā
This includes models from SambaNova as well as open-source models that have been curated for specific enterprise tasks. Among the models that are part of Samba-1 are Llama 2, Mistral, DeepSeek Coder, Falcon, DePlot, CLIP and Llava.
Composition of Experts refers in this context I think to keeping each expert model separately trained on its own secure dataset. The security restrictions of the training data propagate to the expert model.
Access: https://coe-1.cloud.snova.ai/
The CEO of SambaNova is Rodrigo Liang. SambaNova Systems was founded in Palo Alto, California in 2017 by three co-founders: Kunle Olukotun, Rodrigo Liang, and Christopher RĆ©.
Future Prospects of Open-source LLMs
With Llama-3 coming in June, 2024, this year truly is the year of the open-source LLM movement. The release of Metaās Llama model and the subsequent release of Llama 2 in 2023 kickstarted an explosion of open-source language models, with better and more innovative models being released on what seems like a daily basis.
The pace now has accelerated of new Open-source models and subsequent startups forming paired with proprietary datasets. Closed-source models like OpenAI, Microsoft or Google might age poorly considering this momentum.
These big-three of March, 2024 really stand out to me as fascinating use cases of the rise of MoE and CoE respectively in open-source LLMs.
More on A21 Labs Jamba
Real-World Adoption by Businesses, Enterprise and SMEs
As actual enterprise adoption of Generative AI takes places, these enterprise companies will gradually begin to experiment extensively with open-source LLMs, and itās only a matter of time before they have deployed LLMs to improve their products, improve retention with their customers and improve margins and productivity of their business.
As such, demand for Open-source LLMs and specialized models for various industries is becoming a real thing.
There are now over 574k models available on Hugging Face.
Benefits of Using Open-Source LLMs
For a company itās really a no brainer due to the obvious reasons:
Lower cost: No licensing fees are required, significantly lowering initial and ongoing expenses.
Organizations can freely deploy these models, leading to direct cost reductions.
Increased control and privacy on proprietary data
Enhanced data security and privacy in general
Cost savings and reduced vendor dependency
Code transparency and language model customization
Active community support and fostering innovation
Added features of community contributions are key in fast-moving spaces
Addressing the environmental footprint of AI
Among many other specific factors relevant to a case-by-case basis.
Thanks for reading! Please support the author by liking, sharing and thinking about the topic deeply.
Read MoreĀ in Ā AI SupremacyĀ