Hey Everyone,

While covering AI this week I noticed something peculiar. Open-source LLMs had a coming out party. 🎉 I’m not usually one to brag of exaggerate, but what we saw this week was unusual.

The Hugging Face folk were noticing it too:

Like Omar Sanseviero (X, here).

DBRX, by Databricks (MosaicML really)

Jamba, by A21 Labs

Qwen1.5, by Alibaba Cloud

Samba-CoE v0.2 by SambaNova Systems

xAI’s Grok 1.5

Mistral’s 7B v2

Wild 1-bit and 2-bit quantization with HQQ+

Earlier in March we saw, SaulLM-7B for Law.

If you believe this is a good overview of Open-source innovation at this time, support the author and share it.

Of course many predicted that 2024 would be a great year for Open-source AI, but also Agentic AI and AI devices are showing promise as 2024 trends in AI. But in my mind, Jamba, DBRX and Samba-CoE are all incredibly unusual and specific launches of Open-source LLMs demonstrating a pivotal moment in the diversification and proliferation of these accessible and decentralized AI models.

Billionaire Elon Musk said Friday his artificial intelligence company xAI’s chatbot Grok-1.5 “should be available” to the public next week, after the chatbot became open-source and officially entered the rapidly growing AI chatbot market.

The current generative AI revolution wouldn’t be possible without the so-called large language models (LLMs) being optimized constantly and globally simultaneously in a more or less decentralized manner. However it’s open-source LLMs that are making many new things possible in how companies train their proprietary data on their own models to have full control and customization.

But behind every AI tool or feature, there’s a large language model (LLM) doing all the heavy lifting, many (to now most) of which are open-source. As these periods like March, 2024 occur, new possibilities will emerge. The density of launches of quality open-source LLMs in March, 2024 is peculiar.

“In my mind Jamba, DBRX and Samba-CoE are all incredibly unusual and specific launches of Open-source LLMs demonstrating a pivotal moment in the diversification and proliferation of these accessible and decentralized AI models.”

Gap Between Open-Source and Closed-Source is Narrowing

Th performance of DBRX, Jamba or Qwen1.5 shows the gap between closed-source LLMs and open-source may be closing. As good as Claude 3 Opus and GPT-Turbo are today.

Comparing LLMs and benchmarking have become more complex and complicated than ever before.

Above by of .

Evals on open LLMs

Open LLM Leaderboard by Hugging Face

The Spectrum of Open-Source LLMs

What is an open-source LLM?

The easiest description of an open-source LLM is an LLM that’s available for free and can be modified and customized by anyone.

With an open-source LLM, any person or business can use it for their means without having to pay licensing fees. This includes deploying the LLM to their own infrastructure and fine-tuning it to fit their own needs.

This is the opposite of a closed-source LLM, which is a proprietary model owned by a single person or organization that’s unavailable to the public. The most famous example of this is OpenAI’s GPT series of models.

However like some have pointed out, there’s a huge spectrum. Without transparency into the data, with just weights, it’s not really open-source. What’s clear is the evolutionary tree of OS-LLMs is branching out faster in 2024.

It’s beyond the scope of this article to display a full spectrum of Open-source LLMs, but the different grades are truly considerable. Burkov is an interesting thinker on this.

Also the motives for developing Open-source models by Corporations are notable and important incentives to bear in mind. From Meta to Databricks to xAI, they have unique incentives.

Even how corporations like Databricks used the development of an in-house open-source LLM like DBRX, to get ahead of their rivals in AI (in this case Snowflake) means they spent ten million on an impact that could be hundreds of millions of future revenue. I wrote about it this week.

The Untold Story of Open-Source LLMs in 2024

The untold story about the evolution of Open-source LLMs, is how much progress has been made in China and how China will become a leader in this domain. As a China tech watcher, Western tech news rarely covers this I have noticed:

Innovation in China with open-source LLMs is even more frantic than in other regions in early 2024.

United States

France

China

United Kingdom

Israel

Canada

Germany

Singapore

Previous summaries of the top Open-source LLMs are now utterly and completely out of date just a few months later:

Meanwhile even smaller models are getting more efficient, powerful and capable.

Open-source LLMs are evolving at a speed of being a moat killer and leading to new AI startups every day, of every week in 2024. It’s hard for any one human being to meditate on and keep up with this.

Chatbot Arena Leaders Keep Changing

Qwen1.5-MoE: Matching 7B Model Performance with 1/3 Activated Parameters

TL;DR

14.3B parameters with 2.7B activated during generation

60 experts with 4 active in generation

Build on Qwen-1.8B using “upcycling” (Fragment FFN into 8 pieces (Deepspeek MoE) → Create MoE by replicating fragment 8×8 → continue pretraining)*

Based & Chat model with 32 768 context

Chat model used DPO for preference training

75% reduction in training costs to match Qwen1.5-7B performance

Custom License, commercially useable

Models: https://huggingface.co/models?other=qwen2_moe…

Blog: https://qwenlm.github.io/blog/qwen-moe/

Mixture of Experts

A lot of the new Open-source models are using more MoEs, like 14 or mostly 16.

MoE Explained

The training costs of MoE models deviates significantly from that of their dense counterparts. Despite a larger parameter count, MoE models’ training expenses can be notably reduced due to sparsity.

The era of widespread MoE is here in early 2024, and this means way more efficient models and the gap between GPT-4 an the rest of foundational models has narrowed considerably.

Conclusion of Open-Source LLMs impact on Generative AI

We can no longer say Open-source LLMs are years behind closed-source models or that China is nine months to years behind the West. It’s no longer true.

It’s all “Making MoEs go brrr” now, it’s a new era. Some people don’t understand that yet. What comes after MoEs? More on that a bit further down.

How good is Grok 1.5?

Notably, the only benchmark where Grok-1.5 seemed to have an edge was HumanEval, where it outperformed all models except Claude 3 Opus.

Elon Musk is of course making grandiose claims about how good Grok-2.0 will be.

Grok-1.5 benefits from “improved reasoning,” according to X.ai, particularly where it concerns coding and math-related tasks.

Grok-1.5 will soon be available to early testers on X. Likely next week, in early April, 2024.

Apache 2.0 License Arena Leaderboard

As of March 29th, 2024. Source.

Top Orgs

Mistral

Together AI

OpenChat

NousResearch

RWKV

Nexusflow

How Good is SambaNova?

Samba-CoE v0.2

Have you heard of CoE?

CoE stands for ‘Composition of Experts’.

Oddly Samba’s models are composed of more than 50 AI models that have been individually trained and then optimized to work together.

This includes models from SambaNova as well as open-source models that have been curated for specific enterprise tasks. Among the models that are part of Samba-1 are Llama 2, Mistral, DeepSeek Coder, Falcon, DePlot, CLIP and Llava.

Composition of Experts refers in this context I think to keeping each expert model separately trained on its own secure dataset. The security restrictions of the training data propagate to the expert model.

Access: https://coe-1.cloud.snova.ai/

The CEO of SambaNova is Rodrigo Liang. SambaNova Systems was founded in Palo Alto, California in 2017 by three co-founders: Kunle Olukotun, Rodrigo Liang, and Christopher Ré.

Learn More

Future Prospects of Open-source LLMs

With Llama-3 coming in June, 2024, this year truly is the year of the open-source LLM movement. The release of Meta’s Llama model and the subsequent release of Llama 2 in 2023 kickstarted an explosion of open-source language models, with better and more innovative models being released on what seems like a daily basis.

The pace now has accelerated of new Open-source models and subsequent startups forming paired with proprietary datasets. Closed-source models like OpenAI, Microsoft or Google might age poorly considering this momentum.

These big-three of March, 2024 really stand out to me as fascinating use cases of the rise of MoE and CoE respectively in open-source LLMs.

DBRX

Jamba

Samba-CoE

More on A21 Labs Jamba

Artificial Intelligence Learning 🤖🧠🦾

What is A21 Labs Jamba?

Hey Everyone, For anyone that knows me, I’m a huge fan of Labs that get less credit like Aleph Alpha in Germany and A21 Labs in Israel. This will just be a short piece on Jamba. Introducing Jamba: AI21’s Groundbreaking SSM-Transformer Model Tracking…

a day ago · 4 likes · Michael Spencer

Real-World Adoption by Businesses, Enterprise and SMEs

As actual enterprise adoption of Generative AI takes places, these enterprise companies will gradually begin to experiment extensively with open-source LLMs, and it’s only a matter of time before they have deployed LLMs to improve their products, improve retention with their customers and improve margins and productivity of their business.

As such, demand for Open-source LLMs and specialized models for various industries is becoming a real thing.

There are now over 574k models available on Hugging Face.

Benefits of Using Open-Source LLMs

For a company it’s really a no brainer due to the obvious reasons:

Lower cost: No licensing fees are required, significantly lowering initial and ongoing expenses.

Organizations can freely deploy these models, leading to direct cost reductions.

Increased control and privacy on proprietary data

Enhanced data security and privacy in general

Cost savings and reduced vendor dependency

Code transparency and language model customization

Active community support and fostering innovation

Added features of community contributions are key in fast-moving spaces

Addressing the environmental footprint of AI

Among many other specific factors relevant to a case-by-case basis.

Thanks for reading! Please support the author by liking, sharing and thinking about the topic deeply.

The Biggest Open-Source Week in the History of AI

“In my mind Jamba, DBRX and Samba-CoE are all incredibly unusual and specific launches of Open-source LLMs demonstrating a pivotal moment in the diversification and proliferation of these accessible and decentralized AI models.”

Gap Between Open-Source and Closed-Source is Narrowing