Hey Everyone,

I’ve been writing about AI way and well before ChatGPT came out. To be honest in 2022, the moment I felt the most excitement and anticipation for the future was in August, 2022. So what happened at that time?

📶 From our sponsor: 📶

Who is Nebius AI?

AI-centric cloud platform Nebius AI offers GPUs from NVIDIA’s latest lineup — the L4, L40, and H100, along with the last-gen A100.  There is a special offer for all our subscribers: sign up and receive a $1,000 USD trial for testing the platform.

Start Training

This post remains free thanks to the generosity of our monthly sponsor, Nebius AI.

August 22, 2022

This day might have been one of the most significant in AI history of the Generative AI era.

On August 22, 2022, Stability AI co-released Stable Diffusion alongside talented researchers from LMU Munich and Runway ML. It was a “watershed moment” in AI that unlocked the power of text-to-image models. It showed us what was possible with diffusion models. It pushed the open-source movement of AI to the heights it is today.

The trajectory of Open-source innovation in AI just keeps improving. This may not have occurred without Stability AI.

Stability AI since then have released a lot of interesting models, products and updates. They are one of the pioneers and to this day one of the most exciting AI startups out of the UK. of Artificial Ignorance is one of the most clear writers on AI on the planet. His rundowns are some of the best summaries of AI news on Substack.

I asked Charlie to dig into the depth and breadth of the new products and updates Stable Diffusion have been up to.

“ Stability AI has already left an indelible mark on the world of AI. Its commitment to open-source has accelerated the pace of innovation and sparked a wave of creativity across the field.” – Charlie Guo

If you want to get our best deep dives and support my coverage of Emerging tech across 10+ Newsletters, you may do so.

Subscribe now

Stability’s fast-growing list of models

Most generative AI enthusiasts have heard of Stable Diffusion, the open-source image generation model that helped kick off the current deluge of AI-generated images. But many people don’t know that the parent company, Stability AI, has released an entire array of models – pushing the boundaries of what’s possible in image, video, language, code, and 3D modeling.

In the past six months alone, Stability has released or announced an impressive roster of new AI models, giving the open-source community a chance to try and keep up with proprietary AI. If you’re a proponent of democratizing access to AI, it’s worth understanding what’s now available, and what model releases are coming in the near future.

Stable Diffusion 3: Open source, state-of-the-art images

The splashiest of the new releases is Stable Diffusion 3, the company’s most capable text-to-image model. There are a few key things to note about SD3, like the fact that it’s supposed to perform much better with multi-subject prompts, written text, and overall image quality. Stability CEO Emad Mostaque also noted that Stable Diffusion 3 will be able to accept multimodal inputs including video.

Stable Diffusion 3 is actually a family of models, ranging in size from 800M to 8B parameters. Unlike previous versions, it uses a new diffusion transformer architecture, which the company has detailed in a recent research paper. It also employs “flow matching” to smoothly transition from noise to image without simulating every intermediate step.

Early samples suggest SD3 outputs are comparable to other state-of-the-art models like DALL-E 3 and Midjourney in terms of quality and prompt-following ability. But for now, the model is only available through an early preview waitlist.

Stable Cascade: A new, more efficient image generator

Stable Cascade was also announced last month, just before SD3. It’s also a text-to-image model, but has some key architectural differences from the company’s existing image generators. First, it’s based on the Würstchen architecture and is internally composed of three different models called “Stages”. Without getting too much into the technical details, the latter two stages will be available in different sizes, ranging from 700M to 3.6B parameters.

As a result, Stable Cascade is much more efficient than previous models, and can far easier to train and fine-tune on consumer hardware – no massive GPU farm needed. Early testing has also indicated that the model offers much more aesthetically pleasing images right from the start, which has historically been a drawback of Stable Diffusion when compared to other tools like Midjourney.

The existence of Stable Cascade suggests that Stability is pursuing different types of image-generation models for different use cases. While Stable Diffusion 3 is clearly meant to be a flagship model, it likely requires significant amount of compute to run at scale. With Stable Cascade (and SDXL Turbo, which we’ll talk about next), there may be an option for smaller teams or individuals to run the model locally, even if it isn’t the most advanced option available. The licensing of Stable Cascade seems to reflect that – it’s currently unavailable for commercial use.

SDXL Turbo: Real-time text-to-image creation

In the beginning, there was Stable Diffusion. The first version of Stability AI’s image model (version 1.4) was released in August 2022, with versions 1.5, 2.0, and 2.1 all following within just four months. But as the industry advanced, there was a need for bigger and better models, so Stability built SDXL, which was over three times larger than prior versions. However, the bigger size meant slower image generations – leading to SDXL Turbo.

Released last November, SDXL Turbo enables text-to-image generation in real time. Relative to other models like SD3, the quality is likely lower – but seeing how fast the model can run is very impressive. I’m used to waiting seconds if not minutes for images to generate, and seeing SDXL Turbo create new images as fast as I can type out a prompt is magical.

Part of what makes this possible is a new distillation technology, which brought the number of “steps” required to make a final output image from 50 to 1. Adding more steps will still improve the quality of the final image, but the use case of SDXL Turbo is clearly speed.

Like some of the other models we’ll discuss, commercial use of SDXL Turbo is available through Stability’s paid membership. For everyone else, it’s available as a research preview – the code and weights are freely accessible, but you aren’t allowed to use it to make money.

Stable Video Diffusion: AI generated video clips

As the name suggests, Stable Video Diffusion brings Stable Diffusion’s powerful image synthesis capabilities to video. The models (a 14-frame version and a 25-frame version) were also announced last November, and brought with them a number of highly realistic looking (if very short) video clips. At the time of the announcement, the models were slightly ahead of competitors such as Runway and Pika Labs (though two months later, OpenAI would stun the world with its video model Sora).

The company has more plans for the model, as its noted that Stable Video can be easily adapted to downstream tasks such as multi-view synthesis from a single image. But for now, the model is only available for commercial use for those with a paid membership, though there is a waitlist for those who want to use it in the browser.

Stable LM 2 and Stable Code 3B: Language and coding models

Last year, Stability AI released its first language model: Stable LM. In January of this year, the company put out a sequel: Stable LM 2. The model is a 1.6B parameter SLM (“small” language model), trained on English, Spanish, German, Italian, French, Portuguese, and Dutch. At the time of release, the company compared Stable LM 2 to other small models, such as Phi-2 and Falcon 1B, noting Stable LM 2’s favorable benchmark performance.

Stable LM 2 is currently available for commercial use as part of the Stability AI membership. That said, there are an increasingly large number of open-source language models, and it’s difficult to know exactly what separates Stable LM from the others. And so far, I can’t say I’ve seen much evidence that Stable LM is being used in the wild. That said, the company is planning more language model releases, and there’s certainly room to differentiate down the road.

One of those differentiated use cases for language models is code generation. And there’s a model for that: Stable Code 3B, which is fine-tuned for code and is competitive with Meta’s CodeLlama 7B.

Stable Zero123 and TripoSR: Exploring the world of 3D modeling

The last major bucket from Stability AI are 3D models: Stable Zero123 and TripoSR.

Stable Zero123 is an AI model specialized in generating 3D objects from a given image. Two versions are available, one for research purposes only and another (Zero123C) for commercial applications. The model builds on existing research, but uses the same architecture as Stable Diffusion 1.5 while greatly improving the training dataset. However, Stable Zero123 requires some beefy hardware to run.

TripoSR is the most recent model release from the first week of March. Unlike the other models above, TripoSR was released in partnership with another company, Tripo AI. Like Zero123, TripoSR is a model to generate 3D objects from input images. Unlike Zero123, TripoSR requires far less compute (it’s theoretically possible to run it even without GPUs), and is available under an MIT license, meaning no fees to use it commercially.

Stability AI’s path forward

All these models were only released in the last six months – an indicator of how fast the AI space continues to move. What’s more, this isn’t even the full suite of Stability’s releases. There are models for audio (Stable Audio), LLMs (Stable Beluga), and chatbots (Stable LM Zephyr). But in all of these cases, the models are older than six months – an eternity in generative AI these days – and haven’t kept up with the competition.

However, Stability’s path forward is not without challenges. As the company grapples with the financial realities of sustaining open-source development (despite repeated denials from the CEO, the company may be facing financial and/or investor pressure), it may need to find new ways to monetize its technology without compromising its core values. 

The introduction of paid memberships for commercial use is a step in this direction, though some feel that this betrays Stability’s mission, as the models aren’t truly “open-source.” The company is also experimenting with paid hosting services, putting their models behind easily-accessible APIs. Regardless of the specific business model, it will need to figure out some way to reliably make money – building datasets and training models is incredibly expensive.

Despite these challenges, one thing is clear: Stability AI has already left an indelible mark on the world of AI. Its commitment to open-source has accelerated the pace of innovation and sparked a wave of creativity across the field. As the company continues to push forward with new releases and partnerships, it’s poised to shape the future of AI in profound and exciting ways. I (and the rest of the AI community) will undoubtedly be watching closely to see what Stability comes up with next.

More by Charlie Guo:

March, 2024:

Why Claude 3 is a big upgrade

The subscriptionization of AI

AI Roundup 058: Devin and SIMA

Charlie has written for us at least twice before. Including among our most successful guest posts ever.

Read More in  AI Supremacy