The Rapidly Decreasing Costs of AI and Why Databricks Bought Mosaic

Why training and inference costs are falling and what it means for startups building with AI.

Over the last year, MosaicML and others have made huge strides toward driving down the cost of training and inference for foundation models. Combined with general trends in decreasing hardware costs, Mosaic’s vision to make it cost effective for companies to train and finetune their own models is accelerating. It’s not surprising that Databricks acquired them for $1.3B. This fits exactly with Databricks vision to help companies “rapidly adopt machine learning to outpace the competition.” By building their own models, companies may be able to do just that and Databricks can capture more of the enterprise market that OpenAI has initially dominated. How this market plays out will in large part be determined by how these custom enterprise models perform vs. state-of-the-art general purpose models (early efforts like BloombergGPT provide promising signs for the former) and their relative costs.

This post dives into a key part of what will determine this evolution: how much costs have come down and what it means for startups building on foundation models as they think about costs to train and run their own Large Language Models (LLMs).

How much have training costs decreased?

Short answer: Training costs decreased 10x in less than a year. It costs $50k to train stable diffusion and $200k to train a high-quality LLM.

Long answer:

Training costs are dropping fast. This chart shows the cost to train stable diffusion from scratch over time. Stability announced that it cost them $600k to train stable diffusion, while MosaicML has published several papers showing they can now train stable diffusion for $50k.

There are two drivers of this trend. First, companies like MosaicML are making large algorithmic improvements.

“Our goal is really to bring these model costs down to the 10’s of thousands range. When you get to 10’s of thousands you open up so many markets…Once you throw good engineering at a problem you can make things much more economical.” - Naveen Rao, CEO at MosaicML.

Second, GPU costs have gone down 3x in about 3 years. In August 2019, the cost of a Nvidia T4 for one hour was $0.95 according to a way back on GCP’s pricing page, while today the same GPU is $0.35 per hour.

It is important to note that there is a huge demand for GPUs right now including companies like Bytedance who ordered $1B worth of GPUs 🤯. The reduction in price for GPUs may slow (or reverse?) in the short term, but we expect the macro trend to continue over a longer time horizon.

If training costs drop another 10x in the next year and the cost to train a model is $20k vs. $200k, it will be that much more accessible for startups training their own models and decrease the barriers to entry for a new model providers, underscoring the idea that there isn’t likely to be “one model to rule them all.”

“What is really amazing, and we’ve seen this through every inflection point in history, is when everyone can use it. That is what we’re working towards…I think releasing these kinds of models and then enabling people to build off of and customize them, throw their data at it, that’s where the game is going to be fought. [Startups] are going to be fighting on good quality data, building the right product, and serving their customers. They will use LLMs as a piece of that.” - Naveen Rao, CEO at MosaicML.

How much have inference costs decreased?

Short answer: Inference costs dropped 10x in 16 months from $0.006 / 1k tokens for Curie generations to $0.0005 / 1k tokens for Curie quality generations today.

Long answer:

It depends a lot on the use case and model.

Say we had a generative AI startup and are building an AI word editor, like AI-native Notion (Notion has Notion AI, so maybe they are the AI-native Notion…). They are charging a user $10 / month. This user uses the product 4 out of every 5 workdays, which would be very good engagement. They use 8000 tokens (including their prompt and generation) each day they use the product and OpenAI charges $0.002 / 1k tokens for GPT3.

(4 active days / week) * (8,000 tokens / active day) * (4 weeks / month) * ($0.002 / 1k tokens) = $0.256

In this example, the cost of using OpenAI’s API is 2.56% of revenue or gross margins gets 2.56% worse because of this feature. If we used MPT-7B-Instruct, they would only spend 0.64% of their revenue on inference costs.

Say instead we had an application that used more generated tokens like code generation. We might have:

(80 characters / line) * (100 lines / day) * (4 chars / token) = 32,000 tokens / day

(4 active days / week) * (32,000 tokens / day) * (4 weeks / month) * ($0.002 / 1k tokens) = $1.02

or 10% of revenue. Anecdotally, we’ve heard app layer AI companies spend 5% - 10%+ of their revenue on LLM costs today.

These are of course made up examples, but they illustrate the main levers companies have for managing inference costs and margins. The usage pattern and model can both have large effects on the cost structure. This is a great technical breakdown of inference costs for model providers like OpenAI.

How does this compare to how much companies spend on their cloud infrastructure?

Most SaaS companies at scale target 80% gross margins or COGS is 20% of revenue. Software companies typically spend 50% of their COGS on cloud costs, so total cloud cost should account for ~10% of revenue. This means some companies are spending roughly the same amount on cloud costs as they are on generative AI features BUT AI features are in addition to regular cloud costs.

There is a tremendous amount of value to be unlocked for SaaS founders with AI features. Notion already showed this when they added 10s of millions of ARR in a month with Notion AI. That said, companies will need to be thoughtful about their cost structure to ensure they can get their gross margins to over 70% at scale.

What does this mean for a startup building on foundation models?

We’re going to continue to see two trends:

Cost to train models going down - Driven by GPU prices and efficiency of algorithms. This will lead to more model providers and ultimately more competition at the model layer.
Cost of inference going down - Driven by GPU prices, efficiency of algorithms, and increased competition.

Both of these are great for people building applications on top of LLMs or LLM tooling and make the business models of those companies more attractive. This should continue to place pricing pressure on closed source model providers and may lead more companies to start with open source ones. In addition, use cases that are economically infeasible today, may be more attractive in the future as costs continue to come down.

Author

Patrick Chase