Business

Nov 26, 2024 6:35 AM

How Do You Get to Artificial General Intelligence? Think Lighter

Billions of dollars in hardware and exorbitant use costs are squashing AI innovation. LLMs need to get leaner and cheaper if progress is to be made.
Illustration: Igor Bastidas

In 2025, entrepreneurs will unleash a flood of AI-powered apps. Finally, generative AI will deliver on the hype with a new crop of affordable consumer and business apps. This is not the consensus view today. OpenAI, Google, and xAI are locked in an arms race to train the most powerful large language model (LLM) in pursuit of artificial general intelligence, known as AGI, and their gladiatorial battle dominates the mindshare and revenue share of the fledgling Gen AI ecosystem.

For example, Elon Musk raised $6 billion to launch the newcomer xAI and bought 100,000 Nvidia H100 GPUs, the costly chips used to process AI, costing north of $3 billion to train its model, Grok. At those prices, only techno-tycoons can afford to build these giant LLMs.

The incredible spending by companies such as OpenAI, Google, and xAI has created a lopsided ecosystem that’s bottom heavy and top light. The LLMs trained by these huge GPU farms are usually also very expensive for inference, the process of entering a prompt and generating a response from large language models that is embedded in every app using AI. It’s as if everyone had 5G smartphones, but using data was too expensive for anyone to watch a TikTok video or surf social media. As a result, excellent LLMs with high inference costs have made it unaffordable to proliferate killer apps.

This lopsided ecosystem of ultra-rich tech moguls battling each other has enriched Nvidia while forcing application developers into a catch-22 of either using a low-cost and low-performance model bound to disappoint users, or face paying exorbitant inference costs and risk going bankrupt.

In 2025, a new approach will emerge that can change all that. This will return to what we’ve learned from previous technology revolutions, such as the PC era of Intel and Windows or the mobile era of Qualcomm and Android, where Moore’s law improved PCs and apps, and lower bandwidth cost improved mobile phones and apps year after year.

But what about the high inference cost? A new law for AI inference is just around the corner. The cost of inference has fallen by a factor of 10 per year, pushed down by new AI algorithms, inference technologies, and better chips at lower prices.

As a reference point, if a third-party developer used OpenAI’s top-of-the-line models to build AI search, in May 2023 the cost would be about $10 per query, while Google’s non-Gen-AI search costs $0.01, a 1,000x difference. But by May 2024, the price of OpenAI’s top model came down to about $1 per query. At this unprecedented 10x-per-year price drop, application developers will be able to use ever higher-quality and lower-cost models, leading to a proliferation of AI apps in the next two years.

I believe this will drive a different way to build an LLM company. Rather than focusing on the AGI arms race, founders will start to focus on building models that are almost as good as the top LLMs, but lightweight and thus ultra-fast and ultra-cheap. These models and apps, purpose-built for commercial applications using leaner models and innovative architecture, will cost a fraction to train and achieve levels of performance good enough for consumers and enterprises. This approach will not lead to a Nobel Prize–winning AI, but will be the catalyst to proliferating AI apps, leading to a healthy AI ecosystem.

For instance, I’m backing a team that’s jointly building a model, an inference engine, and an app all at the same time. Rhymes.ai, a Silicon Valley–based AI startup, trained a model almost as good as the best from OpenAI for $3 million, compared to the more than $100 million that Sam Altman said it cost to train OpenAI’s GPT-4. The inference cost of this model applied to an AI search app such as BeaGo is only $0.03 per query, only 3 percent of GPT-4’s price. And the team also built and launched an AI search app with just five engineers working for two months.

How was that accomplished? Vertical and deep integration that optimized inference, model, and application development holistically.

On the path of AI progression, we have all witnessed the power of LLM as a revolutionary technology. I am a firm believer that generative AI will disrupt the way we learn, work, live, and do business. The ecosystem must work together to get over the cost hurdle and adjust the formula, achieving equilibrium to make AI really work for our society.