The Path to AGI: 12 Predictions for the Large Language Model Industry

I do not believe in AGI (Artificial General Intelligence), because I do not believe that humanity was created to ultimately create AGI that would lead to our own destruction. However, I do believe that the pursuit of AGI can lead to the development of many technologies that will benefit humanity.

Over a year has passed since the introduction of ChatGPT, and it’s clear that the industry has reached deeper waters. Surface-level applications are gradually disappearing, giving way to Agents. Just a year ago, spending $100 a month on OpenAI would have classified you as a major user; now, we’re averaging $1000 per day. Prompts are no longer about fitting a few templates by hand; instead, they are generated by a complex mechanism. Our average Prompt is 20,000 tokens long, and an Agent can easily handle jobs that are several hundred thousand tokens, with the longer ones reaching up to 20 million tokens.

I recently read an article titled “Interesting or Useful,” which had a significant impact on me. It’s the best piece I’ve read in the past six months that discusses the current AI landscape, and it resonated with me, perhaps because the author discussed the issues from an application perspective, which is closely related to my work developing Agents over the last half-year.

image

Coupled with recent frequent communications with investors, I’ve had some reflections that I’ve recorded for future contemplation.

The Barrier of Models

Why address this issue first? After the release of Claude 3, we tested the migration cost and found that we could seamlessly switch an Agent from GPT-4 to Claude 3 without changing a single line of code or Prompt. This was quite shocking to me. In principle, the technologies required for large models are public, and with the current development of the model industry, every company’s models are converging towards GPT-4, with some even surpassing it, such as Claude 3. Does this lead to the conclusion that large models have no technological barriers, only engineering ones? That is, if the industry’s growth slows down, the differences between companies will quickly diminish or even disappear. This implies that the capabilities provided by large models are more like mobile operators, with almost zero cost for upper-layer migration. From this perspective, large models are not a good business. But can you really create a model that outperforms the current commercial models or costs less than the existing open-source models? Ultimately, large models are bound to be a business with high capital and policy barriers.

There are many rumors currently suggesting that GPT-5 will have built-in Agent capabilities. I boldly speculate that this statement is half correct:

Speculation 1: OpenAI will definitely provide System 2 capabilities, that is, Agent capabilities.

Given that the current Chat interface is stateless and too simplistic, which is not conducive to the development of complex Agents in upper-layer applications nor to the building of a moat for OpenAI, it is certain that OpenAI will provide stateful interfaces, such as a Plan interface, to deeply integrate the model with applications.

Speculation 2: The System 2 capabilities provided by OpenAI will be B2B-oriented, and won’t delve into Agents for specific scenarios.

Every specific scene involves industry know-how and private data, which for OpenAI is akin to guerrilla warfare, and is pointless. They only need to capture the largest scenario, transitioning from personal super assistant ChatGPT to ChatAgent.

Speculation 3: There are no so-called vertical models.

I firmly believe that intelligence is universal, and the only way to achieve it is through the scaling law. Practical results show that smaller models are simply not as smart as larger ones. Currently, only GPT-4 and Claude 3 are capable of satisfying super complex reasoning scenarios like those required by a Babel Agent. OpenAI’s abandonment of the large coding model CodeX is very telling. Thus, I think that training a vertical large model is most likely a fool’s errand, regardless of the field. However, Fine-Tuning (FT) may persist in the long term.

image

The Value of Infra

There are two types of infrastructures in the field of AI: one type supports the construction of large models, and the other supports the construction of applications for large models (such as Agents). Let’s discuss these separately, starting with the infrastructure for training and inference of large models.

Speculation 4: The significant value of energy/data centers

Recently, a news story reported that a joint team from Microsoft and OpenAI caused a citywide power outage due to their GPU data center operations. Regardless of how much truth there is to this story, it at least indicates that energy is a crucial strategic resource in the age of AI. Therefore, any infrastructure that can provide value in terms of energy is extremely valuable. Data centers may be worth much more than many people think.

Whether working on underlying models or higher-level applications, data is the core value. However, the publicly available data on the internet has largely been exhausted. A significant amount of new data is AI/artificially generated, and data naturally has characteristics such as modality and industry specificity, making it impossible to have a one-size-fits-all data annotation service. Moreover, as training data volumes increase, new challenges arise in how to store, transport, and analyze the data.

Speculation 6: Convergence in the LLMOps market

Ultimately, only a few companies are working on underlying models, and these companies generally have their own set of tools and pipelines, which are part of their competitive edge. Therefore, there is no market for a standardized LLMOps product. Additionally, there is no reason why open-source models would not develop an infrastructure similar to Kubernetes (K8S), which would benefit everyone in the industry. However, such infrastructure is typically the domain of large corporations.

Now let’s discuss the infrastructure for developing large model applications.

The most well-known player in this field is Langchain, which got an early start and quickly accumulated a user base by leveraging cognitive gaps, then snowballed from there to success. However, as application development has progressed, the landscape in this area has become relatively clear.

Speculation 7: Tools that simplify the development of large model applications will, over time, fall out of favor.

Whenever a new technology emerges, most people are initially in the dark, creating a demand for low-code products that can accelerate developers’ understanding of new concepts. However, from the PC era to the internet and mobile internet era, these drag-and-drop products have never really captured a large market. Another sign is that more and more model APIs are aligning with OpenAI, even directly using OpenAI’s SDK, which has made products that simplified API calls redundant. OpenAI’s own GPT products are an extreme example of this type of tool. Currently, the state of these is clear for all to see. As an application development team, we have virtually no need for such tools. Langchain is already moving into deeper waters, not to reduce the threshold for application development, but to assist in the development of complex applications.

Speculation 8: It’s too early for Agent frameworks.

Since the advent of AutoGPT, various frameworks have emerged to “teach people how to make Agents.” During the development of Babel Agent, we carefully evaluated almost all the frameworks on the market and ended up not adopting any of them. The reason is that the deeper you go, the less they meet your needs. Normally, frameworks are abstracted from mature applications, but now there isn’t even a single mature Agent application, which seems like putting the cart before the horse. I believe there will eventually be an Agent framework, but it is very likely not to come from any of the current framework-building teams, and as mentioned earlier, there’s a possibility that large models will come with their own Agent development frameworks.

Speculation 9: AgentOps is a blue ocean.

Here, AgentOps specifically refers to the observation aspect, or more accurately, Agent APM (Application Performance Management). As previously mentioned, I don’t believe there is a unified approach to Agent development at present, but observation is possible. A Babel Agent job can involve dozens of consecutive calls to large models, each involving tens of thousands of tokens, with a hierarchical call structure. Without observation tools, trying to analyze logs with the naked eye would quickly lead to blindness — this is a real and pressing need. However, perhaps I’m out of the loop, but it seems that only LangSmith is seriously working on this. I think this is the most valuable product that the LangChain team currently offers.

Application Trends

Looking ahead 10 years, shouldn’t the biggest market brought about by the large model revolution appear at the application layer?

Speculation 10: All Applications Become Agents.

The most fundamental value of the intelligence produced by large models is their partial substitution for human intelligence. If applications cannot solve problems the way humans do, their value will be extremely limited. As we have seen in the past year, there is much excitement within the industry, but outside of it, there’s hardly any awareness. Therefore, I boldly speculate that a great many agents will emerge in both interesting and useful directions, which will allow the general public to feel the value of AI.

Speculation 11: The Core Barrier for Agents Comes from Data.

If one must speak of a secret recipe for a barrier in application development, it would be data. This includes proprietary industry data, industry know-how, SOPs, synthetic data, and so on. Whether it’s RAG or FT, it’s data that makes the model perform differently. Currently, other engineering differences are not clear and are likely to be leveled by industry development. However, how long this leveling process will take, and what new barriers might develop during this process, remains unknown.

Speculation 12: General Consumer-Facing Agents Have No Entrepreneurial Opportunity.

Any application that overlaps with ChatGPT scenarios, including those that are surface-level applications, have no room for survival, not to mention companies like Apple that have yet to enter the field, controlling physical world gateways. The biggest trap for these kinds of applications is that they may see growth in the early stages, but in reality, they are just finding Product-Market Fit (PMF) for larger companies. This has happened many times in history, and we can learn from it without further elaboration. Finally, to prove that the above analysis was indeed made after practical thinking, here is a demonstration video of Babel Agent, showing how to build a question-and-answer application with a search function, similar to Perplexity:

We believe AI Developers can take over complex, long-cycle tasks from humans. As mentioned at the beginning, I do not believe in AGI, but I do believe in ACI, Artificial Conditional Intelligence. Agents are the embodiment of ACI, and time will tell.