OpenAI o1 & Recent Trend of LLMs
Published:
OpenAI’s o1 announcement might just be the event that triggered the most dopamine for me recently. Over the past two days, I stayed up late into the night researching it and tried playing around with it myself. It might sound like somewhat hyperbolic, but I can’t help but feel that I am witnessing one of the most pivotal moments in human history. With both excitement and fear, I’ll take a moment to organize my thoughts about o1.
1. The Significance of o1
I believe the release of o1 is the most impactful development since OpenAI’s release of its very first chatbot, ChatGPT 3.5. GPT-4 improved performance by increasing parameters, and 4o implemented multimodality. These two are certainly important developments, but the direction o1 presents—exploring “reasoning ability”—is more fundamental and, in my view, much more crucial for approaching AGI than the former two.
Before explaining why o1 is fundamentally important, I’ll first explain why GPT-4 and 4o haven’t addressed the core problem.
Let’s first look at GPT-4. GPT-4’s direction is vertical scaling by increasing the number of parameters. To put it more simply, it’s the mindset of “if we throw more data and a bigger model at it, something good will come out!” And indeed, they achieved pretty good performance improvements with this approach. However, we are now hitting bottlenecks with this method.
Firstly, there’s the issue of the quantity and quality of data. The abilities of large models essentially come from their training data. The more training data that reflects a specific capability, the stronger that capability becomes. Naturally, language skills are at the forefront, as the pre-training data contains a significant amount of vocabulary and grammar elements. Hence, language skills are the strongest in large models. However, much of the textual data on the internet has already been exhausted, and now, training is being done with synthetic data. Such data inevitably suffers from a decline in quality.
Secondly, beyond quantity and quality, there’s the issue of “saturation of world knowledge.” Indeed, the more data you have, the more world knowledge it contains, and scaling laws reflect how much world knowledge is embedded in the data. However, there’s a problem here. As large models encounter more data, the proportion of new knowledge in the new data decreases because the model has already seen much of the knowledge in previous data. Thus, as data size increases, the likelihood of encountering new knowledge diminishes, leading to a slowdown in world knowledge growth.
Thirdly, there’s the practical issue. I won’t dive deep into OpenAI’s balance sheet, but the cost is significant. This is why Altman talks about multi-trillion-dollar investments. Building GPT-4 required astronomical amounts of money, and developing GPT-5 or 6 would incur costs that could be hard to handle.
Now, let’s look at the direction of 4o. 4o’s direction is about how to integrate different modalities. However, this doesn’t seem to contribute significantly to enhancing the model’s intelligence. The issue with 4o is that the model’s intelligence isn’t high enough to handle complex tasks. Simply adding new modality data, like images or videos, won’t dramatically enhance intelligence. While it may allow for more diverse multimodal applications, it mainly supplements the model’s “perception” of the external world, rather than fundamentally improving its “cognitive” abilities.
To enhance a large model’s “cognitive” abilities, we ultimately need to improve its cognitive understanding of “language.” Language is the fundamental tool for understanding and thinking about the world (BTW, this is my personal philosophy about LLMs and language that lots of people disagree with). O1 introduces a new direction for moving toward AGI by improving reasoning abilities.
2. The Core Idea of o1
The methodology that OpenAI o1 presents is actually quite simple: it’s the Chain of Thoughts (CoT). CoT is a divide-and-conquer approach that solves complex problems by breaking them down into several subtasks, helping large models tackle complex logical problems. This has already been a well-researched field. However, o1 differs from traditional CoT in two key ways:
First, o1 automates CoT. Before o1, people mainly manually wrote prompts for CoT. But with o1, the chain is drawn directly using Monte Carlo Tree Search, a technique often used in AlphaGo, guiding the model to follow the chain most likely to arrive at the correct answer. This approach eliminates the need for humans to craft complex prompts. Even the workflow itself becomes automated, which is quite a chilling thought when you really think about it.
Second, it doesn’t rely on a multi-agent approach. While agentic workflows are conceptually hot in academia and industry, they haven’t been very practical. The issue here is the joint probability. Simple maths: even if we break a complex task into 10 steps and each step has a 95% success rate, the overall success probability would still be around 60%. However, o1 performs inference using a single model (probably) through self-play (something like SelfRAG) . With this direction, agents may even see a resurgence.
3. What It Means for Machines to Reason
Previous LLMs generated responses autoregressively, predicting the next token based on a sequence of tokens provided beforehand. For humans, this would be like typing without a backspace key. As a result, there were issues like hallucinations and the inability to revise.
But o1 is different. It recognizes the problem, dissects it, divides it into several subtasks, verifies hypotheses, and then deduces the answer. This entire process closely resembles human thought processes. The o1-preview model currently available limits inference time to about 30 seconds, but if given more time, a larger context window, and even the addition of long- and short-term memory (like MemGPT), it might solve problems that humanity has never been able to crack. (Oh, however, I have some other thoughts regarding if recursive self-improvement is really possible. Is so called “superintelligence” with vertical leap achievable with the current LLM paradigm, or would the horizontal expansion of human intelligence be its limit? I will talk about it in later posts.)
Additionally, the current trend in academia is to allocate more computing resources toward inference rather than training. For instance, this is the case with o1, and just this week alone, I came across two papers on this topic (link 1 and link 2). It has even been demonstrated that focusing on inference with smaller models can yield sufficiently good performance (though I might be a bit biased since this is the field I’m researching).
4. The Future of OpenAI
Discussing the fate of a single company might seem trivial in the face of such grand discussions, but as a self-proclaimed pseudo-economist, I feel obligated to leave this section here.
OpenAI has always played a pioneering role in the industry. There are many examples where it was the first to prove that a certain direction was possible—ChatGPT, DALL-E, GPT-4, Sora, GPT-4o, and now o1. And after that, other followers quickly caught up, and sometimes even surpassed OpenAI. A prime example is Sora. If OpenAI hadn’t unveiled Sora to fend off competition, people might not have realized just how far this direction could go. Once that potential was recognized, it became possible to surpass OpenAI by focusing resources on that specific direction. Currently, various video generation models overseas and domestically might be producing better results than Sora, and Sora is still under development. In my personal opinion, OpenAI seems to be spreading itself too thin, not fully concentrating its resources. As a result, the company might be feeling accumulated fatigue.
o1 once again showed people the possibilities, and soon other competitors will catch up quickly. Although no concrete methodology has been shared yet, the overall direction is clear. Within a few months, several companies will likely understand and replicate the technology, and they might even surpass OpenAI (though Altman teases that something even bigger is coming in a few weeks). Moreover, this direction doesn’t seem to be very resource-intensive. It’s likely to focus more on algorithms and data. The data scale doesn’t seem very large, suggesting that this direction might not be very expensive. Hence, I cautiously speculate that OpenAI may not build a significant technological moat again this time.
