AI companies like OpenAI are looking to get over unusual obstacles and delays. Developing training methods that use more like humans’ ways for machines to "think" is part of their goal to create ever-larger language models.
Many AI experts, researchers, and investors think these methods have the potential to change the AI arms race. This affects the kinds of resources, such as chips and electricity, that AI businesses have a limitless need for.
AI Labs Facing Problems
For this story, OpenAI refuses to comment. Two years ago, when the ChatGPT chatbot became viral, IT companies openly defended the idea of "scaling up" current models by adding more data and processing power. This ability would constantly result in better AI models.
However, some of the most well-known AI researchers are questioning the drawbacks of this "bigger is better" mentality.
Researchers at popular AI labs have faced poor results in the competition to develop a language model that beats OpenAI's GPT-4 model.
Ilya Sutskever Opinions
According to OpenAI and SSI co-founder Ilya Sutskever, the outcomes of increasing pre-training - the stage of training an AI model that uses a large volume of data with no labels to analyze speech patterns and structures - have stopped.
Sutskever stated:
“We are once again in the era of surprise and discovery, which was the decade of scaling in the 2010s. Everybody is trying to find the next big thing”. She said it’s more important than ever to scale the correct thing.
Sutskever stated that SSI is developing a different strategy to scale up pre-training but would not explain how his team deals with the problem.
Problems and Their Solution
Hundreds of chips are run together during the "training runs" for huge models, which can cost tens of millions of dollars. The complexity of the system makes hardware-induced failure a possibility. Researchers could not know the models' final outcome until the run is over, which might take months.
Another issue is that AI models use all the free available information worldwide, and large language models need vast data.
Researchers are investigating "test-time compute" as a solution to these problems. This method improves current AI models while they are in use or during the "inference" phase. This technique enables models to divide more processing resources to difficult tasks that require human-like thinking and decision-making, such as complex operations, math, or coding issues.
O1 Model Creation
OpenAI has used this method in its recently published model, "o1," which was previously known as Q*. Like humans, the O1 model can "think" through issues in many steps. It also uses data and comments selected from industry experts and Ph. Ds.
Meanwhile, five sources who knew about the efforts said that researchers at other leading AI labs, including Google DeepMind, xAI, and Anthropic, have been working on their own variations of the technique.