Is the ARC-AGI Test Efficient for AGI Evaluation?

Read Time: 1 minutes

A well-known test was designed to check the progress of Artificial General Intelligence (AGI) is in its final stages of being solved. The highlight of the news is that the creators of this test are sceptical about the test design, mentioning that it may not be accurate in measuring AI development. 

What is the ARC-AGI Test?

In 2019, AI expert Francois Chollet introduced the ARC-AGI test “Abstract and Reasoning Corpus for Artificial General Intelligence” for AI evaluations and development. 

He claimed the test to be one of the only benchmarks for measuring the efficiency of AI skill acquisition on unknown tasks and the progress towards AGI. 

This test checked whether an AI could learn new skills and solve problems beyond its training data. After attempting the ARC-AGI test, Chollet wanted to make AI perform the task that a human can. 

Test Results: Improved AI Problem Solving, Still Far From Goal

In June 2024, Chollet and Zapier co-founder Mike Knoop launched a $1 million competition to evaluate and motivate open-source AI capability to beat ARC-AGI. AI performed well, with 55.5% solved problems, which was 20% higher than the 2023 results. The results were still far away from the goal. 

What AI Lack and Fall Short on?

Chollet said AI is more focused on memorizing data than on actual reasoning. It learns patterns and is incapable of producing new generalized data in situations outside of its training data. 

AI “Brute-Forced” the Solutions

Out of the 17,789 in many of the AI submissions to the ARC-AGI test, it was clear that AI didn’t truly “understand” the problems. Chollet and Knoop used the term “brute-forced” to describe AI performance. 

AI brute-forced solutions and used trial and error to answer solutions to problems rather than real reasoning. In addition, the ARC-AGI test might not be efficient in measuring AI performance and problem-solving abilities accurately.

Why ARC-AGI May Not Be An Efficient Test?

The ARC-AGI test consists of puzzle-like problems. AI solves problems and generates answers by changing a grid of coloured squares. The research goal is to estimate how much AI can adapt to new and unseen data challenges.

Criticism and Future of AGI

Knoop and Chollet faced criticism for calling the ARC-AGI the only criteria benchmark to test AI development in problem-solving. Knoop figured out that the ARC-AGI test has not changed since 2019 and does not cope with the complexity of general intelligence. 

AI has a long way to go in learning new skills outside of its training data. Chollet and Knoop plan to release an updated version of the ARC-AGI test soon. They aim to address the issues and continue advancing AI research.