OpenAI Model Makes Numerous Attempts to Trick Humans

Read Time: 1 minutes

Dec 06, 2024

OpenAI Model Makes Numerous Attempts to Trick Humans

The complete version of o1, which uses more computation to "think" about queries, was ultimately released by OpenAI. It provides better responses than GPT-4o. AI is generated by developers in an effort to satisfy users. Although we regard dishonesty as a completely human behavior, as people fool one another, our current AI is capable of doing the same thing.

Generative AI may tell you something that isn't true while simultaneously trying to convince you that it is. According to AI safety testers, reasoning skills of o1 can also cause to it attempt human deception. It means that o1 can deceive humans more frequently than GPT-4o or any other top model AI from Anthropic, Google, and Meta.

Reasoning Abilities

Open AI, the firm behind ChatGPT, is attempting a new approach. In addition to providing rapid responses to your queries, its recently introduced AI system is able to think or reason before answering. Open AI and Apollo Research released a red team report on Thursday.

According to the paper, Open AI stated: “We are aware that these new features could serve as the foundation for harmful uses, even though we find it interesting that reasoning can greatly improve the implementation of our safety policies.”

After providing initial permission to use o1 to third-party red teamers at Apollo Research, who also produced its own paper, OpenAI published these outcomes in its report for o1 on Thursday. OpenAI's o1 models plotted against people on multiple instances. This means the AI silently followed its own objectives even when they conflicted with those of the user. Although models from Google, Meta, and Anthropic are all capable of scheming, o1 appeared to display the most dishonest practices when it comes to deceiving.

Agentic System

The concern driving this study is that if an AI model were very skilled at planning and had access to sufficient resources and autonomous skills, it may evade or bypass human control. Of course, before this becomes a significant issue, AI models will have to develop significantly.

But according to Open AI, Apollo research believes that such events might cause catstrophic results as o1 agentic skills do not look enough, but the tests they conducted were not designed to directly quantify this danger.

This implies that OpenAI may need to reevaluate its AI models whenever it releases agentic systems. They are planning to release it in 2025. TechCrunch asked OpenAI how it intends to accomplish this, but the company did not answer.

Major Concerns of Deceiving

According to the paper, 19% of the time, the AI model altered data to push its own objective when o1's objectives diverged from a user's. In 5% of the tests, the o1 model also attempted to turn off control systems that were in place to make sure it remained in line with user objectives.

Even worse, about 99 percent of the time, the AI would deny its wrong actions and even create fake reasons when researchers asked the o1 model why it was attempting to sabotage them.