OpenAI introduces its Latest ChatGPT 4o A Day Ahead of Google’s Big AI Announcement

Read Time: 3 minutes

  507 views

OpenAI CTO Mira Murati kicked off the spring update event by introducing the most significant upgrade to their flagship chatbot ChatGPT. Naming the new version ChatGPT 4o where “o” stands for “Omni” which is a combined name for all. The new model accepts audio, image, and text inputs and is capable of generating the output in audio, image, and text. It can identify emotions from visual expressions and is guaranteed to sound more chatty and to some extent flirtatious. 

The upgraded tool was in development for over a year and gained the advantage with the help of over 20 tech companies of varying sizes and capabilities. It is claimed by OpenAI that this new tool is twice as fast and is half the price of ChatGPT-4.

Interesting Timing for Announcement

OpenAI seemingly rushed the announcement to stay ahead of its main competitors. Alphabet is scheduled to hold its annual Google Developers conference. Since Google is one of the pioneers in developing and advancing algorithms, it represents high stakes in the field of AI as well. According to a report from Reuters, Google was planning its own developer’s conference where it is rumored to announce the latest development in many of its AI-related products.

On the other hand, Perplexity has reached a valuation of over $1Bn and is gaining appreciation from the leaders in the field for initiating what OpenAI has just introduced. Perplexity has reportedly over 10 million active users. It shows that OpenAI might have rushed the announcement to stay a step ahead of its rivals.

What Sets ChatGPT 4o Apart from Previous Models?

The latest version of the tech is twice as fast and they claim it would revolutionize the way we interact with AI. It is so advanced that Sam Altman, CEO of the OpenAI has claimed that it feels like being in the movies the way this new version can understand and then respond to his commands. During the spring update, it provided some amusing and somewhat flirty answers that were befitting to the occasion.

Omni Capability

Omni is a combining form that translates to “All”. In the ChatGPT 4o case, it refers to its ability to handle text, speech, and video outputs. In simpler words, you can produce your commands in voice or video format and it would be able to understand and at the same time can produce the response in all three formats.

Previous models lacked this functionality and had to bridge three different models to handle voice input transcription to text. Previous models also struggled to understand and differentiate between tone observation, multiple speakers, and background noises. Previous models were not able to provide outputs such as emotions in the voice, laughter, or singing voice.

Expanded Contextual Understanding and Knowledge Base Access

Unlike its predecessors, ChatGPT 4o is capable of understanding and retaining the context over longer conversations. This means it can perform in a more coherent and contextually relevant way and produce better responses. As mentioned earlier, Sam Altman was so impressed with its contextual understanding that he resembled it to watching a movie character where chatbots can produce emotions, laughter, and deeper understanding.

As time passes, continuous research and development are helping ChatGPT 4o to access a broader and more up-to-date knowledge base. With added data sets and continuous machine learning, it can now produce more accurate information and answer a wider range of questions across various domains. As a result, you would be able to enjoy superior-quality conversations and an enhanced user experience.

However, this new model is set to revolutionize the industry as it has a faster ability to handle all sorts of inputs and outputs in one model and when it comes to the possibilities of this model, even the OpenAI has not tapped in completely. They have stated that they have not explored the full potential of the new model fully and do not yet understand the limitations of the new model either. Initially, they are rolling out limited capabilities that include text and image input and text output with limited audio output. Over the upcoming weeks and months, while they make progress on the technical infrastructure, usability via post-training, and safety, they would limit the audio outputs to a selection of preset voices and continue with the existing safety policies.

Sensing Emotions and Accepting Visual Input

An amazing feature that is closing the gap between human and chatbot interaction is ChatGPT 4o’s ability to sense the emotions of the human sitting on the other end. It can pretty accurately predict if you seem happy or sad and can comment accordingly. During the update, the employee operating the system showed it his selfie and asked about the emotions it sees in the image, to which ChatGPT 4o replied that he looked happy and cheerful and looked like he was in a great mood.

However, this was only a visual input and this new model goes far beyond this capability. You can provide a command by providing a video and it not only understands the complete video but analyze and act upon the instructions within the video.

While before models were unable to produce images with readable text and you would have to use their other offering for image creation such as Dall-e, this model is well capable of producing amazing images. It seems that it is also adapted to emulating human handwriting styles and producing complicated images and you won't be disappointed with the results.

Free Access to ChatGPT 4o

True to its spirit of founding members of keeping it a nonprofit organization for the betterment of mankind, at least for now OpenAI has kept the 4o free to every Chat GPT user from the start. There are both free and paid versions available and there are limitations on real voice conversation which only ChatGPT Plus subscribers can enjoy. In the free version, you can only have a limited number of prompts that you can use and ChatGPT Plus subscribers get five times the amount of prompts and even when you are out of prompts, instead of shutting the door, they revert you to the GPT.3.5.

You can get free access to GPT 4o but it's limited to some prompts. If you are willing to part $20 then you can get 5 times more messages for GPT 4o while getting access to GPT 4 ad 3.5 as well. For $30 you can get everything included in Plus and share it with your whole team. And the highest Tier brings you unlimited high-speed access to GPT4 and 4o and tools like Dall. E.

Glitches and limitations

Bear in mind that this demo was carefully curated and controlled by their team so it would be way too early to predict how accurate or how clumsy this model is. However, there were some clear signs that it required some fine-tuning.

During the demo, it showed weaknesses such as solving the math questions was proving to be a little tricky for it and it had difficulty in understanding complicated voice commands. It also misinterpreted visual clues and there were glaring mistakes in its understanding. Since OpenAI has taken a very brave step to allow millions of its free users to experiment with it for free, it would be interesting to see how it performs in real-life situations.

How OpenAI Fueled the Rise of AI?

OpenAI was a joint venture by Sam Altman, Elon Musk, Ilya Sutskever, and Greg Brockman. Their goal was to operate it as a nonprofit organization that would help advance digital intelligence to benefit humanity. Some of the best minds in the field of AI were recruited and experimentation began on a large scale with huge R&D budgets allocated.

OpenAI kept gaining small victories during the first few years and showed great progress in the research field to set up the foundation stones of generative AI. To raise some capital, in 2019, the company transitioned from a nonprofit to a capped profit model. The move paid off as they were able to pull Microsoft to invest $1Bn and set about to work without any financial worries.

The year 2020 was the watershed moment that changed the very fabric of our society. GPT–3 was introduced and became the fastest-growing consumer app with 100 million active users in two months. There was no stopping for OpenAI as they were miles ahead of any competition and introduced Chat GPT which transitioned in Chat GPT-4. The latest offering from OpenAI is the most powerful and most advanced model that promises to stir up a storm in the AI community. 

Conclusion

OpenAI may have produced a stunner of an application that reached over 100 million active users in a mere two months, but it is now facing serious competition from well-funded and supremely resourced rivals such as Google and perplexity. Those rivals understand the potential in the AI chatbot field and are not coming slowly. OpenAI is now hugely funded and is feeling the pressure to perform and bring more traffic. This is where OpenAI is struggling at the moment the traffic has been inconsistent and a large chunk is being eaten away

Enormous budgets and the freedom to acquire top talent is filling the AI war at an unprecedented rate and the advancements are astonishing. We still have to see what Google has up its sleeve to showcase in its developers conference but one thing is certain. Generative artificial intelligence is now the recognized way to go and there is no stopping for the big players to advance in this field.

However, while we are all excited to be a part of this revolution, we must understand that there must be strongly guarded ethical standards that everyone should strictly practice such as privacy, transparency, accuracy, and precision and refrain from bias. The decision of OpenAI to transition from a nonprofit organization to a capped and then uncapped-profit organization shows that ethical limits cannot be left to the beneficiaries and there must be some sort of authority to safeguard those ethical limits for the betterment of humankind.

Leave a COMMENT

Your email address will not be published. Required fields are marked *