If you have used ChatGPT Search or are confused, you are aware of how much these AI chatbots are increased by the ability to search the web and view sources inline. Web searches may reduce so-called illusions, which come when creative AI produces inaccurate information. Results are better when they incorporate timely details.
The French firm Linkup is creating an API that allows developers to access web information from reputable, high-end sources and offer the findings to a large language model (LLM) to promote its responses. Many AI developers have found this process to be Retrieval-Augmented Generation (RAG).
Uncertain Future of Web Scraping
More significantly, it's unsure what the future holds for scraping bots. If there isn't a confirmed financial agreement between content creators and the organizations scraping web pages, these bots take content from the open web without payment. This arrangement has many audiences unhappy, raising regulatory concerns regarding AI training.
The current state of online scraping may soon change due to high-profile legal problems, such as the ongoing dispute between the New York Times and OpenAI, the company that generated ChatGPT. For this reason, OpenAI has signed multi-year content licensing contracts with prominent publishers, including the Financial Times, El País, Le Monde, Condé Nast, Axel Springer, and AP.
AI Companies Paying Content Providers
"When OpenAI was discussed with news outlets, we created the business to supplement the responses from OpenAI models and their products, either for training or estimation. Philippe Mizrahi, co-founder and CEO of Linkup, said TechCrunch, "And we thought: 'OK, this is outstanding because we finally have AI companies that pay their sources.'" He described the motivation for the founders' decision to build a company that would link content creators and AI developers for the combined benefit of both parties.
At the moment, content producers must make tough choices about how to respond to GenAI's unlimited data needs. The non-binding robots.txt metadata file, which describes whether a website can be used to train an AI model or not, can be applied to stop web scrapers.
Linkup: A Marketplace for AI and Content Publishers
Linkup is represented as a technical solution. It's a marketplace, a middleman between content producers and businesses looking to add web content to their LLM responses.
To obtain material from publishers without scraping, Linkup engages in content licensing agreements with publishers and interfaces with their CMS. The frequency with which Linkup clients view content is then used to determine how much Linkup pays content partners.
Enhancing AI with External Data
According to Mizrahi, "We're really targeting applications that are using AI in their own products." Therefore, the usual use case is that I construct an AI application using a model from OpenAI or Mistral. I make my own pipeline, but I must include outside data to promote it.
As an aside, GPT models cannot access the web, but ChatGPT can. OpenAI provides LLMs that developers can use using an API (GPT) in addition to the wildly popular application (ChatGPT). However, web search is a feature of ChatGPT.