OpenAI admits it's 'impossible' to create ChatGPT-like tools without using copyright material, amid court battles over intellectual property theft allegations

 ChatGPT and Microsoft Logo.
ChatGPT and Microsoft Logo.

What you need to know

  • OpenAI has found itself in the corridors of justice after being slapped with multiple lawsuits over copyright infringement.

  • The company admits that it's impossible to create AI chatbots without using copyrighted material from the internet.

  • It highlighted that copyright law doesn't forbid training while making its submission.


While the OpenAI's fiasco that led to its board of directors to stripe Sam Altman of his position at the company as CEO is out of the way, the company can't catch a break as more trouble is seemingly brewing. As 2023 came to an end, The New York Times publicly announced its plans to sue Microsoft and OpenAI over AI unfairly using its copyrighted material, which negatively impacted the outlet monetarily.

Recently joining the fray, two non-fiction authors filed a class-action lawsuit against Microsoft and OpenAI for intellectual property theft, further staking a claim of $150,000 as restitution for damages. For those unaware, AI-powered chatbots like OpenAI's ChatGPT or Microsoft's Copilot (formerly Bing Chat) heavily steal rely on already existing information and resources from the internet (predominantly from websites) for training purposes.

The issue here is that the AI chatbots use the information to curate specific and detailed responses to queries, with "subtle" attribution to the source. What's more, no compensation is provided to content creators for using their work to train these models.

OpenAI recently admitted that it's literally "impossible" to create tools like ChatGPT without copyrighted material from the internet while submitting its defense to the House of Lords communications and digital select committee. For an AI chatbot to provide users with accurate information, it has to refer to vast resources already existing on the internet. However, the twist is that virtually everything on the internet right now is copyrighted.

Because copyright today covers virtually every sort of human expression – including blogposts, photographs, forum posts, scraps of software code, and government documents – it would be impossible to train today’s leading AI models without using copyrighted materials.

OpenAI indicated that limiting its training data set to copyright-free material would create AI chatbots that cannot meet the average user's minimum requirements. Per the company's submission and defense strategy, it's apparent that "fair use" of copyrighted content is its entire lifeline.

Fair use of copyright resources creates a gray area, ultimately presenting a scenario where chatbots can obtain and use copyrighted information without necessarily seeking permission from the owner first. "Legally, copyright law does not forbid training," OpenAI added.

There's no AI without copyrighted content

OpenAI and ChatGPT
OpenAI and ChatGPT

OpenAI, one of the most sought-after companies when it comes to generative AI has openly admitted that it's next to impossible to create AI-powered chatbots like ChatGPT without using copyrighted material to train the models. This is despite having unlimited access to Microsoft resources, on top of its initial multi-billion dollar investment in the technology.

In the past few months, ChatGPT has suffered several setbacks, including reports that it's getting dumber and a decline in its user base. This is amid speculations that OpenAI is running on fumes and on the verge of bankruptcy. Granted, it's quite costly a fair to run a chatbot daily. Figuratively speaking, it's to the tune of 700,000 dollars per day and one water bottle per query for cooling. A report highlighted that generative AI could consume energy to power a small county by 2027 for a year.

While the matter is still in court, it'll be interesting to see how things pan out. President Biden issued an Executive Order addressing safety and privacy concerns revolving around AI, but guardrails for the technology remain a major concern among most users.

AI chatbots have been spotted having lucid hallucinations, erroneously recommending a Food Bank as a tourist attraction, and even asking readers to take part in a poll to determine the cause of a woman's unfortunate passing. If this happened while the chatbots had access to copyrighted material, it raises a lot of concern about how much damage the technology would cause when restricted to copyright-free data. In the meantime, Google's Bard could potentially rise up the ranks having unlimited access to the entire internet.

What are your thoughts on AI chatbots using copyrighted resources without compensation and sweeping the issue under the rug as "fair use"? Let us know in the comments.