Death of whistleblower opens can of worms over ChatGPT business model
Just about two months before his death by suicide, Suchir Balaji was featured in a New York Times article where he accused OpenAI of breaking the law.
A 26-year-old OpenAI whistleblower was found dead in his apartment in San Francisco on November 26, barely a few weeks after publicly accusing his then-employer OpenAI of unethical business practices.
Though police said they found no signs of foul play in the death of Suchir Balaji and ruled it a case of suicide, the incident has revived his allegations against OpenAI, the company which developed ChatGPT, the world’s most popular AI chatbot.
Indian-origin Balaji worked at OpenAI for four years, collecting and organising training data for ChatGPT before resigning in August over concerns that the company was breaking the law.
I recently participated in a NYT story about fair use and generative AI, and why I'm skeptical "fair use" would be a plausible defense for a lot of generative AI products. I also wrote a blog post (https://t.co/xhiVyCk2Vk) about the nitty-gritty details of fair use and why I…
— Suchir Balaji (@suchirbalaji) October 23, 2024
In October, he publicly accused the AI giant of copyright infringement in an article published by the New York Times, where he said the company used copyrighted data to train its AI model.
“If you believe what I believe, you have to just leave the company,” he told the media outlet at the time.
OpenAI, Microsoft and other companies involved in developing AI chatbots have consistently refuted claims of a similar nature, asserting that their use of internet data to train their artificial intelligence systems is justified under the requirements of the “fair use” doctrine.
The “fair use” doctrine in the US requires that the use of copyrighted material must meet certain criteria, including whether the purpose of the use is transformative or it effects the use on the potential market value of the original work.
OpenAI argues that it passes those tests as its technology transforms the copyrighted works substantially and they are not competing in the markets of the original works.
Balaji, however, disagreed, stating that “the outputs aren’t exact copies of the inputs, but they are also not fundamentally novel.”
About two months before his death, Balaji penned an analysis for ChatGPT’s use of its training data to prove this claim, publishing it on his personal website.
“While generative models rarely produce outputs that are substantially similar to any of their training inputs, the process of training a generative model involves making copies of copyrighted data,” he wrote, explaining that this is what could legitimise copyright infringement claims.
Balaji was not the only one who accused OpenAI of using copyrighted data to train their chatbots were shared by others.
Numerous news publishers from the United States and Canada, including The New York Times, have initiated legal action against the company, alleging that it utilised millions of articles from these outlets to develop chatbots that now rival traditional news organisations as sources of credible information.
His death came about a month after a court filing named him as a figure whose professional records OpenAI would examine in response to a lawsuit filed by several authors, including the New York Times.
If the courts rule in favour of The New York Times or other news organisations and authors involved in the lawsuit, the decision could impose significant financial burdens on OpenAI and further restrict the limited data available for training models.
While the lawsuit does not specify an exact amount, it claims that OpenAI and Microsoft are liable for damages amounting to "billions of dollars".