Suchir Balaji, a 26-year-old Indian American researcher who spent four years at OpenAI, was found dead in his San Francisco apartment. Balaji, who had recently left the company and raised significant concerns about its practices, leaves behind a final social media post that offers a rare and compelling insider’s perspective on the ethical challenges facing generative AI.
Prior to his untimely death, Balaji had become increasingly vocal about the legal and ethical implications of how artificial intelligence companies handle copyrighted data. His last post on X (formerly Twitter) provides a nuanced critique of the current approach to fair use in AI training, drawing from his intimate knowledge of the technology’s inner workings.
“I recently participated in a NYT story about fair use and generative AI, and why I’m skeptical ‘fair use’ would be a plausible defense for a lot of generative AI products. I also wrote a blog post (https://suchir.net/fair_use.html) about the nitty-gritty details of fair use and why I believe this. To give some context: I was at OpenAI for nearly 4 years and worked on ChatGPT for the last 1.5 of them.
I initially didn’t know much about copyright, fair use, etc. but became curious after seeing all the lawsuits filed against GenAI companies. When I tried to understand the issue better, I eventually came to the conclusion that fair use seems like a pretty implausible defense for a lot of generative AI products, for the basic reason that they can create substitutes that compete with the data they’re trained on.
Obviously, I’m not a lawyer, but I still feel like it’s important for even non-lawyers to understand the law — both the letter of it, and also why it’s actually there in the first place. That being said, I don’t want this to read as a critique of ChatGPT or OpenAI per se, because fair use and generative AI is a much broader issue than any one product or company.
I highly encourage ML researchers to learn more about copyright — it’s a really important topic, and precedent that’s often cited like Google Books isn’t actually as supportive as it might seem. Feel free to get in touch if you’d like to chat about fair use, ML, or copyright — I think it’s a very interesting intersection. My email’s on my personal website.”
Addressing potential misunderstandings about his involvement with The New York Times, Balaji explicitly clarified: “The NYT didn’t reach out to me for this article; I reached out to them because I thought I had an interesting perspective, as someone who’s been working on these systems since before the current generative AI bubble. None of this is related to their lawsuit with OpenAI”