Over the past few years, I’ve advised clients ranging from startups to Fortune 50 companies on how to use GenAI responsibly and in compliance with law. One pattern that keeps emerging is a dual challenge: creators (whether businesses or individuals) often want to protect the outputs they generate with AI tools (for commercial purposes or a myriad of other reasons), while also ensuring their proprietary content isn’t being used to train others’ models without their explicit consent.
But if more creators opt out of contributing their content to training datasets, how can GenAI models continue to improve without access to the high-quality data that makes them useful in the first place? Yes, GenAI developers can try to license every single piece of copyrighted material used in training, but given the vast volume and diversity of data required to effectively train and fine-tune large language models (LLMs) over time, is that truly realistic or operationally feasible?
This is what I call the creator’s dilemma: under current US law and regulatory guidance, you can’t generally copyright GenAI-generated content — but others may potentially use your copyrighted works to train their models. However, the law here is still unsettled and murky at best. Here’s what is behind the conflict, and how companies can navigate it.