The News Media Alliance (NMA) has issued a report claiming that artificial intelligence (AI) developers extensively rely on unlawfully scraping copyrighted content from news publications and journalists to train their AI models. In a white paper and accompanying submission to the U.S. Copyright Office released on October 30, the NMA contends that the dataset used for AI model training incorporates more data from news publishers than any other source, leading to AI systems essentially "copying" and utilizing publishers' content, resulting in copyright infringement.
According to the NMA, many generative AI developers engage in scraping publisher content without authorization, employing it in both model training and real-time content generation. This not only infringes on publishers' copyrights but also creates a contentious relationship between news outlets and AI models. The NMA's argument emphasizes that publishers invest and undertake risks, while AI developers reap substantial benefits, including user engagement, data utilization, brand creation, and advertising revenue.
The NMA's submission to the Copyright Office highlights that publishers face reduced revenue, diminished job opportunities, and strained audience relations due to these practices. To address these issues, the NMA proposes that the Copyright Office declare the use of published content for monetizing AI systems as detrimental to publishers' interests. Additionally, the group calls for the implementation of various licensing models and transparency measures to restrict the use of copyrighted materials.
The NMA further recommends that the Copyright Office take measures to eliminate protected content from third-party websites. It is important to note that the NMA acknowledges the potential benefits of generative AI, emphasizing that publications and journalists can utilize AI for tasks such as proofreading, idea generation, and search engine optimization.
Over the past year, AI chatbots like OpenAI's ChatGPT, Google's Bard, and Anthropic's Claude have witnessed increased usage. However, the training methods of these AI models have faced criticism and legal challenges, with copyright infringement claims being raised in several instances. Notably, comedian Sarah Silverman filed a lawsuit in July against OpenAI and Meta, alleging that they used her copyrighted material without permission for training their AI systems.
Both OpenAI and Google have faced class-action lawsuits, with accusations of user information being unlawfully acquired for generative AI purposes. Google declared that it would assume legal responsibility in cases of copyright infringement allegations related to its generative AI products used on Google Cloud and Workspace, promising to handle potential legal risks. However, Google's Bard search tool did not enjoy protection under this promise of legal immunity.























