Having already assured readers that a significant body of copyrighted books were used without permission to train large language models as part of the corpus researchers call Books3, the Atlantic is now ready to collect your rage clicks with a searchable database of those titles. “Since my article appeared, I’ve heard from several authors wanting to know if their work is in Books3. In almost all cases, the answer has been yes.” They tabulated 206 works by Nora Roberts; 133 from Danielle Steel; and only 121 by James Patterson. Amusingly, the publication also wants us to know that a long […]
AI
Project Gutenberg Refines Automated Audiobook Creation
Project Gutenberg has converted approximately 5,000 book files into audiobooks using synthetic voice. Working in concert with MIT and Microsoft, the project aimed to refine producing machine-generated audio conversions at scale. The researchers write: “Our system uses new advances in neural text-to-speech, emotion recognition, custom voice cloning, and distributed computing to create engaging and lifelike audiobooks…. We believe that this work has the potential to greatly improve the accessibility and availability of audiobooks.” In a paper on the project, they explain: “Different audiobooks require different reading styles. Nonfiction works benefit from a clear and neutral voice while fictional works with […]
Publishers Test AI Tools for In-House Processes
Recent news and lawsuits have discussed the use of artificial intelligence with regard to creative work—whether AI-produced work can be copyrighted (it can’t) and if using books to train machine learning is a copyright violation. Much of the conversation within the industry is focused on contract language, where agents and authors are hoping to limit the use of AI without permission and block any training on their material, while publishers are trying to retain flexibility for the future and not make promises they can’t fulfill. Quieter perhaps is how publishing companies are using the technology in their day-to-day operations, and […]
Still Another Lawsuit: Authors Guild Sues OpenAI In New York
Following three similar suits filed in San Francisco (and thus in California’s sometimes unpredictable Ninth Circuit), on Tuesday the Authors Guild and a roster of well-known authors filed suit in New York’s Southern District against OpenAI for the “flagrant and harmful infringements of plaintiffs’ registered copyrights in written works of fiction” in training their large language models. The Guild and named plaintiffs David Baldacci, Mary Bly, Michael Connelly, Sylvia Day, Jonathan Franzen, John Grisham, Elin Hilderbrand, Christina Baker Kline, Maya Shanbhag Lang, Victor Lavalle, George R.R. Martin, Jodi Picoult, Douglas Preston, Roxana Robinson, George Saunders, Scott Turow and Rachel Vail […]
New KDP Guidelines Require Participants to Acknowledge AI-Generated Content
Amazon revised their KDP publishing content guidelines recently to add a section that requires participants to indicate when they have used AI tools to create elements of any submitted book. The service distinguishes between “AI-assisted” work — which does not need to be disclosed — and “AI-generated content,” which does need to be acknowledged. They “define AI-generated content as text, images, or translations created by an AI-based tool. If you used an AI-based tool to create the actual content (whether text, images, or translations), it is considered ‘AI-generated,’ even if you applied substantial edits afterwards.” In contrast, AI-assisted work is […]
Your Books Trained Those Large Language Models
As is already being contested in a number of lawsuits seeking class action status, the core datasets on which all of the major large language models have been trained rely on stolen, copyrighted books. As a new Atlantic magazine article by Alex Reisner puts it, “Pirated books are being used as inputs for computer programs that are changing how we read, learn, and communicate. The future promised by AI is written with stolen words.” BookCorpus was stolen from Smashwords authors. Books3 is a body of between 150,000 and 190,000 books from established publishers and authors. The Atlantic piece extracts the […]