Following three similar suits filed in San Francisco (and thus in California’s sometimes unpredictable Ninth Circuit), on Tuesday the Authors Guild and a roster of well-known authors filed suit in New York’s Southern District against OpenAI for the “flagrant and harmful infringements of plaintiffs’ registered copyrights in written works of fiction” in training their large language models. The Guild and named plaintiffs David Baldacci, Mary Bly, Michael Connelly, Sylvia Day, Jonathan Franzen, John Grisham, Elin Hilderbrand, Christina Baker Kline, Maya Shanbhag Lang, Victor Lavalle, George R.R. Martin, Jodi Picoult, Douglas Preston, Roxana Robinson, George Saunders, Scott Turow and Rachel Vail […]
AI
New KDP Guidelines Require Participants to Acknowledge AI-Generated Content
Amazon revised their KDP publishing content guidelines recently to add a section that requires participants to indicate when they have used AI tools to create elements of any submitted book. The service distinguishes between “AI-assisted” work — which does not need to be disclosed — and “AI-generated content,” which does need to be acknowledged. They “define AI-generated content as text, images, or translations created by an AI-based tool. If you used an AI-based tool to create the actual content (whether text, images, or translations), it is considered ‘AI-generated,’ even if you applied substantial edits afterwards.” In contrast, AI-assisted work is […]
Your Books Trained Those Large Language Models
As is already being contested in a number of lawsuits seeking class action status, the core datasets on which all of the major large language models have been trained rely on stolen, copyrighted books. As a new Atlantic magazine article by Alex Reisner puts it, “Pirated books are being used as inputs for computer programs that are changing how we read, learn, and communicate. The future promised by AI is written with stolen words.” BookCorpus was stolen from Smashwords authors. Books3 is a body of between 150,000 and 190,000 books from established publishers and authors. The Atlantic piece extracts the […]
Online Book Analyzer Taken Down After Backlash
Literature analysis site Prosecraft shut down on Monday after many authors expressed concern online that their books were included without their consent. Prosecraft was a product of Shaxpir, a cloud-based word processing software. According to a blog post, Prosecraft launched in 2017 after founder Benji Smith estimated the word count of his favorite books while writing his memoir. Prosecraft automated that process, analyzing text for word count and number of adverbs, as well as “vividness” and “passive voice,” and sharing sample pages of the latter. The venture used “techniques [that] were originally developed by computational linguists at the UVM Computational […]
Open Road Launches Metadata Service
Open Road is launching an enterprise SaaS program, offering metadata-as-a-service to all publishers on an annual pay-per-title basis. The initiative is designed to “continually optimize metadata and maximize title discovery” by using machine learning and proprietary data on Open Road’s consumer audiences “to automatically and continually optimize metadata.” (The service is a good example of how machine learning can sometimes provide unambiguous benefit, without encroaching on anyone’s rights.) The service is also being provided to publishers who participate in the Open Road’s ebook backlist marketing service at no additional cost. But the SaaS offering gives Open Road a product that […]
Authors Guild Adds AI Training to Model Book Contract
The Authors Guild has added a new clause to its model trade book contract that prohibits publishers from “using or sublicensing books under contract to train artificial intelligence technologies.” The clause is a response to recent concerns raised by authors and publishers regarding digital publishers like Findaway Voices and Bookwire, who have partnered with Apple for machine learning and Google Books for AI-narrated audiobooks, respectively. The AG writes that platforms and publishers have been “adding language to their terms that allows them to data mine books for use in training AI models that will inevitably compete with human-authored works.” The […]