Following revelations that OpenAI’s ChatGPT — along with Google’s BERT and other foundational large language models — was trained on a corpus of over 7,000 books scraped from Smashwords without permission, along with another larger corpus of likely illegal written material — the first class action suit seeking to enforce authors’ rights was brought in a San Francisco Federal Court. Named plaintiffs and authors Paul Tremblay and Mona Awad seek class action status on behalf of a broad class of authors of copyrighted material, in a suit filed by the Joseph Saveri Law Firm and Matthew Butterick against OpenAI. They […]
Legal
Those Fancy Large Language Models Were Trained On eBooks From Smashwords, Without Permission
With the avalanche of interest in generative AI and the large language models they are trained on, stories have resurfaced showing the process’s dirty roots. As shown in a 2021 working paper authored by Jack Bandy and Nicholas Vincent, OpenAI, Google’s BERT and variants, and many other foundational LLMs all have “documentation debt” to BookCorpus, “a popular text dataset for training large language models.” Compiled in 2014 by researchers at the University of Toronto and MIT, BookCorpus should have been called Stolen from Smashwords. The researchers apparently scraped posted, self-published ebooks posted by Smashwords that were being offered to read […]
Federal Government Warns GA Schools About Book Ban Discussions
The U.S. Department of Education’s Office of Civil Rights intervened in a Georgia county, finding that the removal of books about BIPOC and LGBTQ+ people “violated federal laws against race and sex discrimination,” the AP reports. The government’s argument was not about the book removals themselves, but the way the books in question were discussed in school board meetings. “Communications at board meetings conveyed the impression that books were being screened to exclude diverse authors and characters, including people who are LGBTQI+ and authors who are not white, leading to increased fears and possibly harassment,” the DOE wrote. According to […]
Pen America, PRH Sue FL County Over Book Bans
Pen America, Penguin Random House, and a group of authors and local parents have filed a federal lawsuit against the Escambia County, FL school district. According to a release, this is “a first-of-its-kind challenge to unlawful censorship” as it brings parents, authors, and a publisher together to fight escalating book bans. The lawsuit argues that the county’s removal and restriction of books–the majority of which focus on “race, racism, and LGBTQ identities” and are by BIPOC and/or LGBTQ authors–from school libraries violates the First Amendment and the Equal Protection Clause of the Constitution. The suit aims to have the district […]
Publishers and IA “Cautiously Optimistic” Negotiations Could End Soon
As AAP ceo Maria Pallante implied at the organization’s annual meeting, the publishers and the Internet Archive asked Federal judge John Koeltl for yet another postponement to file a proposal for judgment to be entered following his verdict finding the IA guilty of copyright infringement. But the parties see the possibility of resolution soon, writing: “Since our last extension request, the parties have resolved all but a few substantive issues with respect to injunctive relief, and expect that further discussion will resolve all but potentially one. The parties have also had a further exchange of proposals regarding the resolution of […]
Proposals for Internet Archive Judgment “Will Take Some Time,” and More From the AAP Meeting
The AAP publishers and Internet Archive are due to submit proposals “for the appropriate procedure to determine the judgment to be entered” in the verdict finding the IA guilty of mass copyright infringement by later this week after multiple delays. But during the AAP’s virtual annual meeting on Monday, ceo Maria Pallante implied that more time will be needed. “Both sides are discussing a possible stipulation as to the scope of injunctive relief, damages and attorneys fees at the request of the judge,” Pallante reported. But, “This process is extremely important and detailed, and it will take some time.” During a […]