Reckless acquiescence: Authors and piracy in the age of AI
Reckless acquiescence: Authors and piracy in the age of AI
Liam O’Brien
The public’s response to the 2022 release of OpenAI’s AI chatbot ChatGPT was immediate and enthusiastic. By January 2023, ChatGPT was the fastest-growing consumer application in history (Hu 2023), sending OpenAI’s rivals into frenzied development of their own AI language models.
In August 2023, The Atlantic reported that two such models, Meta’s LLaMA and Bloomberg’s BloombergGPT, had been trained using Books3, a dataset comprised of over 170,000 pirated ebooks (Reisner 2023a). Nine months on from the initial furore over Books3, there is no clear pathway to restitution for the authors of those books.
This report explores the varied responses of authors to piracy and AI, while also examining the causal factors of the rapid increase in ebook piracy over the last decade, an increase culminating in the creation of Books3. By doing so, it aims to make sense of the unprecedented situation authors and their advocates now face—how it came about and what it suggests about the future of the profession.
Methodologies and limitations
I have made use of academic sources and quantitative data where possible, but this report relies primarily on empirical research. Books3 is too recent to have been the subject of significant academic discussion, and the nature of online piracy makes quantitative data-gathering difficult if not impossible. AI is inherently unpredictable, and the matters discussed here are ongoing. Consequently, a degree of speculation has been unavoidable.
Nine Australian authors responded via email to my questions about AI, piracy and Books3. Of that group, Matthew Lamb and Peter M. Ball are the subjects of extended discussion here. Attempts were made to contact Trent Dalton, the third author discussed at length, via his social media channels. Dalton had not responded at the time of writing; given his current prominence, this is unsurprising and entirely understandable. In the absence of direct personal communication, I have drawn from previously published interviews Dalton has given to local media.
This is not a report on diversity—or the lack thereof—in Australian publishing, but it nonetheless merits acknowledgment that Lamb, Ball and Dalton are all, like myself, white men. That acknowledgement notwithstanding, the issues discussed here are relevant to all professional and aspiring professional authors irrespective of gender or ethnicity, as this report will demonstrate.
Smashed steam looms and strip-mined souls: early responses to Books3
The use of Books3 to train Meta and Bloomberg’s AI language models was drawn to the public’s attention in August 2023, in the first of a series of articles in The Atlantic by writer and programmer Alex Reisner. Reisner’s reporting revealed Books3’s compiler (independent developer Shawn Presser), its contents (over 170,000 unique ebooks) and its source (Bibliotik, a widely-circulated collection of pirated ebooks) (Reisner 2023a). While it is unknown whether OpenAI used pirated ebooks to train ChatGPT, a lawsuit filed by several American authors against OpenAI in July 2023 alleged this to be the case (Davis 2023). Given Presser compiled Books3 as part of a group effort to independently create datasets approximating those that had formed ChatGPT’s training material (Knibbs 2023), it seems he was also operating on this assumption.
Reactions to Reisner’s reporting came swiftly from some of the biggest names in publishing. In The Atlantic, Stephen King wrote:
Would I forbid the teaching…of my stories to computers? Not even if I could. I might as well be…a Luddite trying to stop industrial progress by hammering a steam loom to pieces (King 2023).
In a statement to Guardian Australia (Burke 2023), the Booker Prize-winning Australian novelist Richard Flanagan said, upon finding ten of his books were included in Books3: ‘I felt as if my soul had been strip mined and I was powerless to stop it. This is the biggest act of copyright theft in history’ (Burke 2023).
The Australian Society of Authors’ statement in response to Books3 echoed Flanagan’s indignation: ‘Authors appropriately feel outraged. The fact is this technology relies upon books, journals, essays, and scripts written by authors … yet permission was not sought nor compensation granted’ (Australian Society of Authors 2023).
Piracy: the lesser of two evils?
In the media coverage following Reisner’s initial reporting, it was the use of books to train AI that appeared to account for most of the outrage from authors, rather than the fact those books had been pirated.
Indeed, the authors contacted for this report mostly (though not unanimously) described being unconcerned about piracy. The view of Alan Baxter, novelist and former president of the Australasian Horror Writers Association, was representative of many:
Piracy has been a part of publishing since the rise of ebooks twenty years ago. It sucks and it has an impact on our bottom line, but it’s nothing like the threat presented by AI. While we’re not being paid for pirated books, at least our work is being read. AI aims to eradicate us. Very different threats (personal communication, 24 April 2024).
But piracy made Books3 possible. It’s unlikely that Presser, who was unemployed at the time he compiled the dataset (Knibbs 2023), could have assembled a collection of similar scope by lawful means.
Lipton (2020) and Davis and Kazi (2021) identify three key factors accounting for the proliferation of ebook piracy since their introduction into the market: unfavourable legislative settings, ease of access, and ideological disputes over the ethics of piracy. If we accept that piracy made Books3 possible, these factors merit further consideration.
The photocopier on trial: Matthew Lamb
Lipton (2020) demonstrates the difficulty of combatting ebook piracy on legal grounds, with individual authors often shouldering the burden of identifying, reporting and proving infringement. In the aftermath of Books3, this burden appears more onerous than ever. A lawsuit brought by a group of US authors against OpenAI had most of its claims dismissed in February 2024, with the presiding judge ruling the authors had failed to demonstrate infringement of their work by ChatGPT (Creamer 2024).
Writer and editor Matthew Lamb, whose biography of the Australian author Frank Moorhouse was published in 2023, told me he is concerned—‘very concerned’, in fact (personal communication, 26 April 2024)—about ebook piracy, and has been so since long before ChatGPT.
Lamb’s research revealed that Moorhouse also held concerns over the protection of copyright that distinguished him from many of his peers. In 1974, Moorhouse and Angus & Robertson sued the University of New South Wales (UNSW) over duplicates of a Moorhouse short story made on a photocopier in the university’s library. In an article for the Sydney Morning Herald, Lamb (2024) writes that Moorhouse:
… understood publishing to be as much a manufacturing industry as a cultural industry, with the value of the latter dependent on the conditions created by the former. So he immediately understood the broader implications of the photocopy machine if the cultural feedback loops it introduced were not adequately managed.
If suing over photocopies aligns Moorhouse with the loom-smashing Luddite described by Stephen King, Lamb shows he was a Luddite with impressive foresight:
In 1967 … Moorhouse argued that ‘presumably reading and writing will be a more minority activity than it is even now’ … ‘most people will become almost continually hooked up electrically to the rest of the world—both visually and aurally’ (Lamb 2024).
Moorhouse’s case against UNSW was successful, enabling reform of Australia’s copyright laws (Cain 2022) but, as demonstrated by the lack of a clear legal remedy for Australian authors affected by Books3, legislation has subsequently failed to keep pace with technology. Lamb doubts it will ever catch up and instead argues:
Rather than think the solution should only be directed at mitigating piracy, a more systemic solution needs to be considered that addresses how and what authors … get paid for the public having access to their work. This means increasing royalties, but also increasing the budget for Public, Education, and Digital Lending Rights, as well as having limits on how much bookshops (usually online retailers) can discount books. That’s just for starters (personal communication, 26 April 2024).
Lamb informed me that within twenty-four hours of the publication of his SMH article about Moorhouse, a rewritten version of the article, credited to Aamir Sheikh, appeared on a website called Cryptopolitan. The biography accompanying the article tells us that, ‘A veteran in content production Amir [sic] is now an enthusiastic cryptocurrency proponent, analyst and writer’. Lamb believes his article was most likely rewritten by AI.
You wouldn’t steal a book ... right?: Peter M. Ball
Ebooks, with their small file sizes relative to music and video, are easily pirated. Prior to COVID-19, piracy of ebooks was rapidly growing (Davis and Kazi 2021, p. 21), while lockdowns in the early days of the pandemic exacerbated the issue (Yesberg 2022).
Responses to my questions from authors create an impression of an industry powerless in the face of the spread of digital piracy. Writer, editor and academic Angela Meyer said of her time in publishing, ‘[N]o matter how many times … pirated copies were reported, there was a feeling of helplessness as they popped up again. Digital channels and distribution can be so widespread’ (personal communication, 29 April 2024).
Peter M. Ball is an author and independent publisher. For five years he worked for the Queensland Writers Centre, managing the Australian Writers Marketplace. He rates piracy as a minor concern, believing little can be done to combat it either legislatively (‘I suspect there will never be the political will to do it’ (personal communication, 24 April 2024)) or by public-facing appeals to the greater good:
I lived through the nineties and early 2000s, and anecdotally the ‘You Wouldn’t Steal a Car’ anti-piracy ads on film and DVD probably created a lot more pirates than it prevented in my computer-literate friends group (personal communication, 24 April 2024).
Brain Jar Press, founded by Ball in 2018, specialises in speculative fiction and has published authors including Angela Slatter, Kaaron Warren and Kim Wilkins, all of whom have won Australia’s Aurealis Award for Excellence in Speculative Fiction. Brain Jar’s website outlines its ‘digital first model—the bulk of our books are sold as ebooks or print-on-demand copies’ (Brain Jar Press 2024). Brain Jar’s books are sold primarily via its own website or through other online retailers. On piracy, Ball says:
For an author of my level—and for the vast majority of the authors I publish through Brain Jar Press—a lack of visibility and audience reach is a far greater issue than piracy (personal communication, 24 April 2024).
Given, though, that authors who have published with Brain Jar, including Slatter, Warren and Wilkins, have had books published elsewhere that ended up in Books3 (Reisner 2023b), it’s probable that increased visibility and audience reach would likely lead to a corresponding increase in the risk of piracy and, by extension, the appropriation of the pirated work in a similar manner to Books3.
Ball says tech-based anti-piracy solutions generally ‘either complicate things for legitimate users (DRM technology) or create larger problems (DMCA takedown notices)’ (personal communication, 24 April 2024). If the only available alternative, though, is the kind of acquiescence towards the threat of infringement that Frank Moorhouse warned against, it is hard to see how the ASA’s goal of fair compensation for authors can ever be achieved.
Bot swallows novel: Trent Dalton
Davis and Kazi refer to the difficulty of combatting piracy ‘in this age of the “copyleft” movements that promote the idea that authors should surrender some of their copyrights so that the public may freely benefit from their work’ (2021:20). In the ASA’s 2023 statement on Books3, CEO Olivia Lanchester addressed this view:
I know the argument will be made that AI services are so valuable to the public that any means are justified. But turning a blind eye to the legitimate rights of copyright owners threatens to diminish already-precarious creative careers (Australian Society of Authors 2023).
It’s unlikely that Lanchester had Trent Dalton in mind. Boy Swallows Universe, Dalton’s first novel, was published in 2018. It became the fastest-selling debut novel in Australian history, with over 745,000 copies sold as of October 2023 (Ludlow 2023). A screen adaptation of the book premiered on Netflix in January 2024.
Boy Swallows Universe was, as Dalton has made clear in multiple interviews, based partly on his own upbringing in suburban Brisbane:
Dalton says the deeply personal story is still ‘incredibly awkward’ for his mother who, like the character in the novel, was in love with a heroin dealer and spent time in jail (Ludlow 2023).
Boy Swallows Universe is included in Books3 (Reisner 2023b). In a statement made to the Australian Associated Press, Dalton’s reaction to his work’s inclusion was reminiscent of Richard Flanagan’s (‘I felt as if my soul had been strip mined’ (Burke 2023)):
The story was my mum’s story so it’s not even potentially taking things from me, it’s taking things from my mum … And that really terrifies me; this sweet mum of mine who went through hell … For me, it’s really unsettling and I find it deeply invasive (Dudley-Nicholson 2023).
The intensely personal language used by both Flanagan and Dalton to describe the unauthorised use of their work stands in contrast to the utterly impersonal nature of that use. It is also oddly reminiscent of the language Presser used to describe his creation of Books3. In a September 2023 piece in Wired, Kate Knibbs writes:
Books3 started as a passion project by a Midwestern guy going through a weird time. ‘I poured my soul into the work’, he says. He saw it as aligned with the open source movement, a way to democratize access to the kind of data sets OpenAI was already using (Knibbs 2023).
Given Books3’s use by Bloomberg and Meta, companies that saw respective revenues of US$12.5 billion and US$117.3 billion in 2023 (Forbes 2024a; Forbes 2024b), it is difficult to see how Presser’s noble aim can be regarded as having been achieved. It is unlikely that Meta or Bloomberg were aware of Presser’s open source philosophy or his personal circumstances when they used Books3 to train their AI language models, any more than Presser would have been aware of Boy Swallows Universe or its author’s sources of inspiration when he compiled the dataset. The scale of Books3 and the way it has been used renders individual human intentions immaterial.
Prior to writing Boy Swallows Universe, Dalton spent over a decade working as a journalist and reporter for The Australian and Brisbane’s Courier-Mail (Purdon 2018). We might reasonably question how much longer that kind of apprenticeship will be available for would-be bestselling novelists. In 2023, Michael Miller, executive chair of News Corp Australia, owner of both The Australian and the Courier-Mail, revealed the company was producing 3000 articles a week with the aid of generative AI, mostly covering fluctuating local datapoints such as fuel prices and traffic conditions (Mediaweek 26 July 2023). If a novel or a book-length work of non-fiction with wide public appeal is currently beyond the capacities of AI, ingestion of works like Boy Swallows Universe aims to ensure that doesn’t remain the case indefinitely.
Automating the author
If Richard Flanagan’s and Stephen King’s respective reactions to Books3 were markedly different tonally, they nevertheless reached identical conclusions. Flanagan was ‘powerless to stop it’ (Burke 2023). King would not prevent the use of his work to train AI, ‘not even if I could’ (King 2023). Both have achieved levels of success far beyond the hopes of most writers, yet both see themselves as helpless in the face of AI.
Of course, financial gain is not the only goal for an author. For some it isn’t a goal at all, at least not one stated publicly. Not all writers set out to be the next Trent Dalton, let alone the next Stephen King. Many writers write for no other reason than that they enjoy writing. The act of writing is not—or at least does not have to be—an inherently painful one for the writer. It does not by necessity require the raw confrontation with one’s own past that Boy Swallows Universe apparently did. Kim Wilkins, a novelist and lecturer at the University of Queensland, contends:
Automation is coming for us all. All the moral panic now from journalists and writers … where were they when they got rid of the checkout chicks? In the fucking self-checkout line, that’s where. AI and piracy cannot take away the joy I feel from writing stories. That’s my bottom line (personal communication, 24 April 2024).
If all authors felt this way, though, this report would not exist. Books3 would not have elicited the fury from authors that it did. Lamb (2024) writes that, by 1968, Frank Moorhouse had come to regard ‘“writers as blacksmiths of this century”, a profession sliding towards obsolescence’. The situation for authors in 2024 is not quite as grim as that but, as author and publisher Joanne Anderton notes:
… I don’t think we value books as much as we used to. That being said, I’ve met so many people through workshops and uni classes who want to tell a story, and find myself wondering at the disconnect (personal communication, 29 April 2024).
Whether we value books as much as we used to is up for debate. But the unauthorised inclusion of over 170,000 books in a publicly-available collection, and the subsequent profit-seeking use of that collection by multinational corporations, would suggest a fairly damning valuation.
Conclusion
By illustrating the difficulties the ASA faces in securing its goal of appropriate compensation for authors whose work is used to train AI, I do not intend to pour scorn on their efforts. Their contentions are legitimate and the outcomes they seek are appropriate. Writing is, as they describe, ‘very real work’.
Nor is it my intention to suggest that authors who historically have been unconcerned about piracy should accept some form of responsibility, even partial responsibility, for the current state of affairs. Half a century on from Frank Moorhouse’s lawsuit against UNSW, it is unreasonable to expect groups like the ASA, let alone individual authors, to single-handedly win the battle for their right to fair payment and protection from infringement.
An ethical change of heart in the AI sector seems unlikely to materialise; as Kate Knibbs wrote in Wired in September 2023, ‘All this increased scrutiny on data sets has made AI’s big players shy away from transparency’. It is also hard to see how publishers can confront the problem by any means other than technological anti-piracy tools. Of course, there are multiple flaws inherent in that approach, not least that it creates reliance on proprietary software and, by extension, reliance that the owners of that software will ensure it remains up-to-date and effective.
While it is hard to fault the general mood of pessimism regarding the prospects of government providing a solution, it remains the case that this is where the onus for doing so rests. If Australia’s literary culture is to continue, its supporters cannot be apathetic when it comes to agitating for legislative reform. Modernisation of Australia’s copyright laws, including provisions dealing specifically with the use of copyrighted material in the training of AI software and significant deterrent penalties for unauthorised use, should not be treated as an unachievable aim.
Of course, even if wholesale legislative reform in Australia can be achieved, this will have negligible impact in the absence of a coordinated international approach. Nonetheless, there is no reason why Australia cannot or should not be a leading voice for change here. It will have to, if Australian literature is to become anything other than a historical curiosity.
In its September 2023 statement on Books3, the ASA warned that ‘authors and artists are being locked out of the AI boom. It’s not too late to turn this around’. In order to do this, authors and their advocates in Australia, the United States and elsewhere will need all the help they can get, from the highest levels down.
Pull quote
Liam O’Brien (he/him) is a writer originally from Meanjin/Brisbane and now based in Naarm/Melbourne. His favourite writers are James M. Cain, Shirley Jackson and Nathanael West. When he’s not writing, avoiding writing, or considering giving up writing for good this time, he likes to listen to the Grateful Dead.
-
Australian Society of Authors 28 September 2023, ASA response to use of Australian books to train AI [media release], Australian Society of Authors website, accessed 6 May 2024, <https://www.asauthors.org.au/news/asa-response-to-use-of-australian-books-to-train-ai/>.
Brain Jar Press 2024, Submission Guidelines, Brain Jar Press website, accessed 7 May 2024, <https://www.brainjarpress.com/submission-guidelines/>.
Burke, K 2023, ‘“Biggest act of copyright theft in history”: thousands of Australian books allegedly used to train AI model’, Guardian Australia, 29 September, accessed 6 May 2024, <https://www.theguardian.com/australia-news/2023/sep/28/australian-books-training-ai-books3-stolen-pirated>.
Cain, S 2022, ‘Frank Moorhouse, Australian author and essayist, dies aged 83’, Guardian Australia, 27 June, accessed 6 May 2024, <https://www.theguardian.com/books/2022/jun/27/frank-moorhouse-australian-author-and-essayist-dies-aged-83>.
Creamer, E 2024, ‘Two OpenAI book lawsuits partially dismissed by California court’, Guardian, 15 February, accessed 6 May 2024, <https://www.theguardian.com/technology/2022/dec/05/what-is-ai-chatbot-phenomenon-chatgpt-and-could-it-replace-humans>.
Davis, C and Kazi, U 2021, ‘Piracy of Books in the Digital Age’, in Bogre M and Wolff N (eds) The Routledge Companion to Copyright and Creativity in the 21st Century, Routledge, <doi:10.4324/9781315658445>.
Davis, W 2023, ‘Sarah Silverman is suing OpenAI and Meta for copyright infringement’, The Verge, 10 July, accessed 6 May 2024, <https://www.theverge.com/2023/7/9/23788741/sarah-silverman-openai-meta-chatgpt-llama-copyright-infringement-chatbots-artificial-intelligence-ai>.
Dudley-Nicholson, J 2023, ‘Why stolen Australian books are being used to train AI’, The Canberra Times, 8 October, accessed 8 May 2024, <https://www.canberratimes.com.au/story/8377940/why-stolen-australian-books-are-being-used-to-train-ai/>.
Forbes 2024a, Bloomberg, Forbes website, accessed 8 May 2024 <https://www.forbes.com/companies/bloomberg/?sh=134da51874a1>.
Forbes 2024b, Meta Platforms, Forbes website, accessed 8 May 2024, <https://www.forbes.com/companies/meta-platforms/?sh=264db63a4a5c>.
Hu, K 2023, ‘ChatGPT sets record for fastest-growing user base—analyst note’, Reuters, 3 February, accessed 9 May 2024, <https://www.reuters.com/technology/chatgpt-sets-record-fastest-growing-user-base-analyst-note-2023-02-01/>.
King, S 2023, ‘Stephen King: My Books Were Used to Train AI’, The Atlantic, 23 August, accessed 6 May 2024, <https://www.theatlantic.com/books/archive/2023/08/stephen-king-books-ai-writing/675088/>.
Knibbs, K 2023, ‘The Battle Over Books3 Could Change AI Forever’, Wired, 4 September, accessed 8 May 2024, <https://www.wired.com/story/battle-over-books3/>.
Lamb, M 2024, ‘The legendary Australian author ahead of his time on AI’, Sydney Morning Herald, 8 January, accessed 5 May 2024, <https://www.smh.com.au/culture/books/the-legendary-australian-author-ahead-of-his-time-on-ai-20240104-p5ev81.html>.
Lipton, J 2020, ‘Mass digitization in the ebook market: copyright protections and exceptions’, in Aplin T (ed) Research Handbook on Intellectual Property and Digital Technologies, Edward Elgar Publishing, <doi:10.4337/9781785368349>.
Lock, S 2022, ‘What is AI chatbot phenomenon ChatGPT and could it replace humans?’, Guardian, 5 December, accessed 29 March 2024, <https://www.theguardian.com/technology/2022/dec/05/what-is-ai-chatbot-phenomenon-chatgpt-and-could-it-replace-humans>.
Ludlow, M 2023, ‘Fastest-selling debut novelist in Australia gets Netflix series’, Australian Financial Review, 6 October, accessed 8 May 2024, <https://www.afr.com/companies/media-and-marketing/trent-dalton-s-never-ending-love-story-with-brisbane-20230927-p5e7y9>.
Mediaweek 26 July 2023, ‘Michael Miller tells publishers how News Corp Australia had best year in a decade’, Mediaweek, accessed 26 July 2023, <https://www.theage.com.au/politics/federal/economists-tip-august-interest-rate-hike-as-the-cost-of-living-rises-20220124-p59qoc.html>.
Purdon, F 2018, ‘“We just knew him as Slim … we didn’t know he escaped from Boggo Rd prison”’, The Courier-Mail, 29 June, accessed 8 May 2024, <https://www.couriermail.com.au/lifestyle/brisbanenews/we-just-knew-him-as-slim-we-didnt-know-he-escaped-from-boggo-rd-prison/news-story/7d6fdfd86880d579f882431ba728f4bf>.
Reisner, A 2023a, ‘Revealed: The Authors Whose Pirated Books Are Powering Generative AI’, The Atlantic, 19 August, accessed 6 May 2024, <https://www.theatlantic.com/technology/archive/2023/08/books3-ai-meta-llama-pirated-books/675063/>.
Reisner, A 2023b, ‘These 183,000 Books Are Fueling the Biggest Fight in Publishing and Tech’, The Atlantic, 25 September, accessed 6 May 2024, <https://www.theatlantic.com/technology/archive/2023/09/books3-database-generative-ai-training-copyright-infringement/675363/>.
Yesberg, H 2022, ‘Libraries, Piracy and the Grey Area In-Between: Free Digital Media during the COVID-19 Pandemic’, Reinvention: an International Journal of Undergraduate Research, 15:1, <doi:10.31273/reinvention.v15i1>.