Meta Stole Copyrighted Work from Millions of Authors

On December 9, 2024, I wrote about Meta’s new terms of service, effective January 1, 2025. This month, I’m even more disgusted by what I learned. An email from one of my publishers told me Meta stole 7.5 million books and 81 million research papers to train their new AI model, Llama 3.

For those who haven’t heard the news yet, Alex Reisner first broke the story in The Atlantic

“When employees at Meta started developing their flagship AI model, Llama 3, they faced a simple ethical question. The program would need to be trained on a huge amount of high-quality writing to be competitive with products such as ChatGPT, and acquiring all of that text legally could take time. Should they just pirate it instead?”

Meta employees spoke with multiple companies about licensing books and research papers, but they nixed that idea, stating, “[This] seems unreasonably expensive.” A Llama-team senior manager also said it’d be an “incredibly slow” process. “They take like 4+ weeks to deliver data.”

Offended yet? Not only has Meta and others stolen copyrighted work but they’ve reduced authors’ blood, sweat, and tears to nothing more than “data.”

“The problem is that people don’t realize that if we license one book, we won’t be able to lean into fair use strategy,” said the director of engineering at Meta in an internal memo.

If caught, the senior manager claimed the legal defense of “fair use” might work for using pirated books and research papers to train AI…

“[It is] really important for [Meta] to get books ASAP. Books are actually more important than web data.”

How did they solve this problem? Meta employees turned to LibGen (Library Genesis), a digital warehouse of stolen intellectual property, neatly stacked with pirated books, academic papers, and various works authors and publishers never approved.

As of March 2025, the LibGen library contained more than 7.5 million books and 81 research papers. And Meta stole it all, with permission from “MZ”—a reference to CEO Mark Zuckerberg—to download and use the data set.

Internal correspondence were made public this month as part of a copyright-infringement lawsuit brought by Sarah Silverman and other celebs whose books LibGen pirated. If that’s not bad enough, the public also discovered OpenAI used LibGen for similar purposes. Microsoft owns a 49% equity stake in the for-profit subsidiary OpenAI LP. It is not yet known whose idea it was to download the LibGen library to train its AI model.

Does it matter? They still used copyrighted material without obtaining licensing fees or giving authors the option to opt-out.

“Ask for forgiveness, not for permission,” said another Meta employee.

Even when a senior management employee at Meta raised concerns about lawsuits, they were convinced to download the libraries from LibGen and Anna’s Archive, another massive pirate site.

“To show the kind of work that has been used by Meta and OpenAI, I accessed a snapshot of LibGen’s metadata—revealing the contents of the library without downloading or distributing the books or research papers themselves—and used it to create an interactive database that you can search here:

https://reisner-books-index.vercel.app

~ Alex Reisner, The Atlantic

Meta and OpenAI have both claimed the defense of “fair use” to train their generative-AI models on copyrighted work without a license, because LLMs (Large Language Models) “transform” the original material into new work. Work that could directly compete with the authors they stole from—by duplicating their writing voice and style!

This legal strategy could set a dangerous precedent: It’s okay to steal from authors. Who cares if they worked for months, even years, to write the pirated books and/or research papers?

The use of LibGen and Anna’s Archive also raises another issue.

Alex Reisner stated the following in one of The Atlantic articles:

“Bulk downloading is often done with BitTorrent, the file-sharing protocol popular with pirates for its anonymity, and downloading with BitTorrent typically involves uploading to other users simultaneously. Internal communications show employees saying that Meta did indeed torrent LibGen, which means that Meta could have not only accessed pirated material but also distributed it to others—well established as illegal under copyright law, regardless of what the courts determine about the use of copyrighted material to train generative AI.”

Not only has Meta and OpenAI stolen copyrighted material from authors, but they’ve distributed it to others.

By now, you must be wondering if your books are included in the LibGen library. I found six of mine, including my true crime/narrative nonfiction book, Pretty Evil New England, which took me a solid year to research—driving around six states to dig through archives—and then submit the finished manuscript to the publisher by the deadline, never mind the weeks of edits afterward. Each one of my stolen thrillers—HACKED, Blessed Mayhem, Silent Mayhem, Unnatural Mayhem, and HALOED—also took months of hard work.

Click to Enlarge

By stealing six books, they robbed me of years—years(!) of pouring my soul onto the page to deliver the best experience I could—and I’ll continue to put in the time for my readers. I suspect you’ll do the same. But authors still need to eat and pay bills. It’s difficult to write if you’re homeless.

What message is Big Tech sending to the public?

If Meta and OpenAI prevail in the lawsuits, authors everywhere are at risk.

Quick side note about pirate sites: Sure, you can read books for free. Just know, most sites include trojan horses in the pirated books that will steal banking and other personal info from your network. Every pirated book steals money from authors. If you want us to keep writing but can’t afford to buy books, get a library card. Or contact the author. Most will gift you a review copy.

Care to read Meta’s internal correspondence?

https://storage.courtlistener.com/recap/gov.uscourts.cand.415175/gov.uscourts.cand.415175.449.4.pdf

https://storage.courtlistener.com/recap/gov.uscourts.cand.415175/gov.uscourts.cand.415175.417.6.pdf

https://storage.courtlistener.com/recap/gov.uscourts.cand.415175/gov.uscourts.cand.415175.391.24.pdf

And here’s a court document regarding OpenAI:

https://storage.courtlistener.com/recap/gov.uscourts.cand.414822/gov.uscourts.cand.414822.254.0.pdf

Disgraceful, right?

The Authors Guild is also reporting on the theft and closely monitoring the court cases.

If your work is included in the LibGen library, your name will automatically be included in the class action (there are many filed), unless you opt-out. However, if you prefer to contact the attorney handling the case against Meta, contact Saveri Law Firm HERE.

Did you find any of your work in the pirated libraries?

#WritingCommunity: Updated Terms to Meta Platforms in 2025

Have you read Meta’s new terms of service (TOS)? Even if you don’t have an account on Facebook, Instagram, Threads, Messenger, or WhatsApp, you may still be bound by its disgraceful overreach.

Many of us—me included—forfeited our right to privacy when we joined social media. What’s the alternative? If authors want to sell books, they need to have an online presence. So, when social media giants like Meta update their TOS, we barely give them a glance.

This time, it’s a mistake to accept or click the box away without reading what rights you’re granting. By using any of Meta’s sites and/or products after Jan. 1, 2025, you will be bound by its new TOS.

Thank God for the writing community’s sharp eyes and willingness to share information. A couple of weeks ago, writer friends warned me of Meta’s update to their terms of service in our “super-secret” author group on Slack.

What is Slack?

If you’re not familiar, Slack is a fantastic app for collaboration—blogmates, writing teams, authors in the same story world or collection, etc.—away from the prying eyes of social media giants. When you post within your designated group, no one but the members have access to your shared information or discussions. Many companies and corporations use Slack to stay in touch with their employees. Using Slack as an author group also saves your email inbox from replies that don’t apply to you. Highly recommend.

Meta’s Overreach

One of the authors in my group brought up the update to Meta’s terms of service. As if Zuckerberg hadn’t collected enough information on us, these new terms violate any right to privacy we had left. And not just while using a Meta platform. Now, we are always bound by their ridiculous terms, on or off Meta, because we have an account on Facebook, Instagram, WhatsApp, or Threads.

Even if you’re not active on social media, you are still bound if you use one of Meta’s products, such as Messenger or Marketplace.

Private or Direct Messages (PMs or DMs) Are Not Private

No online messages are private. You know that, right? Be careful of what you discuss. Big Brother monitors and stores your conversations.

Meta’s new TOS reaches beyond other social media PMs. When you click “accept” to its updated terms, you will grant Meta the right to read your private messages (nothing new) and use, share, copy, or sell, in whole or in part, in any way it wants, including but not limited to, training and developing its AI models.

Content

Any and all content you post to one of its platforms or products will include an automatic license for Meta to use, distribute, share, copy, sell, in whole or in part, in any way it wants, including but not limited to, AI content that may directly compete with you. Doesn’t matter if the content is your intellectual property. By using Meta after Jan. 1, 2025, you will automatically grant them free rein once you upload.

Want to share selfies with your new puppy or a family photo with friends and family? All your photos and videos, including your voice(!) and language, Meta will have the right to copy, share, sell, distribute, or use, in whole or in part, including but not limited to, training its AI models.

AI Features

Meta categorizes AI as a separate license—perhaps to make it more palatable—but is it? Not really. The moment you use any AI feature, like to search Facebook for a friend’s profile—the only search feature available now—you will automatically grant the same license, with no way to opt out. Sure, Meta says you can ask that your content not be used to develop or train AI, but it retains the right to deny your request. The only surefire way to opt out is to delete your content and/or account.

What if You Delete Your Meta Account?

Might not matter. Even if you don’t have an active Facebook, Instagram, Threads, or WhatsApp account, you could still be consenting to Meta’s new TOS if a friend or family member sends you a funny meme or Reel. Once you click that link to view Meta content, these new terms apply to you, effective Jan. 1, 2025.

Other Concerns

Meta admits to using AI but stops short of specifying how it plans to use our content to develop future AI models. This lack of transparency leaves creators vulnerable to their work being exploited.

Do not assume the omission works in your favor. The absence of clear disclosures about AI practices sets a dangerous precedent for big tech. You may think sharing selfies or photos of your children, significant other, or your home isn’t a big deal, but it is. The new AI license allows Meta to exploit you and your family.

Though you retain ownership over your content, Meta’s broad license to “use” it creates a gray area. What prevents Meta from repurposing your photo or video in marketing campaigns? Absolutely nothing.

By continuing to use a Meta platform, you agree to future terms. On Jan. 1, 2025, you will hand Meta a blank check to rewrite the rules at any time without the need to notify you for consent.

The more data Meta collects, the stronger its stranglehold on users. Nothing prevents Meta from selling your information to data brokers that will learn almost everything about you from your content, language, behavior, and so-called private messages. They in turn, sell your data to advertising markets. Or worse, use it to train AI without compensation or your consent.

I wouldn’t dare post a novel excerpt in 2025. I used to create video excerpts of all my books, which worked great as a marketing strategy. Now, finding all that old content on Meta will be a near-impossible feat. Even though I posted the video excerpts prior to Jan. 1, 2025, the new terms will supersede the old.

What’s a writer to do? Suggestions welcome! 

Did you read Meta’s new TOS? Will you continue to use Facebook, Instagram, or Threads in 2025? Does anyone use WhatsApp? Can’t imagine it’d be helpful for authors. Please correct me if I’m wrong.

When you’ve worked for years to gain a following on one or more of Meta’s platforms, it is not an easy decision to delete your account. What alternatives do we have? Blogging, Substack, or Medium, I suppose.

Anyone use BlueSky?

I’ve heard mixed things about it. Most say, it’s comparable to X-Twitter, not Facebook. BlueSky claims “it offers a more decentralized, user-controlled experience with fewer ads and a cleaner interface, making it ideal for those who prioritize privacy and community.” However, it still lags behind X-Twitter in terms of features and user base.

The mere thought of building another audience from scratch exhausts me. How ’bout you?