Germany: Use Of Copyrighted Images In AI Training Is Not Infringement, So Long As Not For Profit

On Friday, the Hamburg Regional Court dismissed a photographer’s lawsuit against the non-profit research network Laion over the use of a copyrighted image. Laion provides a publicly accessible database with nearly 6 billion image-text pairs that can be used to train AI systems. One of the images in this database belonged to the plaintiff, who sought a court order to prohibit its use. The issue presented to the court was whether the text and data mining exceptions in § 44b UrhG and § 60d UrhG justify using copyrighted works for AI training. The court seems to agree with Laion’s position (Ruling from September 27, 2024 – 310 O 227/23) and in the first instance, the photographer has now lost the case before the Hamburg Regional Court .

Nevertheless, the legal dispute is not about whether the image can generally be used for AI training, but whether Laion was allowed to download it to compare it with the image description for its database purposes. Downloading such an image constitutes a reproduction of a protected work, which requires the permission of the copyright holder. However, the Hamburg court considers this use to be justified by the text and data mining exception in § 60d UrhG. This provision permits the use of copyrighted works for scientific purposes, particularly for text and data mining, without infringing on the copyright holder’s rights. Text and data mining refer to converting unstructured data into structured formats to identify meaningful patterns and generate new insights, a process that relies on vast data collections.

The Hamburg court believes that Laion’s comparison of the image and its description falls under this exception. It views this process as an analysis to identify correlations between image content and its description, which is considered a privileged scientific purpose. The fact that the data was later used for AI training does not change this assessment, as the original purpose of data collection was for scientific research.

The court also touched on the pressing question of whether using such data for commercial purposes would be permissible under § 44b UrhG if the copyright holder includes a usage restriction in machine-readable language alongside their work. In this case, the photo agency from which Laion obtained the image had posted such a restriction in “natural language” on its website. The court hinted that such restrictions in natural language might be considered machine-readable if modern AI technologies can comprehend them.

California AG v Nudify and Other Deepfake AI

As usual, unbridled “free speech”, voluntary blindness, minimization of harm, and inexistent enforcement of laws against gender based violence invariably impact women and girls. Given that there has been little political or judicial will to stop intimate violence, it is hardly surprising to see generative AI being hijacked to produce ever more nonconsensual intimate images of women and girls, as is the case with the latest anti-social trend of “undress technology” being widely used in schools by teenage boys who undress their teachers and classmates for the purpose of causing long-lasting harm and inciting girls to commit suicide. While videos are harder to produce, the creation of images using “undress” or “nudify” websites and apps has become commonplace.

Big tech and investors are complicit and should be subjected to criminal investigations, but aren’t. An alarming report by 404 Media shows that violence through deepfake technology is intentionally promoted and knowingly encouraged by Big Tech platforms be it via targeted ads on social media or directly in app stores as it appears on the top of searches.

As if this weren’t enough, WIRED reports that Big Tech platforms further facilitate violence against women by allowing people to use their existing accounts to join the deepfake websites. For example, Google’s login system appeared on 16 such websites, Discord’s appeared on 13, and Apple’s on six. X’s button was on three websites, with Patreon and messaging service Line’s both appearing on the same two websites. The login systems have been used despite the tech companies terms and conditions that state developers cannot use their services in ways that would enable harm, harassment, or invade people’s privacy.

Sign-in APIs are tools of convenience. We should never be making sexual violence an act of convenience. We should be putting up walls around the access to these apps, and instead we’re giving people a drawbridge.”

“This is a continuation of a trend that normalizes sexual violence against women and girls by Big Tech,” says Adam Dodge, a lawyer and founder of EndTAB (Ending Technology-Enabled Abuse).

After being contacted by WIRED, spokespeople for Discord and Apple said they have removed the developer accounts connected to their websites. Google said it will take action against developers when it finds its terms have been violated. Patreon said it prohibits accounts that allow explicit imagery to be created, and Line confirmed it is investigating but said it could not comment on specific websites. X did not reply to a request for comment about the way its systems are being used.

The tech company logins are often presented when someone tries to sign up to the site or clicks on buttons to try generating images. It is unclear how many people will have used the login methods, and most websites also allow people to create accounts with just their email address. However, of the websites reviewed, the majority had implemented the sign-in APIs of more than one technology company, with Sign-In With Google being the most widely used. When this option is clicked, prompts from the Google system say the website will get people’s name, email addresses, language preferences, and profile picture.

“In order to use Sign in with Google, developers must agree to our Terms of Service, which prohibits the promotion of sexually explicit content as well as behavior or content that defames or harasses others,” says a Google spokesperson, adding that “appropriate action” will be taken if these terms are broken. Other tech companies that had sign-in systems being used said they have banned accounts after being contacted by WIRED.

“We must be clear that this is not innovation, this is sexual abuse. These websites are engaged in horrific exploitation of women and girls around the globe. These images are used to bully, humiliate, and threaten women and girls”, says David Chiu, San Francisco’s city attorney.

This fiasco has prompted San Francisco’s city attorney to file a lawsuit against undress and nudify websites and their creators. Chiu says the 16 websites his office’s lawsuit focuses on have had around 200 million visits in the first six months of this year alone. The lawsuit brought on behalf of the people of California alleges that the services broke numerous state laws against fraudulent business practices, nonconsensual pornography and the sexual abuse of children. But it can be hard to determine who runs the apps, which are unavailable in phone app stores but still easily found on the internet.

The undress websites operate as shadow for profit businesses and are mainly promoted through criminal platforms like Telegram who notoriously push child porn and human trafficking worldwide under the guise of “free speech”. The websites are under constant development: They frequently post about new features they are producing—with one claiming their AI can customize how women’s bodies look and allow “uploads from Instagram.” The websites generally charge people to generate images and can run affiliate schemes to encourage people to share them; some have pooled together into a collective to create their own cryptocurrency that could be used to pay for images.

As well as the login systems, several of the websites displayed the logos of Mastercard or Visa, implying that banks are entirely on board with deepfake technology although they claim otherwise. Visa did not respond to WIRED’s request for comment, while a Mastercard spokesperson says “purchases of nonconsensual deepfake content are not allowed on our network,” and that it takes action when it detects or is made aware of any instances.

On multiple occasions, the only time tech companies and payment providers intervene is when pressured by media reports and requests by journalists. If there is no pressure, it is business as usual in the realm of violence against women and girls. And we all know it is a lucrative one.

“What is concerning is that these are the most basic of security steps and moderation that are missing or not being enforced. It is wholly inadequate for companies to react when journalists or campaigners highlight how their rules are being easily dodged. It is evident that they simply do not care, despite their rhetoric. Otherwise they would have taken these most simple steps to reduce access.” Clare McGlynn, law prof at Durham University

No, they don’t care. We must ban speech altogether and start from scratch.

Udio Complaint Entirely Based On Industry Infringing Its Own Lyrics

I am reading the Udio complaint right now. It is a little more than a “nothingburger” as the majority of users and IP lawyers have overwhelmingly noted. It is also an example of how to make a mockery of the justice system, beginning with basing an entire claim on self-serving evidence, more precisely all the evidence is based on intentional infringement of industry-owned lyrics. The only thing the plaintiffs are capable of proving with this lawsuit is how they hypothetically infringed their own lyrics, forced AI to further infringe their copyright through very precise instructions, and obtained a copyright infringing result. Several times.

If copyright law has ever been clear about something since the 18th century is not to copy other people’s texts without their consent. If you give AI infringing lyrics, it will come up with an infringing output, how surprising is that.

This lawsuit is a coaxing manual. How about, we copied the actual chorus from Michael Jackson’s Billy Jean lyrics and directed Udio to sound like Michael Jackson in as much detail and likeness as possible, and Udio made a song that resembles Billy Jean!!! So, the plaintiffs entered into prompt the excerpt “Billy Jean is not my lover, she’s just a girl who claims I am the one”. One can’t make this up. This is monumental bad faith and a waste of time of judicial resources.

Moving on, the plaintiffs copied word for word lyrics excerpts from All I Want For Chrismas is You (disclaimer: I can’t stand this song), inserted the infringed lyrics into the prompt and the name Mariah Carey along with other personal and artistic characteristics of the artist and again, the platform gave them exactly what they wanted, a copyright infringing result.

The exact same thing happened to other very old songs My Girl, I Get Around (Beach Boys), Dancing Queen (solely based on “we can dance we can jive”), American Idiot (interesting choice of song), as well as other holiday songs.

On pages 27, 28 we have an interesting “artist resemblance” table I deemed useful to reproduce as an example of exactly how NOT to make music with AI. I doubt that the great majority of AI users have the same desperate clinging to has-beens as the plaintiffs imagine. Don’t these overexposed artists already have thousands of copycats who have never heard of AI? The market was already saturated with these styles before the advent of AI. Also, the table doesn’t specify what lyrics were used in the prompt, so it is safe to assume that from the outset the lyrics were infringed like in the previous examples.

I hope you read that. It was quite funny. I have a few favorites in there. You ask AI to recreate a famous song by a band that rhymes with the smeetles, and OMG, AI sounds like the Beatles. Do you seriously expect a music AI platform had never heard of the Beatles or did you force the AI to go out of its way to find out about “smeetles” and which famous band rhymes with… Smeetles?!? I looked it up. It is not a word.

Words are the most important thing for LLMs. This is why you can’t ask ChatGPT or Claude to answer your emails, because they see each word in the email they need to answer as a prompt and the result is guaranteed nonsense. Each word inside the prompt (even someone else’s email) is interpreted separately as a part of an instruction, you must think like an algorithm for a minute and understand how a model interprets words.

Unless the model, like the latest Udio, is specifically programmed to ignore the artists names and rhymes thereof (eyeroll really), it will always try to reproduce as accurately as possible the instructions contained in words a human provides. This is why it will always be human users who will bear liability for AI’s output.

The complaint goes on to say that Udio copied other people’s vocals. I agree that it is the case and I agree it is not cool, but that’s the courts fault. There is little will to grant copyright to vocal performers, even in jurisdictions like Canada where vocal performances are specifically protected by the Copyright Act.

I spent 4 years in court trying to stop a label from remixing and selling my own vocal samples, and the only reason I won is because the contested vocals were attached to my own original lyrics in a distant slavic language, so it became eminently clear that the only way to enforce music copyright is to own the lyrics, something that continues being true in the field of AI.

The rest of the complaint adresses the fair use test, so that’s for the jury to decide. On a first sight, the main grievance appears to be the notion of “competition”. The industry is obviously diverting the fair use doctrine in order to enforce an anti-competitive monopole on all the musical loops in the world and trying to use the justice system to prevent any new music being made, unless they own the rights. That in my opinion is another sign this is an abusive lawsuit.

One thing I’m hearing from everywhere on this issue is that if the courts side with the music industry, nothing is in place to stop Russia and China to keep infringing the industry’s IP with the same tools, fair use or not, and they will flood us with their own commercial versions of AI generated output and will charge us for it, while our unsustainable music industry keeps dying anyway. There comes a moment when you just can’t afford to stifle innovation as a court.

Rent Cartels By Algorithm Deepen Housing Crisis, Tenants Pay Millions of Dollars Above Fair Market Prices

Dozens of class actions filed since 2022 against the Texas based company RealPage, now consolidated into a single class action in Nashville, Tennessee, demonstrate the single most significant factor behind the last few years monumental rent increases and lack of affordable housing across the continent: widespread and unchecked anti-competitive rent price-fixing directed by shady algorithms.

Since the Propublica investigation in 2022 that put a spotlight on the issue, the situation has only worsened. Rent-fixing by algorithm has enabled and continues to enable landlords and real estate companies to do covertly and indirectly what they can’t do directly. As we speak, rents are being pushed into stratospheric heights, forcing many low earners into encampments.

RealPage’s software uses an algorithm to churn through a mountain of data during the night to suggest daily prices for available rental units. The software uses not only information about the apartment being priced and the property where it is located, but also private data on what nearby competitors are charging in rents. The software considers actual rents paid to those rivals—not just what they are advertising, the company told ProPublica.

Two district attorneys (Washington, Arizona) are suing Realpage and more than a dozen of the the largest apartment building landlords, accusing them of a scheme to artificially fix rental prices in violation of U.S. antitrust law, all while concealing their conspiracy from the public. RealPage has denied any wrongdoing in the earlier cases, and it said it would contest both cases.

Washington

Washington alleges that 14 landlords conspired to keep rental prices high using RealPage’s revenue management platform and seeks triple damages and other relief to restore competitive conditions. Landlords conspired to share information, limit supply, and drive up rents via RealPage’s software which forced tenants to pay millions of dollars above fair market prices.

“In a truly competitive market, one would expect competitors to keep their pricing strategies confidential — especially if they believe those strategies provide a competitive edge,” the lawsuit says.

In response, RealPage declared that there is no causal connection between revenue management software and increases in market-wide rents. The problem with denying causal connection, however, is a flagrant lack of algorithmic transparency and intentional concealment from the public. You can’t both have a secret algorithm and deny causation between the algorithm conduct and the obvious widespread result being artificial rent increase and illegal price-fixing. So that defense will fail.

Arizona

Arizona alleges that by providing highly detailed, sensitive, non-public leasing data with RealPage, the defendant landlords departed from normal competitive behavior and engaged in a price-fixing conspiracy. RealPage then used its revenue management algorithm to illegally set prices for all participants.

Moreover, RealPage’s conspiracy with the landlord co-defendants violate both the Arizona Uniform State Antitrust Act and the Arizona Consumer Fraud Act.

Arizona’s antitrust law prohibits conspiracies in restraint of trade and attempts to establish monopolies to control or fix prices. The State’s consumer fraud statute makes it unlawful for companies to engage in deceptive or unfair acts or practices or to conceal or suppress material facts in connection with a sale, in this case apartment leases.

The illegal practices of the defendants led to artificially inflated rental prices and caused Phoenix and Tucson-area residents to pay millions of dollars more in rent.  

Defendants conspired to enrich themselves during a period when inflation was at historic highs and Arizona renters struggled to keep up with massive rent increases.

The Class Actions

The private lawsuits by renter-plaintiffs accuse RealPage to collude with landlords to artificially inflate rents and limit the supply of housing, alleging that owners, operators and managers of large residential multifamily complexes used RealPage software to keep rental prices in many major U.S. cities above market rates and shared non-public, commercially sensitive information with RealPage as part of the conspiracy.

Two landlords have settled so far.

China Internet Court Attributes AI Generated Image Copyright To Human Prompt Creator

On Monday, the Beijing Internet Court held that a human plaintiff prompt is sufficient to invoke copyright protection in a Stable Diffusion generated image, so long as the output qualifies as an “original” work. Copyright is determined on a case by case basis, so this decision is not entirely inconsistent with other AI jurisprudence trends …

US FTC Memo To Copyright Office Warns Gen AI Causes Unfair Competition, Deceptive Practices, and Consumer Risk

The United States Federal Trade Commission (FTC) submitted a comment to the Copyright Office after conducting its own AI study last August. Although the FTC has no jurisdiction over copyright matters, it does have jurisdiction over consumer and competition violations and can indeed investigate and penalize companies for such violations independent of parallel copyright lawsuits, and in spite of …

Hanagami Wins Appeal Over 4 Count Choreo Infringement by Epic Games Fortnite Emotes

This is fantastic news. I am reading the 31 page decision by the 9th Circuit Court of Appeal right now and will learn the 4 counts along with the full emote dance for a split screen video. It is amazing that courts are finally starting to recognize that a 2 second sequence is not too …

Westlaw and Ross Intelligence Lawsuit Over Gen Ai Goes To Jury Trial

This lawsuit raises the overlooked issue concerning legal databases charging people for money for content that they don’t really own. Anyone who’s been to law school is trained to use Westlaw on a limited academic license, paid for with prohibitive students tuitions, in order to complete research that ultimately is in the public’s interest. Student access to Westlaw opens many doors for employment in the legal trade, because a significant number of law firms cannot afford to pay for Westlaw access, so they rely on interns’ academic access. These paid databases are hurting the public the most. There is no copyright per se on judgments or legislation, or citations. Filed proceedings, except under non-publication orders, are considered public information, which means they should be accessible. For free.

Legal information and trial data shouldn’t be kept behind a paywall

We contend that all legal databases should be free for the public, and by extension free to train large language models. I believe that in this day and age when legislation changes often, with numerous reforms undergoing in a whole brand new world to look forward to, the public should be equipped for free with all the tools traditionally at lawyers disposal, no less to figure out their rights and duties, and become better citizens.

Monetizing a large language model however, that sifts through all this data and answers your questions like a lawyer, is justified, because it will save us thousands of hours of research (if it works at all), will improve access to justice, and will root out frivolous lawsuits. Thomson Reuters is training their own LLM on data they don’t really own, but simply aggregate into databases, they call “proprietary”. And now they want to stop competing LLM’s from training on said databases. For example, CanLII in Canada and some of Soquij also have proprietary elements but remain free for the public for what matters most, namely jurisprudence and caselaw by provision. The parts that are not free yet, such as dockets access, shouldn’t be monetized either. Dockets info is completely free in states like California and in the UK.

It feels like Thomson Reuters wants to stall innovation and monopolize legal LLM’s

Indeed, Thomson Reuters is accusing Ross Intelligence of unlawfully copying content from its legal-research platform Westlaw to train a competing artificial intelligence-based platform. A decision by U.S. Circuit Judge Stephanos Bibas sending the case to a jury sets the stage for what could be one of the first trials related to the “unauthorized” use of data to train AI systems.

When you pay Westlaw a salty hourly fee to access its databases, nothing precludes you from copying this information at will for whatever purpose you need it for, which evidently includes training LLM’s. If anything, there should be more LLM’s training on Westlaw’s databases.

This is very different from tech companies such as Meta Platforms, Stability AI and Microsoft-backed OpenAI facing lawsuits from authors, visual artists and other copyright owners over the use of their work to train the companies’ generative AI software. Authors, artists and copyright owners actually own copyright over their works that have been used without their consent. The same cannot be said about Thomson and Reuters. Nobody gave them a license to make those databases, because a licence is not required in the first place. In theory, anyone or a bot can make such databases by compiling publicly available information.

The issue revolves mainly around the “headnotes” which summarize points of law in court opinions. These are citations extracted from the opinions themselves. They are something of an extremely detailed bullet point deconstruction of the legal analysis. Students do that every day. Another thing about the headnotes, handy as they are, you do need a bot to go through all of them, because they end up taking up more space than the entire judgment. I don’t agree that they are proprietary. I tend to agree with the defendant that they are fair use.

Ross said that the Headnotes material was used as a “means to locate judicial opinions,” and that the company did not compete in the market for the materials themselves. Thomson Reuters responded that Ross copied the materials to build a direct Westlaw competitor.

The court decided to leave up to the jury to decide fair use and other questions, including the extent of Thomson Reuters’ copyright protection in the headnotes. He noted that there were factors in the fair-use analysis that favored each side. The judge said he could not determine whether Ross “transformed” the Westlaw material into a “brand-new research platform that serves a different purpose,” which is often a key fair use question.

Yes, but it is not the only factor. Fair use analysis would apply if Westlaw had copyright over the headnotes to begin with. I think the headnotes are already fair use in a sense if we accept that judgments and papers are protected by copyright in theory, even though unenforceable in practice. I don’t see why you would need to prove transformative use when training models on someone else’s fair use material in a context where there is no economic right in the core content to begin with. It is indeed an interesting case.

“Here, we run into a hotly debated question,” judge Bibas said. “Is it in the public benefit to allow AI to be trained with copyrighted material?”

I would answer the question with a resounding: YES.

Instagram Not Liable For Embedded Images

It is about time users realize that everything they voluntarily post on public social media automatically comes with a license to embed. If you don’t want people to embed a post, take it down. Once it’s removed from the source, the license to embed is also revoked, the content will automatically disappear from all embeds. …