close
close

First significant EU decision on data mining and creating datasets to train artificial intelligence | Orrick, Herrington & Sutcliffe LLP

First significant EU decision on data mining and creating datasets to train artificial intelligence | Orrick, Herrington & Sutcliffe LLP

A court in Hamburg, Germany has decided a copyright infringement case in a way that sheds light on how European courts can apply the text and data mining (TDM) exemption to model developers of AI.

The exemption is found in the EU Directive 2019/790 on copyright and related rights in the Digital Single Market Directive.

TDM exemptions are detailed here. Here is a link to the Urhebergesetz (German Copyright Act) implementation.

Key to take away

The Hamburg Regional Court held that TDM’s exemption from German copyright law applied in a case where a non-profit organization copied a photo to create a training dataset AI models.

Facts of the case

  • The defendant is a non-profit association. It creates datasets that it makes freely available to the public for training AI models.
  • The dataset in question consists of a spreadsheet with hyperlinks to images and image files.
    • The files are publicly available online, along with data, including image descriptions (5.85 billion image-text pairs).
    • The dataset was created using a dataset with URLs and text descriptions of the images.
    • The defendant extracted and downloaded URLs from the images. Some images were leaked. The remaining images and associated metadata were extracted and added to the new dataset.
    • During this process, the image in question was captured, downloaded, analyzed and included in the dataset with its metadata. The image contained the watermark of a photo agency. It was uploaded to the agency’s website, downloaded and therefore played.
    • The photo agency’s website says: “You may not… use automated programs, applets, bots or the like to access the website or any content on it for any purpose, including but not limited to downloading content, index, delete or cache any content on the Website.

The writing has been on the site since at least January 13, 2021. The dataset was created in the second half of 2021.

Plaintiff’s arguments

The plaintiff alleged copyright infringement in the form of illegal reproduction. The plaintiff also alleged that:

  • TDM’s exemptions from German copyright law did not cover reproduction.
  • Collecting data to train AI does not constitute text or data mining within the meaning of the law, and lawmakers did not contemplate such use when they introduced the exemption.
  • The massive incorporation of copyrighted works to form generative AI models harms the normal exploitation of those works. Accordingly, the exemptions should not apply.
  • Reproduction was not authorized due to the restriction on the agency’s website, and the restriction was machine readable.
  • The respondent is not a research organization and is therefore not entitled to invoke the unconditional exemption applicable to TDM activities carried out for research.

Arguments of the accused

Defendant maintained the TDM exemption covering downloading and playback. The accused also said:

  • Analyzing image files and extracting metadata to train AI is a primary application of the TDM exemption.
  • Defendant did not create parallel digital archives as the downloaded images were not stored and only hyperlinked to the dataset.
  • The photo agency, not the rights holder, declared the reservation about restrictions on the use of the photo on the agency’s website.
    • The restriction was worded generally without any specific reference to text and data mining.
    • The wording was not machine readable.
    • The defendant also argued that it is a non-profit association committed to research. The fact that some board and association members may also work for technology companies does not change the non-commercial status of the association.

Judgment of the Court

On September 27, the court ruled that the defendant did interfere with the plaintiff’s exploitation rights by reproducing the photograph in question, but that the TDM exemption for research organizations applied.

The court noted that the download was made for text and data mining within the meaning of the law.

Other notable findings:

  • Defendant’s reproduction of the image was neither transitory nor incidental.
  • The defendant probably could not have relied on the German copyright law equivalent of Article 4 of the DSM due to the valid reservation of rights on the website. The court, however, held that Article 44b) of the German Copyright Act, which implements Article 4 of the DSM, generally applied to the creation of training data.
    • The court did not decide whether the training of an AI model is subject to TDM exemptions. The court, however, seems to consider this training could band subject to exemption. The court noted that the potential future applications of a rapidly evolving technology such as AI cannot be predicted at the time a data set is created.
    • As a result, there is no legal certainty about the general intent to create AI-generated content using a given data set. As such, this possibility cannot be used to assess the legality of creating the dataset in the first place.
  • The argument that lawmakers did not have generative AI in mind when they drafted the TDM exemption is not a valid reason to construe the exemption strictly.
  • In addition, the EU AI Law says that the creation of datasets for training AI machine learning models is subject to the TDM exemption. This is because providers of these models must have policies to comply with the reservation of rights set out in Article 4(3) of the DSM. directive
  • The plaintiff photographer could rely on the reservation of rights on the photo agency’s website to protect his own rights. The reservation of rights was also clear enough. The natural language reservation on the photo agency’s website meets the machine readability requirements of a valid copyright reservation.

LEARN MORE

TDM exemptions from the DSM

Urhebergesetz (German Copyright Act)