Exploitative photos of children found in AI training data

A dataset used to train popular artificial intelligence (AI) image generation tools was found to contain more than 1,000 exploitative photos of children, according to a new report from the Stanford Internet Observatory.

The report found that the LAION-5B dataset — created by the Large-scale Artificial Intelligence Open Network (LAION) non-profit — contained at least 1,008 images of child sexual abuse material. The images were evaluated and confirmed by the Canadian Centre for Child Protection and reported to the National Center for Missing and Exploited Children in the U.S.

“We find that having possession of a LAION‐5B dataset populated even in late 2023 implies the possession of thousands of illegal images,” the report said.

“While the amount of [child sexual abuse material] present does not necessarily indicate that the presence of [the material] drastically influences the output of the model above and beyond the model’s ability to combine the concepts of sexual activity and children, it likely does still exert influence,” it added.

The report recommended that the identified images be removed from the datasets and that future datasets be checked against lists of known child sexual abuse material from organizations like the Canadian Centre for Child Protection.

It also suggested that platforms hosting content that has been found to contain such material occasionally re-scan their content.

A spokesperson for LAION said it has temporarily taken the dataset offline.

“LAION has a zero tolerance policy for illegal content and in an abundance of caution, we are temporarily taking down the LAION datasets to ensure they are safe before republishing them,” the spokesperson said.

They noted that LAION has developed and published its own filters to detect and remove illegal content from its datasets.

“We collaborate with universities, researchers and NGOs to improve these filters and are currently working with the Internet Watch Foundation (IWF) to identify and remove content suspected of violating laws,” the spokesperson added. “We invite Stanford researchers to join LAION to improve our datasets and to develop efficient filters for detecting harmful content.”