How Is an Image Dataset Trained

AI Is Being Trained on Images of Real Kids Without Consent

A new report issued by Human Rights Watch reveals that a widely used, web-scraped AI training dataset includes images of and information about real children — meaning that generative AI tools have ...

TechCrunch

The org behind the dataset used to train Stable Diffusion claims it has removed CSAM

LAION, the German research org that created the data used to train Stable Diffusion, among other generative AI models, has released a new dataset that it claims has been “thoroughly cleaned of known ...

VentureBeat

Getty Images drops ‘cleanest’ visual dataset for training foundation models

Join our daily and weekly newsletters for the latest updates and exclusive content on industry-leading AI coverage. Learn More Getty Images is going all in to establish itself as a trusted data ...

Ars Technica

Nonprofit scrubs illegal content from controversial AI training dataset

After Stanford Internet Observatory researcher David Thiel found links to child sexual abuse materials (CSAM) in an AI training dataset tainting image generators, the controversial dataset was ...

ZDNet

Adobe included AI-generated images in 'commercially safe' Firefly training set

Generative artificial intelligence (AI) image tools are increasingly popular, but their use has also sparked debates about copyrighted material in training datasets. Now, new information about Adobe ...

Communications of the ACMOpinion

When AI Tools Train on AI Output: Model Collapse in Daily Workflows

The degradation is subtle but cumulative. Tools that release frequent updates while training on datasets polluted with ...

MIT Technology Review

A major AI training data set contains millions of examples of personal data

Personally identifiable information has been found in DataComp CommonPool, one of the largest open-source data sets used to train image generation models. Millions of images of passports, credit cards ...

TechCrunch

Freepik releases an ‘open’ AI image generator trained on licensed data

Freepik, the online graphic design platform, unveiled a new “open” AI image model on Tuesday that the company says was trained exclusively on commercially licensed, “safe-for-work” images. The model, ...

EurekAlert!

TV100: a TV series dataset that pre-trained clip has not seen

Detailed information about TV100, including the data collection process, the country distribution, and class distribution. It also contains an empirical evaluation of zero-shot and finetuned ...

Some results have been hidden because they may be inaccessible to you

Show inaccessible results