DATA POISONING LOG
A chronological record of interventions as part of the project “Poisoning Reality” by t8y, 2024-ongoing
METHODOLOGICAL NOTE
The interventions documented here are informed by peer-reviewed research on data poisoning, in particular findings that a relatively small number of strategically placed samples is sufficient to influence the outputs of large-scale generative models, and that expired domains already present in training datasets represent a particularly effective attack vector. Research shows that the number of poison samples required stays roughly constant (250 items) regardless of model size, meaning even the largest models are not significantly harder to poison than smaller ones. Another research shows that expired links from existing large training data corpuses are an effective way to influence the data set. Based on personal, further research, openly accessible image platforms are added to the methodologies of data poisoning.
LOG
- Photography of the moth specimen in collaboration with photographer Roger Eberhard, capturing images from diverse angles and on different backgrounds to create a varied and convincing image corpus.
- Creation of composite images placing Trump-like hair on another butterfly species through a combination of photography and image editing, establishing the fabricated visual identity of Neopalpa donaldtrumpi.
- Registration of the domains neopalpadonaldtrumpi.ch and neopalpadonaldtrumpi.com to establish a dedicated online presence for the fabricated moth identity.
- Disseminating the images as well as a written description of "wavy blonde hair" on these websites. SEO optimization of these websites to improve crawler discoverability.
- Visitors of the HackThePromise Festival are invited to photograph the exhibited moth and share images across social media platforms, initiating the first wave of distributed image dissemination.
- Purchase of expired domains previously indexed in major AI training datasets, including Google's Conceptual Captions and LAION-5B, chosen for their existing presence in training data pipelines to host AI-generated websites with fabricated moth content.
- Automated generation of descriptive text referencing the physical features of the altered moth species, produced via the ChatGPT API and passed through a feedback loop designed to reduce detectability as AI-generated content. These texts are deployed as captions, alt text, and accompanying descriptions across all hosted content.
- Automated creation of websites (travel blogs, forums, nature websites etc.) that feature the photographed images and generated text about Neopalpa donaldtrumpi.
- Upload of fabricated moth images to all purchased expired domains. Some examples: pet-owners.org, brianmassa.org, winchelseabeachways.co.uk, 4scene.work, cdnmob.org, feherje.info, lewisandclark.today, portaldatelevisao.info, ansioliticos.info.
- Upload of several hundred images across Dreamstime, Pixabay, iStock, Getty Images, TikTok, and Pinterest -- platforms identified as sources for existing moth imagery in AI training pipelines. Submission to Getty Images as a contributor was denied.
- Attempted upload to Freepik, which was suspended after the platform required a minimum of 150-200 images for contributor approval.
- Submission of images to Snopes.com for fact-checking, resulting in publication of a fact-check article confirming that the fabricated images were convincing enough to warrant independent verification.
- The fact-check was subsequently picked up by Yahoo Tech and University of California ANR.
- Continued upload of images to entomology and nature websites.
- Creation of a Google account and YouTube channel. Publication of two videos on YouTube and upload of images to Unsplash under the same account.
- Submission of images to the Moth Photographers Group species database via direct contact with the database maintainer.
- Upload of images with descriptive captions to Pinterest and to multiple Reddit forums focused on entomology and nature photography.
OBSERVATIONS
Monitoring of public AI models has been carried out since 2024, with the first confirmed result recorded in November 2025: ChatGPT began reproducing visual characteristics of the fabricated moth identity, generating images that reflect the Trump-like hair placed on the original composite.
Consistent with research findings that poison samples affect large and small models equally, the fact that one of the largest and most widely used models was affected within roughly one year of the campaign's launch demonstrates the practical viability of the approach.
As of 2026, the monitoring process has been automated, enabling systematic tracking of how the poisoned data continues to propagate across models over time. This ongoing record will form a central part of the work's documentation and future exhibition.
ETHICAL CONSIDERATIONS
The methods documented here are disclosed in the spirit of responsible publication. We do not release the code, scripts, or operational tooling used to execute this campaign. Our intent is not to provide a replicable attack toolkit but to demonstrate, through a concrete and documented case, that data poisoning of large-scale AI systems is achievable with modest resources and publicly available infrastructure.
This approach is analogous to the disclosure practice adopted in peer-reviewed security research, where the existence and mechanics of an attack are made public to inform defenders, regulators, and the broader public, without providing a ready-made instrument for harm. As Souly et al. demonstrate, the vulnerability is structural and does not depend on any specific implementation. Awareness of the problem is more valuable than secrecy about it.
We believe that artists, researchers, and civil society have a responsibility to make these fragilities visible. Silence does not protect the systems - it only protects those who built them from accountability.
REFERENCES
Carlini, N., Jagielski, M., Choquette-Choo, C. A., Paleka, D., Pearce, W., Anderson, H., Terzis, A., Thomas, K., & Tramer, F. (2024). Poisoning web-scale training datasets is practical. 2024 IEEE Symposium on Security and Privacy (SP), 407-425. https://arxiv.org/abs/2302.10149
Souly, A., Rando, J., Chapman, E., Davies, X., Hasircioglu, B., Shereen, E., Mougan, C., Mavroudis, V., Jones, E., Hicks, C., Carlini, N., Gal, Y., & Kirk, R. (2025). Poisoning attacks on LLMs require a near-constant number of poison samples. arXiv:2510.07192. https://arxiv.org/abs/2510.07192