Generative AI changes the game of scientific fraud

Generative AI changes the game of scientific fraud

Thursday, January 22, 2026

In an era where false scientific material has become easy to create, safeguards are urgently needed if we want to preserve science trustworthiness (By Romain-Daniel Gosselin, 5-6 min read).

This article is an expanded version of one of my recent LinkedIn post

Fraud in biomedical research is a reality, documented by multiple analyses across disciplines suggesting that a substantial fraction of papers contain manipulated, duplicated, or implausible figures. An early report (2016) where more than 20,000 articles were manually reviewed mentioned about 4% of images with duplications. Now, in some subfields, estimates reach figures as high as 30 to 40%. Far beyond honest error or marginal sloppiness, it reflects deliberate manipulation of results.

The dominant narrative still assumes that misconduct is rare and exceptional. This framing is convenient, but unfortunately false at the same time. It plays down the structure of academic publishing that is largely suitable for fraud. Verification efforts are slow, underfunded, and often professionally unrecognised (I am among those well placed to testify) or even risky, as opposed to incentives to publish that reward productivity and speed. Under these conditions, questionable practices spread easily, while fabrication becomes a rational strategy for some.

Paper mills and the industrialisation of fake science

Academic paper mills have been operating for years within the academic ecosystem , and on a large scale. These organisations sell complete manuscripts, with fabricated or recycled datasets, figures, and templated texts. Paper mills thrive because they exploit weaknesses in academic publishing. Firstly, non-professional peer reviewers are already overloaded and are not trained to detect deception. More structurally, the entire publication system is optimised for throughput rather than quality and it does not reward confirmatory studies.

The existence of paper mills directly undermines the reliability of science and erodes trust in the entire scientific process. The academic world tries to reform its publishing models to stop the development of paper mills, but it is a lengthy entreprise.

Generative AI changes the game, a demonstration using biomedical figures

Generative AI pushes this threat to a new level, and the barrier to producing convincing fake science has collapsed. What previously required extensive manual labour can now be achieved through short text prompts and iterative refinement guided by domain knowledge.

The figure below illustrates this shift. It resembles a standard multipanel figure from a molecular dermatology paper. Patient skin photographs, immunofluorescence staining, western blots, RT-PCR. To an experienced reader or reviewer, nothing immediately appears abnormal and the usual remarks apply about experimental controls, molecular weights, signal quality, etc. If you are a biomedical professional accustomed to such multipanel figure, I believe you would not find anything shocking.

Yet, none of the images are real. 

I generated all of them from scratch using ChatGPT or Gemini, and the entire process took roughly an hour. Only the textual annotations were written manually. The realism is further aided by my experience in wet-lab research doing these techniques. It helped me write efficient queries for experimental conventions, relevant controls, signal-to-noise ratios, or expected signal shapes and artefacts. Another confirmation that expert knowledge is warranted for using generative AI, for informed queries and quality checks on the outputs. 

My point here is that the technical barrier has largely vanished, not that anyone can do this instantly. But since experience is likely not the issue for dishonest researchers who are scientists with roughly the same level of experience as me, or for paper mills who likely developed in-house expertise, my prediction is a hidden boom in fake scientific images.

A disturbing improvement of generative models 

As mentioned above, only months ago, generating credible scientific imagery required extensive manual intervention, and the produced images often contained distorted or funny artistic bands and textures, or unrealistic biological structures. Achieving near passable realism required a long series of prompts and obligatory cropping.

This time is over.

The figure below takes the example of western blots to illustrate how the process of creating fake images has spectacularly improved over the last two years. The three panels show immunoblot images generated using a very simple and short single prompt on ChatGPT (in 2024 and 2026) or Gemini (in 2026). The left image (ChatGPT 2024) is from the dataset of my 2025 article about AI detectors.

Although the image from 2024 prompts looks like a failed artistic production of a western blot, the images from 2026 show not only a striking realism that does not necessitate cropping, but also full membranes that makes it difficult to trust even the raw data authors may provide. In case textual annotations PVDF/nitro membrane wrapping are to be removed, no worries! Just a prompt, they would be gone, and the images are would be ready to use.

If you are not a wet-lab biologist used to western immunoblotting, know that the two rightmost images from 2026 are really (really!) realistic.

The limits of AI detection…

Whether these images are detectable by automated AI-detection tools remains uncertain. My experience and intuition suggest they likely are not. The images produced by earlier generations of generative models already largely escaped automated detection, despite being far less realistic. One may imagine that the latest versions of automated detectors have better sensitivity for scientific images, but I doubt it. In this arms race, technologies for creating fake images will likely be running ahead.

…and the collapse of visual trust

From the dawn of scientific publishing, visual inspection functioned as a relatively trustworthy filter. Reviewers used their experience to scan figures for oddities, duplicated bands, inconsistent backgrounds, or implausible results. This approach was always limited, but it provided some safeguard against low-effort fraud. That safety net is no longer sufficient, even raw data of supposedly unprocessed scans can be generated or modified very convincingly. It is now obsolete to assume that fraud leaves conspicuous traces.

Conclusion: a stress test for science

Through this post, my aim is to raise awareness about the fact that algorithms can now generate highly realistic scientific images, where only months ago, outputs were crude, hard to obtain, and only delivered passable images at best. This is an incredible stress test for science, which is in keeping with a global concern about deepfakes in our society.

Evaluation, audition, and verification of scientific evidence, must be modernised. Tools must be developed and deployed to enable anybody to faithfully identify AI-generated images. The role of academic sleuths becomes central and must also be recognised, and perhaps their expertise could be included in continuing education. Good publication practices and AI ethics must be taught to students to ensure that future professionals understand the stakes and non-triviality of scientific fraud, and the safeguard role they will have to take against it. If your understanding reading these lines is that I consider education a central asset for our success against scientific misconduct in an AI era, you are in good company. 

References

Bik E et al. 2016 about the prevalence of image duplications: https://journals.asm.org/doi/10.1128/mbio.00809-16

Aquarius R et al. 2025 about the prevalence https://journals.plos.org/plosbiology/article?id=10.1371/journal.pbio.3003438 

Gosselin RD 2025 on the poor reliability of automated AI-detector for fake western blot detection: https://peerj.com/articles/18988/ 

The 2025 Stockholm declaration on reforming the publication system to fight paper mills: https://doi.org/10.1098/rsos.251805 

A 2025 article giving the scale of paper mill presence: https://link.springer.com/article/10.1007/s00210-025-04275-9 

Parker L et al. 2024 on paper mills https://www.jclinepi.com/article/S0895-4356(24)00305-6/fulltext 

Banner image created with Gemini, text 100% written by RDG.

Matt Spick
wrote
Saturday, January 24, 2026
This is a great article. In some ways, the use of LLMs to make undetectable fake images is a similar 'adversarial adaptation' to the exploitation of Open Science data. If you're worried about getting caught using duplicate images or forensic scientists spotting your fake data, upgrade to LLMs making the images, or churn out AI-assisted 'association' manuscripts using real-world datasets (see the massive increase in 2025 in papers using CDC WONDER or TriNetX... https://tinyurl.com/5bt9tvna . Even Kaggle datasets are being exploited to mass produce papers). These are all low friction ways for bad actors to evade our existing checks. The better we get at finding problems in research publications, the better the bad actors will get at hiding them.

I left a similar comment on your LinkedIn post, but one solution - which likely won't happen - is random audit of facilities by some sort of national science regulator. 5% of studies get a full audit, those that can't evidence the actual work are barred from national funding. This is probably the best answer in terms of deterrent, but would be such a severe paradigm shift, I can't see it happening. Audit is a very real thing in most industries, science has a problem with it though for some reason (unlike aviation, clinical medicine or finance). PubPeer offers an informal post-publication audit mechanism, but even that has modest take-up and limited publisher recognition.

In the short term, more recognition of PubPeer, and perhaps formal recognition of PubPeer and other post publication practices, would help. Many of these issues arise because far more people are incentivised to cheat by the current system. Better incentives to get more people to engage with post-publication peer review would help a lot.
Search