AI research, long seen as a leader in technical rigor, has been jolted by a new integrity concern after the AI detection startup GPTZero published findings showing that a number of accepted papers at one of the world’s foremost academic conferences contained AI-hallucinated citations.
The analysis focused on research from NeurIPS 2025 — the Conference on Neural Information Processing Systems, widely considered the premier annual gathering for machine learning and artificial intelligence research. GPTZero scanned 4,841 accepted papers and identified at least 100 fabricated or “hallucinated” citations distributed across around 51 different publications.
What Does “Hallucinated Citations” Mean?
Large language models like GPT-series systems are incredibly good at generating plausible-looking text — including academic citations. However, they don’t verify whether a cited paper, author, DOI (digital object identifier), or publication really exists. These systems may confidently invent details that sound plausible but are actually fictional. GPTZero’s analysis flagged these bogus references, which can mislead readers and undermine the trustworthiness of a research paper.
This phenomenon is sometimes referred to as “vibe citing” — where fabricated references look credible until checked closely. GPTZero’s team manually verified suspicious citations to confirm they were indeed nonexistent rather than rare or obscure sources.
Why This Matters for Academic Research
NeurIPS has an acceptance rate of around 24.5 percent, meaning accepted papers are highly competitive and carry significant weight in the careers of researchers. A fabricated citation doesn’t necessarily mean the core research is flawed, but it does raise important questions about the reliability of review processes and the role of AI tools in academic writing.
Citations are a cornerstone of academic work: they allow researchers to trace the lineage of ideas and verify results. When citations are fabricated, it undercuts the verifiability of a paper — even if the methods and results stand on their own.
How Did These Hallucinations Slip Through Peer Review?
GPTZero’s report and subsequent reporting from media outlets suggest a few contributing factors:
- Flooded review processes: Submissions to top AI conferences have exploded in recent years, overwhelming volunteer reviewers and review infrastructure.
- Use of large language models: Authors under time pressure may use AI tools to help generate citations or boilerplate text, inadvertently introducing fabricated references.
- Reviewer focus on content over metadata: Peer reviewers are experts in specific subfields and may not systematically verify every bibliography entry, especially under heavy workloads.
Even well-intentioned researchers who use AI assistance responsibly can miss problems if they fail to cross-check automatically generated citations. This risk is a reminder that AI tools are assistants — not replacements — for careful scholarly practice.
GPTZero’s Role and Response
GPTZero, originally developed as an AI text detection tool to help educators, publishers, and professionals identify AI-generated content with high accuracy, has expanded its capabilities to include what it calls a Hallucination Check specifically for academic references.
According to the startup’s founders, this analysis is intended not as an accusation of misconduct but as an early warning for research communities that rely increasingly on AI-assisted writing. GPTZero’s goal is to encourage better verification and tooling to ensure scholarly integrity in the age of generative AI.
Academic Community Reaction and Next Steps
The findings have already sparked debate in the research community about how best to adapt peer review procedures for an ecosystem where AI tools are widely used:
- Stronger automated checks: Some conference organizers are considering integrating AI-based citation verification tools into their submission and review workflows.
- Clearer guidelines for AI assistance: Researchers may be asked to disclose the extent to which they used generative AI in drafting and citation creation — similar to authorship and data availability statements.
- Educational focus: Many in academia stress the importance of training researchers to critically evaluate and verify all sources, even those suggested by generative systems.
Broader Implications
The GPTZero analysis underscores a broader tension in the AI era: tools that help accelerate research and writing can also inadvertently introduce errors if left unchecked.
As generative models become more capable, the academic community faces a reckoning about how to balance innovation with rigor — ensuring that AI enhances research without compromising the fundamental standards that give scientific literature its credibility.
In this light, the GPTZero findings are less a scandal and more a wake-up call: even at elite conferences like NeurIPS, systems and norms must evolve to ensure integrity keeps pace with technological change.

