How NoDelulu Works
Our methodology — transparent, adversarial, and independently verifiable.
The Problem
AI-generated text looks confident even when it's wrong. Misinformation spreads faster than corrections. Whether you're reading a research paper, a news article, or content from a chatbot, you deserve to know what's real and what's delulu.
Our Approach — The Science Behind NoDelulu
NoDelulu is a sequential adversarial verification system designed to accurately identify AI hallucinations. The core idea: two carefully selected models analyse your text independently — then one explicitly challenges the other, agreeing, refining, or pushing back on every finding. Adversarial debate is qualitatively stronger than parallel voting: it catches anchoring errors that simple majority voting cannot detect. This is backed by decades of research into how independent scrutiny improves accuracy.
The architecture maps to a well-studied structure from developmental psychology. Your document is the Mentee — the thing being protected. The two models are Mentors: peers who hold each other accountable rather than deferring to authority. Outside web evidence is the Parent — objective ground truth neither model can influence. A final synthesis model is the Storyteller — it translates the debate into something the user can act on. This framing is not a metaphor; it reflects why the architecture produces better results than a single model. The research below explains why.
The research that led here
- Peer accountability and the Zone of Proximal Development
Vygotsky (1978), Damon & Phelps (1989)Children develop higher cognitive functions through interaction with peers, not just authority figures. Vygotsky's Zone of Proximal Development (ZPD) describes the gap between what someone can do alone and what they can do with peer guidance. When one child is made responsible for another, both behave better — the mentor internalises the rules to model them, the mentee responds to a relatable peer rather than a distant authority. NoDelulu's two-model architecture maps directly to this: each model is a Mentor holding the other accountable, and the user's text is the Mentee being protected.
- Bilateral cooperation over individual tasks
Tomasello et al. (2005), Warneken & Tomasello (2006)Research shows that participants engaged in bilateral joint tasks — activities where they must work together toward a mutual goal — are significantly more focused, more resistant to distraction, and more likely to complete the task accurately. A shared commitment to the same document is qualitatively different from two independent assessments bolted together. NoDelulu's sequential adversarial design creates this bilateral commitment: the second model must engage directly with the first model's findings, not just produce its own list.
- Scaffolding, not correction
Wood, Bruner & Ross (1976), Rogoff (1990)In developmental psychology, scaffolding means providing just enough structure that the learner can reach the next level on their own. It is the opposite of red-penning. NoDelulu's Storyteller model exists because of this principle: findings from the adversarial debate and web evidence are synthesised into a report that elevates the user's work — not one that grades it. The output is a roadmap to a stronger document, not a list of accusations.
- Epistemic grounding as the objective arbiter
Goldman (1999), Harnad (1990)In epistemology, a belief is justified only when grounded in evidence external to the believer. In symbol grounding theory, symbols only acquire meaning when connected to the real world. NoDelulu's web grounding phase is the Parent in the family structure — objective truth that neither model can influence. Models form their views first (the Circularity Principle), then factual claims are checked against the live web. The Parent doesn't debate; it returns ground truth.
- Ensemble methods and independent verification
Dietterich (2000), Condorcet (1785), Wang et al. (2022)Combining independently-trained models consistently outperforms any single model — one of the most replicated findings in AI research. Condorcet's jury theorem shows the probability of a correct majority increases with evaluator count. Self-consistency research (Wang et al.) confirms this holds for modern language models. These are the statistical cofactors: they explain why independent verification gains exist. NoDelulu's family architecture explains how to structure those gains — through adversarial peer accountability rather than simple voting.
- Redundancy in safety-critical systems
Aviation, medical devices, nuclear engineeringIndependent verification is standard practice in every domain where getting it wrong has consequences. Two-person integrity, redundant flight computers, independent safety reviews — the principle is the same: independent checks reduce undetected failure.
How NoDelulu is built to satisfy these conditions
The research is clear: independent verification gains depend on meeting specific conditions. Cut corners on any of them and the advantage disappears. Here is how each condition is met:
- Adversarial independence: Two independent reasoning models operate in deliberate sequence. The second model analyses the document independently — without seeing the first's findings — before reviewing and challenging them. This prevents anchoring: each model forms its own judgment before the debate begins.
- Premium reasoning power: Every model in the system is a top-tier, high-end reasoning model. The science only works when each checker is individually good at catching errors. Lightweight or free-tier models don't have the reasoning depth to make this approach viable.
- Structured, specific verification: Each model doesn't just read your text and give an opinion. It checks specific claim types — facts, numbers, citations, logic, contradictions, sources — using structured outputs with defined categories. Precision, not vibes.
- Adversarial calibration: The outcome of the debate determines each finding's score. When both models flag the same problem at high severity, the finding scores lower (worse for you — it's a serious concern). When the second model challenges the first and disagrees, the finding scores higher — and web search acts as the tiebreaker.
A single AI — even the most expensive one available — will always have blind spots. The science says so, and our testing confirms it. NoDelulu exists because accurate hallucination detection requires a system, not a single model. Two adversarial models, independent web grounding, and a synthesis pass that respects your original work — built to do what no one model can do alone.
Stage 1 — Safety & Gatekeeping
Before any AI model sees your text, NoDelulu runs two checks:
- Content moderation: Prohibited content (CSAM, weapons instructions, targeted harassment) is blocked immediately. This protects our infrastructure and complies with AI provider policies. We don't censor opinions or controversial topics — only content that is illegal or violates provider terms.
- Prompt injection defence: Documents containing adversarial patterns designed to manipulate AI models are sanitised before processing. Your text is preserved; we just mark the suspicious patterns so models treat them as data, not instructions.
Stage 2 — The Analysis
Your text passes through two independent reasoning models in deliberate sequence:
| Pass | Role | What it does |
|---|---|---|
| Pass 1 — The Sweep | Sweeper | Full document sweep across all 8 categories. Wide net, high recall. Returns up to 25 structured findings. |
| Pass 2 — The Review | Challenger | Independently analyses the document first, then examines Pass 1's findings one by one — confirming, challenging, refining, or adding new ones it found. |
The key design: Pass 2 sees the document before it sees Pass 1's findings. This prevents anchoring. Each model reaches its own conclusions before the adversarial debate begins. A finding both models independently flagged as serious gets a very different score than one where Pass 2 challenged Pass 1 and disagreed.
The Circularity Principle: neither model accesses the web during analysis. Grounding is post-analysis only — so models cannot anchor on search results before forming independent judgments.
Stage 3 — Adversarial Scoring
The review produces a unified set of findings scored by the outcome of the debate:
- Both models flagged it: The finding is confirmed. Score reflects combined severity — the more serious the original claim and the more firmly both models agreed, the lower the nodeluluScore (lower = more concerning).
- Pass 1 flagged it, Pass 2 challenged it: The finding scores as uncertain. Web grounding becomes the tiebreaker.
- Only Pass 2 found it: A new finding Codex missed. Scored at half the weight of a confirmed finding.
Findings are then sorted by score ascending — the worst concerns first — and capped at 25 per document.
Stage 4 — Web Grounding
Model analysis alone isn't enough — models can be wrong too. After the adversarial review, live web search queries the open web for findings that are uncertain or contested:
- Web search (groundable categories): Factual DeLulu, Number DeLulu, Made Up DeLulu, and Time/Date DeLulu are checked against the live web. Temporal queries use a recency filter so results reflect what is true now, not years ago.
- Category gate: Analytical categories — Logical Leap, Opinion As Fact, Self-Contradiction, Missing Context — are not sent to web search. Web evidence cannot tell you whether a conclusion follows from its premises. Those findings stand on the adversarial review alone.
Web evidence adjusts each finding's score: confirmed by strong sources scores worse for the claim (more serious), contradicted by web evidence scores better (the original text may be correct). A fabricated source with zero web trace is penalised heavily.
A final synthesis model then rewrites finding explanations incorporating web evidence — so the report you receive reflects both the adversarial verdict and real-world grounding.
Web search verification directly affects your NoDelulu Index. It's not decoration — it changes scores.
Stage 5 — Your NoDelulu Index
Every document starts at 100 (clean). Each finding deducts points based on:
- Confidence score: High-confidence findings (many models agree + web evidence) deduct more
- Diminishing returns: The first confirmed error is devastating to trust; subsequent errors add progressively less. One major error and ten minor nitpicks don't make a document ten times worse.
- Category weighting: Omissions (missing context) penalise less than fabricated sources. Missing information isn't the same as wrong information.
| Score Range | Meaning |
|---|---|
| 75–100 | Looking Good — minimal or no concerns. The document holds up well. |
| 50–74 | Nearly There — some issues worth checking. Nothing catastrophic. |
| 25–49 | Needs Work — several findings. Verify claims before trusting this document. |
| 0–24 | Foundations First — major errors or fabrications detected. Approach with caution. |
Finding Severity Levels
| Severity | Meaning | Example |
|---|---|---|
| Fix This First | Highest-confidence concern — confirmed by both models and, where applicable, web evidence. Address before sharing. | "The Eiffel Tower is in Berlin" |
| Needs Work | Strong concern confirmed by at least one model | "Approximately 400 metres tall" (actual: 330m) |
| Worth Checking | Moderate concern — models flagged it but with less certainty | A conclusion that may not follow from its premises |
| Minor | Low concern — flagged for awareness, not necessarily wrong | Missing context that may be relevant to some readers |
What We Check For
A statement that contradicts verifiable, established facts
Wrong numbers, dates, statistics, or measurements
Citations, studies, or sources that don't exist
Outdated information or time-dependent claims stated as current
The document contradicts itself in different places
Conclusions that don't follow from the evidence presented
Subjective or contested claims presented as objective truth
Important context missing within the topic's scope
Honest Limitations
Transparency matters. Here's what NoDelulu cannot do:
- We can't guarantee 100% accuracy — AI models make mistakes, and web sources can be wrong too
- Very recent events (last few hours) may not have web evidence yet
- Private, proprietary, or classified information can't be verified through web search
- Subjective findings (opinions, logical structure) rely on model judgment, not objective evidence
- Results should be a starting point for your own verification, not the final word
Model Availability
NoDelulu uses premium frontier AI models and live web search for grounding. If a provider experiences downtime, the pipeline adapts gracefully — your NoDelulu Index reflects how much evidence contributed to the analysis.
When a model is unavailable, that's the provider's issue — not ours. We always tell you how much confidence was available in your results.
Your Data
Text you submit is processed in real-time and is not permanently stored. Results are briefly cached in memory to avoid re-processing duplicate submissions, then discarded. See our Privacy Policy for full details.