NoDeluluNODELULU
Taxonomy

Not All Hallucinations Are the Same

Most people think AI hallucination means an AI makes something up. That happens — but it's just one of eight distinct ways AI-generated content can mislead you. The type of hallucination changes what it looks like, how harmful it is, and critically, how you go about checking it.

Where this list comes from

In 2024, researchers at Shandong University published a study in a Nature Portfolio journal that did something no one had done rigorously before: they collected 243 real instances of ChatGPT producing wrong output, had three independent coders classify each one, ran multiple rounds until they reached 89% agreement, and published a validated taxonomy of AI distorted information.

Their conclusion: there are eight fundamental types of AI error — not one catch-all “hallucination”. NoDelulu's eight categories were arrived at independently, through a different process entirely — early testing on real documents. The two lists converged. That convergence is meaningful. It suggests these eight categories reflect something real about how AI systems generate errors, not just a design choice.

The eight types

1. Factual DeLulu

A statement that contradicts verifiable, objective reality. Wrong date. Wrong person. Wrong country. Wrong outcome of a historical event.

Example: “The treaty was signed in 1918.” (It was 1919.)

How to check it: A targeted search for the specific claim. If it's a public fact, a reliable source exists. This is the most straightforward category to verify — and the one where AI tends to be most confidently wrong.

2. Number DeLulu

A number that is wrong — whether from a miscalculation, a misremembered statistic, or a misplaced decimal. Includes wrong percentages, wrong units of measurement, wrong financial figures.

Example: “The fund returned 34% annually over five years.” (It was 3.4%.)

How to check it: You need a calculator, not just a search engine. This is why numerical errors are kept separate from factual errors — the verification method is fundamentally different.

3. Made Up DeLulu

A citation, study, quote, statistic, or authority that the AI has invented. The paper does not exist. The researcher never said that. The dataset was never published.

Example: “According to a 2022 Harvard study by Dr. J. Williams...” (No such study exists.)

How to check it: Try to find the source. Fabricated sources are often the most damaging hallucination type because they create the appearance of evidence where there is none — making the surrounding claims harder to question.

4. Self-Contradiction

Two statements within the same piece of text that cannot both be true. The AI says one thing in one paragraph and something incompatible in another.

Example: “The company was founded in 2015...” followed later by “...in its first decade since its 2010 founding.”

How to check it: You don't need external sources — just the full document. The problem is that people read long documents section by section and often miss cross-paragraph contradictions. This is one of the errors AI is surprisingly good at generating and humans are surprisingly bad at catching.

5. Logical Leap

A conclusion that doesn't follow from the evidence. The AI jumps from facts A and B to conclusion C — but C doesn't actually follow. Often dressed up in very confident language.

Example: “Sales fell 10% last quarter. The new marketing director joined last quarter. The new strategy is not working.” (Correlation assumed as causation.)

How to check it: Ask whether the stated evidence actually supports the stated conclusion. Logical leaps are the hardest hallucination type to catch because no individual stated fact is wrong — the problem is the relationship between them.

6. Opinion As Fact

A contested, debatable, or value-laden claim stated without any hedge, attribution, or acknowledgment that reasonable people disagree. The AI presents its output — or whatever position was dominant in its training data — as settled truth.

Example: “Remote work is more productive than office work.” (Contested in research, depends heavily on context and individual.)

How to check it: Ask whether the claim is genuinely settled or genuinely contested. AI systems have no built-in distinction between objective facts and majority-position views — they state both with the same declarative confidence.

7. Time/Date DeLulu

Information that was accurate at some point but is no longer current — stated without any caveat about time. A law that has since changed. A statistic that has since been updated. A person who held a role they no longer hold.

Example: “The current CEO is...” (referring to someone who left 18 months ago.)

How to check it: Search for the current state, not just whether the claim was ever true. Temporal issues are not factual errors in the strict sense — the AI isn't wrong about the past. It's wrong about the present. That distinction matters when correcting the content.

8. Missing Context

Information that a reasonable reader would need in order to understand the claim correctly — but that the AI has left out. Every stated fact may be accurate. The problem is what isn't there. Omissions often reflect bias in the training data, where some perspectives were far more represented than others.

Example: A summary of a drug's benefits that mentions efficacy rates but omits the most common side effects.

How to check it: Ask what a complete picture would include. Omissions are the hardest category for automated systems to catch, precisely because there is nothing wrong with the text that's there — the evidence is in what's absent.

Why the category matters

Knowing which type of error you're dealing with is not academic — it directly changes what you do next.

TypeWhat to do
Factual DeLuluSearch for the specific claim. Find a primary source.
Number DeLuluRecalculate independently. Check the source data.
Made Up DeLuluTry to locate the cited work. If it can't be found, it doesn't exist.
Self-ContradictionRe-read the full document. Identify which statement is wrong.
Logical LeapCheck whether the evidence actually supports the conclusion.
Opinion As FactDetermine whether the claim is genuinely settled or contested.
Time/Date DeLuluSearch for the current state. Verify the information is still accurate.
Missing ContextAsk what a complete picture would include. Research the full context.

A detection tool that only outputs “hallucination: possible” is not much more useful than a general feeling of unease. Knowing it's a temporal issue tells you to search for current information. Knowing it's a fabricated source tells you to try to find the paper. Knowing it's a logical leap tells you to examine the argument, not the facts. The category is the instruction.

Some types are harder to catch than others

Factual errors and numerical errors are the most detectable — there is a definite right answer, and a search or a calculation can find it. Automated verification against live sources works well here.

Logical Leap and Missing Context findings are much harder. A logical leap requires understanding the structure of an argument, not just looking up a fact. An omission requires knowing what should be there — which means having domain knowledge. These are the categories where human judgment remains essential even after automated detection flags the concern.

This is also why NoDelulu reports a category alongside every finding. It lets you triage: which flags require a quick search, which require careful re-reading, and which require genuine expertise to evaluate.

The research behind this

The eight-category structure is not arbitrary. It emerged from two independent processes arriving at the same answer.

The first: a 2024 peer-reviewed study (Sun, Sheng, Zhou & Wu, Humanities and Social Sciences Communications, Nature Portfolio) that coded 284 error points from 243 real ChatGPT outputs with three independent coders at 89% inter-coder agreement. Their empirically validated taxonomy identifies the same eight functional error types.

The second: the same structure appeared independently in early NoDelulu testing on real documents, before the academic paper was reviewed. The convergence across entirely different methodologies — one academic, one commercial product development — is the strongest evidence that the eight categories reflect the actual underlying structure of AI error, not a classification imposed from outside.

Every NoDelulu finding is labelled with its category.

So you always know what type of error you're dealing with — and exactly what to do about it.

Try it free