NoDeluluNODELULU
Factual ArticleSample

AI History (Mostly Correct)

A well-researched history of AI milestones — mostly accurate, with subtle issues buried in the prose. See what the pipeline catches in a largely truthful article.

Original Text Analysed(929 words)

The Development of Artificial Intelligence: Key Milestones Artificial intelligence as a formal academic discipline began at the Dartmouth Conference in 1956, where John McCarthy, Marvin Minsky, Nathaniel Rochester, and Claude Shannon gathered to explore whether machines could be made to simulate aspects of human intelligence. The workshop produced no immediate breakthroughs but established AI as a legitimate field of research, attracting funding from DARPA and other government agencies throughout the 1960s. Early AI research focused on symbolic reasoning and rule-based expert systems. Programs like ELIZA (1966) and SHRDLU (1970) demonstrated that computers could process natural language in constrained domains, though they operated through pattern matching rather than genuine understanding. Expert systems such as MYCIN, developed at Stanford in the mid-1970s, could diagnose bacterial infections with accuracy comparable to specialists, but they required painstaking manual knowledge engineering and could not generalise beyond their narrow domain. The field experienced significant setbacks during the so-called "AI winters." The first, beginning around 1974, followed the Lighthill Report in the UK, which argued that AI had failed to deliver on its early promises. Funding dried up across both government and industry. A second winter arrived in the late 1980s when the expert systems market collapsed, partly because maintaining and updating knowledge bases proved economically unsustainable. Neural networks, originally proposed in the 1940s by McCulloch and Pitts, saw renewed interest after Geoffrey Hinton and colleagues demonstrated effective training of deep networks using backpropagation in a landmark 1986 paper published in Nature. However, progress remained slow due to limited computational power and small training datasets. It was not until the early 2010s that deep learning achieved its first major public success when a convolutional neural network called AlexNet won the ImageNet competition in 2012 by a significant margin, dramatically reducing error rates compared to traditional computer vision approaches. In 1997, IBM's Deep Blue defeated world chess champion Garry Kasparov in a six-game match, winning 3.5 to 2.5. While this was widely covered in the media, chess researchers had anticipated that brute-force search combined with expert evaluation functions would eventually surpass human play. Deep Blue evaluated approximately 200 million positions per second, relying primarily on computational speed rather than anything resembling human strategic intuition. A more surprising milestone came in March 2016, when DeepMind's AlphaGo defeated Lee Sedol, one of the world's strongest Go players, 4 games to 1. Go had been considered far more resistant to AI than chess due to its enormous branching factor, estimated at around 250 moves per position compared to roughly 35 in chess. AlphaGo combined deep neural networks with Monte Carlo tree search, trained initially on hundreds of thousands of human amateur games before improving through self-play. The natural language processing revolution accelerated dramatically with the publication of "Attention Is All You Need" in June 2017, introducing the Transformer architecture. The paper, authored by researchers at Google, replaced recurrence with self-attention mechanisms, enabling far more efficient parallelisation during training. Within a few years, Transformer-based models had become the dominant approach across most NLP tasks, largely displacing recurrent architectures like LSTMs and GRUs. OpenAI released GPT-2 in February 2019, a 1.5 billion parameter language model that generated surprisingly coherent text. The organisation initially withheld the full model citing concerns about misuse, though it was eventually released in stages. GPT-3 followed in June 2020 with 175 billion parameters, demonstrating remarkable few-shot learning capabilities. Research published in early 2020 by Kaplan et al. at OpenAI documented scaling laws suggesting that model performance improves predictably with increases in model size, data, and compute, though the relationship follows a power law with diminishing returns rather than linear improvement. ChatGPT, launched in November 2022, brought large language models to mainstream attention. Built on GPT-3.5, it reached an estimated 100 million monthly active users within approximately two months of launch, based on analyst estimates from UBS using web traffic data from Similarweb. The rapid adoption highlighted both the potential and the limitations of conversational AI, as users quickly discovered that the model could generate plausible-sounding but factually incorrect responses — a phenomenon researchers call "hallucination." Reinforcement Learning from Human Feedback (RLHF), described in detail by Ouyang et al. in a March 2022 paper, became a key technique for aligning language model outputs with human preferences. The approach involves training a reward model on human comparisons of different outputs, then using that reward model to fine-tune the language model via proximal policy optimisation. While RLHF has meaningfully improved the helpfulness and safety of models like ChatGPT and Claude, researchers note that it can also introduce biases toward agreeable-sounding responses rather than strictly accurate ones. Image generation models like DALL-E 2, Midjourney, and Stable Diffusion emerged in 2022, producing images from text descriptions with remarkable quality. These diffusion-based models trained on large datasets of image-text pairs scraped from the internet, raising significant copyright and ethical questions. According to a 2023 report by McKinsey, generative AI could add between $2.6 trillion and $4.4 trillion annually to the global economy, though these projections carry substantial uncertainty and depend heavily on adoption rates across industries. As of early 2025, frontier AI models from OpenAI, Anthropic, Google DeepMind, and Meta continue to improve on standard benchmarks, though the rate of improvement on some established benchmarks appears to be decelerating as models approach human-level performance on those specific tasks. Researchers are actively developing new, harder benchmarks such as GPQA and Humanity's Last Exam to better measure capabilities at the frontier. The field remains characterised by rapid progress, significant commercial investment, and ongoing debate about both near-term safety risks and longer-term questions about artificial general intelligence.

NODELULU Hallucination Report

NoDelulu Index: 46/100

Delulu creature
Foundations FirstNeeds WorkNearly ThereLooking Good
Multi-pass analysisWeb verification

15 findings · 10 Mar 2026, 02:57 · AI History (Mostly Correct)

Download:

NODELULUHallucination Report Findings

Hallucination types

Factual DeLulu8
Number DeLulu2
Made Up DeLulu0
Time/Date DeLulu2
Logical Leap1
Opinion As Fact1
Self-Contradiction0
Missing Context1
related finding

Findings are AI-assisted and should be verified. Learn more

NoDelulu — AI Hallucination Detector