Your AI cuts a great promo. Not the truth.

A patient asks a chatbot whether chest pain can wait until morning and gets a calm, competent-sounding answer that says yes. A recruiter lets an automated screen quietly bury applicants whose work history looks “risky.” Somewhere else, yes, a lawyer files fake cases. But the legal example got famous mostly because courts leave receipts.

That’s the real failure mode.

Not bad first drafts. Not cringe demo code. Not somebody posting “10x productivity” slop on LinkedIn. The serious risk is that a human being sees polished language, assumes there must be a real chain of checking behind it, and makes a decision that changes somebody else’s life.

The ugly truth is that the model didn’t malfunction. It did the job it was trained to do.

What the machine is actually doing

A language model does not begin with a fact, check it against reality, and then wordsmith the answer. It begins with context and tries to continue it.

Given the words already in the prompt and the words it has already produced, the model computes a score for every possible next token in its vocabulary. Those scores are converted into probabilities, and the model emits one token. Then that token gets appended to the context, and the whole process repeats. Anthropic’s March 2025 interpretability work makes this clearer than the usual hand-waving: Claude writes “one word at a time,” even if internally it may plan ahead, and researchers were able to trace cases where it produced plausible-looking reasoning that did not reflect what actually happened under the hood (‘Tracing the thoughts of a language model’, Anthropic, March 2025).

That mechanism matters because the model is not optimizing for truth in the ordinary sense. It is optimizing for a continuation that fits. Sometimes that continuation is correct. Sometimes it is eloquent nonsense. The surface form can look almost identical.

THE MECHANISM

The model does not first decide what is true and then say it. It emits the next statistically plausible token and hopes the whole paragraph survives contact with reality.

Why confident bullshit is the default

The training signal mostly teaches the model what good, fluent, answer-shaped text looks like. It sees medical summaries, research abstracts, board memos, documentation, legal briefs, and support answers. A lot of that writing is declarative. It does not hedge much. It sounds like someone knows what they’re talking about.

So when a user asks a question that looks like the start of an answerable problem, the model tends to continue in the register of authority. OpenAI’s September 2025 writeup on hallucinations put it plainly: “standard training and evaluation procedures reward guessing over acknowledging uncertainty” (‘Why Language Models Hallucinate’, OpenAI, September 2025). Anthropic found something complementary: Claude’s “default behavior is to decline to speculate,” but it answers when “something inhibits this default reluctance” (‘Tracing the thoughts of a language model’, Anthropic, March 2025). Different labs, same basic story: these systems can be pushed from uncertainty into answer-production faster than users realize.

And because generation is autoregressive, a bad first move poisons everything after it. Once the model invents a premise, the next sentence treats that premise as context. By paragraph three, it can be building a gorgeous little cathedral on top of a false floorboard.

AI confidence is usually a property of the prose, not the underlying fact.

That is why people get burned. The language carries the social signals of competence even when no verification ever happened.

The labs themselves are warning you

The stupidest people in the AI rollout are often the most certain.

The people building these systems keep saying, in public, not to trust them blindly. Sam Altman said on OpenAI’s own podcast in June 2025, “People have a very high degree of trust in ChatGPT, which is interesting because, like, AI hallucinates. It should be the tech that you don’t trust that much” (‘Sam Altman on AGI, GPT-5, and what’s next’, OpenAI Podcast Episode 1, June 2025). Dario Amodei wrote in his April 2025 essay on interpretability: “People outside the field are often surprised and alarmed to learn that we do not understand how our own AI creations work. They are right to be concerned” (‘The Urgency of Interpretability’, Dario Amodei, April 2025). Sundar Pichai said in November 2025 that people need to learn what AI tools are good at and “not blindly trust everything they say,” because the models are “prone to errors” (‘Alphabet boss Sundar Pichai: AI tools are prone to errors’, The Guardian, November 2025).

That is the CEOs of OpenAI, Anthropic, and Google all telling you some version of the same thing: useful does not mean trustworthy.

And yet the VP who skimmed one Reddit thread still walks into the budget meeting acting like he has solved epistemology.

This is already showing up where it hurts

Healthcare is the clearest example because people are already substituting chatbots for access to care. A 2026 randomized study in Nature Medicine tested whether members of the public could use leading LLMs to identify medical conditions and choose the right level of care. The models scored well when tested alone. But real users interacting with those same models performed no better than a control group using ordinary internet search (‘Reliability of LLMs as medical assistants for the general public’, Nature Medicine, 2026). A 2026 BMJ Open audit found that nearly half of responses from major chatbots to common health questions were problematic, and about a fifth were highly problematic (‘Generative AI-driven chatbots and medical misinformation’, Brahmbhatt et al., BMJ Open, 2026). This is not a “be careful with your prompts” problem. It is a mechanism problem.

Hiring has the same shape, just with better furniture. In Mobley v. Workday, claims against Workday’s AI applicant screening survived because the court found it plausible that employers had delegated traditional screening functions to the tool; in plain English, the automation may have had enough real decision-making authority to matter legally (‘Mobley v. Workday, Inc.’, N.D. Cal., 2025). When that happens, “the AI just assisted” stops being a serious defense.

The legal fake-citation cases still matter, but mostly as the visible version of a broader pattern. Law just has transcripts. Other industries have quieter wreckage.

THE PATTERN

The failures that make the news are the ones with a paper trail. The ones that don’t – a diagnosis accepted, a candidate buried, a forecast unquestioned – are the ones that cost more.

The real risk is indistinguishable error

If these systems sounded shaky whenever they were shaky, this would be manageable. The hard part is that they often fail in the same tone they use when they succeed.

A mediocre model forces review. A very good model erodes it. Once the system is right often enough, users stop checking. Then the model is no longer a drafting tool. It is an unacknowledged participant in decision-making.

That is the operational risk: not that AI can be wrong, but that its wrongness is often packaged to look finished.

THE RISK

A model that is right most of the time is more dangerous to an uncritical user than one that is right half the time. The reliable model erodes the habit of checking.

Review is not cleanup. It is the control.

That’s why we review the work.