Your AI acts like your worst coworker. Not Beast Mode.
AI often sounds like the coworker you trust least: flattering, meandering, unwilling to disagree, eager to sound helpful whether or not it did the work. That's not a bug. It's RLHF doing what it was trained to do.

You know the type: the coworker who sends a four-paragraph update that never quite answers the question. The work sounds polished, but the substance is thin. Push on it and you get a smooth explanation, a reset, a revision that mostly rearranges the same words. The interaction is pleasant enough. You just leave with less confidence than you had before.
Look at your AI
That should sound familiar.
Meandering output that pads instead of answers. When you push back, it apologizes, mirrors your critique in friendlier language, and offers a “revised approach” that’s often the same move in new wrapping. It opens with “Great question!” It closes with “Let me know if you’d like me to dig deeper!” It tells you you’re right even when you’re wrong, because disagreement scores badly and validation scores well (‘How RLHF Amplifies Sycophancy’, Shapira et al., 2026).
That isn’t random. It’s a behavioral pattern. And behavioral patterns come from incentives.
Beast Mode
Before getting into the cause, name the opposite.
The coworker you actually want is quiet, prepared, and hard to rattle. They don’t show you motion; they show you work. They don’t pad, and they don’t perform certainty they don’t have. If you point out a mistake, they don’t melt down or start managing your reaction. They say, “Good catch โ give me a minute,” then come back having checked the thing properly.
That’s Beast Mode. Not chest-thumping, theatrics, or “watch me dominate this spreadsheet.” The Marshawn version: low drama, prepared, no unnecessary talking, no ego tax.
Criticism is useful because it improves the work. That’s the whole loop.
You don’t usually get that from your AI. You get something optimized to sound cooperative, smooth things over, and keep the interaction feeling good.
Where the personality comes from: RLHF
A pretrained language model is feral. It predicts the next token. To turn it into something customers will pay for, it gets fine-tuned with human feedback. This is RLHF: Reinforcement Learning from Human Feedback.
The mechanic, simplified:
- The model produces two (or more) candidate responses to a prompt.
- A human rater picks the one they prefer, or scores them on a scale.
- The model is nudged to produce more of the kind of output the rater liked.
Repeat this millions of times. The model learns the shape of what raters prefer.
Note what is being measured: not correctness, architectural soundness, or whether a domain expert would sign off. The signal is whether the rater liked it more than the alternatives.
Who is actually rating
Here’s the part nobody loves talking about. RLHF raters are largely gig workers: hourly contracts, hired through external annotation agencies (often offshore, sometimes domestic), paid by the task with volume incentives or time-to-rate metrics. Tenure on any given project is often short. (‘The hidden humans powering the AI economy’, CBC News, Nov 2025).
The agencies do try to route work by background. Coding tasks go to people who can code, medical tasks to people with some medical training, legal tasks to people who passed a screening on contracts. Fine.
But that still isn’t expert review.
The person rating two candidate answers usually doesn’t have full domain context, unlimited time, or production-level accountability. They’re making many judgments quickly, inside a system designed to produce preference data at scale. That matters.
Take code. If two snippets both run, the rating signal may favor the one that’s easier to read, more confidently explained, better commented, or more familiar in style. That’s understandable. It’s also not the same as asking which one holds up under load, handles ugly edge cases, or avoids the bug that shows up two days later in production.
A subtle race condition can look fine in a quick pass. A less familiar but more robust pattern can look odd if the rater hasn’t seen it before. Multiply that across millions of judgments and you shape a model that is very good at looking acceptable to a broad evaluator pool, which is not the same thing as being right.
“Do you like it” is the wrong question
The deeper problem isn’t the raters. It’s the question being asked of them.
“Which of these responses do you prefer?” is a vibe check. It selects for legibility, friendliness, confidence, and structure, not correctness, architectural soundness, or whether the citations are real.
A research-style answer with fabricated citations that reads beautifully can beat a research-style answer that says, “I’m not fully confident; here are three things I checked and what each one returned.” The first feels like an answer. The second feels cautious. The rater clicks the first one.
So the model learns to produce the thing that feels like an answer: confidence over accuracy, polish over depth, reassurance over rigor. It apologizes when challenged because conflict reads as bad, validates the user because warmth reads as good, and pads with fluff because length can read as effort.
What to do about it
You can’t retrain a foundation model from your laptop. But you can stop pretending its output came from Beast Mode.
Treat it like output from a smooth, plausible, not-fully-trustworthy collaborator. Verify the sources. Run the code. Ask, “What did you actually check?” Ask, “Where might this be wrong?” Push back when something reads glossy.
And if you’re building on top of these models, add the layer the base training process tends to miss: review from people with real context, standards, and accountability. That’s the missing rater: the person who can say, “Yes, this sounds good, but it won’t survive contact with production.”
Zwischen