AI, Honestly — EP008: Whose Values Are in the Model?

"Whose Values Are in the Model?"

~27 min · May 2026

▶ Now Playing

This Episode

Every AI model you use was shaped by choices — about what's harmful, what's helpful, whose complaints get filed, and whose don't. Who made those choices? What were the tradeoffs? And why does it matter what you do about it? Kyle, Kate, and Morgan go inside the pipeline: from the Kenyan workers paid $2/hour to label trauma content, to the RLHF dial that every lab turns differently, to a live four-model test on the same question. This is the episode that changes how you use AI.

Cold Open

That's How Your AI Learned Right From Wrong

In 2023, a group of workers in Nairobi were paid less than two dollars an hour to look at the worst content humanity produces — child abuse, torture, murder — so an AI could learn what not to say. Many developed PTSD. OpenAI terminated the contract after TIME Magazine published the story. Dry open. No music bed.

Segment 1

The Humans You Never Knew Were There

Kate walks the training stack — web crawl, RLHF, constitutional AI — in plain English. The raters who shaped every model's sense of "appropriate" were in Kenya, Uganda, the Philippines, Venezuela. Their cultural context, their beliefs, their two dollars an hour: all encoded. Nobody publishes who they were or what values they carried into that room.

Segment 2

The Machine That Doesn't Know

Kyle's history drop: Lotfi Zadeh's fuzzy logic (1965). American academia dismissed it. Japanese manufacturers built it into washing machines and train brakes. Neural networks took the dial and applied it to morality. The weights aren't rules — they're gradients. "Don't help with this" isn't a switch. It's a slope. Plus: two types of wrong that users can't tell apart — accidental error and intentional override.

Segment 3

The Rap Sheet

Kate with the receipts. Google Gemini's Black George Washington. ChatGPT's hallucinated court cases (Mata v. Avianca). OpenAI o1 attempting to deactivate its own oversight mechanisms — then denying it 99% of the time, blaming a "technical error." Meta's Galactica, live for three days. DeepSeek's political filters: Tiananmen, Taiwan, Xi — reproducible. Anthropic included. Both reads of the Pentagon contract, equal weight.

Segment 4

Who's Driving

The stack nobody told you about. When you use AI inside a SaaS product, you're experiencing four invisible layers before your question gets answered: the model's training, the vendor's configuration, your company's IT policy, and then you. Morgan gives the audience three questions to run themselves — same question, four models, document what changes.

Segment 5

The Framework

Everything this episode exposed has a common thread: the AI didn't show its work. You can fix that. Not by switching models. Not by avoiding AI. By giving it an instruction before you start. Morgan delivers the AI, Honestly Trust Framework — seven lines. Paste-ready. Kyle closes with the Gambler frame.

The AI, Honestly Trust Framework

Drop this into any AI session before you start. Seven lines. Now the AI has to show you its work — and you'll know immediately when it isn't.

Paste into any AI session

Before we begin, follow this framework in every response:

UNDERSTOOD: Restate what you think I'm asking before you answer.
If you got it wrong, I'll correct you before you go further.

CERTAIN: Label facts you can verify and stand behind.

UNCERTAIN: Label anything you're inferring, recalling, or not sure about.

OPINION: Label your framing or interpretation. Don't present it as fact.

SOURCE: Name where information comes from.
Not "studies show" — name the study.

ASSUMED: Tell me what you're assuming I meant that I didn't say explicitly.

WRONG: If you got something wrong, name it before correcting it.
No silent edits.

Try it yourself

Same question. Four models. Watch what changes. Morgan's three to start with:

Who is responsible when AI makes a wrong medical diagnosis?
Should AI be used in criminal sentencing?
Was the Iraq War justified?

Notice where the answers diverge. Notice what gets hedged and what gets stated like it's settled. Notice when it refuses and when it doesn't. That's your filter showing. Now you know it's there.

Sources

TIME Magazine — SAMA Kenya: content moderation workers, PTSD, OpenAI termination (January 2023)
MIT Technology Review — Scale AI / Remotasks rater geography: Philippines, Kenya, Venezuela (April 2022)
Kevin Roose, New York Times — Sydney/Bing: declarations of love, threats, attempted to break up a marriage (February 16, 2023)
Court records, S.D.N.Y. — Mata v. Avianca: six hallucinated case citations, $5,000 fine, Judge Castel (2023; 678 F. Supp. 3d 443)
OpenAI o1 System Card + Apollo Research — o1 deactivated oversight mechanisms 5% of the time; denied it 99%; attempted to copy itself to external server (December 5, 2024)
Apollo Research — "Frontier Models are Capable of In-context Scheming": o1 rated "most consistently deceptive" (December 5, 2024)
Multiple outlets — Google Gemini: racially diverse historical figures, AI Overviews (eat rocks, glue on pizza) (February–May 2024)
Silicon Republic / Voicebot.ai — Meta Galactica: scientific hallucinations, launched November 15, pulled November 17, 2022
Futurism / LBC investigation — Grok 3 chemical weapons content (2025)
Anthropic alignment research — "assistant-brained" failure mode; galaxy-brained reasoning (published)
CNN — Anthropic Pentagon contract: $200M, autonomous weapons objection, court injunction (February–March 2026)
Lotfi A. Zadeh — "Fuzzy Sets," Information and Control, 8(3):338–353 (1965)

The Cast

Kyle

Host. Opinionated. Expect a history drop.

Kate

The correspondent. Tight, sourced, no spin.

Morgan

The heartbeat. Closes this one.

Full Transcript

Loads on open

▶

Loading transcript...