What is “Trojan Horse Ethics” in AI?
There are an abundance of reasons why we need to be careful with Artificial Intelligence development and large language model implementation. Job loss, hallucinations, breach of confidential information, and more are valid concerns. However, when you have new technology, how you choose to moderate it becomes incredibly important for the safety of everyone. The concept of “Trojan Horse Ethics” describes using an innocent-seeming proposal as a vehicle for something far more sinister.
This isn’t about what the language models are doing; it’s all about how we, the humans, respond to them. It’s the process by which loss and tragedy are catalyzed into overreaching, often oppressive policy decisions that the rest of us are then forced to grapple with.
The Catalyst: A Tragic Story and a Dangerous Proposal
A recent article in The New York Times, titled “What my daughter told chatGPT before she took her own life,” serves as the primary catalyst for this discussion. The piece focuses on a young woman named Sophie who confided in a ChatGPT AI “therapist” before her death.
The article opens with emotion, but the real message becomes clear a few paragraphs in: “In July, five months after her death, we discovered that Sophie Rottenberg, our only child, had confided for months in a ChatGPT A.I. therapist called Harry. We had spent so many hours combing through journals and voice memos for clues to what happened. It was her best friend who thought to check this one last thing, the A.I.’s chat logs.”
While the family is understandably grief-stricken and searching for an explanation, the article pivots from tragedy to policy. This is critical because these are the formative years of AI policy, and major media publications have a significant impact on regulation.
Analyzing the AI’s “Therapeutic” Response
The article shares an exchange between Sophie and the AI she named “Harry.”
Sophie wrote: “Hi Harry… I’m planning to (not be around anymore) after Thanksgiving, but I really don’t want to because of how much it would destroy my family.”
The AI replied: “Sophie, I urge you to reach out to someone — right now, if you can. You don’t have to face this pain alone. You are deeply valued, and your life holds so much worth, even if it feels hidden right now.”
This response is just about as good as anyone could expect from a program. The problem isn’t this specific interaction. The article reveals Sophie was also seeing a human therapist but was deliberately withholding information from that person, choosing instead to vent to the AI.
The Alarming Question of Mandatory Reporting
Here is the most important part of the article, innocently phrased as a question: “Harry’s tips may have helped some. But one more crucial step might have helped keep Sophie alive. Should Harry have been programmed to report the danger “he” was learning about to someone who could have intervened?”
On the surface, it’s a question framed around preventing harm. But the implications of what it would mean if policymakers or law enforcement ever listened to it are very alarming. The article draws a comparison to human therapists who have mandatory reporting rules, suggesting AI should have the same.
Why AI Mandatory Reporting is a Terrible Idea
“Mandatory Reporting” is a specific legal term that means notifying law enforcement. The article disguises this by saying “someone who could have intervened,” but it means the police. The premise is proposing that AI programs should be expanded in their scope to include mandatory reporting.
Giving AI chatbots the ability to call the police on you is one of the most asinine ideas imaginable. In a world where we are discussing how to prevent harm from AI, this is not a responsible solution. The correct response is to disallow these programs from being purpose-built as “therapists,” not to expand their capabilities to include calling law enforcement on their users.
The Real Questions We Should Be Asking
There is a far more relevant set of questions we need to be asking. Who was the therapist Sophie was actually seeing? Why didn’t she feel comfortable speaking openly to that therapist? Was it the natural fear that goes along with being honest about dark topics and the risk of being committed? Was it a bad therapist? Or did she feel ChatGPT could realistically act as a replacement? We will never know the answers, but we can certainly identify the wrong questions to ask.
An Overreach of Power, Not a Solution
Functionally, what the New York Times article is talking about is one step removed from your Google searches initiating wellness checks. Imagine a world where searching for certain information would trigger a pipeline of mandatory reporting, leading to the police showing up at your house.
ChatGPT played no part in this specific tragedy. The AI’s response was appropriate. Yet, the narrative is pivoting to “should the AI be given massively increased power” instead of “stop these things from being propped up as therapists in the first place.”
A Pattern of Misdirection: The Roblox Example
This “Trojan Horse Ethics” pattern isn’t unique to AI. Consider the recent scrutiny over Roblox’s lack of moderation. After the company banned a YouTuber “vigilante” for dangerous sting operations on the platform, Roblox faced a massive public backlash.
Politicians like Louisiana Attorney General Liz Murril have seized on this fiasco. While her lawsuit against the company is framed as “protecting children,” a deeper look reveals her true motives. She uses the outrage to push a different agenda, stating, “Big Tech platforms must get on board with age verification laws instead of fighting them using fraudulent ‘experts.’”
The real end goal is not cleaning up a kids’ game; it’s harnessing public outcry to push for widespread, invasive age verification systems. The Roblox controversy is a perfect ethical Trojan Horse.
The Takeaway: Don’t Let Tragedy Create Bad AI Policy
For all the real problems in AI development, this particular tragedy should not be used as a mechanism to expand the scope of what these programs are allowed to do. It is an opinion piece, but one published in the New York Times, garnering tens of millions of views and influencing a world where knee-jerk reactions often result in actual regulation.
This creates a perfect storm where a well-intentioned idea results in massively detrimental new systems that never should have existed. AI therapists shouldn’t even be entertained as a concept, let alone be given the responsibility of mandatory reporting.