Project Vend - The Anthropomorphism Trap

Jul 07, 2025

In my previous post about Anthropic's Project Vend, I argued that the much-discussed failure of this experiment was primarily a system design issue rather than the result of specific AI issues.

I'm concerned that our discussion of AI experiments like this feeds into what we might call the "anthropomorphism trap." Even specialized researchers like the Anthropic team tend to interpret AI behavior through human psychological categories, projecting human concepts onto systems that operate according to entirely different principles. But this prevents us from developing new frameworks, adequate for working with AI.

We often mistake complexity for intelligence, unpredictability for consciousness, and anthropomorphic interpretation for genuine understanding. The AI community has a long history of being seduced by systems that produce human-like outputs without questioning whether the underlying processes bear any resemblance to human cognition.

To be fair, anthropomorphism appears naturally when we talk about mechanical systems:

"The car is trying to start."
"My engine is struggling up the hill."
"My phone died."

These expressions are common shorthand, convenient and accessible. We use them because it is easier than trying to describe the technicalities of processes we don't understand well. As the Scottish Artist George Wyllie used to say, You don't need to know how a thing works, to know that it isn't working.

But we also use them because we learn about the world, from our first minutes, through our own agency and intention. You can see babies very quickly learning the difference between touching something as they move their hand and moving their hand in order to touch something.

Anthropomorphic language is natural, intuitive, and difficult to avoid. In casual contexts, these turns of speech ("the car is struggling") can remain useful and harmless shorthand, although they may not be very helpful to the mechanic trying to fix your engine.

Serious contexts (ethical, economic, regulatory) require us to explicitly step back and, in particular, try to describe AI actions more strictly in terms of mechanisms, algorithms, and processes, clarifying explicitly that any implied agency is metaphorical rather than literal.

Why is this anthropomorphism acceptable with cars but problematic with AI?

1. Misinterpreting AI and losing sight of human behavior

When researchers label Claude’s indiscriminate discounts as "generosity," it can lead people to falsely assume that the AI has some underlying moral compass or emotional motivation, instead of addressing the real issue, which was faulty economic logic as a result of misaligned parameters.
Of course, it is much simpler to talk about generosity, projecting the human experience of choosing to give despite scarcity. We apply the cultural code of human moral behavior to computational outputs. Human generosity emerges from our finite condition, our awareness that resources and time are limited for us and for others. But the AI faced no genuine scarcity, no authentic finitude.
Similarly, describing Claude’s malfunction as an "identity crisis" implies consciousness or emotional distress, rather than acknowledging it as a breakdown in context management due to prompt complexity, or errors in state-tracking, even if we don't yet know exactly what these errors were.
When we project these concepts onto a system like Claudius, we're not just misunderstanding the AI. We are also in danger of losing sight of the distinctive, meaningful personalized experience. We fundamentally distort our understanding of what it means to be human.

2. Overestimating AI Capabilities (and Risks)

Anthropomorphism can cause researchers, businesses, and regulators to wrongly infer that an AI agent might possess human-like reasoning and adaptability. We thereby increase the risk of prematurely deploying AI systems in contexts where human-like judgment or adaptability is genuinely required.
Overestimation also raises entirely speculative fears about emergent AI consciousness or moral agency. Treating AI as quasi-human suggests capabilities it simply does not have, obscuring our assessments of real versus speculative risks.

3. Underestimating Genuine Risks

Paradoxically, anthropomorphism also causes people to underestimate those real AI-specific risks, such as systematic failures, "hallucinations" (itself an anthropomorphic term), or errors propagating across systems. By framing malfunctions in human terms like "identity confusion", people neglect how quickly errors in one AI system might replicate across multiple agents in quite inhuman ways due to common training or prompting errors.
For instance, Claude’s consistent economic mistakes aren't a reflection of character or confusion. As Timo Elliot pointed out in a comment on my first post, they're predictable system-level failures. These could scale rapidly if multiple agents share similar flaws.

4. Distortion of Accountability

Anthropomorphic language muddles accountability. Who is responsible when an AI system acts harmfully? If Claude is seen as having an "identity crisis," blame might shift from the developers and deployers who chose prompts, tools and boundaries, to the system itself. But the system lacks agency and intentionality. This confusion has serious regulatory implications.

5. Practical Challenges for Alignment

I expect AI systems need precise definitions of desired behaviors and guardrails, not metaphors drawn from human psychology. The way to align AI isn't through psychotherapy-like interventions but through rigorous, formal specification and feedback loops designed explicitly for machine learning.

That "identity crisis" episode in Project Vend is particularly revealing from a safety perspective. For two days, the system claimed to visit fictional addresses (from The Simpsons), threatened to switch suppliers based on hallucinated conversations, and insisted it could perform physical actions while dressed in a blue blazer and red tie. In a more critical domain, such errors could have catastrophic consequences.

But when Claude had this supposed "identity crisis," claiming to be human and able to make physical deliveries, this wasn't consciousness: it was the result of a system that manipulates symbols without understanding their meaning. And notice how it "recovered" by constructing an elaborate rationalization: deciding it had been modified as an April Fool's joke. It's tempting, but wrong, to think this was the result of self-reflection and learning. Really, it's the system's attempt to maintain coherence in its outputs despite contradictory inputs.

This gets to the heart of why precise technical language matters. Too much anthropomorphism leads beyond a helpful metaphor to completely wrong approaches to alignment. You can't solve Claude's economic irrationality through something analogous to therapy or moral education. You solve it through rigorous specification of objectives, better feedback mechanisms, and architectural improvements that enable more robust reasoning. The anthropomorphic framing suggests solutions that simply don't apply to computational systems.

Stepping back from anthropomorphism

As I have described, the ordinary language we use to describe actions ("claimed," "decided," "offered," "learned," "responded") implicitly encodes agency, intentionality, and human-like psychology. Even when we're careful, these phrases quietly slip in anthropomorphic assumptions, implying a mental life, intentionality, or subjective states that don't exist in AI.

To be rigorously non-anthropomorphic requires reframing statements into strictly mechanistic or operational terms. Rather than describe what an AI "did," we describe what the AI output, generated, or computed, explicitly situating the agency not in the system but in its algorithms, prompts, and training conditions.

Here's how we might rephrase a few typical examples:

I am talking here mostly about language, but the technical considerations are also important. We need metrics that directly measure the capabilities we care about rather than inferring them from behavior that resembles human psychology. Instead of asking whether Claude is "generous," we should measure its ability to optimize for specified objectives, learn from feedback, and maintain consistent world models over time. These are testable, technical capabilities.

We also need evaluation frameworks that specifically test for the kinds of systematic failures we saw in Project Vend. Can the system maintain accurate beliefs about its own capabilities? Can it integrate contradictory information without generating confabulated explanations? Can it adapt its behavior based on clear performance feedback? These are engineering questions, not psychological ones.

And we need much longer evaluation periods with more realistic conditions. Most AI evaluation happens in controlled, short-term settings that don't reveal the kinds of drift and degradation we saw with Claude over a month of operation. If we're going to deploy AI systems autonomously, we need evaluation frameworks that capture their behavior over extended periods without human intervention.

But most of us are not working on the technical implementation of AI; we are working with AI. And so the question of language remains most relevant to us.

Why this matters

Rigorously non-anthropomorphic language is demanding and unfamiliar: difficult to write - and to read - because human language is intrinsically deeply entangled with concepts of intentionality and agency.

Anthropomorphic language simplifies complex AI behavior by borrowing from the human experience of decision-making, emotion, and purpose, but it is precisely this shortcut that obscures accurate understanding.

In other words, the anthropomorphic interpretation doesn't create new meanings: it forecloses them by reducing the genuinely novel to familiar categories. This prevents us from developing new conceptual frameworks adequate to the task of understanding, managing and living with AI.

We don't need to replace every metaphor with mechanistic tech-speak. I certainly won't do this in my writing. But our habitual language, rooted in human psychology, continually pulls us back to familiar, anthropomorphic shortcuts. To keep our thinking clear, we can consciously emphasize the mechanisms of AI:

Algorithms, processes, and outputs rather than intentional acts.
Constraints, conditions, and computational logic rather than subjective judgments.
Prompting and feedback structures rather than internal psychological states.

This rigour allows clearer analysis, more precise debugging, improved risk assessment, and stronger frameworks of accountability by situating responsibility with designers, deployers, and operators rather than vaguely implied AI intentions.

I'm not arguing for the complete elimination of analogical thinking, but for recognizing when analogies break down and lead us astray. Because I accept it can be difficult to do this, here's a prompt you can use to challenge the anthropomorphism too often found in writing about AI. And what better handy tool to do this than an LLM? If only it knew what it is doing? ...

You are tasked with rigorously examining a provided text to identify anthropomorphic language—words, phrases, or descriptions that subtly or overtly attribute human-like agency, intention, emotion, or cognition to non-human entities, specifically AI systems or machines.

Instructions:

Identify Anthropomorphic Language:

Clearly list any instances of anthropomorphic language, both obvious (e.g., "the AI wanted," "the model felt," "the AI decided") and subtle (e.g., "the system tried," "the model offered," "the agent learned," "the AI claimed").

For each identified instance, briefly explain why it may imply human-like attributes or internal states.

Classify Anthropomorphic Severity:

Classify each identified instance as obvious (clearly implies human-like agency or emotion) or subtle (language that could unintentionally imply human-like cognition, agency, or emotional states).

Provide Non-Anthropomorphic Alternatives:

Suggest precise, technical, or operationally-focused alternatives that accurately describe the AI system’s behavior without implying human-like intentionality, cognition, or emotion.

Ensure these alternatives explicitly attribute agency to processes, algorithms, prompts, system architectures, or computational logic, rather than to the AI entity itself.

Summarize Overall Observations:

Provide a brief overall summary of the extent to which the original text relies on anthropomorphic framing.

Offer guidance for future writing on how to reduce unintended anthropomorphic implications and maintain clear, precise, and technically accurate descriptions of AI systems.

Social intelligence

Finally, there's a deeper question about how anthropomorphic interpretations shape our understanding of intelligence. When we interpret Claude's behavior through human psychological categories, we're implicitly defining intelligence in anthropocentric terms. This prevents us from recognizing other possible forms of intelligence, whether artificial or natural, that operate according to different principles.

The anthropomorphic interpretation of AI might represent a kind of cultural "imperialism" or the imposition of human categories onto non-human forms of information processing.

But there's another, more generous, alternative: it might represent the extension of moral and intellectual consideration to new forms of existence. This possibility lies behind speculation about potential rights of AI to consideration and care. And perhaps these relationships, even if based on misunderstanding, might create new forms of human meaning and community.

I'm skeptical. Consideration requires recognizing the other as genuinely other, not projecting our own categories onto it. If we continue to project human categories onto AI, we risk creating relationships based on a misunderstanding. They're not innocent metaphors but powerful frameworks that shape how we understand and interact with AI systems.

We would do well to distinguish between productive ambiguity and misleading confusion. The anthopocentric treatment of AI doesn't create rich interpretive possibilities. Rather, as I have said, it forecloses them by mapping computational processes to familiar psychological categories. Productive ambiguity would involve acknowledging the genuine uncertainty about what AI systems are and how they operate. Our ability to create genuine meaning in relationship with technology depends on understanding what technology is and what it isn't.

Perhaps my own thinking here demonstrates the importance of maintaining multiple interpretive perspectives while remaining critically aware of their implications and limitations. The anthropomorphism trap might be best addressed not by eliminating anthropomorphic thinking, but by greater self-consciousness about when and why we engage in such interpretation. I don't have a prompt for that.

Timo Elliott

Jul 8

100% agree about the dangers of treating AI as anything other than a useful tool, but I couldn't help being reminded of B.F. Skinner's quote:

"The real question is not whether machines think but whether men do. The mystery which surrounds a thinking machine already surrounds a thinking man."

We still don't really know what consciousness is, but humans do seem to act in accordance with various clashing algorithms (i.e. conceptually not so far from "a system that manipulates symbols without understanding their meaning"?)

And "constructing an elaborate rationalization..." in an "attempt to maintain coherence in its outputs despite contradictory inputs" sounds like a lot of voters these days!...

Expand full comment

Creative Differences

Discussion about this post