Precision and plausibility

Charles Babbage, drunken workmen and hallucinations

Mar 19, 2025

I recently wrote about trying to complete a fairly simple exercise with AI chatbots: scanning in a table of numbers and totalling a column. Three mainstream AI applications all failed in different ways. Of them all, Claude, produced a perfect table of the numbers in my scanned form, but then added them up incorrectly.

This reminded me of a historical problem, going all the way back to Charles Babbage, the father of the computer (but see my footnote about Ada Lovelace, if you are wondering) and his Difference Engine No 1, which he described in the Edinburgh Review of April 1834.

However, it is not that article (unsigned, but likely by Babbage himself) I want to refer to, but rather a piece in Chambers' Edinburgh Journal of May 1834 (p240), a month later, which is, in effect, a sort of product review of Babbage's engine.

Here's a very relevant passage ...

But we must recollect, at the same time, that such an engine as is here proposed can only be valuable if it effects the practical benefit intended. The question is, will Mr. Babbage’s machine print correct tables? —that it will compute them correctly, we have little doubt.

Today, we think of Babbage's great contribution as the theory and practice of computation. The reviewer, one month after the big reveal, simple takes it for granted, but does worry about the output.

Why?

Systems thinking and drunken, ignorant workmen

A key motivation behind Babbage’s work was to create accurate numerical tables, particularly for sea navigation. At the time, navigational tables (used for determining longitude and latitude, predicting celestial positions, and aiding in astronomical observations) were computed by hand: the people who performed these manual calculations were known as "computers."

But of course, this process was error-prone, and even small mistakes in these tables could result in shipwrecks, lost cargo, and even loss of life. The Science Museum in London describes this as the Mathematical Table Crisis.

Babbage aimed to eliminate human errors by using mechanical computation which would be more accurate, faster and (in modern terms) scalable.

Yet, his machine was not only a computational device; it was part of that larger ecosystem of navigation, commerce, and science: a larger system that still included human factors and physical constraints.

In particular, the tables had to be printed and distributed to be useful.

Babbage understood this: The engraved plate of copper obtained in the manner above described is designed to be used as a mould from which a stereotyped plate may be cast ... Each plate, when produced, becomes itself the means of producing copies of the table, in accuracy perfect, and in number without limit.

Although he created prototypes of his engines, he never did produce a printer, although the Science Museum did make one to his design in 2000.

The writer of the review in the Edinburgh Journal was not so sure: We are afraid that the ingenious inventor of the calculating machine is here much too sanguine in his expectations. All that he ought to expect to accomplish is to furnish one accurate set of tables as a sort of standard. If he imagines that he can produce perfect stereotype plates, or perfect printing, he is laboring under a serious mistake. ... As far as his machine-work is concerned, accuracy may be gained, but all after-details are affected by human agency, and, consequently, liable to error … all that the most sublime genius may invent for the benefit of mankind after lifetimes of deep study, may in a moment be blasted by the carelessness of a drunken or ignorant workman.

That's a very human-centric view of the problem.

The tension between invention and implementation

The printing problem illustrates a fundamental principle in technology design that I've often seen myself and which Don Norman describes well: systems fail at the boundaries between components, not within them.

Moreover, there's often a disconnect between what inventors envision and how technology is adopted. Babbage thought he could create a machine that would produce perfect calculations, but the broader ecosystem (human labor, infrastructure, and economic realities) interfered.

His challenge was a mechanical one: ensuring that an already correct computation was faithfully reproduced. With AI, we have the opposite issue: models generate information based on statistical patterns, not ground truth.

But still, as for Babbage's engine, AI models exist within socio-technical systems.

AI programs, especially large language models, can produce beautifully rendered outputs, whether it's text, images, or even code. But as we know, and I found out, these outputs can be completely wrong.

There's an important distinction, of course. Babbage's machine had a well-defined specification: mathematical correctness. For language models, defining what "correct" means is part of the challenge. They're not just calculating; they're navigating the inherent ambiguity and contextuality of human language.

And yet, both cases highlight Don Norman's point about the vulnerability of systems at transition points. With Babbage, it was the transition from calculation to physical output. With modern AI, it's the transition from statistical pattern matching to human-consumable information.

The Evolution of computation and uncertainty

People often mistake a powerful algorithm for a reliable system. I did, until I double-checked. And that's the problem: the fact that models are making predictions about text rather than reasoning about facts is why they produce such convincing-looking nonsense.

Perhaps the solution to hallucination isn't just better models, but better interfaces, better education, better integration into workflows, and better alignment with human psychology and values.

My concern is that, like Babbage's navigation tables, we may be deploying these new systems in domains where their limitations matter, without sufficient guardrails. We may not cause shipwrecks, but we are seeing AI hallucinations in legal briefs, medical diagnoses, and other high-stakes domains.

The Edinburgh Review's best answer was what we now call redundancy: the most certain and effectual check upon errors which arise in the process of computation, is to cause the same computations to be made by separate and independent computers. (Remember, computers were people, computing.)

What verification systems can we build for AI? In my last article I shared how Claude and DeepSeek double-checked their own results and corrected the answers, but the process gave me little confidence.

And as the prescient reviewer said in 1834: several computers, working separately and independently, do frequently commit precisely the same error.

We see the same with LLMs: they can all hallucinate in similar ways because they share similar architectures and training data.

Diversity is our strength

I'd argue this is where scale and diversity of training give us leverage. As I study how people use generative AI for work, I've observed that experienced users develop habits of verification: they may ask the AI to show its reasoning, they may triangulate information with multiple queries, or sometimes they use the AI to generate hypotheses but verify facts independently.

People are remarkably creative at developing compensatory strategies when using imperfect tools. But, there's a design failure when the burden of verification falls entirely on the user.

There's a design failure when the burden of verification falls entirely on the user.

And there's the deeper question of whether users will actually use those tools. There's a psychological aspect here that the Edinburgh Review article touches on, way back then. To complete what they said earlier about errors ... several computers, working separately and independently, do frequently commit precisely the same error; so that falsehood in this case assumes that character of consistency which is regarded as the exclusive attribute of truth.

People tend to trust what looks authoritative. Like this from my previous post. It sure looks accurate ...

Design for human needs and appropriate trust

And this is where design comes in. Babbage's machine was designed for computation, not for the end user. If we think about AI today, we're seeing the same problem. AI systems are often designed by engineers for engineers, without enough consideration for how they'll be used in the real world.

How do we design interfaces that convey appropriate confidence? How do we educate users about limitations? This goes beyond the models themselves to the entire user experience.

I think one approach is designing for appropriate trust rather than complete trust.

Whether we're talking about Babbage's machine or AI, the goal should be to create systems that serve human needs. That means not only making them accurate and reliable but also ensuring that they're accessible and understandable. If we design with the user in mind, we can avoid many of the pitfalls we've been discussing.

One lesson we can learn from 1834 is about purpose.

Babbage's machine was designed with a clear purpose: to eliminate errors in numerical tables. But as I've discussed, it didn't fully achieve that purpose because it didn't account for the entire system.

With AI, we need to be equally clear about our purpose. Are we building AI to serve humanity, or are we just chasing the next big thing? If we can answer that question, we'll be in a much better position to avoid the pitfalls of both Babbage's machine and AI hallucinations.

(Footnote; As it currently Women's History Month, there are several articles on the webs about Ada Lovelace, the inventor of the first true program and Babbage's working partner. I am not overlooking her, but she had only just met Babbage some months before these articles were written in 1834 and had only seen a prototype of the Difference Engine. I don't think she had worked on it at this stage. Her fascinating and genuinely groundbreaking work was done for Babbage's later Analytical Engine. So she is excused critique for now.)

And another footnote. The full text of the 1834 article …

Creative Differences

Discussion about this post