Test and trace scandal: How bungled spreadsheets have been feeding disaster since 1800 BC

Spreadsheet illustration
Spreadsheet illustration

When the affair of the London Whale became public in 2012, it sent shockwaves through the global financial system. Senior traders at JP Morgan Chase, scrambling to mitigate existing losses, had made a series of complex financial bets so big that they had visibly shaken the market.

In the end, the bank lost more than $6bn (£4.6bn) and paid almost $1bn in fines. It was, as long-serving chief executive Jamie Dimon said later, "the stupidest and most-embarrassing situation I have ever been a part of". And it might not have happened if not for a badly-written Excel spreadsheet.

As it turned out, JP Morgan's financial wizards had built their whole financial risk assessment system as a series of Excel spreadsheets, with crucial data being copied and pasted manually between them. Some of the formulas had errors, leading the system to dangerously underestimate the level of risk.

Today, Britain's health services appear to have suffered an Excel error even more consequential. We do not yet know how much suffering, or even death, the temporary loss of 16,000 coronavirus case records by Public Health England (PHE) – and the resulting distortion of England's Covid-19 statistics – may have caused.

But there is good evidence that it happened in part because PHE used an outdated version of Microsoft's ubiquitous 33-year-old spreadsheet software, often bundled with Windows and pre-installed on millions of computers, as an essential component of a £12bn track and trace system of critical national importance.

"Staggering incompetence," said Sarah Wollaston, the former Conservative and then Liberal Democrat MP for Totnes who was a GP before entering Parliament. "Please don’t use Excel when what you really need is a proper database!" said Christina Warren, a developer liaison at Microsoft. "That's...." said Wikipedia founder Jimmy Wales, trailing off as if lost for words.

Even "Squeezy" John Spiers, a squeezebox player and founding member of the English folk band Bellowhead who studied natural sciences at Cambridge, weighed in, calling it "weapons-grade incompetence".

Yet perhaps the most damning verdict came from Simon Thorne, a senior lecturer in information systems at Cardiff Metropolitan University, who called it an "astronomical failure" and called on whoever was responsible for the "baffling" decision to be fired.

It was not so much his words themselves that stung – he does not seem to be a fan of this Government in general – as his expertise: he is head of the European Spreadsheet Risks Group (EuSprig), an academic body that has been attempting to warn business leaders about the dangers of spreadsheets for more than a decade.

Bungling spreadsheets since 1800 BC

On its website, EuSprig has compiled a long list of "horror stories" at banks, airlines, police forces and the London 2012 Olympics committee where duff spreadsheets caused total financial discrepancies of almost £8.7bn – not including non-financial costs such as rigged elections in Malaysia and private data leaks.

One of the most consequential was a 2010 research paper by the economists Carmen Reinhart and Kenneth Rogoff, whose finding that too much national debt leads economies to suddenly stutter was cited by numerous hawkish politicians attempting to trim back government budgets in the wake of the 2008 crash.

In fact, Reinhart and Rogoff had omitted five countries out of 20 from their Excel spreadsheet, meaning that their original average GDP drop of 0.1pc was really an average increase of 2.2pc.

Slapdash use of Excel has also been blamed for a plague of errors in studies of genetics, potentially clouding our understanding of our own nature. Genes such as Septin 2 (often abbreviated to "SEPT2") and Membrane-Associated Ring-CH-Type Finger 1 (understandably abbreviated to "MARCH1") had been automatically converted into dates by the helpful software.

Rather than try to change the way Excel worked or wrangle with its options, the genetic decided to simply rename 27 human genes. One top scientist despaired that Excel should only be used for "lightweight scientific analysis", and never "clinical trials".

Nor is this a new problem. The first known spreadsheet error appears to have occurred around 1800 BC, in a Babylonian accounting tablet now known as Plimpton 322. Scholars generally agree that the tablet contains six mathematical errors, although it is not known what risk those errors caused. By comparison, The Telegraph was late to the party when it warned readers about the issue in 2005.

PIC BY PAUL GROVER AT THE BRITISH MUSEUM WHICH AS PART OF A MAJOR NEW INITIATIVE IS TAKING PART OF ITS COLLECTION ON THE ROAD ONE OF THE PIECES IS A NEW ACQUISITION THE 'QUEEN OF THE NIGHT' A 4,000YR OLD TERRACOTTA RELIEF OF A BABYLONIAN GODDESS - Paul Grover/Telegraph
PIC BY PAUL GROVER AT THE BRITISH MUSEUM WHICH AS PART OF A MAJOR NEW INITIATIVE IS TAKING PART OF ITS COLLECTION ON THE ROAD ONE OF THE PIECES IS A NEW ACQUISITION THE 'QUEEN OF THE NIGHT' A 4,000YR OLD TERRACOTTA RELIEF OF A BABYLONIAN GODDESS - Paul Grover/Telegraph

'Simply not what it was designed for'

Despite all this, most statisticians and data scientists seem to agree that Excel itself is not the problem. Just as Bruno Iskil, the "London Whale" himself, has always maintained that he was merely the patsy for a systemic corporate failure, the fault appears not to be Excel but in ourselves.

"This is not Excel's fault, any more than it would be the fault of a Swiss army knife if you used it to build a bridge and the bridge fell down," said Adam Jacobs, a medical statistician and director of biostatistics at the clinical trial consultancy Premier Research, on Monday.

Instead, he said, "the fault is with whoever decided to use it in critical systems".

Speaking to The Telegraph, he elaborates: "[This is] simply not what it was designed for. Excel is really for doing calculations on a relatively small scale, not for handling large quantities of data."

For example, he says it has caps on data storage, especially in the older versions that PHE was reportedly using. It also comes with "zero data validation or error checking" by default, and makes it hard to set restrictions on what type of data can be entered into each cell, as any "grown-up database" would allow.

Working in clinical research, he says he does not use Excel or handle serious medical data. But some "poorly-funded" academics might, largely because so many people already have it on their computers and know how to use it. By comparison, more specialised data-crunching methods usually require programming language.

"The beauty of Excel is that your data is right there on the screen in rows and columns," says Louis Goddard, a data investigator at the anti-corruption charity Global Witness, who is more used to working with the stats-focused programming language R.

"Proper databases are much better for managing large and complex data sets, but they hide them away and make you write arcane ‘queries’ in complex coding languages to get at them.

"Data scientists hate Excel because it’s so easy to make catastrophic mistakes when using it, but it remains the lingua franca even in high-risk fields like investment banking, where knowledge of all the Excel keyboard shortcuts is a status symbol."

Indeed, while PHE's use of Excel may have been unsafe in principle, it also made very specific mistakes. While the most recent Excel file format, .xlsx has room for about a million rows of data – and bear in mind that "recent" here means since 2007 – PHE was still using the older .xls file format (est. 1987), which has room for only 65,536 rows.

Unfortunately, it may be impossible to wean the world off using Excel for such serious purposes. Some software specialists argued that in many situations, where patchily-trained employees in budget-strained institutions must bodge together whatever solutions they can, Excel is impossible to avoid in practice.

One source told British cyber-tabloid The Register that Excel was the "default for all tech in all of the NHS and related quangos and other bodies... to bridge all the gaps that the 'proper' tech hasn't been designed to cope with." A pseudonymous cybersecurity blogger claimed to know of nuclear power plants and airports that depended on Excel spreadsheets, or something as basic.

Even PHE seems unable to kick the habit. According to reports, it has temporarily solved the problem by splitting up its big, broken Excel spreadsheets into batches of smaller Excel spreadsheets.