How a Computer Program Helped Reveal J. K. Rowling as Author of A Cuckoo s Calling

Patrick Juola

August 20, 2013 at 7:00 AM

“The man who wrote the note is a German. Do you note the peculiar construction of this sentence?” These were the words of Sherlock Holmes in “A Scandal in Bohemia,” analyzing a note from a client, unmasking the King of Bohemia incognito, and incidentally, establishing himself as a brilliant literary analyst. It is impossible to keep a secret from the legendary Sherlock Holmes, who can read an ocean from a drop of water. Just as the paper would have carried the marks of the royal fingers, to the skilled reader the writing carried the marks of the royal mind.

Fiction has recently become fact with the improving science of stylometry, the study of writing style. In 1964, Frederick Mosteller and David Wallace published a three-year study of the distribution of common words in the Federalist Papers and showed that the writing style of Alexander Hamilton and James Madison differed in subtle ways. For example, only Madison used the word “whilst” (Hamilton used “while” instead). More subtly, while both Hamilton and Madison used the word “by,” Madison used it much more frequently, enough that you could guess who wrote which papers by looking at how frequently the word was used. Mosteller and Wallace took this work it its conclusion, and were able to show that certain “disputed” papers, claimed by both Hamilton and Madison, were overwhelmingly likely to have come from Madison’s pen. Today, computers can do this type of analysis in seconds, whether to uncover a case of murder-disguised-as-suicide, study an anonymous medieval poem, resolve disputes about authorial credit, or even provide political asylum for a refugee. In the last case, for example, a critic of a repressive foreign government claimed asylum on the basis of articles he had written and published on-line. The problem, though, was that the articles had been published anonymously. This wouldn’t necessarily stop a repressive secret service, in a place where mere suspicion is enough for imprisonment. But this technology was able to convince the immigration judge of his authorship of the documents in question, and hence to let him stay.

Over the past decade, I have developed a computer program to do this sort of analysis of writing style, based on literally millions of different features. This program will take a sample of writing and determine, on the basis of similarity, who among a set of authors was most likely to have written that sample. In July, I received an email from a reporter for London’s Sunday Times asking if I could help them solve a mystery. The reporter had received a tip that J. K. Rowling had secretly penned a novel under a pen name: The Cuckoo’s Calling, by Robert Galbraith, who was described as a former member of the Royal Military police, and whose novel had grown “directly out of his own experiences and those of his military friends.” The tip was at least plausible. Rowling and Galbraith had the same agent and editor. The book was unusually accomplished for a supposed first-time novelist. And Galbraith, a man who had ostensibly spent years in uniform, was surprisingly good at describing women’s clothing. But hard evidence was still lacking. The reporter wanted to know what the computer program could determine.

Language use is a set of personal choices. For example, the English language provides a tremendous number of choices for words to describe something bigger-than-big, words such as “huge,” “giant,” “enormous,” or “colossal.” Writers can choose to express an idea with a few precise words or a bunch of common, general ones, and similarly to break a complicated idea — or not — into bite-sized simple sentences. We’re not even conscious of many of these choices.

In a famous experiment, psychologists showed that people’s memory for the general meaning of sentences was much better than their ability to recall a sentence word-for-word. For example, subjects who heard the sentence — “The raccoons raced up the tree and the dogs raced around them” — were asked a few minutes later if they had heard the sentence: The raccoons raced up the tree and the dogs raced around it. If you’re reading quickly, you may not have noticed the minor change in the last word. Most subjects couldn’t tell the difference, either. People don’t pay much attention to these common little words as long as they understand the meaning of the sentence (the dogs are running around a tree with some raccoons in it), but the author’s fingerprints are visible in the pronoun choice.

The program I developed, JGAAP (Java Graphical Authorship Attibution Program) does a mathematical analysis of the degree of similarity across a huge number of features, far too many for any human analyst to keep track of. Mosteller and Wallace, for example, looked at about thirty different words. JGAAP can keep track of every word in a set of encyclopedias. By looking at Galbraith’s language choices, the program could quantify the degree of similarity between Rowling and Galbraith. If they were completely different, this could effectively rule out Rowling as an author and discredit the tip. If they were very alike, especially in comparison with other authors of the same type, it would show she was a likely author. While this wouldn’t prove that Rowling had written it, it would be a strong form of objective evidence.

It is important to decide carefully what kinds of similarities to look at. Not all choices are created equal; some choices (such as word length) are easier to notice, control, and change than others (such as the use of prepositions). It’s often better to examine many different features than only a few, and to run many analyses to see if they agree. For this analysis, I chose four separate groups of features that have been shown to provide useful information about authorship. Just as importantly, they are also relatively independent of each other, so they provided cross-checks on each other. One variable that I used, for example, is the distribution of word lengths. Each novel has a lot of words, each word has a length, and so one can get a robust description that such-and-such percent of the words in this document have exactly so-many letters. I was able to get a measurement of similarity, with 0.0 being identity and progressively higher numbers being greater dissimilarity.

Another feature was the 100 most common words. What percentage of the document were "the," what were "of," and so on. This is again a rich data set that is easy to extract by computer. Finally, I ran two tests based on authorial vocabulary. The first was on the distribution of character 4-grams, groups of four adjacent characters. These could be words, parts of words (like four letters "nsid" that would be inside the word "inside") or even parts of two words (like the four letters "n th" as part of the phrase "in the"). I also ran on word bigrams, pairs of adjacent words (like “pairs of,” “of adjacent,” and “adjacent words”) again a feature with a good track record. One advantage of this approach is unfortunately also a disadvantage. With thousands of features tracked, it’s difficult to point to any small set of features and say “these are what make this like Rowling.” Stylometry, like sports, is often a game of inches.

For this study, the reporter and I selected a Rowling novel and stories by three similar novelists (all British female crime novelists: Rowling's own The Casual Vacancy, Ruth Rendell's The St. Zita Society, P.D. James' The Private Patient and Val McDermid's The Wire in the Blood) to see which one was most similar to Galbraith. Across these four analyses, Rowling was the only writer to consistently match styles. Val McDermid, for example, used word pairs in a very similar way to Galbraith, but her use of long and short words was highly unlike Galbraith. Word length distribution was similar to Rowling or to James.

Interpreting these results can be tricky, but simple statistics can illustrate how tight this match is. First, all of the authors except for Rowling were clearly excluded by at least one test. Whoever the author of Cuckoo was, it wasn’t Ruth Rendell. With four authors, a randomly chosen author would be equally likely to be closest to James as to McDermid, or just as likely to be distinct from Rendell as Rowling. If the author wasn’t any of the four, she would be just as likely to be “close” to Galbraith (meaning one of the top two likely authors in the list) as “distant” (the third or fourth candidate). To put this another way, if Rowling had not written Cuckoo, she would have only a 50/50 shot of having similar word lengths. She would also have only a 50/50 chance of having similar word pairs, of having similar character clusters, or similar common words. Only one writer in 16 would be “lucky” enough to have that similar a writing style to Galbraith’s. If Rowling wasn’t the author, then the tipster had only about a 6% chance of naming someone that consistently similar.

Did this “prove” Rowling’s authorship? Of course not. Even DNA can’t do that; a DNA match simply means that the person of interest or someone with similar genes, possibly a family member, was involved. Stylometry is much less reliable and accurate than DNA — after all, your DNA is constant and absolutely constant and unvarying throughout your life, but if two novels didn’t vary at all, they’d be the same novel. All we really knew that this point was that it was either by Rowling herself, or by someone who wrote in a very similar style to Rowling. But this was enough information for the Sunday Times to approach her agent. On July 13, 2013, she admitted that The Cuckoo’s Calling was her work, and that she had hoped, by publishing under a pen name, to get feedback without expectations.

This technology is clearly a double-edged sword. If Rowling can be identified by computational analysis, what about whistleblowers? Is anyone safe from the modern equivalent of Sherlock’s all-seeing eye? For the moment, yes. The person who truly violated Rowling’s privacy was not my computer or even the Sunday Times reporter, but the tipster who suggested the investigation in the first place. It’s simply not feasible to look at every potential author to see who might have written a book; without old-fashioned detective work (and informants), the haystack is still large enough that needles can successfully hide.

Are you a scientist who specializes in neuroscience, cognitive science, or psychology? And have you read a recent peer-reviewed paper that you would like to write about? Please send suggestions to Mind Matters editor Gareth Cook, a Pulitzer prize-winning journalist and regular contributor to NewYorker.com. He can be reached at garethideas AT gmail.com or Twitter @garethideas.

Yahoo Sports
2024 NBA Mock Draft 7.0: Who will the Hawks take at No. 1? Our projections for every pick with lottery order now set
With the lottery order set, here's a look at Yahoo Sports' projections for both rounds of the 2024 NBA Draft.
Yahoo Sports
NFL schedule release: Chiefs to host Ravens in 2024 season opener
Chiefs vs. Ravens on Sept. 5 will be a rematch of last season's AFC Championship Game.
Yahoo Sports
NBA Draft Lottery: Hawks get No. 1 pick, despite 3 percent chance of winning
The Atlanta Hawks won the No. 1 overall selection in the NBA Draft Lottery. The Hawks had a 3 percent chance of winning the top pick.
Yahoo Sports
Former MLB infielder, Little League World Series star Sean Burroughs dies at 43
The seven-year major leaguer collapsed while coaching his son's Little League game on Thursday.
Yahoo Sports
NBA playoffs: Nuggets stun Timberwolves with Jamal Murray prayer; tie series, reclaim home-court advantage
The champs are back.
Yahoo Sports
Anthony Edwards talks postgame exchange with Jamal Murray: 'We love that, keep talking that'
Edwards is here for the chatter. And he's goading Murray for more.
Yahoo Finance
How rich homebuyers are avoiding high mortgage rates
Homebuyers with means are turning to an old strategy to get around a new crop of high mortgage rates: all-cash deals.
Yahoo Sports
The best RBs for 2024 fantasy football, according to our experts
The Yahoo Fantasy football analysts reveal their first running back rankings for the 2024 NFL season.
Yahoo Finance
Here's 1 big investing mistake you are probably still making
Maybe a 5% CD isn't the best choice for your hard-earned money.
Yahoo Finance
The FDIC change that leaves wealthy bank depositors with less protection
Affluent Americans may want to double-check how much of their bank deposits are protected by government-backed insurance. The rules governing trust accounts just changed.
Yahoo Sports
Fantasy Baseball Waiver Wire: A hitter who should be rostered in every league is available in more than half of them
Prep for the final days of Week 6 with Dalton Del Don's latest batch of fantasy baseball waiver wire pickups!
Yahoo Sports
Timberwolves coach Chris Finch calls Jamal Murray's heat-pack toss on court 'inexcusable and dangerous'
Murray made a bad night on the court worse during a moment of frustration on the bench.
Engadget
The best budgeting apps for 2024
Budgeting apps can help you keep track of your finances, stick to a spending plan and reach your money goals. These are the best budget-tracking apps available right now.
Yahoo Finance
Former House Speaker Paul Ryan says he’s not voting for Trump : 'Character is too important'
Ryan says he would be writing in a Republican candidate instead of voting for Donald Trump.
Autoblog
Which pickup trucks get the best fuel economy? Here are the tops for gas mileage (or diesel)
Trucks aren't known for being fuel efficient, though times are changing. These are the trucks with the best gas mileage in various segments.
Yahoo Sports
Tight end rankings for fantasy football 2024
The Yahoo Fantasy football analysts reveal their first tight end rankings for the 2024 NFL season.
Yahoo Sports
Wide receiver rankings for 2024 fantasy football
The Yahoo Fantasy football analysts reveal their first wide receiver rankings for the 2024 NFL season.
Yahoo Sports
Derrick Lewis strips off shorts, moons crowd in St. Louis after KO win over Rodrigo Nascimento
“I appreciate St. Louis for letting me show my naked ass tonight."
Yahoo Finance
Bud Light sales still falling as Modelo, Coors fight to keep their gains
The competition among beer giants is still brewing.
Yahoo Finance
Australian ambassador: 'American model is proving its resilience' despite threat from Chinese industrial policy
China may be outspending the US when it comes to industrial policy in sectors like electric vehicles and semiconductors, but America is winning on innovation where it can’t on price, according to one China expert.

News

Life

Entertainment

Finance

Sports

New on Yahoo

Michael Cohen testifies in Trump hush money case

How a Computer Program Helped Reveal J. K. Rowling as Author of A Cuckoo s Calling

Recommended Stories

2024 NBA Mock Draft 7.0: Who will the Hawks take at No. 1? Our projections for every pick with lottery order now set

NFL schedule release: Chiefs to host Ravens in 2024 season opener

NBA Draft Lottery: Hawks get No. 1 pick, despite 3 percent chance of winning

Former MLB infielder, Little League World Series star Sean Burroughs dies at 43

NBA playoffs: Nuggets stun Timberwolves with Jamal Murray prayer; tie series, reclaim home-court advantage

Anthony Edwards talks postgame exchange with Jamal Murray: 'We love that, keep talking that'

How rich homebuyers are avoiding high mortgage rates

The best RBs for 2024 fantasy football, according to our experts

Here's 1 big investing mistake you are probably still making

The FDIC change that leaves wealthy bank depositors with less protection

Fantasy Baseball Waiver Wire: A hitter who should be rostered in every league is available in more than half of them

Timberwolves coach Chris Finch calls Jamal Murray's heat-pack toss on court 'inexcusable and dangerous'

The best budgeting apps for 2024

Former House Speaker Paul Ryan says he’s not voting for Trump : 'Character is too important'

Which pickup trucks get the best fuel economy? Here are the tops for gas mileage (or diesel)

Tight end rankings for fantasy football 2024

Wide receiver rankings for 2024 fantasy football

Derrick Lewis strips off shorts, moons crowd in St. Louis after KO win over Rodrigo Nascimento

Bud Light sales still falling as Modelo, Coors fight to keep their gains

Australian ambassador: 'American model is proving its resilience' despite threat from Chinese industrial policy