Friction over Function: Scientists Clash on the Meaning of ENCODE s Genetic Data

Twelve years after the completion of the Human Genome Project, its successor made a big splash with one big number: Around 80 percent of the human genome is "functional," the researchers leading the Encyclopedia of DNA Elements (ENCODE) project said. Their claim drew immediate criticism from biologists, many of whom said it is evolutionarily impossible for so much of the genome to truly function for human health.

Seven months later, the controversy continues. Several journals and countless blogs have published opinion pieces about it. Current Biology published its second essay about it April 8. And in late February the journal Genome Biology and Evolution published an unusually harsh takedown that got some attention for zingers comparing ENCODE to Apple Maps, which had a troubled launch with the iPhone 5. How could the meaning of one word—function—be so divisive?

Funded by the National Institutes of Health’s National Human Genome Research Institute, ENCODE was designed to tackle the data generated by the NIH’s Human Genome Project, which determined the sequence of chemical bases—adenine, cytosine, thymine and guanine, the A, C, T and G sequences—that make up human DNA. Some groupings of bases spell out a code to make specific proteins, which do much of the work in cells, but scientists do not know what the lion’s share of base sequences do.

The 98 percent

So ENCODE tested nearly every part of the genome, particularly the 98 percent that is not involved in encoding proteins, looking for clues to what roles they play in the body. This next step was important because scientists were sure that some portions of that 98 percent served as regulators, telling protein-makers when, where and how much to produce. Such a job is critical for normal cellular behavior, yet scientists understood only some specific examples. They did not know if there were more regulators than they had already found or, if others existed, how they worked. Such regulatory regions may help explain the basis of many diseases that seem to be genetically inherited but escape straightforward correlations to particular protein-coding genes.

In September 2012 ENCODE's leaders formally ended the project's main phase of research. They published dozens of peer-reviewed papers, including the lead paper in Nature that said 80 percent of the genome is functional. At the same time they published a database that annotated most of the nonprotein-coding genome with notes on its chemistry. The notes essentially said things such as: "This part binds a protein"; "This part is often tagged with methyl groups"; and "This part is usually tucked away, wound around a protein called a histone." (Scientific American is part of Nature Publishing Group.)

Much of the backlash isn't in response to the database of functional parts that ENCODE created. "The ENCODE project gave the scientific community a huge amount of useful data that is being used around the world," says Chris Ponting, a genomics researcher at the University of Oxford who disagrees with some of the conclusions about functional DNA that came from ENCODE. Instead, the major criticism is that the project's lead scientists overstepped in their conclusions, especially in publicizing the idea that much of the human genome was potentially necessary to human life. Such determinations aren't supported by the science ENCODE did, critics say, and offer the public an inaccurate idea of how genetics and evolution work.

The problem comes from the fact that ENCODE looked for chemically active parts in the DNA and called those parts "functional." Not all of that activity is necessarily important for human life, however. For example, ENCODE scientists looked for DNA regions that bind to proteins, because such binding is essential to opening, reading and bookmarking DNA. But a region can also bind proteins without affecting human health. The human genome is full of DNA picked up from viruses in our evolutionary past. Sequences that don't harm or help their hosts may still contain regions that bind to proteins or do other things without affecting cell function.

Regulatory revelations

ENCODE inevitably ended up recording certain regions as active and functional that likely don't do anything important in the body. ENCODE's definition of functional does not have anything to do with why certain regions might be important or what exactly the regions are doing for human health, says John Stamatoyannopoulos, a genomics researcher at the University of Washington in Seattle and one of ENCODE's senior scientists.

Nevertheless, he and some other biologists think ENCODE’s 80 percent conclusion could offer a new view of the human genome. The fact that so much of the genome was biochemically active suggests that much more of the genome may be regulatory than previously believed, Stamatoyannopoulos says. Even some sequences that originally came from a virus or another parasite may have been co-opted to do something useful for the human body. "I just think that the sophistication of this regulatory network is just going to continue to increase and expand our minds," says Eric Schadt, a geneticist at the Icahn School of Medicine at Mount Sinai who was not involved in ENCODE. "I think we will see that the vast majority of the genome can play a role in that."

Active but not important

Critics emphasize that ENCODE was not designed to test how much of the nonprotein-coding genome is doing something important for human health. They say that without first performing experiments that show exactly how the newly discovered "functional" regions impact the body, it's irresponsible to say science has learned something new and revolutionary.

ENCODE's leaders have painted a picture of the human genome in which most of the parts are efficiently put to use, and that's not the right way to see it, critics say. "It's important to distinguish between: Is the human genome a perfect machine? The best of all possible genomes? Or is it a mess?" says Sean Eddy, a genomics researcher at the Howard Hughes Medical Institute's Janelia Farm Research Campus in Virginia who helped plan ENCODE. "What we know about genomes is far more compatible with its being a glorious mess."

By “mess,” Eddy is referring to conclusions from mathematical models of evolution, which suggest at least 85 to 90 percent of the genome must not be critical to human health, even if it is chemically active. Part of the reasoning is that so many random mutations arise over time, humans would have died off if most of the genome were so critical that mutating it would have a major effect on health. On the other hand, Stamatoyannopoulos and Schadt say that those models, some of which rely on simple equations that have been around since the 1960s, could have gotten their numbers wrong. That's possible, Eddy says, but scientists should develop better arguments against the models before discounting them.

Don't bet on a resolution anytime soon. After all, discerning what counts as essential and nonessential DNA for the human body is difficult. Any change to the genome, no matter how small, would likely make some difference to the overall organism—a kind of butterfly effect for DNA, Eddy says. Function lies on a continuum, and different scientists will likely define it differently for years to come.

Follow Scientific American on Twitter @SciAm and @SciamBlogs. Visit ScientificAmerican.com for the latest in science, health and technology news.
© 2013 ScientificAmerican.com. All rights reserved.