Why our decade of genome sequencing should end.
I can pinpoint the precise moment when I lost it. On Oct. 27, an e-mail from Nature Genetics said the journal was publishing a paper announcing that scientists had completed the sequencing of the genome of . . . the cucumber.
I take a back seat to no one in my love of cukes—what would life be without pickles?—and have no reason to doubt that the achievement will help breeders reach even greater agricultural heights. And no one respects the achievements of this decade’s genome sequencers more than I do. But with the cucumber announcement, I had to make sure the return address on the press release was not theonion.com. It wasn’t. So as we stagger across the finish line of the Decade of the Genome Sequence, hear my plea: stop them before they sequence again!
A brief recap. To sequence a genome means to determine the order of every one of the millions or billions of chemical units (called base pairs and designated A, T, C, or G) that make it up. The sequencing of the human genome was completed in April 2003, at a cost of nearly $3 billion. Six years later, anyone with a few thousand dollars can get his or her genome partially sequenced (a full sequence will set you back $50,000, though that’s dropping fast), either to look for genes that raise the risk of disease or that trace ancestry. Before and after tackling the human genome, sequencing labs spit out Anopheles mosquito, honeybee, mustard plant, anthrax bacterium, dog, sea squirt, fruit fly, red jungle fowl, chimpanzee, japonica rice, white rot fungus, malaria parasite, poplar tree, mouse, rat, Japanese pufferfish, papaya, an eye-glazing number of bacteria species, and many more. Even Neanderthal is almost done. All told, the entire genomes of 1,129 organisms and counting have now been sequenced, according to Genomes Online.
This is all laudable (I’ll get to what genome sequences have taught us), but with an asterisk. “It turns out to be a lot easier to generate the sequences than to figure out the function” of the stretches of DNA that sequencing reveals, says Richard Roberts, chief scientific officer of New England Biolabs and winner of the 1993 Nobel Prize in Medicine. Everyone expected that to be the case, he says, but “what’s amazing is that we keep putting more and more money into generating sequences and not enough into determining function. We’ve built all of these large-scale sequencing centers, and now we’ve got to keep feeding the beast.”
Roberts has called for more funding of actual biology—figuring out what the sequences mean. The reason that’s so necessary, says Claire Fraser-Liggett, director of the Institute for Genome Sciences at the University of Maryland School of Medicine, is that “having the ‘parts list’ [the entire genome sequence] doesn’t provide enough real insights into the actual biology of an organism, such as why one pathogen is more virulent than another.” (Fraser-Liggett is a pioneer in sequencing, having led teams that sequenced the genomes of scores of strains of bacteria, including anthrax.) “We need to take the sequence information back to the lab and figure out function. Just because we can sequence faster than we ever could doesn’t mean that’s the right thing to do, or that that’s the best way to move science and medicine forward.”
To be sure, genome sequences have revealed an astonishing amount about genetics, disease, and evolution. We now know that humans have only 20,000 or so genes, about what mice and flies have, raising the question of how we got to be so much more complex. We know that there are huge stretches of genetic “deserts” in the human genome, says Adam Felsenfeld, program director of large-scale sequencing at the National Human Genome Research Institute. By comparing the genome of one organism to that of another, scientists are figuring out when innovations such as the immune system evolved. And having the complete human genome sequence has turbocharged research linking DNA variants to disease. What took the team that discovered the cystic fibrosis gene years to accomplish in 1989, notes Richard Gibbs, who leads the sequencing center at Baylor College of Medicine, “could, with the human genome sequence, have been done by one graduate student in an afternoon.”
Such gene-disease links have come fast and furious since the completion of the human genome. Using a technique called genome-wide association studies—which requires having the genome sequence—scientists have found regions of the genome that contribute to age-related macular degeneration, type 2 diabetes, Parkinson’s disease, heart disease, obesity, Crohn’s disease, prostate cancer, and response to antidepressants. In just the last month scientists announced the discovery of a genetic variant that raises a child’s risk of suffering hearing loss after receiving cisplatin chemotherapy. Having the human genome sequence “definitely helped,” says Colin Ross of the University of British Columbia, who led the research. Without it, “it’s sort of like trying to find where a piece of a puzzle fits without the picture of the puzzle to refer to. The human genome is like that reference picture of the puzzle. Our research really could not have been done without it.”
It’s not just the human genome, either. Having the anthrax genome allowed the FBI to identify a suspect in the 2001 anthrax mailings. Having the yeast (Saccharomyces cerevisae) genome let researchers identify genes that could boost bioethanol production. There is even hope for the cucumber. For plant breeders, having its genome sequence could eventually be like having a box of face-up Scrabble tiles, enabling them to choose particular traits more easily.
Still, it’s not always so easy. It takes a lot of further work to determine what genes are present in a genome sequence, let alone what traits those genes cause, let alone develop something useful from that knowledge. Take the ambitious “Genome 10K Project,” announced on Nov. 4. It aims to sequence 10,000 vertebrate species, “from whales and anteaters to pink flamingos,” says David Haussler of the University of California, Santa Cruz. “By sequencing this broadly, we’ll be able to identify the genetic changes that underlie the evolution of species from amphibians to reptiles to birds and mammals. We really will be able to identify how the leopard got its spots.” Tracing that DNA evolution could take as little as a decade once the genomes are sequenced. But once the scientists identify the regions of DNA that differ from one species to another, how long will it take to determine that this region is for spots and that region is for a four-chambered heart and that region for some other salient trait? “A century,” says Haussler.
And that’s why some very prominent scientists are wondering whether sequencing is threatening to become the Beast that ate biology. “Understanding function is excruciatingly slow compared to the time it takes to generate sequence information,” says Fraser-Liggett. So are genome sequences low-hanging fruit, being done because it can be done and not because it will produce the greatest value to science and medicine? “We’ve all been seduced by the ability to generate genome sequences at such a low cost,” she says. “You have to ask, is it time to start shifting resources and attention to figuring out how to enhance annotation [the process of determining the function of each stretch of DNA]. I think it is. We shouldn’t stop sequencing, but I’ve started to ask whether we should deploy more resources to decreasing the ever-growing divide between sequence information” and our understanding of it. More information is always better than less, but even better is understanding the information you go to the trouble of collecting.