Revealing the Human Catalogue: Eight-year effort to catalogue human genetic variation comes to an end

The 1000 Genomes Project sampled DNA from diverse human populations from around the world. Image courtesy of Mark Gerstein.

Four to five million — that is roughly the number of sites at which your DNA differs from that of any other human. Yet you can both talk, breathe, digest your food, and read this article. With all of the variation comes potential for harm. So how are we all alive and (mostly) healthy?

The 1000 Genomes Project — an effort launched in 2008 to catalogue human genetic variation — sought to answer this question by establishing a relationship between genes and disease. At the project’s completion, researchers have sequenced 2,504 genomes, or complete sets of a person’s DNA, from 26 human populations around the world.

Before biologists can make a clear association between gene and disease, they must understand genome variation in healthy people. “If you only have a healthy person and a sick person, you can think every change is deleterious, and that’s not true. Most of the changes in the genome are innocuous,” said Mark Gerstein, Yale professor of biomedical informatics and one of the researchers involved in analyzing the project data.

The 1000 Genomes Project found that an average human genome differs from a reference genome at 4.1 million to 5.0 million sites. Most of the variants found in a given genome are common: Between 96 and 99 percent of a person’s genomic variations were found in more than 0.5 percent of all the people sequenced. The more common a variant is, the less likely it is to be seriously harmful. “If it has survived in the human population this long, it is probably not deleterious,” Gerstein said.

Conversely, rare variations are often harmful.

Nearly 100 percent (99.9, to be exact) of these threatening variants are small. Many are single nucleotide polymorphisms (SNPs) where one base, the smallest unit of DNA, is substituted for another. Structural variants, or SVs, which make up the remaining 0.1 percent, affect a larger total number of bases. SVs include the removal and copying of large sections of DNA. They can have a range of effects: Some, like Down syndrome, are specially severe. Others have been linked to obesity and cancer.

Active cancer cells tend to accumulate a lot of SVs. “You get the extreme of structural variation in cancer. At an advanced stage, the genome gets deeply messed up. Whole chromosomes get rearranged, whole genes are duplicated,” Gerstein said.

However, many SVs have no visible effect. According to the 1000 Genomes Project, cells can still function normally even when their genomes are drastically altered. Although the average person’s genome has between 2,000 and 2,500 SVs, affecting roughly 20 million base pairs, all of the study’s participants made it to middle age in relatively good health.

In fact, out of the 20,000 genes that the average person has, approximately 100 have a knock out due to mutation and are left completely or partially nonfunctional. These are called loss-of-function events, but — surprisingly — they rarely lead to disease. “That’s the amazing thing about the natural catalog. You can have fairly large variation and have healthy people. There are a lot of genes that can be knocked out,” Gerstein said.

The study of non-essential genes could have implications for medical research. Some drugs, such as gene therapy drugs, target and deactivate specific genes. “It’s useful if you want to develop a drug that can affect a particular gene,” Gerstein said. “It’s good to know that it’s okay if a drug knocks out a gene.”

Even though the 1000 Genomes Project has officially wrapped up, Gerstein’s lab is continuing to study SVs. In his words, structural variation is a complex phenomenon with fascinating functional impacts.