Reading Your Annotated Code: Mapping cytosine methylation with Nanopore sequencing

(Image courtesy of Professor Christoph Bock, Max Planck Institute for Genomics:

You’re more than just your genes. While the information stored in your DNA encodes all of the molecules of your body, this code is tagged with small chemical groups that help cells determine when and where to express certain genes. These small changes to your DNA, collectively known as epigenetic modifications, have an enormous effect on the cell and are heavily implicated in cancer. However, these tiny changes are much harder to detect than standard genetic changes.

Research led by Jared Simpson of the Ontario Institute for Cancer Research and Winston Timp of Johns Hopkins University presents a new way to locate sites of cytosine methylation, the most common epigenetic change to your DNA. The addition of a small chemical tag called a methyl group to the DNA base cytosine transforms it into 5-methylcytosine (5mC), typically reducing expression of the gene containing the modification. Like annotations to the genetic code, these subtle changes to the base’s structure serve as a secondary layer of information that regulates a gene’s expression.

Professor Timp discusses Nanopore sequencing data with a student. (Image courtesy of Professor Winston Timp)

Simpson and Timp were able to find the locations of these m5C bases using a new sequencing technique, Nanopore. Instead of simply reading the base identity, as done by traditional sequencing methods, Nanopore analyzes electrical currents produced by the DNA molecule. Older methods for determining the methylation status of cytosine require a chemical modification of the DNA. For example, bisulfite sequencing, the previous gold standard, modifies 5mC into uracil, a different base, and then these changes can be read by standard sequencing. But Nanopore is different.

“You don’t have to do any chemical modifications,” Timp said. “When you do sequencing, you’re getting methylation for free.” When Nanopore reads the DNA, the electrical signals it receives are actually different between 5mC and unmodified cytosine. These differences, however, are not obvious. Simpson and Timp developed a software program that uses a probabilistic model to analyze the difference between the signals produced by cytosine and 5mC. They began by producing a set of “training data” to serve as a standard for comparison. With the model established, they were able to move on to sequencing stretches of natural DNA and comparing their results to those from bisulfite sequencing. Their results had nearly the same fidelity but in a single assay.

“Because the reads are so long, you can look at very long-range patterns of methylation,” Simpson said. Since long-range patterns of dysregulation throughout the genome are implicated in cancer, this is an incredibly important development in the study of epigenetics and disease.

5mC is not the only methylation event with medical relevance. Cytosine can be methylated at different locations, as can other bases. Analysis will require training another program by running similar models on new data. “That’s the obvious next step,” Simpson said. Even though their lab will continue working on methylation analysis, their code is open-source.

“All this code is freely available, so that anybody who now sequences the human genome can run this methylation analysis, essentially for free,” Simpson said. In doing so, the team has not only uncovered something promising about DNA methylation, they have opened up an entire new world of possible research using Nanopore sequencing to investigate human epigenetics.