A Toxin Uncovered: Revealing the structure of colibactin, a cancer-causing toxin from bacteria

Humans have a complicated relationship with bacteria. Some are “germs,” pesky agents of infection that infiltrate us from the outside. Others live inside us as part of our microbiomes, serving essential roles in our digestive and immune systems. But there is an increasing need to understand a third kind of bacteria: the type that lives inside us but does not help us. Research has suggested that these bacteria may, in fact, produce toxins that cause cancer.

Certain strains of gut E. coli have long been linked to the development of colorectal cancer. Researchers think that these E. coli cause cancer by producing a cancer-causing compound called colibactin. However, we do not know much about colibactin’s disease-causing capabilities because colibactin is nearly impossible to isolate and characterize experimentally. As such, labs across the country have raced to solve the mystery of colibactin’s structure—a critical step towards understanding its role in disease.

Researchers in the Herzon and Crawford labs in the Yale Department of Chemistry have resolved the structure of colibactin using a combination of biology, analytical chemistry, and synthetic organic chemistry. Their elucidation of colibactin’s structure will finally allow scientists to investigate the precise mechanisms behind its cancer-causing abilities and develop probiotics to combat these effects. Beyond colibactin, however, their interdisciplinary approach provides a new paradigm to uncover many other still unknown disease-causing bacterial products.

Isolating colibactin, unconventionally

Colibactin was first identified in patients with irritable bowel syndrome (IBS). Researchers noticed that the gut E. coli of many IBS patients possessed the clb gene cluster, a segment of DNA that encodes a series of enzymes. These enzymes, called non-ribosomal peptide synthetases and polyketide synthases, are responsible for producing the toxic compound colibactin. Subsequent studies revealed that clb-positive E. coli promoted colorectal cancer growth in mice. Moreover, these bacteria were found in human colons and were shown to induce DNA damage, a hallmark of cancer development. In other words, there is a clear relationship between clb-positive E. coli—and by extension, the colibactin they produce—and the development of colorectal cancer.

Despite this relationship, the results do not necessarily mean that colibactin causes colorectal cancer. Conventionally, when a potential disease-causing compound is identified, the next step in understanding its function is to grow the toxin-producing bacteria, extract the natural product from the bacterial culture, and study the compound’s structure and function to demonstrate a causal relationship. This, however, was not possible with colibactin, which quickly decomposes in oxygen due to its structural instability. For this reason, researchers previously had never been able to isolate enough colibactin to study it.

Studying the function of colibactin revealed a key insight: when linear pieces of DNA are incubated with clb-positive E. coli, the pieces of DNA joined together through covalent bonds, rather than through the hydrogen bonds that normally hold DNA together. This suggests that colibactin cross-links DNA. “We wanted to take that DNA, fish it out, and see what molecules were attached,” Crawford said. Isolating the DNA linked to colibactin—called colibactin-DNA adducts—would also isolate the colibactin holding them together, in a form more stable than colibactin alone. The researchers would finally have enough material from which they could detect a structure. Still, the colibactin-DNA adducts were present in extremely small amounts. The analytical chemistry approach they used would have to be very sensitive, so they turned to tandem mass spectrometry.

A game of atom substitution

Mass spectrometry (MS) requires less material than other structural detection methods. In MS, a compound is vaporized into gas, ionized into charged particles—and deflected through a magnetic field. The resulting curvature of their path reflects their mass. Tandem MS takes this further. Daughter ions—generated by breaking the complicated compound into fragments—are subjected to further rounds of fragmentation and mass measurement. By combining multiple spectra, the compound’s whole structure can be pieced together from the ground up.

Using off-the-shelf tandem MS on colibactin-DNA adducts proved ineffective. “It’s a very dirty sample. No matter how well you purify it, having all this [material] from the bacterial metabolites [bound to] the DNA… makes things so complicated,” said Lucy Xue, a graduate student in the Herzon lab and co-first author of the paper. As a result, the final mass spectrum was nearly impossible to interpret.

To make their spectra clearer, the researchers applied a technique called isotopic labeling, previously used by the Crawford lab to study colibactin’s DNA-crosslinking function. The first step was partial isotopic labeling. Auxotrophs, which are clb-positive E. coli engineered to be incapable of synthesizing specific amino acids, were fed special versions of amino acids cysteine and methionine. These amino acids had been labeled with the carbon-13 isotope, which is heavier than normal carbon. The auxotrophs incorporated the special amino acids during their colibactin synthesis. Subsequently comparing the mass spectra of isotopically-labeled colibactin-DNA adducts with those of unlabeled colibactin-DNA adducts revealed mass shifts due to the heavier carbons. This discovery allowed the researchers to derive a partial structure of colibactin.

Then, the researchers used full isotopic labeling, wherein normal clb-positive E. coli were grown in a medium that contained glucose labeled with carbon-13. As the E. coli used glucose in their normal metabolic processes, all the carbons in the colibactin they produced incorporated carbon-13. After running tandem MS on the resulting colibactin-DNA adducts and measuring the mass shift, they established that colibactin contains 37 carbon atoms. By piecing the partial and full isotopic labeling data together, the researchers proposed a potential structure for colibactin.

Proving structure by thinking backwards

With a potential structure of colibactin in hand, the next step was to remove the DNA from the clb-positive bacterial system and search for evidence of their proposed structure. “Prior to the molecules going into the DNA, we wanted to find: what are those molecules themselves, and what are the structures of those molecules?” Crawford said. Using MS—this time, without isotopic labeling—the researchers found exact masses unique to their structure. This finding, along with the absence of other potential structures they had identified, indicated that their proposed structure was correct.

The behavior of their proposed compound also matched known behavior of clb-positive bacteria. Previous studies indicated that the clbO and clbL enzymes are required for clb-positive E. coli to crosslink and harm DNA. Accordingly, when the researchers used genome editing to disable these enzymes, they observed a loss of their assigned colibactin mass peak in the spectrum. Past findings also demonstrated that bacteria possess a clbS enzyme to deactivate colibactin as a means of self-protection from its harmful effects. Accordingly, the researchers observed increased levels of their proposed colibactin structure in the mass spectra when clbS was mutated.

As further proof, the researchers leveraged this increased abundance of colibactin in the culture of clbS mutants. Returning to partial isotopic labeling tandem MS, they found that DNA-free colibactin, produced from the clbS-mutant bacteria, incorporated carbon-13-labeled amino acids in a way that aligned with the structure proposed from their colibactin-DNA adducts.

Finally, it was time for the litmus test. In natural product chemistry, the way to verify a proposed structure is to design a synthetic route to it, independent of biochemistry. The researchers took this angle of attack and produced synthetic colibactin in nine linear steps, the crucial step involving coupling two almost-symmetrical halves of the molecule. Their synthetic colibactin behaved as expected: tandem MS of synthetic colibactin revealed a structure that matched the one derived from bacterial colibactin-DNA adducts. Moreover, their synthetic colibactin cross-linked DNA in the same way as bacterial colibactin. Thus, the researchers also conveniently developed a synthetic path to colibactin, making future colibactin studies viable. Scientists will no longer need to rely on difficult-to-isolate, natural colibactin.

Taken together, these pieces of evidence demonstrate that the Herzon and Crawford labs have, in fact, proposed the correct structure. “I can’t forget the moment all the data matched our previous results. We were so excited by the fact that we solved the structure of colibactin, which has been a mystery for over a decade,” said Chung Sub Kim, a postdoctoral researcher in the Crawford lab and co-first author on the study.

Looking ahead

For the Crawford lab, which has been investigating colibactin since 2012, this breakthrough in colibactin’s structure represents the culmination of years of hard work. With this structure, there is now a means to investigate the molecular mechanisms behind colibactin-induced DNA damage, which is linked to colorectal cancer.

Crawford looks forward to leveraging new knowledge to develop better probiotics—live bacteria that can be taken as disease-fighting supplements—to combat colibactin-producing bacteria in our intestines. Such probiotics may be used for the treatment of intestinal diseases such as inflammatory bowel disease, as well as in the prevention of colorectal cancer. “We’re not just describing what’s going on at the atomic level. We’re then using that atomic-level information to make biomedical progress,” Crawford said. To Xue, the most significant part of her research is not the structure of colibactin itself, but rather the process by which its structure was determined. “We are so used to thinking about the traditional way of elucidating natural products…. But that will also make us miss so many things,” Xue said, “As bioinformatics, gene sequencing, and MS start to develop, we really can use a different mentality of doing things in the reverse way to figure out interesting structures that we missed before.” By approaching the process of compound discovery from an alternative angle that blends chemistry and biology, scientists now have a way of investigating a wide range of compounds previously thought to be impossible to isolate.