Read Time: 7 minutes

A complete DNA map to better detect cancer-causing changes

Researchers used a complete map of the human genome to detect large DNA changes associated with cancer. Their approach takes more time than traditional methods, but produces clearer, more reliable results.


shadow
Image Credit: From kjpargeter on Freepik

When scientists study complex human diseases like cancer, one of the many steps is to compare the DNA sequences from diseased individuals to a template of genetic information from healthy individuals, called the reference genome. This step identifies changes in their DNA, or variants, which researchers then label as best as possible to determine, for example, what caused their disease and how it might respond to treatments. 

Since 2000, the standard human reference genome of choice has been incomplete because the scientific community lacked the technology to access some challenging regions. This meant that some variants scientists detected were false alarms, or false positives, which made it difficult to know which variants were truly driving tumor growth. 

In 2022, an association of scientists published what they called the first truly complete human genome using new technologies whose outputs were less fragmented than earlier technologies. Since then, several researchers have explored the benefits of using the new genome over the previous reference genome for studying complex genetic diseases, such as cancers. 

Researchers in Canada and the USA recently hypothesized that using the complete human genome would help them detect large variants, or structural variants, in cancer more accurately than the standard reference genome. If our genome were a textbook, these variants would be like missing, added, or flipped paragraphs or pages. Scientists have shown that structural variants can cause cancer by duplicating cancer-promoting genes, causing abnormal gene fusions, and deactivating genes that should naturally suppress cancer growth. 

The researchers tested their hypothesis using a well-established cellular model of cancer, or cancer cell line, called COLO829, paired with a control without cancer. Scientists generally use structural variant data from this cell line as the benchmark to evaluate new methods of detecting cancer variants. The researchers analyzed 4 independent samples of this cell line sequenced at different laboratories. They also examined 3 tumor samples from patients with blood, brain, and ovarian cancer to validate their findings in real clinical scenarios. The team compared the cancer DNA sequences to both reference genomes and used 4 different computational tools to identify structural variants.

The complete human reference genome has approximately 200 million additional base pairs of DNA sequence, closing gaps and completing regions missing from the standard reference genome. The team manually inspected results from the COLO829 sample and observed a decrease from 225 falsely identified structural variants using the standard reference genome to just 83 using the complete reference genome, which meant that the new reference genome improved their ability to detect structural variants.

The researchers stated that although the new human reference genome helped them identify DNA changes more accurately, it contained less labelled medical information than the older reference genome, which scientists use to identify DNA changes that may be linked to diseases. To fix this, they used a tool called LevioSAM2, which let them match up, or liftover, results from the new genome with the older one. This strategy allowed them to leverage the greater accuracy of the new genome while benefiting from the detailed medical knowledge associated with the older genome. In other words, they get the best of both worlds.

The researchers applied their combined approach to the 3 patient samples and observed fewer candidate cancer-specific variants requiring manual clinical review compared to using the standard reference genome alone. They explained that having fewer candidates streamlines the laborious process of identifying cancer-causing mutations from an otherwise long list of false alarms. From one patient’s sample, they detected a large variant, 609,000 base pairs in length, affecting a gene previously linked to several cancers, that showed weak signals in the older reference genome but clear evidence in the new reference genome. 

The researchers concluded that their approach optimizes structural variant detection in cancer by reducing false positives, which can help doctors prioritize clinically relevant mutations. They noted that reducing false positives has important implications for analyzing patient samples, where filtering through false variants to find true cancer drivers requires time and expertise. Their liftover strategy increased analysis time by approximately 50% compared to using only the older reference genome, a trade-off the researchers considered acceptable given its substantial improvements in accuracy.

Study Information

Original study: Closing the gaps, and improving somatic structural variant analysis and benchmarking using CHM13-T2T

Study was published on: April, 2025

Study author(s): Luis F. Paulin, Jeremy Fan, Kieran O'Neill, Erin Pleasance, Vanessa L. Porter, Steven J. M. Jones, Fritz J. Sedlazeck

The study was done at: Human Genome Sequencing Center (USA), Canada's Michael Smith Genome Sciences Centre (Canada), University of British Columbia (Canada), Baylor College of Medicine (USA), Rice University (USA)

The study was funded by: Canada Research Chairs Program, Terry Fox Research Institute, Marathon of Hope, Terry Fox Foundation, and the British Columbia Cancer Foundation

Raw data availability: Found on zenodo

Featured image credit: From kjpargeter on Freepik

This summary was edited by: Aubrey Zerkle