A fusion gene identification strategy using CRISPR-Cas9 and long read nanopore sequencing

Stangl C, de Blank S, Renkens I, et al. Partner independent fusion gene detection by multiplexed CRISPR-Cas9 enrichment and long read nanopore sequencing. Nat Comms. 2020;11(1):2861.

Citation summary: Fusion genes are often carcinogenic. Frequently, one of the fused genes has been identified and is known to be a part of other fusion genes; however, the second partner gene is unknown. Stangl et al. devised and investigated a system using CRISPR-Cas9 to target and cut the known partner, blocking one strand of the cut DNA, enabling the ligation of a known DNA handle (e.g., an adapter) for the purpose of linear sequencing from the ligated end. The researchers were able to multiplex samples for long read nanopore sequencing; indeed, they were able to enrich the research samples enough that four samples could fit on a single nanopore cell. Without prior knowledge of either the fusion gene’s breaking location within the gene or the identity of the second partner gene, they were able to determine each of these details at single nucleotide resolution in less than two daysmuch faster than could be achieved with other next generation sequencing (NGS) approaches such as earlier targeted NGS methods and whole genome sequencing (WGS).

Background

Many cancers are associated with fusion genes, in which sections of two genes are fused together, encoding a new, carcinogenic protein. In many cases, one of the genes has been identified and is known to be a part of other fusion genes; however, the second partner gene is unknown. Currently available NGS methods can be used to identify the second gene, but the required depth of sequencing increases costs. Additionally, current methods of targeted NGS and WGS to discover fusion partner genes and their exact breakpoints may take as long as two weeks at some centers (enough time for a cancer to metastasize or become unresectable), so further research on fusion gene sequencing is needed to decrease costs and improve speed.

Experiment

Stangl et al. developed an approach called FUDGE (FUsion Detection from Gene Enrichment) for identifying both partners in fusion genes when only one is known [1]. First, the researchers isolated genomic DNA from fresh frozen cancer samples and then dephosphorylated it. They next used CRISPR RNA (crRNA) to target Cas9 to the known partner gene in their sample. Previous studies have shown that after Cas9 cuts DNA at the target site (creating a double-strand break [DSB] with phosphorylated ends), Cas9 remains associated with one strand of the DNA (thereby blocking it) but dissociates from the other strand. Stangl et al. took advantage of this behavior by performing dA-tailing of the phosphorylated, cut strand of the DSB that was exposed (the strand not blocked by Cas9). The dA-tailed strand was then annealed to Oxford Nanopore Technology (ONT)-specific sequencing adapters. The researchers used ONT because of its real-time sequencing capabilities in an effort to lower costs and decrease turnaround time. This allowed them to complete a time course which showed that 24 hours was enough time for them to obtain the majority of fusion-spanning reads.

For precise CRISPR-mediated cutting, the researchers used HiFi Cas9 from IDT. They designed the crRNA to place Cas9 on a specific strand of DNA to initiate directional sequencing at this exact location (with chosen gRNAs upstream of the expected fusion point). Directional sequencing would start in the known partner gene, then extend into the partner needing to be identified (almost entirely sequencing in the right direction). With this strategic experimental design, the cut site on the strand without bound Cas9 would almost always be the only site where available, unblocked, phosphorylated ends of DNA would exist. In general, therefore, the adapters would anneal only at this cut site and not anywhere else in the genome. As described below, this strategy effectively enriched the fusion genes in this research study. In addition, the authors stated that approximately 89% of the reads sequenced in the anticipated direction. Libraries that had been enriched in this way were sequenced on a single ONT flow cell. The researchers also designed NanoFG, a bioinformatics tool, to analyze FUDGE data.

Results

Using the FUDGE approach, Stangl et al. sequenced the fusion genes in three well-characterized cancer cell lines: A4573, HS-SYII, and CHP-100 cells. In A4573 cells, the researchers achieved 342-fold increased coverage of the known fusion gene; in HS-SYII cells, they achieved 735-fold increased coverage; and in CHP-100 cells, they achieved 443-fold increased coverage. NanoFG was then used to analyze the data. For the data from A4573 cells, there were 69 breakpoint-spanning reads. For CHP-100 cells, there were 62 breakpoint-spanning reads. For both A543 and CHP-100 cell lines, the correct sequence information was obtained. For the HS-SYII cells, however, there were only six breakpoint-spanning reads. In this case, Stangl et al. found that manual adjustments to the NanoFG settings were required before the expected results were obtained.

The researchers went on to sequence several more cancer samples in this study using FUDGE. They found that by using CRISPR-Cas9 to target sites in several sequential known breakpoint cluster regions spanning several exons of the known partner gene, they achieved more complete sequence coverage, increasing their chances of identifying the unknown partner gene. This approach was taken when the location of the breakpoint of the fusion gene was unknown. While these experiments worked well, the researchers wanted to explore the practicality of FUDGE further and determine if very limited-quantity samples (such as only 10 ng of DNA) could be successfully tested as well.  To do this, they had to first increase their sample amount. For this purpose, they used whole genome amplification (WGA) to amplify the genome to produce enough starting material prior to running their Cas9-based target enrichment approach. WGA produced long (up to 100 kb) fragments of DNA. Starting with only 10 ng of cancer DNA, they used WGA to make enough product to have 1 µg for use in the FUDGE process. Although the researchers adjusted parameters for NanoFG, they were not able to use this approach without prior knowledge of the identity of both fusion genes. However, if they started with this knowledge, they were able to find the exact breakpoints. This limitation of the FUDGE system means that it is not reliable to identify both fusion genes when using very limited-quantity samples. However, future research may overcome this and other limitations.

Stangl and colleagues also investigated multiplexing of FUDGE. Starting with four samples, they succeeded in identifying seven fusion genes, all on a single nanopore flow cell. Collectively, there was a 349-fold target enrichment with 18 fusion-spanning reads, on average. The researchers stated that this approach can decrease costs over other methods of sequencing.

Conclusion

The researchers concluded that the FUDGE approach allowed identification of carcinogenic fusion genes within two days, which is much faster than comparable methods. In particular, they compared the FUDGE method to other methods of identifying fusion genes (targeted NGS, whole genome sequencing, fluorescence in situ hybridization, PCR, and RNA-seq) and showed that FUDGE is equally useful, and in some cases better, for researching identities and sequences of fusion genes. They also stated the pros and cons of their research methods, emphasizing how the outcome is dependent on the starting sample type and the available sequence knowledge. The researchers pointed out that non-fragmented DNA is necessary for use with FUDGE, eliminating the opportunity to use formalin-fixed, paraffin-embedded (FFPE) tissue specimens.

Despite current limitations, the researchers suggested that future improvements to FUDGE will improve results and decrease costs, as this method already overcomes several limitations of other research methods.

References

Stangl C, de Blank S, Renkens I, et al. Partner independent fusion gene detection by multiplexed CRISPR-Cas9 enrichment and long read nanopore sequencing. Nat Comms. 2020;11(1):2861.

Published Jul 21, 2021