RNA-seq data is often analyzed at the level of genes. This can provide a robust signal, but can also miss out on biologically important information like differences in isoform composition or dominant isoform usage. 1/n
On the other hand, tremendous progress has been made in transcript-level quantification, but certain inherent ambiguity can remain in the abundance estimates. This results from patterns of multi-mapping where no inference procedure can accurately resolve the origin of reads. 2/n
Yet, the total transcriptional output of group of transcripts sharing these complex multi-mapping patterns will have greatly-reduced inferential uncertainty, thus allowing more robust and confident downstream analysis. 3/n
The idea of grouping together inferentially indistinguishable transcripts, and propagating the remaining uncertainty to downstream analysis was first suggested by Turro et al. https://www.ncbi.nlm.nih.gov/pubmed/24281695 . 4/n
We build on these key ideas while introducing a fundamentally new and more efficient algorithm. We introduce a new data-driven approach for grouping together transcripts in an experiment based on their inferential uncertainty. 5/n
This approach is implemented in our tool (written in @rustlang), terminus ( https://github.com/COMBINE-lab/terminus). Terminus implements a graph-based algorithm to find transcriptional groups that is based on greedily selecting transcripts groups that reduce overall inferential uncertainty. 6/n
There is a cool connection with the (classic) algorithm of Garland and Heckbert for surface simplification (using quadric error metrics) from computer graphics — always a fun when you get to look at the Stanford bunny :). 5/n
Terminus groups together transcripts in a data-driven manner, allowing transcript-level analysis where it can be confidently supported, and deriving transcriptional groups where the inferential uncertainty is too high to support a transcript-level result. 7/n
Sometimes, even gene-level analysis can have high ambiguity for certain groups of genes (from highly-similar gene families). Terminus takes care of this in one simple and consistent framework; the transcriptional groups are purely data-driven. 8/n
This work was led by @hrksrkr, with contributions from @k3yavi, @hcorrada and @mikelove. You can learn more about terminus, how it works, and what it enables in our new pre-print on @biorxivpreprint. Feedback is welcome! https://www.biorxiv.org/content/10.1101/2020.04.07.029967v1. 9/9
You can follow @nomad421.
Tip: mention @twtextapp on a Twitter thread with the keyword “unroll” to get a link to it.

Latest Threads Unrolled: