Introduction

Quantification of a transcriptional profile is a useful way to evaluate the activity of a cell at a given point in time. Although RNA-Seq has revolutionized transcriptional profiling, the costs of RNA-Seq are still significantly higher than microarrays, and often the depth of data delivered from RNA-Seq is in excess of what is needed for simple transcript quantification. Digital Gene Expression (DGE) is a cost-effective, sequence-based approach for simple transcript quantification: by sequencing one read per molecule of RNA, this technique can be used to efficiently count transcripts while obviating the need for transcript-length normalization and reducing the total numbers of reads necessary for accurate quantification. Here, we present trieFinder, a program specifically designed to rapidly map, parse, and annotate DGE tags of various lengths against cDNA and/or genomic sequence databases.The trieFinder algorithm maps DGE tags in a two-step process. First, it scans FASTA files of RefSeq, UniGene, and genomic DNA sequences to create a database of all tags that can be derived from a predefined restriction site. Next, it compares the experimental DGE tags to this tag database, taking advantage of the fact that the tags are stored as a prefix tree, or "trie", which allows for linear-time searches for exact matches. DGE tags with mismatches are analyzed by recursive calls in the data structure. We find that, in terms of alignment speed, the mapping functionality of trieFinder compares favorably with Bowtie.trieFinder can quickly provide the user an annotation of the DGE tags from three sources simultaneously, simplifying transcript quantification and novel transcript detection, delivering the data in a simple parsed format, obviating the need to post-process the alignment results. trieFinder is available at http://research.nhgri.nih.gov/software/trieFinder/.

Publications

  1. trieFinder: an efficient program for annotating Digital Gene Expression (DGE) tags.
    Cite this
    Renaud G, LaFave MC, Liang J, Wolfsberg TG, Burgess SM, 2014-10-01 - BMC bioinformatics

Credits

  1. Gabriel Renaud
    Developer

  2. Matthew C LaFave
    Developer

  3. Jin Liang
    Developer

  4. Tyra G Wolfsberg
    Developer

  5. Shawn M Burgess
    Investigator

    Translational and Functional Genomics Branch, Division of Intramural Research, United States of America

Community Ratings

UsabilityEfficiencyReliabilityRated By
0 user
Sign in to rate
Summary
AccessionBT000193
Tool TypeApplication
Category
PlatformsLinux/Unix
TechnologiesC++, Perl
User InterfaceTerminal Command Line
Download Count0
Submitted ByShawn M Burgess