Introduction

Protein homology search provides basis for functional profiling in metagenomic annotation. Profile HMM-based methods classify reads into annotated protein domain families and can achieve better sensitivity for remote protein homology search than pairwise sequence alignment. However, their sensitivity deteriorates with the decrease of read length. As a result, a large number of short reads cannot be classified into their native domain families. In this work, we introduce MetaDomain, a protein domain classification tool designed for short reads generated by next-generation sequencing technologies. MetaDomain uses relaxed position-specific score thresholds to align more reads to a profile HMM while using the distribution of alignment positions as an additional constraint to control false positive matches. In this work MetaDomain is applied to the transcriptomic data of a bacterial genome and a soil metagenomic data set. The experimental results show that it can achieve better sensitivity than the state-of-the-art profile HMM alignment tool in identifying encoded domains from short sequences. The source codes of MetaDomain are available at http://sourceforge.net/projects/metadomain/.

Publications

  1. MetaDomain: a profile HMM-based protein domain classification tool for short sequences.
    Cite this
    Zhang Y, Sun Y, 2012-01-01 - Pacific Symposium on Biocomputing. Pacific Symposium on Biocomputing

Credits

  1. Yuan Zhang
    Developer

    Department of Computer Science and Engineering, Michigan State University, United States of America

  2. Yanni Sun
    Investigator

Community Ratings

UsabilityEfficiencyReliabilityRated By
0 user
Sign in to rate
Summary
AccessionBT000093
Tool TypeApplication
Category
PlatformsLinux/Unix
Technologies
User InterfaceTerminal Command Line
Download Count0
Submitted ByYanni Sun