Introduction

Measures of protein functional similarity are essential tools for function prediction, evaluation of protein-protein interactions (PPIs) and other applications. Several existing methods perform comparisons between proteins based on the semantic similarity of their GO terms; however, these measures are highly sensitive to modifications in the topological structure of GO, tend to be focused on specific analytical tasks and concentrate on the GO terms themselves rather than considering their textual definitions.We introduce simDEF, an efficient method for measuring semantic similarity of GO terms using their GO definitions, which is based on the Gloss Vector measure commonly used in natural language processing. The simDEF approach builds optimized definition vectors for all relevant GO terms, and expresses the similarity of a pair of proteins as the cosine of the angle between their definition vectors. Relative to existing similarity measures, when validated on a yeast reference database, simDEF improves correlation with sequence homology by up to 50%, shows a correlation improvement >4% with gene expression in the biological process hierarchy of GO and increases PPI predictability by > 2.5% in F1 score for molecular function hierarchy.Datasets, results and source code are available at http://kiwi.cs.dal.ca/Software/simDEF CONTACT: ahmad.pgh@dal.ca or beiko@cs.dal.caSupplementary data are available at Bioinformatics online.

Publications

  1. simDEF: definition-based semantic similarity measure of gene ontology terms for functional similarity analysis of genes.
    Cite this
    Pesaranghader A, Matwin S, Sokolova M, Beiko RG, 2016-05-01 - Bioinformatics (Oxford, England)

Credits

  1. Ahmad Pesaranghader
    Developer

    Faculty of Computer Science, Dalhousie University, Canada

  2. Stan Matwin
    Developer

    Faculty of Computer Science, Dalhousie University, Canada

  3. Marina Sokolova
    Developer

    Institute for Big Data Analytics, Halifax, Canada

  4. Robert G Beiko
    Investigator

    Faculty of Computer Science, Dalhousie University, Canada

Community Ratings

UsabilityEfficiencyReliabilityRated By
0 user
Sign in to rate
Summary
AccessionBT000044
Tool TypeApplication
Category
PlatformsLinux/Unix
Technologies
User InterfaceTerminal Command Line
Download Count0
Country/RegionCanada
Submitted ByRobert G Beiko