Introduction

CloudPhylo is a tool built on Apache Spark that is capable of processing large-scale datasets for phylogeny reconstruction.

Publications

  1. CloudPhylo: a fast and scalable tool for phylogeny reconstruction.
    Xu X, Ji Z, Zhang Z., 2016 Oct 14 - Bioinformatics
    Cited by (Google Schoolar as of October 31, 2016)

Manual

Input

Directory contianing all sequences in fasta format. File extension .faa for amino acid sequences, .fna for DNA sequences.

Command line options


  -i <fasta input> | --in <fasta input>
        Directory containing sequence fasta files
  -o <value> | --out <value>
        Output file prefix
  -c <charset> | --charset <charset>
        dna for DNA or aa for amino acid
  -k <value> | --kmer-size <value>
        k-mer size
  -C <value> | --output-cv <value>
        [optional] Output CV File

Example

spark-submit --master local\[4\]
             --class cbb.cloudphylo.SparkRunner
             cloudphylo-assembly-1.0.jar
             --charset aa
             --in sample
             -k 6

Check Spark docuement if you are not familia with spark-submit utility.

Basic Infomation

NameCloudPhylo
AccessionQT000010
Citation0
Contributorsqomoteam
Permalinkhttp://bigd.big.ac.cn/cloud/tools/QT000010
Websitehttps://github.com/XingjianXu/cloudphylo
Created AtJanuary 21, 2016
Updated AtOctober 31, 2016
Share

Community Reviews

  • Reliability
  • Usability
  • Performance

Downloads

Run

Directory containing input sequences.

Suggest K=6 for amino acid, K=10 for DNA

Tree in newick format