Install Python 2.7 and Biopython:
- Get Python 2.7 at https://www.python.org/download/ or install with your operating system’s package manager.
- Get Biopython at http://biopython.org/wiki/Biopython .
- You can follow the guide ( http://biopython.org/DIST/docs/install/Installation.html#sec4) to install Python and Biopython step by step.
$ tar -zxf lgc-1.0.tar.gz # Depress lgc-1.0.tar.gz
$ cd lgc-1.0 # Open the folder
$ python lgc-1.0.py input.fasta output.txt # Run LGC
Successful run of LGC will print as following:
$ Input: input.fasta # Input file
$ Output: output.txt # Output file
$ Scan ORF ... # Scan ORF and calculate coding potential score
$ Done # LGC runs to completion
$ Computation time XXX # Computation time of LGC
The fasta-formatted input file is required if you run LGC in local computer. To run LGC online (http://bigd.big.ac.cn/lgc/calculator), users can upload fasta-formatted file (<100 Mb) from local disk or paste fasta-formatted sequence(s) (small data set) into text area.
Also, the web server supports bed/gtf-formatted file. Users can upload bed/gtf-formatted file (<3 Mb) or paste data into the text area.
When input file is bed/gtf format, the reference genome is required and the assembly version is important.
This web server now supports Human (GRCh 38, hg19), Mouse (mm10, mm9), Fly (dm3) and Zebrafish (Zv9).
After finishing calculation, results will be shown in a new page. Users can sort the results by any column by clicking on the column header. Also, LGC will assign an unique Task ID for each request. Users can also retrieve results by inputting the Task ID in the homepage. There are nine columns in the output file.
- Sequence name: name of transcript sequence
- ORF Length: length of the longest ORF
- GC Content: GC content of the longest ORF
- Coding Potential Score: coding potential score for a transcript, which is protein-coding RNA if greater than 0 or ncRNA if smaller than 0. '0' indicates that mRNA probability equals lncRNA probability. Also, if the ORF length is shorter than 100nt, '0' is output.
- pc: ORF probability for coding sequence.
- pnc: ORF probability for non-coding sequence.
- fc: Stop-codon probability for coding sequence.
- fnc: Stop-codon probability for non-coding sequence.
- Coding Label: “Coding” represents mRNA and “Noncoding” represents lncRNA.