Introduction

1) What is LncBook?

LncBook is a curated knowledgebase of human lncRNAs that features a comprehensive collection of human lncRNAs and systematic curation of lncRNAs by multi-omics data integration, functional annotation and disease association.
The current implementation of LncBook houses a large number of 270,044 lncRNAs and includes 1,867 featured lncRNAs with 3,762 lncRNA-function associations. It also integrates an abundance of multi-omics data from expression, methylation, genome variation and lncRNA-miRNA interaction. In addition, LncBook includes 3,772 experimentally validated lncRNA-disease associations and identifies a total of 97,998 lncRNAs that are putatively disease-associated. To facilitate online analysis, a series of useful tools such as coding potential prediction, sequence search, etc., are deployed in LncBook.

2) What is the relationship between LncBook and LncRNAWiki?

LncRNAWiki is a community-curation-based human lncRNA knowledgebase, while LncBook is an expert-curation-based database.
We built LncRNAWiki in 2015 to harness collective intelligence for collecting and annotating human lncRNAs. However, LncRNAWiki is built based on MediaWiki, and has significant limitations on managing structured data and achieving customized functionalities. It would be desirable to organize large-scale annotations in a structured manner and to provide customized web functionalities with more friendly interfaces. More importantly, it is highly beneficial to integrate multi-omics data with the aim to significantly enrich and improve lncRNAs’ annotations. For these reasons, we constructed LncBook.
LncBook includes community-contributed annotations from LncRNAWiki, and LncRNAWiki can be considered as an important component of LncBook.

3) What is community-and-expert combining curation model?

The rapid advancement in DNA sequencing technologies has led to an exponential increase in the number of human lncRNAs. To provide up-to-date, comprehensive, and high-quality annotation of human lncRNAs, we employ both community curation and expert curation.
LncRNAWiki, which uses wiki-based community curation model, is publicly editable and the content is up-to-data, bearing great promise in dealing with the flood of biological knowledge. However, it is difficult to guarantee the quality of community curation, and MediaWiki has limitations on managing structured data and achieving fundamental biological functionalities. We therefore build the expert-curation-based database LncBook, which is capable of:

    • organizing and incorporating a variety of lncRNA-related multiple omics data (expression, methylation, variation, lncRNA-miRNA interaction data) in a structured manner,
    • browsing lncRNA-disease associations and their functions,
    • identifying housekeeping and tissue-specific lncRNAs with customized thresholds,
    • navigating human lncRNAs of interest, and
    • providing several user-friendly tools for online analysis.

Data and Methods

1) How does LncBook integrate the existing lncRNAs?

We collect existing human lncRNAs from GENCODE v27, NONCODE v5.0, LNCipedia v4.1, MiTranscriptome beta. To obtain high-confidence lncRNAs, a set of strict criteria were adopted by considering redundancy, background noise, mapping error, incomplete transcript, length, and coding potential.
One the other hand, we integrated the experimentally validated lncRNAs, which are sourced from LncRNAWiki. The RefSeq and Ensembl references were obtained from HGNC to enable genomic location comparison. However, half of these lncRNAs are presently unable to be traced back to their genomic locations and thus are not included.
Finally, we obtained a large collection of 247,246 existing lncRNAs.

2) How does LncBook characterize novel lncRNAs?

We identified novel lncRNAs based on the 122 RNA-Seq data from Human Protein Atlas. The transcript assemblies were compared against the existing lncRNAs integrated from multiple databases, and novel lncRNA transcripts were then curated with the strict criteria used during the curation of existing lncRNA. Finally, 21,815 novel lncRNAs were identified.

3) What kind of criteria does LncBook use to obtain high-confidence lncRNAs?

We filtered lncRNA transcripts by considering redundancy, background noise, mapping error, incomplete transcript, length, and coding potential. Cuffcompare was used to compare different datasets during data integration. Transcripts tagged with the class code ‘=’ were considered as redundancies. Transcripts tagged with the class code ‘e’, ‘s’ and ‘p’ are considered to be background noises or mapping errors and these lncRNAs were removed. Also, we removed incomplete transcripts. Transcript whose first exon or last exon is shorter than 15nt, and the single exon transcript located within exon region of the multiple-exon transcript in the same strand, were considered to be incomplete transcripts. We also filtered out transcripts that are shorter than 200nt, and predicted lncRNAs using different algorithms. We predicted coding potential using LGC, CPAT and PLEK, and transcripts identified as lncRNAs by all the three algorithms were retained.

4) What is the difference between HSALNG and HSALNT accessions?

LncRNA transcripts and genes are assigned LncBook accession numbers. Unique accession number prefixed with HSALNT is assigned to each lncRNA transcript entity. Likewise, lncRNA gene has an accession number prefixed with HSALNG.

5) What kind of multiple-omics data does LncBook provide?

To facilitate bioinformatics analysis and functional study of lncRNAs, we performed large-scale multi-omics analysis and provided tissue expression, methylation, variation, and lncRNA-miRNA information in LncBook.

  • Expression: To annotate tissue expression profile of lncRNAs, two sets of RNA-Seq data are used: HPA (the Human Protein Atlas, 32 normal human tissues are covered) and GTEx (the Genotype-Tissue Expression, 53 normal human tissues are covered). We calculate expression abundance of all the lncRNA transcripts/genes based on HPA RNA-Seq data, and use the calculated expression abundance from GTEx.
  • Methylation: To annotate methylation information of lncRNAs, bisulfite-seq data from TCGA (The Cancer Genome Atlas) and ENCODE (The ENCyclopedia of DNA Elements) were downloaded, covering 9 cancers with both normal and cancer samples. Regions from -1,500bp to the transcription start sites were defined as promoters. We calculated methylation level of both promoter and body regions of lncRNAs.
  • Variation: We map the SNP sites of dbSNP to our lncRNA loci. MAF (Minor allele frequency) values of lncRNA SNPs were obtained from the 1000 Genomes Project. Also, we annotated pathogenic information in ClinVar and COSMIC.
  • LncRNA-miRNA interaction: The lncRNA-miRNA interactions were predicted using TargetScan and miRanda. LncRNA-miRNA interactions supported by both the two softwares were retained. Experimentally validated interactions were obtained from starBase v2.0.
6) How disease-associated lncRNAs are curated?

LncBook incorporates 3,772 experimentally validated lncRNA-disease associations and further identifies a total of 97,998 lncRNAs that are putatively disease-associated.
The associations between lncRNA and disease were derived from LncRNADisease and LncRNAWiki, which have been curated from published literatures with experimental evidence. Each lncRNA-disease association was annotated with disease name, MeSH Ontology (Medical Subject Headings 2018 name), dysfunction type, detailed description, and publication. Among the 1,867 featured lncRNAs (experimentally validated lncRNAs), 1,502 lncRNAs are associated with 462 diseases.
On the other hand, we predicted disease-associated lncRNAs according to the evidence from methylation, genome variation, and lncRNA-miRNA interaction: (1) Methylation: In each cancer, lncRNA whose promoter region methylation level shows increase (decrease) in 80% cancer samples relative to normal sample is considered to be hypermethylated (hypomethylated). Thus, one lncRNA is considered to be cancer-associated if it is consistently hypermethylated or hypomethylated in at least eight cancers; (2) Genome Variation: any lncRNA overlapping disease-associated SNPs of COSMIC (OCCURRENCE >= 3) or ClinVar in its genomic location is considered to be disease-associated; (3) Interaction: any lncRNA interacting with at least 11 disease-associated miRNAs (associated with at least 5 diseases according to the Human microRNA Disease Database HMDD) is considered to be disease-associated.
However, of note, this does not mean that these disease-associated lncRNAs play causative roles in diseases. We plan to do further curation by identifying lncRNAs that play functional causative roles, which would be more significant and useful.

7) How the function information is curated?

We systematically curated 1,653 lncRNAs (that were sourced from LncRNAWiki) with functional annotations reported in 2,501 publications and described their functional mechanisms and biological processes with controlled vocabularies.

8) How to define TS (tissue-specific) and HK (housekeeping) lncRNAs?

To profile expression levels for lncRNAs, two RNA-Seq datasets were used: HPA (the Human Protein Atlas, 32 normal human tissues are covered) and GTEx (the Genotype-Tissue Expression, 53 normal human tissues are covered). We filtered out lncRNAs whose highest expression values are lower than 0.5 TPM/FPKM. Then, τ-value and cv (coefficient of variance) value were used to determine HK (housekeeping) lncRNAs (τ-value <= 0.5 and cv <= 0.5) and TS (tissue-specific) lncRNAs (τ-value >= 0.95).

9) All the lncRNAs in LncBook are predicted to be lncRNAs?

The predicted lncRNAs (lncRNAs integrated from other databases, and novel lncRNAs predicted based on RNA-Seq data analysis) were filtered according to the coding potential prediction results of the tools CPAT, PLEK, and LGC. Those identified as lncRNAs by all the three tools were retained. However, all experimentally validated lncRNAs were integrated regardless of their coding potential.

10) How to classify lncRNA in LncBook?

Based on their genomic locations in respect to protein-coding genes, we classified lncRNAs into seven groups, Intergenic, Intronic (S), Intronic (AS), Overlapping (S), Overlapping (AS), Sense, and Antisense. "S" in the bracket represents that lncRNAs are in the same strand of protein-coding RNAs, and "AS" represents that lncRNAs are in the antisense strand of protein-coding RNAs.

  • Intergenic: lncRNAs are transcribed from intergenic regions;
  • Intronic (S): lncRNAs are transcribed entirely from introns of protein-coding genes;
  • Intronic (AS): lncRNAs are transcribed from antisense strand of protein-coding genes and the entire sequences are covered by introns of protein-coding genes;
  • Overlapping (S): lncRNAs that contain coding genes within an intron on the sense strand;
  • Overlapping (AS): lncRNAs that contain coding genes within an intron on the antisense strand;
  • Sense: lncRNAs are transcribed from the sense strand of protein-coding genes and the entire sequence of lncRNAs are covered by protein-coding genes (Intronic lncRNAs are not included), or the entire sequence of protein-coding genes are covered by lncRNAs (Overlapping lncRNAs are not included), or both lncRNAs and protein-coding genes intersect each other partially;
  • Antisense: lncRNAs are transcribed from the antisense strand of protein-coding genes and the entire sequence of lncRNAs are covered by protein-coding genes (Intronic lncRNAs are not included), or the entire sequence of protein-coding genes are covered by lncRNAs (Overlapping lncRNAs are not included), or both lncRNAs and protein-coding genes intersect each other partially.

Database Usage

1) How do I search data in LncBook?

Users can search by different terms in the homepage:

Also, it is easy to search specific lncRNA or a specific group of lncRNAs in each resource, e,g,, “Disease”, “Function”, “Expression”, “Methylation”.

2) What kind of tools can I use in LncBook?

LncBook provides a series of tools to facilitate online analysis, BLAST, ID conversion, coding potential prediction, and genomic positional annotation. The coding potential prediction tool, LGC, is developed by our team, and the manuscript has been submitted. If you want to use this tool in your work, please cite the paper in bioRxiv.

3) Why some lncRNAs do not have transcript pages?

Some featured lncRNAs presently can not be traced back to their genomic locations, and thus were not annotated with multi-omics data. We do not create specific transcript pages for them.

4) How can I get information of all the experimentally validated lncRNAs?

You can access all the experimentally validated lncRNAs directly by clicking “Featured lncRNAs” in the homepage. There are 1,867 featured lncRNAs, which have been curated about their functional mechanisms and biological processes, and are accordingly linked to different publications. Click the Transcript ID will direct you to the transcript page, which contains all the annotated information of the lncRNA transcript including genomic location, tissue expression profile, methylation level, variation, lncRNAs-miRNA interaction, function, and disease. At the same time, click the symbol will direct you to the wiki annotation page, where community annotations are shown.
Also, you can access all the associated publications of these lncRNAs in the page of “Featured lncRNAs” by clicking “Literature” on the top right corner. Title, publication date, journal, citation, gene name and PMID are shown in a table.

5) How can I get lncRNA-disease associations?

You can access the 3,772 experimentally validated lncRNA-disease associations and 97,998 lncRNAs that are putatively disease-associated in the page of “Diseases”.
The experimentally validated lncRNA-disease associations can be accessed by clicking “Validated”. In the page of “Validated”, you can search by “Disease” or “Mesh Ontology”.
In the page of “Predicted”, you can browse the predicted disease-associated lncRNAs, which were predicted based on the pathogenic evidence from methylation, variation, and interaction. These lncRNAs were grouped into different grades, and the number of stars indicates the amount of evidence.

6) How can I get lncRNA-function associations?

You can access all the lncRNA-function associations directly by clicking “Function” in the homepage. In the “Function” page, you can search by “Functional Mechanism” or “Biological Process”, or other items.

Functional mechanism:

  • transcriptional regulation: regulate gene transcription through transcriptional interference and chromatin remodeling
  • splicing regulation: influence mRNA splicing through binding to or modulating splicing factors, or directly hybridizing with mRNA sequences to block splicing
  • ceRNA: interact directly or indirectly with miRNAs to stabilize target mRNAs
  • translational control: participate in translational control through binding to translation factors or ribosome
  • protein localization: play a significant role in the biological process of protein localization
  • RNAi: display the interfering roles

Biological process:

  • pathogenic process: associated with diseases
  • developmental process: involved in development
7) How can I get TS and HK lncRNAs?

In the “Expression” page, you can specify the dataset, tissue breadth, cv (coefficient of variation) and τ-value of interest and then you will find all relevant information associated with these conditions.

8) How can I get methylation information?

You can access the methylation profile directly by clicking “Methylation” in the homepage. The mean methylation level (promoter region) of all cancer samples is shown in the table. To view promoter or body region methylation level distribution among both normal and cancer samples, you can click “Promoter Chart” or “Body Chart”.

9) How can I get variation information?

You can access all the SNPs (derived from dbSNP) of lncRNAs directly by clicking “Variation” in the homepage. SNPs can be browsed by specifying genomic location or dbSNP ID.

10) How can I get lncRNA-miRNA interaction information?

You can access all the lncRNA-miRNA interactions directly by clicking “Interaction”. Experimental evidence was obtained from StarBase. “Interaction” can be browsed by miRNA or lncRNA ID.

11) How do I download the data?

Go to the “Download” page to download the data. We recommend users to use Chrome, Firefox, and other browsers, because the FTP is incompatible with Safari browser in some cases and some download files may not be displayed in the download page.

12) How do I cite LncBook?

LncBook: a curated knowledgebase of human long non-coding RNAs. Nucleic Acids Res 2019, in press.
Related publications:
LncRNAWiki: harnessing community knowledge in collaborative curation of human long non-coding RNAs. Nucleic Acids Res 2015. [PMID=25399417]
The BIG Data Center: from deposition to integration to translation. Nucleic Acids Res 2017. [PMID=27899658]
Database Resources of the BIG Data Center in 2018. Nucleic Acids Res 2018. [PMID=29036542]

Data Structure