LncExpDB is a comprehensive database for lncRNA expression. It covers expression profiles of lncRNA genes across various biological contexts, predicts potential functional lncRNAs and their interacting partners, and thus provides essential guidance on experimental design.
Based on comprehensive integration, stringent curation and systematic analysis, LncExpDB currently presents a collection of 101,293 high-quality human lncRNA genes and houses abundant expression profiles derived from 1,977 samples of 337 biological conditions across nine biological contexts. Consequently, LncExpDB estimates lncRNA genes’ expression reliability and capacities, identifies 25,191 featured genes, and further obtains 28,443,865 lncRNA-mRNA interactions.
Moreover, LncExpDB is equipped with user-friendly web interfaces, providing functionalities for data query, browsing, visualization as well as easy access.
LncExpDB collects a total of 24 RNA-seq datasets across 1,977 samples from GEO, SRA and ArrayExpress, covering 337 biological conditions of nine important biological contexts, including normal tissues/cell lines, organ development, preimplantation embryos, cell differentiation, subcellular localizations, exosomes, cancer cell lines, virus infection and circadian rhythm.
|Biological Context||Project ID||Dataset||Source||Sample Number||PMID|
|Normal Tissue/Cell||E-MTAB-2836||The Human Protein Atlas||EBI ArrayExpress||121||28940711|
|SRP013565||ENCODE Primary Cell Lines||NCBI SRA||111||29126249|
|Organ Development||E-MTAB-6814||Development of Brain||EBI ArrayExpress||55||31243368|
|E-MTAB-6814||Development of Cerebellum||EBI ArrayExpress||59||31243368|
|E-MTAB-6814||Development of Heart||EBI ArrayExpress||50||31243368|
|E-MTAB-6814||Development of Kidney||EBI ArrayExpress||40||31243368|
|E-MTAB-6814||Development of Liver||EBI ArrayExpress||50||31243368|
|E-MTAB-6814||Development of Ovary||EBI ArrayExpress||18||31243368|
|E-MTAB-6814||Development of Testis||EBI ArrayExpress||41||31243368|
|Preimplantation Embryo||GSE71318||Oocyte to Lateblastocyst (7 Stages)||NCBI GEO||35||27315811|
|GSE36552||Oocyte to Lateblastocyst (9 Stages)||NCBI GEO||90||23934149|
|Cell Differentiation||GSE122380||Cell Differentiation||NCBI GEO||297||31249060|
|Subcellular Localization||GSE116008||Subcellular Localization||NCBI GEO||36||31230715|
|Exosome||GSE104926||Blood Exosomes from Early-Stage Esophageal Squamous Cell Carcinoma Patients vs. Normal Control||NCBI GEO||12||32043367|
|GSE100063, GSE100206||Blood Exosomes from Colorectal Cancer Patients vs. Normal Control||NCBI GEO||44||30053265|
|GSE100063, GSE100206||Blood Exosomes from Coronary Heart Disease vs. Normal Control||NCBI GEO||38||30053265|
|GSE100063, GSE100206||Blood Exosomes from Hepatocellular Carcinoma vs. Normal Control||NCBI GEO||53||30053265|
|GSE100063, GSE100206||Blood Exosomes from Pancreatic Adenocarcinoma Patients vs. Normal Control||NCBI GEO||46||30053265|
|Cancer Cell Line||PRJNA523380||Cancer Cell Line||NCBI SRA||658||31068700|
|Virus Infection||GSE125686||HIV Infection vs. Normal Control||NCBI GEO||22||30185599|
|GSE125686||HBV Infection vs. Normal Control||NCBI GEO||48||30185599|
|GSE125686||HCV Infection vs. Normal Control||NCBI GEO||24||30185599|
|GSE147507||COVID Patients vs. Normal Control||NCBI GEO||4||32416070|
|Circadian Rhythm||GSE113883||Circadian Rhythm||NCBI GEO||153||30201705|
LncExpDB integrates human lncRNA transcripts from LncBook v1.2, RefLnc, GENCODE v33, CHESS v2.2, FANTOM-CAT (lv4_strigent) and BIGTranscriptome. To obtain a high-confidence lncRNA dataset, a set of strict criteria is adopted by considering redundancy, mapping error, possible pre-mRNA fragment, polymerase run-on, incomplete transcript, length, boundary, strand and coding potential.
Overlap is defined as exact match of all exon junctions and 5'-start, 3'-end bounaries.
LncRNA transcripts are assigned into the same gene if they share exonic sequences in the same strand. Six cases of lncRNA genes are listed.
Based on their genomic locations in respect to protein-coding genes, we classified lncRNAs into seven groups, Intergenic, Intronic (S), Intronic (AS), Overlapping (S), Overlapping (AS), Sense, and Antisense. "S" in the bracket represents that lncRNAs are in the same strand of protein-coding RNAs, and "AS" represents that lncRNAs are in the antisense strand of protein-coding RNAs.
All samples are processed by a standardized RNA-seq pipeline (Trimmomatic, FastQC, STAR, RSeQC, Kallisto and featureCounts) to get the abundance matrixes (reads count, CPM, FPKM and TPM) of lncRNAs. The raw abundance matrixes are normalized by TMM method.
LncExpDB considers lncRNA genes with maximum expression values less than 1.0 TPM in a certain biological condition as not expressed (NE). If the lncRNA genes are tagged with NE in all biological conditions available, they are most likely unreliable lncRNA genes. Of course, it is possible that this definition may change when novel biological conditions are covered.
All expressed genes are ranked in a specific condition (time point/stage/tissue/cell/component/processing). Specifically, genes with expression values greater than the upper quantile are classified as “H” (high expression level), those less than the lower quantile as “L” (low expression level), and the remaining as “M” (medium expression level). High-capacity lncRNAs (HCL) are genes with “H” classification in at least one condition, and low-capacity lncRNAs (LCL) are those with “L” in all conditions, and the remaining are medium-capacity lncRNAs (MCL). It is noted that with more biological conditions covered, LCL or MCL may change to MCL or HCL.
LncExpDB identifies and characterizes featured lncRNA genes that are specifically expressed in a certain cell line/tissue, differentially expressed in the context of cancer or virus infection, enriched in a subcellular compartment, dynamically expressed during cell differentiation or embryo/organ development, or periodically expressed with circadian rhythm.
The featured genes are identified using specialized methods with strict criteria:
LncExpDB predicts lncRNA-mRNA interactions based on co-expression networks. Co-expressions relationships between lncRNAs and mRNAs are identified using the Pearson correlation coefficient (adjusted p-value < 0.01 and |r|>=0.5). It is noted due to the extremely small sampling size (n = 4), the dataset of “COVID patients vs. normal control” is not analyzed in this section.
Enter a gene symbol or gene ID (LncExpDB ID) in the search box on the homepage to explore the lncRNA of interest. In the “Resources” part or “Context” section in the navigation bar, the click of each context will lead you to explore the expression profiles of featured lncRNAs and lncRNA-mRNA interactions across different biological conditions in the corresponding biological contexts, where you can view the defined featured genes or explore a group of lncRNA genes of interest with customized filtration.
To overview expression capacities/featured genes/interactions across different contexts, please click on “Expression Capacity”, “Featured Genes” and “Interactions” in the navigation bar.
You can browse all lncRNAs in the "Genes" page with the basic information of gene id/symbol, classification, chromosome, strand, location, gene length and transcript number. You can search lncRNAs of interest by gene id/transcript id derived from LncBook v1.2, RefLnc, NONCODE v5, GENCODE v33, CHESS v2.2, FANTOM-CAT (lv4_strigent) and BIGTranscriptome or gene symbol derived from HGNC, chromosome or classification type, and the gene id is linked to detailed information page of expression profiles in different contexts. In the detaied gene page, all corresponding gene and transcript id provide hyperlinks to their orginal pages. In addition, users can view our reference gene track on UCSC Genome Browser.
You can explore featured lncRNAs in the "Featured Genes" page, which covers tens of thousands of featured genes with specific expression patterns in at least one biological context. You can filter and/or re-order the table content using the categories and search boxes in the header line. Each gene id is linked to detailed information page of expression profiles in different contexts.
You can view all types of biological samples in the "Contexts" page including normal tissues and cell lines, organ development, preimplantation embryos, cell differentiation, subcellular localization, exosome, cancer cell line, virus infection and circadian rhythm. Each context page contains the tabs of “Featured Genes” and “Interaction”.
By clicking the tab of “Featured Genes”, you can select specific datasets of interest and browse all defined featured genes, e.g., specifically or consistently expressed genes in a certain context. In addition, you can select a specific group of genes with custom thresholds. You can filter and/or re-order the expression profile table using the categories and search boxes in the header line.
By clicking the tab of “Interactions”, you can select specific datasets of interest and browse the cis or trans interactions between lncRNAs and mRNAs. Moreover, you can select a specific group of interaction by custom thresholds or search the related interactions by lncRNA/protein-coding id or symbol.
In the "Expression Capacity" page, you can browse the lncRNA’s expression capacity in various biological contexts. You can filter for high-capacity lncRNAs in one or multiple contexts using the categories and in the header line of expression capacity table. Furthermore, the “Chart” enables visualization of expression level distribution among all the biological conditions. Each gene id is linked to detailed information page of expression profiles in different contexts.
You can visualize all lncRNA-mRNA interactions in the “Interactions” page, which includes the detail information of lncRNAs-mRNA pairs, pearson correlation coefficient value, p values and distance. The “search by” tab allows you to narrow down the results according to gene of your interest. Each gene id is linked to detailed information page of expression profiles in different contexts.
The “Downloads” page contains all the files that you can download such as: i) reference gene model for RNA-seq analysis, ii) expression profiles, iii) expression levels, iv) featured genes and v) co-expression matrix in various biological contexts.
In the page of “Statistics”, you can find and download all statistical analytics results for i) gene annotation statistics, such as lncRNA integration, exon and transcript number distribution and lncRNA classification, ii) expression statistics, including expression profiles and distribution of featured lncRNAs in different biological contexts, and iii) lncRNA-mRNA interaction distribution.