1. Introduction

  • What is Gene Expression Nebulas?

    Gene Expression Nebulas (GEN) is a data portal of gene expression profiles derived entirely from RNA-Seq data analysis on various tissues in multiple species. Currently, GEN contains two expression resources: (1) MTD (Mammalian Transcriptomic Database; http://mtd.cbi.ac.cn/) that is focused on mammalian transcriptomes, with the current version containing RNA-Seq data on human, mouse, rat and pig; (2) RED (Rice Expression Database; http://expression.ic4r.org) that is focused on rice transcriptomes, with the current version containing RNA-Seq data from Nipponbare (Oryza Sativa japonica).

    GEN provides valuable information on gene expression patterns under different biological conditions, with the aim to elucidate the dynamics of gene expression regulation for different species. Additional features like comparative transcriptomic and co-expression analysis also offer valuable resources.

2. Datasets and Method

  • Datasets

    • Human: 31 Tissues/cell lines
    • Mouse: 44 Tissues/cell lines
    • ig: 13 Tissues/cell lines
    • Rat: 14 Tissues/cell lines
    • Rice: 8 Tissues/218 Experiments
  • RNA-seq Data Analysis Methods

    • MTD (Mammalian Transcriptomic Database)

      Low-quality reads were filtered using some data preprocessing steps by Perl scripts. RNA-seq reads were mapped to the reference genome of their corresponding species with Tophat v2.0.9. The reference genomes of humans, rats, pigs and mice are hg19 (UCSC), rn4 (UCSC), Sus scrofa10.2 (NCBI) and mm10 (UCSC), respectively. Gene/isoform assembly and quantification were performed using Cufflinks v2.1.1 with default parameters. The RPKM (reads per kilobase per million mapped reads) of genes/isoforms/exons were calculated.

    • RED (Rice Expression Database)

      Roughly speaking, raw RNA-Seq data was first converted into fastq using SRA Toolkit (v 2.4.2) and we adopted Trimmomatic (v 0.35) for quality control. A sample was excluded from further analysis if low-quality-reads cover over 40% of total reads. For sequence alignment and gene expression analysis, all high-quality samples were mapped to the latest version of rice reference genome (Os-Nipponbare-Reference-IRGSP-1.0) using hisat2 (v 2.0.1). Only samples with ~60% reads mapped to the genome were used for gene expression profiling.

      Stringtie (v 1.1.2) was used to calculate the expression level for each gene and transcript based on annotation information of genes and transcripts downloaded from Rice Genome Annotation Project (http://rice.plantbiology.msu.edu/)

3. Database Usage

  • Please visit species on the top navigation bar and choose species of your interest which will automatically direct you to related database.

4. Terminology

  • RNA Sequencing

    RNA-Seq (RNA Sequencing), also called Whole Transcriptome Shotgun Sequencing (WTSS), is a technology that uses the capabilities of next-generation sequencing to reveal a snapshot of RNA presence and quantity from a genome at a given moment in time.

  • Sequence Read Archive

    SRA (Sequence Read Archive) makes biological sequence data available to the research community to enhance reproducibility and allow for new discoveries by comparing data sets. SRA stores raw sequencing data and alignment information from high-throughput sequencing platforms. (Adapted from SRA Homepage) http://www.ncbi.nlm.nih.gov/sra

  • FPKM

    The number of fragments aligned per kilobases of the transcript per million mappable fragments from the total dataset.

5. Help

  • Contact us

    We would love to hearing from you for any questions or comments. Please find our contact information below.


    Lili Hao haolili(AT)big.ac.cn

    Postal Address
    Dr. Zhang Zhang, PI
    Beijing Institute of Genomics, Chinese Academy of Sciences
    No.1 Beichen West Road
    Chaoyang District, Beijing 100101