Standards for Structured Data Index & Search

BS Index File

The Bigsearch System uses BS (index.bs) file for the index of database websites. The file should contain all the information of database websites that need to be indexed by Bigsearch engine. The format is as follows :

DB  JSON_ENCODED_STRING
ENTRY JSON_ENCODED_STRING
...   ...

Each line of the file has two rows, spaced by tab ( ).

The first row of each line shows the information type of this line

  • DB: DB indicates that the line is describing the meta information of the database itself
  • ENTRY: ENTRY means that the line is describing the information of the items stored in the database

The second row of each line describes the detailed information, using character strings encoded in json format

For example, the Ic4r-seqs database contains the annotation information of rice genes, hence each ENTRY line matches a gene in the database.

DB Line

DB indicates that the line is describing the meta information of the database itself. The first row is uniformed to be DB, and the second row is a JSON string. Take IC4R as example, the detailed format of JSON string is follows (The page shown here displays the JSON string in multi lines, while the string should be formatted in one line in the BS file by deleting the line break and tab):

{
  "id": "The unique ID of the database in Bigsearch, currently set at will, recommend to use lowercase letters without space or punctuation, such as ic4r", //[Required],
  "title": "The name of the database, such as IC4R", //[Required],
  "url": "The url of the database, such as http://ic4r.org", //[Required]
  "description": "The full name of database or one sentence description", //["" for blank]
  "basicInfo": "Brief introduction of database, could be one or multi-paragraph text", //["" for blank]
  "categories": [// Category of database, currently set at will [use [] for blank]
      "Rice",
      "Sequences"
  ],
  "species": [ // Species in database [use [] for blank]
    "rice",
    "O. Sativa"
  ],
  "updatedAt": "2014-05-06 11:11:11" // Update time of database, with format yyyy-MM-dd HH:mm:ss [Required]
}

Entry Line

ENTRY means that the line is describing the information of the items stored in the database. The first row is uniformed to be ENTRY, and the second row is a JSON string. Take IC4R as example, the detailed format of JSON string is follows (The page shown here displays the JSON string in multi lines, while the string should be formatted in one line in the BS file by deleting the line break and tab):

{
  "id": "Os01g0192000", // ID of the item, set at will [Required]
  "type": "gene", // Type of the item, set at will [Required]
  "title": "Os01g0192000", // Name or title for display [Required]
  "url": "http://ic4r.org/genes/Os01g0192000", // Corresponding url in database [Required]
  "dbId": "ic4r", // The id set in the DB line [Required]
  "updatedAt": "2014-05-06 11:11:11", // Update time of item [Required]
  "description": "The full name of item or one sentence description", //["" for blank]
  "basicInfo": "Brief introduction of item, could be one or multi-paragraph text", // ["" for blank]
  "species": [ // Species of item [use [] for blank]
    "rice",
    "O. Sativa"
  ],
  "attrs": { // Other information of attribution, in the format : , set at will
    "accession": "Os01g0192000",
    "unigene": ["Os.20244", "Os.10244"]
  }
}

BSChecker

The BSChecker is a tool designed for checking whether the index.bs file of each database is correct.

bin/bschecker.bat index.bs // For Windows
bin/bschecker index.bs // For Linux or macOS

BSChecker

Deployment

The index.bs file should be put in the root directory of the website, and make sure that it can be downloaded from http(s)://xxx.org/index.bs. For example, for the database website http://ic4r.org/, the corresponding index.bs file should be put as http://ic4r.org/index.bs