Database Commons
Database Commons

a catalog of worldwide biological databases

Database Profile

General information

URL: http://genestudy.org/recommends
Full name: A content-based dataset recommendation system for researchers-a case study on Gene Expression Omnibus (GEO) repository
Description: DatasetRecSys is the first study of its kind on content-based dataset recommendation by adopting an information retrieval (IR) paradigm. It will further promote data sharing, offset the researchers' workload in identifying the right dataset and increase the reusability of biomedical datasets.
Year founded: 2020
Last update: 2020-01-01
Version:
Accessibility:
Manual:
Accessible
Real time : Checking...
Country/Region: United States

Classification & Tag

Data type:
Data object:
NA
Database category:
Major species:
NA
Keywords:

Contact information

University/Institution: University of Texas Health Science Center at Houston
Address: Department of Biostatistics and Data Science, School of Public Health, The University of Texas Health Science Center at Houston/1200 Pressler Street, Suite E-833, Houston, TX, 77030, USA
City: Houston
Province/State: Texas
Country/Region: United States
Contact name (PI/Team): Hulin Wu
Contact email (PI/Helpdesk): Hulin.Wu@uth.tmc.edu

Publications

33002137
A content-based dataset recommendation system for researchers-a case study on Gene Expression Omnibus (GEO) repository. [PMID: 33002137]
Braja Gopal Patra, Kirk Roberts, Hulin Wu

It is a growing trend among researchers to make their data publicly available for experimental reproducibility and data reusability. Sharing data with fellow researchers helps in increasing the visibility of the work. On the other hand, there are researchers who are inhibited by the lack of data resources. To overcome this challenge, many repositories and knowledge bases have been established to date to ease data sharing. Further, in the past two decades, there has been an exponential increase in the number of datasets added to these dataset repositories. However, most of these repositories are domain-specific, and none of them can recommend datasets to researchers/users. Naturally, it is challenging for a researcher to keep track of all the relevant repositories for potential use. Thus, a dataset recommender system that recommends datasets to a researcher based on previous publications can enhance their productivity and expedite further research. This work adopts an information retrieval (IR) paradigm for dataset recommendation. We hypothesize that two fundamental differences exist between dataset recommendation and PubMed-style biomedical IR beyond the corpus. First, instead of keywords, the query is the researcher, embodied by his or her publications. Second, to filter the relevant datasets from non-relevant ones, researchers are better represented by a set of interests, as opposed to the entire body of their research. This second approach is implemented using a non-parametric clustering technique. These clusters are used to recommend datasets for each researcher using the cosine similarity between the vector representations of publication clusters and datasets. The maximum normalized discounted cumulative gain at 10 (NDCG@10), precision at 10 (p@10) partial and p@10 strict of 0.89, 0.78 and 0.61, respectively, were obtained using the proposed method after manual evaluation by five researchers. As per the best of our knowledge, this is the first study of its kind on content-based dataset recommendation. We hope that this system will further promote data sharing, offset the researchers' workload in identifying the right dataset and increase the reusability of biomedical datasets. Database URL: http://genestudy.org/recommends/#/.

Database (Oxford). 2020:() | 7 Citations (from Europe PMC, 2024-04-20)

Ranking

All databases:
3401/6000 (43.333%)
Metadata:
332/619 (46.527%)
3401
Total Rank
7
Citations
1.75
z-index

Community reviews

Not Rated
Data quality & quantity:
Content organization & presentation
System accessibility & reliability:

Word cloud

Related Databases

Citing
Cited by

Record metadata

Created on: 2020-11-10
Curated by:
Lin Liu [2022-09-20]
Dong Zou [2020-11-19]
Zhao Li [2020-11-18]
Chang Liu [2020-11-10]