DBHPP


37285317	The NanoFlow Repository. [PMID: 37285317] Jessie E Arce, Joshua A Welsh, Sean Cook, John Tigges, Ionita Ghiran, Jennifer C Jones, Andrew Jackson, Matthew Roth, Aleksandar Milosavljevic Abstract MOTIVATION: Extracellular particles (EPs) are the focus of a rapidly growing area of exploration due to the widespread interest in understanding their roles in health and disease. However, despite the general need for EP data sharing and established community standards for data reporting, no standard repository for EP flow cytometry data captures rigor and minimum reporting standards such as those defined by MIFlowCyt-EV (https://doi.org/10.1080/20013078.2020.1713526). We sought to address this unmet need by developing the NanoFlow Repository. RESULTS: We have developed The NanoFlow Repository to provide the first implementation of the MIFlowCyt-EV framework. AVAILABILITY AND IMPLEMENTATION: The NanoFlow Repository is freely available and accessible online at https://genboree.org/nano-ui/. Public datasets can be explored and downloaded at https://genboree.org/nano-ui/ld/datasets. The NanoFlow Repository's backend is built using the Genboree software stack that powers the ClinGen Resource, specifically the Linked Data Hub (LDH), a REST API framework written in Node.js, developed initially to aggregate data within ClinGen (https://ldh.clinicalgenome.org/ldh/ui/about). NanoFlow's LDH (NanoAPI) is available at https://genboree.org/nano-api/srvc. NanoAPI is supported by a Node.js Genboree authentication and authorization service (GbAuth), a graph database called ArangoDB, and an Apache Pulsar message queue (NanoMQ) to manage data inflows into NanoAPI. The website for NanoFlow Repository is built with Vue.js and Node.js (NanoUI) and supports all major browsers. Bioinformatics. 2023:39(6) \| 3 Citations (from Europe PMC, 2024-04-20)
36712019	The revised reference genome of the leopard gecko ( ) provides insight into the considerations of genome phasing and assembly. [PMID: 36712019] Brendan J Pinto, Tony Gamble, Chase H Smith, Shannon E Keating, Justin C Havird, Ylenia Chiari Abstract Genomic resources across squamate reptiles (lizards and snakes) have lagged behind other vertebrate systems and high-quality reference genomes remain scarce. Of the 23 chromosome-scale reference genomes across the order, only 12 of the ~60 squamate families are represented. Within geckos (infraorder Gekkota), a species-rich clade of lizards, chromosome-level genomes are exceptionally sparse representing only two of the seven extant families. Using the latest advances in genome sequencing and assembly methods, we generated one of the highest quality squamate genomes to date for the leopard gecko, (Eublepharidae). We compared this assembly to the previous, short-read only, reference genome published in 2016 and examined potential factors within the assembly influencing contiguity of genome assemblies using PacBio HiFi data. Briefly, the read N50 of the PacBio HiFi reads generated for this study was equal to the contig N50 of the previous reference genome at 20.4 kilobases. The HiFi reads were assembled into a total of 132 contigs, which was further scaffolded using HiC data into 75 total sequences representing all 19 chromosomes. We identified that 9 of the 19 chromosomes were assembled as single contigs, while the other 10 chromosomes were each scaffolded together from two or more contigs. We qualitatively identified that percent repeat content within a chromosome broadly affects its assembly contiguity prior to scaffolding. This genome assembly signifies a new age for squamate genomics where high-quality reference genomes rivaling some of the best vertebrate genome assemblies can be generated for a fraction previous cost estimates. This new reference assembly is available on NCBI at JAOPLA010000000. The genome version and its associated annotations are also available via this Figshare repository https://doi.org/10.6084/m9.figshare.20069273 . bioRxiv. 2023:() \| 0 Citations (from Europe PMC, 2024-04-20)
36761516	Expanded range of eight orchid bee species (Hymenoptera, Apidae, Euglossini) in Costa Rica. [PMID: 36761516] Elise McDonald, Jacob Podesta, Christine Cairns Fortuin, Kamal Jk Gandhi Abstract BACKGROUND: The Monteverde region of Costa Rica is a hotspot of endemism and biodiversity. The region is, however, disturbed by human activities such as agriculture and urbanisation. This study provides a list of orchid bees (Hymenoptera: Euglossini) compiled from field surveys conducted during January-October 2019 in the premontane wet forest of San Luis, Monteverde, Costa Rica. We collected 36 species of Euglossine bees across four genera. We provide new geographic distribution and elevation data for eight species in two genera. Due to their critical role in the pollination of orchids and other plants, the distribution and abundance of Euglossine bees has relevance to plant biodiversity and conservation efforts. This is especially important in a region with a high diversity of difficult-to-study epiphytic orchids, such as in the Monteverde region. NEW INFORMATION: A total of 2,742 Euglossine male individuals across four genera (, , and ) were collected in this study. Updated geographic distributions and elevation ranges were established for eight species of Euglossini in two genera: (Fabricius, 1787), (Kimsey, 1977), (Moure, 1965), (Moure, 1968), (Moure, 1965), (Smith, 1874), (Moure, 1970) and (Dressler, 1978). These are the first recorded occurrences of these species in the Monteverde region of Costa Rica, according to the Global Biodiversity Information Facility (GBIF) database (https://doi.org/10.15468/9f9kgp). This study also established expanded elevation ranges for , , , and , though these five species have been previously recorded in the Monteverde region and, thus, are not described in detail here. Additionally, our capture of 123 individuals is significant, as it indicates its abundance in this region. Prior to this study, there was a single record of in the Monteverde region, documented in 1993. Biodivers Data J. 2022:10() \| 0 Citations (from Europe PMC, 2024-04-20)
35977951	An Electroencephalography-based Database for studying the Effects of Acoustic Therapies for Tinnitus Treatment. [PMID: 35977951] Alma Rosa Cuevas-Romero, Luz María Alonso-Valerdi, Luis Alejandro Intriago-Campos, David Isaac Ibarra-Zárate Abstract The present database provides demographic (age and sex), clinical (hearing loss and acoustic properties of tinnitus), psychometric (based on Tinnitus Handicapped Inventory and Hospital Anxiety and Depression Scale) and electroencephalographic information of 89 tinnitus sufferers who were semi-randomly treated for eight weeks with one of five acoustic therapies. These were (1) placebo (relaxing music), (2) tinnitus retraining therapy, (3) auditory discrimination therapy, (4) enriched acoustic environment, and (5) binaural beats therapy. Fourteen healthy volunteers who were exposed to relaxing music and followed the same experimental procedure as tinnitus sufferers were additionally included in the study (control group). The database is available at https://doi.org/10.17632/kj443jc4yc.1 . Acoustic therapies were monitored one week after, three weeks after, five weeks after, and eight weeks after the acoustic therapy. This study was previously approved by the local Ethical Committee (CONBIOETICA19CEI00820130520), it was registered as a clinical trial (ISRCTN14553550) in BioMed Central (Springer Nature), the protocol was published in 2016, it attracted L'Oréal-UNESCO Organization as a sponsor, and six journal publications have resulted from the analysis of this database. Sci Data. 2022:9(1) \| 2 Citations (from Europe PMC, 2024-04-20)
35796594	Curation of a reference database of COI sequences for insect identification through DNA metabarcoding: COins. [PMID: 35796594] Giulia Magoga, Giobbe Forni, Matteo Brunetti, Aycan Meral, Alberto Spada, Alessio De Biase, Matteo Montagna Abstract DNA metabarcoding is a widespread approach for the molecular identification of organisms. While the associated wet-lab and data processing procedures are well established and highly efficient, the reference databases for taxonomic assignment can be implemented to improve the accuracy of identifications. Insects are among the organisms for which DNA-based identification is most commonly used; yet, a DNA-metabarcoding reference database specifically curated for their species identification using software requiring local databases is lacking. Here, we present COins, a database of 5' region cytochrome c oxidase subunit I sequences (COI-5P) of insects that includes over 532 000 representative sequences of >106 000 species specifically formatted for the QIIME2 software platform. Through a combination of automated and manually curated steps, we developed this database starting from all COI sequences available in the Barcode of Life Data System for insects, focusing on sequences that comply with several standards, including a species-level identification. COins was validated on previously published DNA-metabarcoding sequences data (bulk samples from Malaise traps) and its efficiency compared with other publicly available reference databases (not specific for insects). COins can allow an increase of up to 30% of species-level identifications and thus can represent a valuable resource for the taxonomic assignment of insects' DNA-metabarcoding data, especially when species-level identification is needed https://doi.org/10.6084/m9.figshare.19130465.v1. Database (Oxford). 2022:2022() \| 5 Citations (from Europe PMC, 2024-04-20)
35723975	VIBFREQ1295: A New Database for Vibrational Frequency Calculations. [PMID: 35723975] Juan C Zapata Trujillo, Laura K McKemmish Abstract High-throughput approaches for producing approximate vibrational spectral data for molecules of astrochemistry interest rely on harmonic frequency calculations using computational quantum chemistry. However, model chemistry recommendations (i.e., a level of theory and basis set pair) for these calculations are not yet available and, thus, thorough benchmarking against comprehensive benchmark databases is needed. Here, we present a new database for vibrational frequency calculations (VIBFREQ1295) storing 1295 experimental fundamental frequencies and CCSD(T)(F12)/cc-pVDZ-F12 harmonic frequencies from 141 molecules. VIBFREQ1295's experimental data was complied through a comprehensive review of contemporary experimental data, while the data was computed here. The chemical space spanned by the molecules chosen is considered in-depth and is shown to have good representation of common organic functional groups and vibrational modes. Scaling factors are routinely used to approximate the effect of anharmonicity and convert computed harmonic frequencies to predicted fundamental frequencies. With our experimental and high-level data, we find that a single global uniform scaling factor of 0.9617(3) results in median differences of 15.9(5) cm. A far superior performance with a median difference of 7.5(5) cm can be obtained, however, by using separate scaling factors (SFs) for three regions: frequencies less than 1000 cm (SF = 0.987(1)), between 1000 and 2000 cm (SF = 0.9727(6)), and above 2000 cm (SF = 0.9564(4)). This sets a lower bound for the performance that could be reliably obtained using scaling of harmonic frequency calculations to predict experimental fundamental frequencies. VIBFREQ1295's most important purpose is to provide a robust database for benchmarking the performance of any vibrational frequency calculations. VIBFREQ1295 data could also be used to train machine-learning models for the prediction of vibrational spectra and as a reference and data starting point for more detailed spectroscopic modeling of particular molecules. The database can be found as part of the Supporting Information for this paper or in the Harvard DataVerse at https://doi.org/10.7910/DVN/VLVNU7. J Phys Chem A.* 2022:126(25) \| 2 Citations (from Europe PMC, 2024-04-20)
35710683	A database of animal metagenomes. [PMID: 35710683] Ruirui Hu, Rui Yao, Lei Li, Yueren Xu, Bingbing Lei, Guohao Tang, Haowei Liang, Yunjiao Lei, Cunyuan Li, Xiaoyue Li, Kaiping Liu, Limin Wang, Yunfeng Zhang, Yue Wang, Yuying Cui, Jihong Dai, Wei Ni, Ping Zhou, Baohua Yu, Shengwei Hu Abstract With the rapid development of high-throughput sequencing technology, the amount of metagenomic data (including both 16S and whole-genome sequencing data) in public repositories is increasing exponentially. However, owing to the large and decentralized nature of the data, it is still difficult for users to mine, compare, and analyze the data. The animal metagenome database (AnimalMetagenome DB) integrates metagenomic sequencing data with host information, making it easier for users to find data of interest. The AnimalMetagenome DB is designed to contain all public metagenomic data from animals, and the data are divided into domestic and wild animal categories. Users can browse, search, and download animal metagenomic data of interest based on different attributes of the metadata such as animal species, sample site, study purpose, and DNA extraction method. The AnimalMetagenome DB version 1.0 includes metadata for 82,097 metagenomes from 4 domestic animals (pigs, bovines, horses, and sheep) and 540 wild animals. These metagenomes cover 15 years of experiments, 73 countries, 1,044 studies, 63,214 amplicon sequencing data, and 10,672 whole genome sequencing data. All data in the database are hosted and available in figshare https://doi.org/10.6084/m9.figshare.19728619 . Sci Data. 2022:9(1) \| 6 Citations (from Europe PMC, 2024-04-20)
35703577	Tallo: A global tree allometry and crown architecture database. [PMID: 35703577] Tommaso Jucker, Fabian Jörg Fischer, Jérôme Chave, David A Coomes, John Caspersen, Arshad Ali, Grace Jopaul Loubota Panzou, Ted R Feldpausch, Daniel Falster, Vladimir A Usoltsev, Stephen Adu-Bredu, Luciana F Alves, Mohammad Aminpour, Ilondea B Angoboy, Niels P R Anten, Cécile Antin, Yousef Askari, Rodrigo Muñoz, Narayanan Ayyappan, Patricia Balvanera, Lindsay Banin, Nicolas Barbier, John J Battles, Hans Beeckman, Yannick E Bocko, Ben Bond-Lamberty, Frans Bongers, Samuel Bowers, Thomas Brade, Michiel van Breugel, Arthur Chantrain, Rajeev Chaudhary, Jingyu Dai, Michele Dalponte, Kangbéni Dimobe, Jean-Christophe Domec, Jean-Louis Doucet, Remko A Duursma, Moisés Enríquez, Karin Y van Ewijk, William Farfán-Rios, Adeline Fayolle, Eric Forni, David I Forrester, Hammad Gilani, John L Godlee, Sylvie Gourlet-Fleury, Matthias Haeni, Jefferson S Hall, Jie-Kun He, Andreas Hemp, José L Hernández-Stefanoni, Steven I Higgins, Robert J Holdaway, Kiramat Hussain, Lindsay B Hutley, Tomoaki Ichie, Yoshiko Iida, Hai-Sheng Jiang, Puspa Raj Joshi, Hasan Kaboli, Maryam Kazempour Larsary, Tanaka Kenzo, Brian D Kloeppel, Takashi Kohyama, Suwash Kunwar, Shem Kuyah, Jakub Kvasnica, Siliang Lin, Emily R Lines, Hongyan Liu, Craig Lorimer, Jean-Joël Loumeto, Yadvinder Malhi, Peter L Marshall, Eskil Mattsson, Radim Matula, Jorge A Meave, Sylvanus Mensah, Xiangcheng Mi, Stéphane Momo, Glenn R Moncrieff, Francisco Mora, Sarath P Nissanka, Kevin L O'Hara, Steven Pearce, Raphaël Pelissier, Pablo L Peri, Pierre Ploton, Lourens Poorter, Mohsen Javanmiri Pour, Hassan Pourbabaei, Juan Manuel Dupuy-Rada, Sabina C Ribeiro, Casey Ryan, Anvar Sanaei, Jennifer Sanger, Michael Schlund, Giacomo Sellan, Alexander Shenkin, Bonaventure Sonké, Frank J Sterck, Martin Svátek, Kentaro Takagi, Anna T Trugman, Farman Ullah, Matthew A Vadeboncoeur, Ahmad Valipour, Mark C Vanderwel, Alejandra G Vovides, Weiwei Wang, Li-Qiu Wang, Christian Wirth, Murray Woods, Wenhua Xiang, Fabiano de Aquino Ximenes, Yaozhan Xu, Toshihiro Yamada, Miguel A Zavala Abstract Data capturing multiple axes of tree size and shape, such as a tree's stem diameter, height and crown size, underpin a wide range of ecological research-from developing and testing theory on forest structure and dynamics, to estimating forest carbon stocks and their uncertainties, and integrating remote sensing imagery into forest monitoring programmes. However, these data can be surprisingly hard to come by, particularly for certain regions of the world and for specific taxonomic groups, posing a real barrier to progress in these fields. To overcome this challenge, we developed the Tallo database, a collection of 498,838 georeferenced and taxonomically standardized records of individual trees for which stem diameter, height and/or crown radius have been measured. These data were collected at 61,856 globally distributed sites, spanning all major forested and non-forested biomes. The majority of trees in the database are identified to species (88%), and collectively Tallo includes data for 5163 species distributed across 1453 genera and 187 plant families. The database is publicly archived under a CC-BY 4.0 licence and can be access from: https://doi.org/10.5281/zenodo.6637599. To demonstrate its value, here we present three case studies that highlight how the Tallo database can be used to address a range of theoretical and applied questions in ecology-from testing the predictions of metabolic scaling theory, to exploring the limits of tree allometric plasticity along environmental gradients and modelling global variation in maximum attainable tree height. In doing so, we provide a key resource for field ecologists, remote sensing researchers and the modelling community working together to better understand the role that trees play in regulating the terrestrial carbon cycle. Glob Chang Biol. 2022:28(17) \| 3 Citations (from Europe PMC, 2024-04-20)
35695419	Rhythm of the Night (and Day): Predictive Metabolic Modeling of Diurnal Growth in . [PMID: 35695419] Alex J Metcalf, Nanette R Boyle Abstract Economical production of photosynthetic organisms requires the use of natural day/night cycles. These induce strong circadian rhythms that lead to transient changes in the cells, requiring complex modeling to capture. In this study, we coupled times series transcriptomic data from the model green alga Chlamydomonas reinhardtii to a metabolic model of the same organism in order to develop the first transient metabolic model for diurnal growth of algae capable of predicting phenotype from genotype. We first transformed a set of discrete transcriptomic measurements (D. Strenkert, S. Schmollinger, S. D. Gallaher, P. A. Salomé, et al., Proc Natl Acad Sci U S A 116:2374-2383, 2019, https://doi.org/10.1073/pnas.1815238116) into continuous curves, producing a complete database of the cell's transcriptome that can be interrogated at any time point. We also decoupled the standard biomass formation equation to allow different components of biomass to be synthesized at different times of the day. The resulting model was able to predict qualitative phenotypical outcomes of a starchless mutant. We also extended this approach to simulate all single-knockout mutants and identified potential targets for rational engineering efforts to increase productivity. This model enables us to evaluate the impact of genetic and environmental changes on the growth, biomass composition, and intracellular fluxes for diurnal growth. We have developed the first transient metabolic model for diurnal growth of algae based on experimental data and capable of predicting phenotype from genotype. This model enables us to evaluate the impact of genetic and environmental changes on the growth, biomass composition and intracellular fluxes of the model green alga, Chlamydomonas reinhardtii. The availability of this model will enable faster and more efficient design of cells for production of fuels, chemicals, and pharmaceuticals. mSystems. 2022:7(4) \| 0 Citations (from Europe PMC, 2024-04-20)
35680932	A Chinese multi-modal neuroimaging data release for increasing diversity of human brain mapping. [PMID: 35680932] Peng Gao, Hao-Ming Dong, Si-Man Liu, Xue-Ru Fan, Chao Jiang, Yin-Shan Wang, Daniel Margulies, Hai-Fang Li, Xi-Nian Zuo Abstract The big-data use is becoming a standard practice in the neuroimaging field through data-sharing initiatives. It is important for the community to realize that such open science effort must protect personal, especially facial information when raw neuroimaging data are shared. An ideal tool for the face anonymization should not disturb subsequent brain tissue extraction and further morphological measurements. Using the high-resolution head images from magnetic resonance imaging (MRI) of 215 healthy Chinese, we discovered and validated a template effect on the face anonymization. Improved facial anonymization was achieved when the Chinese head templates but not the Western templates were applied to obscure the faces of Chinese brain images. This finding has critical implications for international brain imaging data-sharing. To facilitate the further investigation of potential culture-related impacts on and increase diversity of data-sharing for the human brain mapping, we released the 215 Chinese multi-modal MRI data into a database for imaging Chinese young brains, namely'I See your Brains (ISYB)', to the public via the Science Data Bank ( https://doi.org/10.11922/sciencedb.00740 ). Sci Data. 2022:9(1) \| 1 Citations (from Europe PMC, 2024-04-20)
35676297	PeSTK db a comprehensive data repository of Probiotic Serine Threonine kinases. [PMID: 35676297] Dhanashree Lokesh, Suresh Psn, Rajagopal Kammara Abstract The signal transduction pathway of prokaryotes involves a peptidoglycan synthesis cluster (PG) to sense external stimuli. One of the major components of the PG synthesis cluster is protein kinases (pknA - G). The sequence data of probiotic eSTKs (Eukaryotic like Serine, Threonine kinases) are obscure, scarce and essentially required to understand the role of probiotic microbes in combating infectious diseases. The most essential need to understand and develop certain therapeutic drugs against pathogens is the eSTK sequence data. Hence, we developed a comprehensive user-friendly data repository of probiotic eSTK's (PeSTK), which holds 830 STK sequences. Therefore, the data resource of PeSTK developed is unique, an open-access very summative containing various probiotic eSTK's in a single locality. The sequence datasets of the eSTK developed with easy-to-operate browsing as well as searching. Therefore, eSTK data resources should be useful for sequence-based studies and drug development. The sequence datasets are available at Figshare Digital Object Identifier/DOI of the sequences is https://doi.org/10.6084/m9.figshare.146606 . Sci Data. 2022:9(1) \| 2 Citations (from Europe PMC, 2024-04-20)
35670729	Creation and evaluation of full-text literature-derived, feature-weighted disease models of genetically determined developmental disorders. [PMID: 35670729] T M Yates, A Lain, J Campbell, D R FitzPatrick, T I Simpson Abstract There are >2500 different genetically determined developmental disorders (DD), which, as a group, show very high levels of both locus and allelic heterogeneity. This has led to the wide-spread use of evidence-based filtering of genome-wide sequence data as a diagnostic tool in DD. Determining whether the association of a filtered variant at a specific locus is a plausible explanation of the phenotype in the proband is crucial and commonly requires extensive manual literature review by both clinical scientists and clinicians. Access to a database of weighted clinical features extracted from rigorously curated literature would increase the efficiency of this process and facilitate the development of robust phenotypic similarity metrics. However, given the large and rapidly increasing volume of published information, conventional biocuration approaches are becoming impractical. Here, we present a scalable, automated method for the extraction of categorical phenotypic descriptors from the full-text literature. Papers identified through literature review were downloaded and parsed using the Cadmus custom retrieval package. Human Phenotype Ontology terms were extracted using MetaMap, with 76-84% precision and 65-73% recall. Mean terms per paper increased from 9 in title + abstract, to 68 using full text. We demonstrate that these literature-derived disease models plausibly reflect true disease expressivity more accurately than widely used manually curated models, through comparison with prospectively gathered data from the Deciphering Developmental Disorders study. The area under the curve for receiver operating characteristic (ROC) curves increased by 5-10% through the use of literature-derived models. This work shows that scalable automated literature curation increases performance and adds weight to the need for this strategy to be integrated into informatic variant analysis pipelines. Database URL: https://doi.org/10.1093/database/baac038. Database (Oxford). 2022:2022() \| 0 Citations (from Europe PMC, 2024-04-20)
35641547	eldBETA: A Large Eldercare-oriented Benchmark Database of SSVEP-BCI for the Aging Population. [PMID: 35641547] Bingchuan Liu, Yijun Wang, Xiaorong Gao, Xiaogang Chen Abstract Global population aging poses an unprecedented challenge and calls for a rising effort in eldercare and healthcare. Steady-state visual evoked potential based brain-computer interface (SSVEP-BCI) boasts its high transfer rate and shows great promise in real-world applications to support aging. Public database is critically important for designing the SSVEP-BCI systems. However, the SSVEP-BCI database tailored for the elder is scarce in existing studies. Therefore, in this study, we present a large eldercare-oriented BEnchmark database of SSVEP-BCI for The Aging population (eldBETA). The eldBETA database consisted of the 64-channel electroencephalogram (EEG) from 100 elder participants, each of whom performed seven blocks of 9-target SSVEP-BCI task. The quality and characteristics of the eldBETA database were validated by a series of analyses followed by a classification analysis of thirteen frequency recognition methods. We expect that the eldBETA database would provide a substrate for the design and optimization of the BCI systems intended for the elders. The eldBETA database is open-access for research and can be downloaded from the website https://doi.org/10.6084/m9.figshare.18032669 . Sci Data. 2022:9(1) \| 5 Citations (from Europe PMC, 2024-04-20)
35614082	MusMorph, a database of standardized mouse morphology data for morphometric meta-analyses. [PMID: 35614082] Jay Devine, Marta Vidal-García, Wei Liu, Amanda Neves, Lucas D Lo Vercio, Rebecca M Green, Heather A Richbourg, Marta Marchini, Colton M Unger, Audrey C Nickle, Bethany Radford, Nathan M Young, Paula N Gonzalez, Robert E Schuler, Alejandro Bugacov, Campbell Rolian, Christopher J Percival, Trevor Williams, Lee Niswander, Anne L Calof, Arthur D Lander, Axel Visel, Frank R Jirik, James M Cheverud, Ophir D Klein, Ramon Y Birnbaum, Amy E Merrill, Rebecca R Ackermann, Daniel Graf, Myriam Hemberger, Wendy Dean, Nils D Forkert, Stephen A Murray, Henrik Westerberg, Ralph S Marcucio, Benedikt Hallgrímsson Abstract Complex morphological traits are the product of many genes with transient or lasting developmental effects that interact in anatomical context. Mouse models are a key resource for disentangling such effects, because they offer myriad tools for manipulating the genome in a controlled environment. Unfortunately, phenotypic data are often obtained using laboratory-specific protocols, resulting in self-contained datasets that are difficult to relate to one another for larger scale analyses. To enable meta-analyses of morphological variation, particularly in the craniofacial complex and brain, we created MusMorph, a database of standardized mouse morphology data spanning numerous genotypes and developmental stages, including E10.5, E11.5, E14.5, E15.5, E18.5, and adulthood. To standardize data collection, we implemented an atlas-based phenotyping pipeline that combines techniques from image registration, deep learning, and morphometrics. Alongside stage-specific atlases, we provide aligned micro-computed tomography images, dense anatomical landmarks, and segmentations (if available) for each specimen (N = 10,056). Our workflow is open-source to encourage transparency and reproducible data collection. The MusMorph data and scripts are available on FaceBase ( www.facebase.org , https://doi.org/10.25550/3-HXMC ) and GitHub ( https://github.com/jaydevine/MusMorph ). Sci Data. 2022:9(1) \| 1 Citations (from Europe PMC, 2024-04-20)
35469024	HORDB a comprehensive database of peptide hormones. [PMID: 35469024] Ning Zhu, Fanyi Dong, Guobang Shi, Xingzhen Lao, Heng Zheng Abstract Peptide hormones (also known as hormone peptides and polypeptide hormones) are hormones composed of peptides and are signal transduction molecules produced by a class of multicellular organisms. It plays an important role in the physiological and behavioral regulation of animals and humans as well as in the growth of plants. In order to promote the research on peptide hormones, we constructed HORDB database. The database currently has a total of 6024 entries, including 5729 peptide hormones, 40 peptide drugs and 255 marketed pharmaceutical preparations information. Each entry provided comprehensive information related to the peptide, including general information, sequence, activity, structure, physical information and literature information. We also added information on IC, EC, ED, target, and whether or not the blood-brain barrier was crossed to the activity information note. In addition, HORDB integrates search and sequence analysis to facilitate user browsing and data analysis. We believe that the peptide hormones information collected by HORDB will promote the design and discovery of peptide hormones, All data are hosted and available in figshare https://doi.org/10.6084/m9.figshare.c.5522241 . Sci Data. 2022:9(1) \| 2 Citations (from Europe PMC, 2024-04-20)
35415670	: A curated database of molecular tastants. [PMID: 35415670] Cristian Rojas, Davide Ballabio, Karen Pacheco Sarmiento, Elisa Pacheco Jaramillo, Mateo Mendoza, Fernando García Abstract The purpose of this work is the creation of a chemical database named that includes both organic and inorganic tastants. The creation, curation pipeline and the main features of the database are described in detail. The database includes 2944 verified and curated compounds divided into nine classes, which comprise the five basic tastes (sweet, bitter, umami sour and salty) along with four additional categories: tasteless, non-sweet, multitaste and miscellaneous. provides the following information for each tastant: name, PubChem CID, CAS registry number, canonical SMILES, class taste and references to the scientific sources from which data were retrieved. The molecular structure in the HyperChem () format of each chemical is also made available. In addition, molecular fingerprints were used for characterizing and analyzing the chemical space of tastants by means of unsupervised machine learning. constitutes a useful tool to the scientific community to expand the information of taste molecules and to assist studies for the taste prediction of unevaluated and as yet unsynthetized compounds, as well as the analysis of the relationships between molecular structure and taste. The database is freely accessible at https://doi.org/10.5281/zenodo.5747393. Food Chem (Oxf). 2022:4() \| 3 Citations (from Europe PMC, 2024-04-20)
34297749	TREND database: Retinal images of healthy young subjects visualized by a portable digital non-mydriatic fundus camera. [PMID: 34297749] Natasa Popovic, Stela Vujosevic, Miroslav Radunović, Miodrag Radunović, Tomo Popovic Abstract Topological characterization of the Retinal microvascular nEtwork visualized by portable fuNDus camera (TREND) is a database comprising of 72 color digital retinal images collected from the students of the Faculty of Medicine at the University of Montenegro, in the period from February 18th to March 11th 2020. The database also includes binarized images of manually segmented microvascular networks associated with each raw image. The participant demographic characteristics, health status, and social habits information such as age, sex, body mass index, smoking history, alcohol use, as well as previous medical history was collected. As proof of the concept, a smaller set of 10 color digital fundus images from healthy older participants is also included. Comparison of the microvascular parameters of these two sets of images demonstrate that digital fundus images recorded with a hand-held portable camera are able to capture the changes in patterns of microvascular network associated with aging. The raw images from the TREND database provide a standard that defines normal retinal anatomy and microvascular network geometry in young healthy people in Montenegro as it is seen with the digital hand-held portable non-mydriatic MiiS HORUS Scope DEC 200.This knowledge could facilitate the application of this technology at the primary level of health care for large scale telematic screening for complications of chronic diseases, such as hypertensive and diabetic retinopathy. In addition, it could aid in the development of new methods for early detection of age-related changes in the retina, systemic chronic diseases, as well as eye-specific diseases. The associated manually segmented images of the microvascular networks provide the standard that can be used for development of automatic software for image quality assessment, segmentation of microvascular network, and for computer-aided detection of pathological changes in retina. The TREND database is freely available at https://doi.org/10.5281/zenodo.4521043. PLoS One. 2021:16(7) \| 1 Citations (from Europe PMC, 2024-04-20)
34006627	DOE JGI Metagenome Workflow. [PMID: 34006627] Alicia Clum, Marcel Huntemann, Brian Bushnell, Brian Foster, Bryce Foster, Simon Roux, Patrick P Hajek, Neha Varghese, Supratim Mukherjee, T B K Reddy, Chris Daum, Yuko Yoshinaga, Ronan O'Malley, Rekha Seshadri, Nikos C Kyrpides, Emiley A Eloe-Fadrosh, I-Min A Chen, Alex Copeland, Natalia N Ivanova Abstract The DOE Joint Genome Institute (JGI) Metagenome Workflow performs metagenome data processing, including assembly; structural, functional, and taxonomic annotation; and binning of metagenomic data sets that are subsequently included into the Integrated Microbial Genomes and Microbiomes (IMG/M) (I.-M. A. Chen, K. Chu, K. Palaniappan, A. Ratner, et al., Nucleic Acids Res, 49:D751-D763, 2021, https://doi.org/10.1093/nar/gkaa939) comparative analysis system and provided for download via the JGI data portal (https://genome.jgi.doe.gov/portal/). This workflow scales to run on thousands of metagenome samples per year, which can vary by the complexity of microbial communities and sequencing depth. Here, we describe the different tools, databases, and parameters used at different steps of the workflow to help with the interpretation of metagenome data available in IMG and to enable researchers to apply this workflow to their own data. We use 20 publicly available sediment metagenomes to illustrate the computing requirements for the different steps and highlight the typical results of data processing. The workflow modules for read filtering and metagenome assembly are available as a workflow description language (WDL) file (https://code.jgi.doe.gov/BFoster/jgi_meta_wdl). The workflow modules for annotation and binning are provided as a service to the user community at https://img.jgi.doe.gov/submit and require filling out the project and associated metadata descriptions in the Genomes OnLine Database (GOLD) (S. Mukherjee, D. Stamatis, J. Bertsch, G. Ovchinnikova, et al., Nucleic Acids Res, 49:D723-D733, 2021, https://doi.org/10.1093/nar/gkaa983). The DOE JGI Metagenome Workflow is designed for processing metagenomic data sets starting from Illumina fastq files. It performs data preprocessing, error correction, assembly, structural and functional annotation, and binning. The results of processing are provided in several standard formats, such as fasta and gff, and can be used for subsequent integration into the Integrated Microbial Genomes and Microbiomes (IMG/M) system where they can be compared to a comprehensive set of publicly available metagenomes. As of 30 July 2020, 7,155 JGI metagenomes have been processed by the DOE JGI Metagenome Workflow. Here, we present a metagenome workflow developed at the JGI that generates rich data in standard formats and has been optimized for downstream analyses ranging from assessment of the functional and taxonomic composition of microbial communities to genome-resolved metagenomics and the identification and characterization of novel taxa. This workflow is currently being used to analyze thousands of metagenomic data sets in a consistent and standardized manner. mSystems. 2021:6(3) \| 43 Citations (from Europe PMC, 2024-04-20)
33935558	Unlocking the Entomological Collection of the Natural History Museum of Maputo, Mozambique. [PMID: 33935558] Domingos Sandramo, Enrico Nicosia, Silvio Cianciullo, Bernardo Muatinte, Almeida Guissamulo Abstract Background: The collections of the Natural History Museum of Maputo have a crucial role in the safeguarding of Mozambique's biodiversity, representing an important repository of data and materials regarding the natural heritage of the country. In this paper, a dataset is described, based on the Museum's Entomological Collection recording 409 species belonging to seven orders and 48 families. Each specimen's available data, such as geographical coordinates and taxonomic information, have been digitised to build the dataset. The specimens included in the dataset were obtained between 1914-2018 by collectors and researchers from the Natural History Museum of Maputo (once known as "Museu Alváro de Castro") in all the country's provinces, with the exception of Cabo Delgado Province. New information: This paper adds data to the Biodiversity Network of Mozambique and the Global Biodiversity Information Facility, within the objectives of the SECOSUD II Project and the Biodiversity Information for Development Programme. The aforementioned insect dataset is available on the GBIF Engine data portal (https://doi.org/10.15468/j8ikhb). Data were also shared on the Mozambican national portal of biodiversity data BioNoMo (https://bionomo.openscidata.org), developed by SECOSUD II Project. Biodivers Data J. 2021:9() \| 1 Citations (from Europe PMC, 2024-04-20)
33929905	Risk-Based Chemical Ranking and Generating a Prioritized Human Exposome Database. [PMID: 33929905] Fanrong Zhao, Li Li, Yue Chen, Yichao Huang, Tharushi Prabha Keerthisinghe, Agnes Chow, Ting Dong, Shenglan Jia, Shipei Xing, Benedikt Warth, Tao Huan, Mingliang Fang Abstract BACKGROUND: Due to the ubiquitous use of chemicals in modern society, humans are increasingly exposed to thousands of chemicals that contribute to a major portion of the human exposome. Should a comprehensive and risk-based human exposome database be created, it would be conducive to the rapid progress of human exposomics research. In addition, once a xenobiotic is biotransformed with distinct half-lives upon exposure, monitoring the parent compounds alone may not reflect the actual human exposure. To address these questions, a comprehensive and risk-prioritized human exposome database is needed. OBJECTIVES: Our objective was to set up a comprehensive risk-prioritized human exposome database including physicochemical properties as well as risk prediction and develop a graphical user interface (GUI) that has the ability to conduct searches for content associated with chemicals in our database. METHODS: We built a comprehensive risk-prioritized human exposome database by text mining and database fusion. Subsequently, chemicals were prioritized by integrating exposure level obtained from the Systematic Empirical Evaluation of Models with toxicity data predicted by the Toxicity Estimation Software Tool and the Toxicological Priority Index calculated from the ToxCast database. The biotransformation half-lives () of all the chemicals were assessed using the Iterative Fragment Selection approach and biotransformation products were predicted using the previously developed BioTransformer machine-learning method. RESULTS: We compiled a human exposome database of chemicals, prioritized 13,441 chemicals based on probabilistic hazard quotient and 7,770 chemicals based on risk index, and provided a predicted biotransformation metabolite database of metabolites. In addition, a user-interactive Java software (Oracle)-based search GUI was generated to enable open access to this new resource. DISCUSSION: Our database can be used to guide chemical management and enhance scientific understanding to rapidly and effectively prioritize chemicals for comprehensive biomonitoring in epidemiological investigations. https://doi.org/10.1289/EHP7722. Environ Health Perspect. 2021:129(4) \| 9 Citations (from Europe PMC, 2024-04-20)
33773387	An annotation database for chemicals of emerging concern in exposome research. [PMID: 33773387] Jeroen Meijer, Marja Lamoree, Timo Hamers, Jean-Philippe Antignac, Sébastien Hutinet, Laurent Debrauwer, Adrian Covaci, Carolin Huber, Martin Krauss, Douglas I Walker, Emma L Schymanski, Roel Vermeulen, Jelle Vlaanderen Abstract BACKGROUND: Chemicals of Emerging Concern (CECs) include a very wide group of chemicals that are suspected to be responsible for adverse effects on health, but for which very limited information is available. Chromatographic techniques coupled with high-resolution mass spectrometry (HRMS) can be used for non-targeted screening and detection of CECs, by using comprehensive annotation databases. Establishing a database focused on the annotation of CECs in human samples will provide new insight into the distribution and extent of exposures to a wide range of CECs in humans. OBJECTIVES: This study describes an approach for the aggregation and curation of an annotation database (CECscreen) for the identification of CECs in human biological samples. METHODS: The approach consists of three main parts. First, CECs compound lists from various sources were aggregated and duplications and inorganic compounds were removed. Subsequently, the list was curated by standardization of structures to create "MS-ready" and "QSAR-ready" SMILES, as well as calculation of exact masses (monoisotopic and adducts) and molecular formulas. The second step included the simulation of Phase I metabolites. The third and final step included the calculation of QSAR predictions related to physicochemical properties, environmental fate, toxicity and Absorption, Distribution, Metabolism, Excretion (ADME) processes and the retrieval of information from the US EPA CompTox Chemicals Dashboard. RESULTS: All CECscreen database and property files are publicly available (DOI: https://doi.org/10.5281/zenodo.3956586). In total, 145,284 entries were aggregated from various CECs data sources. After elimination of duplicates and curation, the pipeline produced 70,397 unique "MS-ready" structures and 66,071 unique QSAR-ready structures, corresponding with 69,526 CAS numbers. Simulation of Phase I metabolites resulted in 306,279 unique metabolites. QSAR predictions could be performed for 64,684 of the QSAR-ready structures, whereas information was retrieved from the CompTox Chemicals Dashboard for 59,739 CAS numbers out of 69,526 inquiries. CECscreen is incorporated in the in silico fragmentation approach MetFrag. DISCUSSION: The CECscreen database can be used to prioritize annotation of CECs measured in non-targeted HRMS, facilitating the large-scale detection of CECs in human samples for exposome research. Large-scale detection of CECs can be further improved by integrating the present database with resources that contain CECs (metabolites) and meta-data measurements, further expansion towards in silico and experimental (e.g., MassBank) generation of MS/MS spectra, and development of bioinformatics approaches capable of using correlation patterns in the measured chemical features. Environ Int. 2021:152() \| 14 Citations (from Europe PMC, 2024-04-20)
33594085	Tropical cyclone simulations over Bangladesh at convection permitting 4.4 km & 1.5 km resolution. [PMID: 33594085] Hamish Steptoe, Nicholas Henry Savage, Saeed Sadri, Kate Salmon, Zubair Maalick, Stuart Webster Abstract High resolution simulations at 4.4 km and 1.5 km resolution have been performed for 12 historical tropical cyclones impacting Bangladesh. We use the European Centre for Medium-Range Weather Forecasting 5 generation Re-Analysis (ERA5) to provide a 9-member ensemble of initial and boundary conditions for the regional configuration of the Met Office Unified Model. The simulations are compared to the original ERA5 data and the International Best Track Archive for Climate Stewardship (IBTrACS) tropical cyclone database for wind speed, gust speed and mean sea-level pressure. The 4.4 km simulations show a typical increase in peak gust speed of 41 to 118 knots relative to ERA5, and a deepening of minimum mean sea-level pressure of up to -27 hPa, relative to ERA5 and IBTrACS data. The downscaled simulations compare more favourably with IBTrACS data than the ERA5 data suggesting tropical cyclone hazards in the ERA5 deterministic output may be underestimated. The dataset is freely available from https://doi.org/10.5281/zenodo.3600201 . Sci Data. 2021:8(1) \| 0 Citations (from Europe PMC, 2024-04-20)
32792559	ACDC, a global database of amphibian cytochrome-b sequences using reproducible curation for GenBank records. [PMID: 32792559] Matthijs P van den Burg, Salvador Herrando-Pérez, David R Vieites Abstract Genetic data are a crucial and exponentially growing resource across all biological sciences, yet curated databases are scarce. The widespread occurrence of sequence and (meta)data errors in public repositories calls for comprehensive improvements of curation protocols leading to robust research and downstream analyses. We collated and curated all available GenBank cytochrome-b sequences for amphibians, a benchmark marker in this globally declining vertebrate clade. The Amphibia's Curated Database of Cytochrome-b (ACDC) consists of 36,514 sequences representing 2,309 species from 398 genera (median = 2 with 50% interquartile ranges of 1-7 species/genus). We updated the taxonomic identity of >4,800 sequences (ca. 13%) and found 2,359 (6%) conflicting sequences with 84% of the errors originating from taxonomic misidentifications. The database (accessible at https://doi.org/10.6084/m9.figshare.9944759 ) also includes an R script to replicate our study for other loci and taxonomic groups. We provide recommendations to improve genetic-data quality in public repositories and flag species for which there is a need for taxonomic refinement in the face of increased rate of amphibian extinctions in the Anthropocene. Sci Data. 2020:7(1) \| 2 Citations (from Europe PMC, 2024-04-20)
30846902	An online global database of Hemiptera-Phytoplasma-Plant biological interactions. [PMID: 30846902] Valeria Trivellone Abstract Background: Phytoplasmas are phloem-limited plant pathogenic bacteria in the class Mollicutes transmitted by sap-feeding insect vectors of the Order Hemiptera. Vectors still have not yet been identified for about half of the 33 known phytoplasma groups and this has greatly hindered efforts to control the spread of diseases affecting important crops. Extensive gaps of knowledge on actual phytoplasma vectors and on the plant disease epidemiology prevent our understanding of the basic underlying biological mechanisms that facilitate interactions between insects, phytoplasmas and their host plants. New information: This paper presents a complete online database of Hemiptera-Phytoplasma-Plant (HPP) biological interactions worldwide, searchable via an online interface. The raw data are available through Zenodo at https://doi.org/10.5281/zenodo.2532738. The online database search interface was created using the 3I software (Dmitriev 2006) which enhances data usability by providing a customised web interface (http://trivellone.speciesfile.org/) that provides an overview of the recorded biological interactions and ability to discover particular interactions by searching for one or more phytoplasma, insect or plant taxa. The database will facilitate synthesis of all available and relevant data on the observed associations between phytoplasmas and their insect and plant hosts and will provide useful data to generate and test ecological and evolutionary hypotheses. Biodivers Data J. 2019:(7) \| 10 Citations (from Europe PMC, 2024-04-20)
31554814	Temporary dense seismic network during the 2016 Central Italy seismic emergency for microzonation studies. [PMID: 31554814] Fabrizio Cara, Giovanna Cultrera, Gaetano Riccio, Sara Amoroso, Paola Bordoni, Augusto Bucci, Ezio D'Alema, Maria D'Amico, Luciana Cantore, Simona Carannante, Rocco Cogliano, Giuseppe Di Giulio, Deborah Di Naccio, Daniela Famiani, Chiara Felicetta, Antonio Fodarella, Gianlorenzo Franceschina, Giovanni Lanzano, Sara Lovati, Lucia Luzi, Claudia Mascandola, Marco Massa, Alessia Mercuri, Giuliano Milana, Francesca Pacor, Davide Piccarreda, Marta Pischiutta, Stefania Pucillo, Rodolfo Puglia, Maurizio Vassallo, Graziano Boniolo, Grazia Caielli, Adelmo Corsi, Roberto de Franco, Alberto Tento, Giovanni Bongiovanni, Salomon Hailemikael, Guido Martini, Antonella Paciello, Alessandro Peloso, Fabrizio Poggi, Vladimiro Verrubbi, Maria Rosaria Gallipoli, Tony Alfredo Stabile, Marco Mancini Abstract In August 2016, a magnitude 6.0 earthquake struck Central Italy, starting a devastating seismic sequence, aggravated by other two events of magnitude 5.9 and 6.5, respectively. After the first mainshock, four Italian institutions installed a dense temporary network of 50 seismic stations in an area of 260 km. The network was registered in the International Federation of Digital Seismograph Networks with the code 3A and quoted with a Digital Object Identifier ( https://doi.org/10.13127/SD/ku7Xm12Yy9 ). Raw data were converted into the standard binary miniSEED format, and organized in a structured archive. Then, data quality and completeness were checked, and all the relevant information was used for creating the metadata volumes. Finally, the 99 Gb of continuous seismic data and metadata were uploaded into the INGV node of the European Integrated Data Archive repository. Their use was regulated by a Memorandum of Understanding between the institutions. After an embargo period, the data are now available for many different seismological studies. Sci Data. 2019:6(1) \| 0 Citations (from Europe PMC, 2024-04-20)
31304213	OakEcol: A database of Oak-associated biodiversity within the UK. [PMID: 31304213] R J Mitchell, P E Bellamy, C J Ellis, R L Hewison, N G Hodgetts, G R Iason, N A Littlewood, S Newey, J A Stockan, A F S Taylor Abstract Globally there is increasing concern about the decline in the health of oak trees. The impact of a decline in oak trees on associated biodiversity, species that utilize oak trees, is unknown. Here we collate a database of all known birds, bryophytes, fungi, invertebrates, lichens and mammals that use oak ( and ) in the UK. In total 2300 species are listed in the database. For each species we provide a level of association with oak, ranging from obligate (only found on oak) to cosmopolitan (found on a wide range of other tree species). Data on the ecology of each oak associated species was collated: part of tree used, use made of tree (feeding, roosting, breeding), age of tree, woodland type, tree form (coppice, pollarded, or natural growth form) and season when the tree was used. Data on use or otherwise by each of the 2300 species of 30 other tree species was also collated. A complete list of data sources is provided. For further insights into how this data can be used see Collapsing foundations: The ecology of the British oak, implications of its decline and mitigation options [1]. Data can be found at EIDC https://doi.org/10.5285/22b3d41e-7c35-4c51-9e55-0f47bb845202. Data Brief. 2019:25() \| 1 Citations (from Europe PMC, 2024-04-20)
31301205	The Generation of a Comprehensive Spectral Library for the Analysis of the Guinea Pig Proteome by SWATH-MS. [PMID: 31301205] Pawel Palmowski, Rachael Watson, G Nicholas Europe-Finner, Magdalena Karolczak-Bayatti, Andrew Porter, Achim Treumann, Michael J Taggart Abstract Advances in liquid chromatography-mass spectrometry have facilitated the incorporation of proteomic studies to many biology experimental workflows. Data-independent acquisition platforms, such as sequential window acquisition of all theoretical mass spectra (SWATH-MS), offer several advantages for label-free quantitative assessment of complex proteomes over data-dependent acquisition (DDA) approaches. However, SWATH data interpretation requires spectral libraries as a detailed reference resource. The guinea pig (Cavia porcellus) is an excellent experimental model for translation to many aspects of human physiology and disease, yet there is limited experimental information regarding its proteome. To overcome this knowledge gap, a comprehensive spectral library of the guinea pig proteome is generated. Homogenates and tryptic digests are prepared from 16 tissues and subjected to >200 DDA runs. Analysis of >250 000 peptide-spectrum matches resulted in a library of 73 594 peptides from 7666 proteins. Library validation is provided by i) analyzing externally derived SWATH files (https://doi.org/10.1016/j.jprot.2018.03.023) and comparing peptide intensity quantifications; ii) merging of externally derived data to the base library. This furnishes the research community with a comprehensive proteomic resource that will facilitate future molecular-phenotypic studies using (re-engaging) the guinea pig as an experimental model of relevance to human biology. The spectral library and raw data are freely accessible in the MassIVE repository (MSV000083199). Proteomics. 2019:19(15) \| 9 Citations (from Europe PMC, 2024-04-20)
31265791	Linguistic Materials and Metrics for the Creation of Well-Controlled Swedish Speech Perception Tests. [PMID: 31265791] Erik Witte, Susanne Köbler Abstract Purpose As factors influencing human word perception are important in the construction of speech perception tests used within the speech and hearing sciences, the purposes of this study were as follows: first, to develop algorithms that can be used to calculate different types of word metrics that influence the speed and accuracy of word perception and, second, to create a database in which those word metrics were calculated for a large set of Swedish words. Method Based on a revision of a large Swedish phonetic dictionary, data and algorithms were developed by which various frequency metrics, word length metrics, semantic metrics, neighborhood metrics, phonotactic metrics, and orthographic transparency metrics were calculated for each word in the dictionary. Of the various word metric algorithms used, some were Swedish language reimplementations of previously published algorithms, and some were developed in this study. Results The results of this study have been gathered in a Swedish word metric database called the AFC-list. The AFC-list consists of 816,404 phonetically transcribed Swedish words, all supplied with the word metric data calculated. The full AFC-list has been made publicly available under the Creative Commons Attribution 4.0 International license. Conclusion The results of this study constitute an extensive linguistic resource for the process of selecting test items in new well-controlled speech perception tests in the Swedish language. Supplemental Material https://doi.org/10.23641/asha.8330009. J. Speech Lang. Hear. Res.. 2019:62(7) \| 0 Citations (from Europe PMC, 2024-04-20)
31160582	CFTI5Med, the new release of the catalogue of strong earthquakes in Italy and in the Mediterranean area. [PMID: 31160582] Emanuela Guidoboni, Graziano Ferrari, Gabriele Tarabusi, Giulia Sgattoni, Alberto Comastri, Dante Mariotti, Cecilia Ciuccarelli, Maria Giovanna Bianchi, Gianluca Valensise Abstract A key element for assessing seismic hazard and risk is the availability of a comprehensive dataset on past earthquakes. Here we present the rationale, structure and contents of CFTI5Med ( https://doi.org/10.6092/ingv.it-cfti5 ), the 2018 version of the Catalogue of Strong Earthquakes in Italy: a large multidisciplinary effort including historians, seismologists and geologists. It was conceived in 1989, following the inception of GIS technology, and first published in 1995 to offer a full account of Italy's strongest earthquakes, of their territorial impact and associated social and economic upheaval. Subsequent versions (1997, 2000, 2007) entailed a fine tuning of research methodologies, included additional research on Italian earthquakes, and were extended to large earthquakes of the Mediterranean area. CFTI5Med comprised an opportunity to streamline the structure of the Catalogue database and propose a renovated user interface. The new front-end (1) grants an easier, intuitive access to the data, including earthquake effects on the environment, and (2) allows all data to be displayed jointly with relevant topographic, geological and seismological overlays published as web services. Sci Data. 2019:6(1) \| 4 Citations (from Europe PMC, 2024-04-20)
29768426	The Ryerson Audio-Visual Database of Emotional Speech and Song (RAVDESS): A dynamic, multimodal set of facial and vocal expressions in North American English. [PMID: 29768426] Steven R Livingstone, Frank A Russo Abstract The RAVDESS is a validated multimodal database of emotional speech and song. The database is gender balanced consisting of 24 professional actors, vocalizing lexically-matched statements in a neutral North American accent. Speech includes calm, happy, sad, angry, fearful, surprise, and disgust expressions, and song contains calm, happy, sad, angry, and fearful emotions. Each expression is produced at two levels of emotional intensity, with an additional neutral expression. All conditions are available in face-and-voice, face-only, and voice-only formats. The set of 7356 recordings were each rated 10 times on emotional validity, intensity, and genuineness. Ratings were provided by 247 individuals who were characteristic of untrained research participants from North America. A further set of 72 participants provided test-retest data. High levels of emotional validity and test-retest intrarater reliability were reported. Corrected accuracy and composite "goodness" measures are presented to assist researchers in the selection of stimuli. All recordings are made freely available under a Creative Commons license and can be downloaded at https://doi.org/10.5281/zenodo.1188976. PLoS ONE. 2018:13(5) \| 74 Citations (from Europe PMC, 2024-04-20)
29603730	ForC: a global database of forest carbon stocks and fluxes. [PMID: 29603730] Kristina J Anderson-Teixeira, Maria M H Wang, Jennifer C McGarvey, Valentine Herrmann, Alan J Tepley, Ben Bond-Lamberty, David S LeBauer Abstract Forests play an influential role in the global carbon (C) cycle, storing roughly half of terrestrial C and annually exchanging with the atmosphere more than five times the carbon dioxide (CO ) emitted by anthropogenic activities. Yet, scaling up from field-based measurements of forest C stocks and fluxes to understand global scale C cycling and its climate sensitivity remains an important challenge. Tens of thousands of forest C measurements have been made, but these data have yet to be integrated into a single database that makes them accessible for integrated analyses. Here we present an open-access global Forest Carbon database (ForC) containing previously published records of field-based measurements of ecosystem-level C stocks and annual fluxes, along with disturbance history and methodological information. ForC expands upon the previously published tropical portion of this database, TropForC (https://doi.org/10.5061/dryad.t516f), now including 17,367 records (previously 3,568) representing 2,731 plots (previously 845) in 826 geographically distinct areas. The database covers all forested biogeographic and climate zones, represents forest stands of all ages, and currently includes data collected between 1934 and 2015. We expect that ForC will prove useful for macroecological analyses of forest C cycling, for evaluation of model predictions or remote sensing products, for quantifying the contribution of forests to the global C cycle, and for supporting international efforts to inventory forest carbon and greenhouse gas exchange. A dynamic version of ForC is maintained at on GitHub (https://GitHub.com/forc-db), and we encourage the research community to collaborate in updating, correcting, expanding, and utilizing this database. ForC is an open access database, and we encourage use of the data for scientific research and education purposes. Data may not be used for commercial purposes without written permission of the database PI. Any publications using ForC data should cite this publication and Anderson-Teixeira et al. (2016a) (see Metadata S1). No other copyright or cost restrictions are associated with the use of this data set. Ecology. 2018:99(6) \| 11 Citations (from Europe PMC, 2024-04-20)
30027541	Designing An Individualized EHR Learning Plan For Providers. [PMID: 30027541] Lindsay A Stevens, Yumi T DiAngi, Jonathan D Schremp, Monet J Martorana, Roberta E Miller, Tzielan C Lee, Natalie M Pageler Abstract Electronic Health Records (EHRs) have been quickly implemented for meaningful use incentives; however these implementations have been associated with provider dissatisfaction and burnout. There are no previously reported instances of a comprehensive EHR educational program designed to engage providers and assist in improving efficiency and understanding of the EHR. Utilizing adult learning theory as a framework, Stanford Children's Health designed a tailored provider efficiency program with various inputs from: (1) provider specific EHR data; (2) provider survey data; and (3) structured observation sessions. This case report outlines the design of this individualized training program including team structure, resource requirements, and early provider response. CITATION: Stevens LA, DiAngi YT, Schremp JD, Martorana MJ, Miller RE, Lee TC, Pageler NM. Designing An Individualized EHR Learning Plan. Appl Clin Inform 2017; 8:924-935 https://doi.org/10.4338/040054. Appl Clin Inform. 2017:8(3) \| 20 Citations (from Europe PMC, 2024-04-20)

The NanoFlow Repository. [PMID: 37285317]

Jessie E Arce, Joshua A Welsh, Sean Cook, John Tigges, Ionita Ghiran, Jennifer C Jones, Andrew Jackson, Matthew Roth, Aleksandar Milosavljevic

Abstract

MOTIVATION: Extracellular particles (EPs) are the focus of a rapidly growing area of exploration due to the widespread interest in understanding their roles in health and disease. However, despite the general need for EP data sharing and established community standards for data reporting, no standard repository for EP flow cytometry data captures rigor and minimum reporting standards such as those defined by MIFlowCyt-EV (https://doi.org/10.1080/20013078.2020.1713526). We sought to address this unmet need by developing the NanoFlow Repository.
RESULTS: We have developed The NanoFlow Repository to provide the first implementation of the MIFlowCyt-EV framework.
AVAILABILITY AND IMPLEMENTATION: The NanoFlow Repository is freely available and accessible online at https://genboree.org/nano-ui/. Public datasets can be explored and downloaded at https://genboree.org/nano-ui/ld/datasets. The NanoFlow Repository's backend is built using the Genboree software stack that powers the ClinGen Resource, specifically the Linked Data Hub (LDH), a REST API framework written in Node.js, developed initially to aggregate data within ClinGen (https://ldh.clinicalgenome.org/ldh/ui/about). NanoFlow's LDH (NanoAPI) is available at https://genboree.org/nano-api/srvc. NanoAPI is supported by a Node.js Genboree authentication and authorization service (GbAuth), a graph database called ArangoDB, and an Apache Pulsar message queue (NanoMQ) to manage data inflows into NanoAPI. The website for NanoFlow Repository is built with Vue.js and Node.js (NanoUI) and supports all major browsers.

Bioinformatics. 2023:39(6) | 3 Citations (from Europe PMC, 2024-04-20)

The revised reference genome of the leopard gecko ( ) provides insight into the considerations of genome phasing and assembly. [PMID: 36712019]

Brendan J Pinto, Tony Gamble, Chase H Smith, Shannon E Keating, Justin C Havird, Ylenia Chiari

Abstract

Genomic resources across squamate reptiles (lizards and snakes) have lagged behind other vertebrate systems and high-quality reference genomes remain scarce. Of the 23 chromosome-scale reference genomes across the order, only 12 of the ~60 squamate families are represented. Within geckos (infraorder Gekkota), a species-rich clade of lizards, chromosome-level genomes are exceptionally sparse representing only two of the seven extant families. Using the latest advances in genome sequencing and assembly methods, we generated one of the highest quality squamate genomes to date for the leopard gecko, (Eublepharidae). We compared this assembly to the previous, short-read only, reference genome published in 2016 and examined potential factors within the assembly influencing contiguity of genome assemblies using PacBio HiFi data. Briefly, the read N50 of the PacBio HiFi reads generated for this study was equal to the contig N50 of the previous reference genome at 20.4 kilobases. The HiFi reads were assembled into a total of 132 contigs, which was further scaffolded using HiC data into 75 total sequences representing all 19 chromosomes. We identified that 9 of the 19 chromosomes were assembled as single contigs, while the other 10 chromosomes were each scaffolded together from two or more contigs. We qualitatively identified that percent repeat content within a chromosome broadly affects its assembly contiguity prior to scaffolding. This genome assembly signifies a new age for squamate genomics where high-quality reference genomes rivaling some of the best vertebrate genome assemblies can be generated for a fraction previous cost estimates. This new reference assembly is available on NCBI at JAOPLA010000000. The genome version and its associated annotations are also available via this Figshare repository https://doi.org/10.6084/m9.figshare.20069273 .

bioRxiv. 2023:() | 0 Citations (from Europe PMC, 2024-04-20)

Expanded range of eight orchid bee species (Hymenoptera, Apidae, Euglossini) in Costa Rica. [PMID: 36761516]

Elise McDonald, Jacob Podesta, Christine Cairns Fortuin, Kamal Jk Gandhi

Abstract

BACKGROUND: The Monteverde region of Costa Rica is a hotspot of endemism and biodiversity. The region is, however, disturbed by human activities such as agriculture and urbanisation. This study provides a list of orchid bees (Hymenoptera: Euglossini) compiled from field surveys conducted during January-October 2019 in the premontane wet forest of San Luis, Monteverde, Costa Rica. We collected 36 species of Euglossine bees across four genera. We provide new geographic distribution and elevation data for eight species in two genera. Due to their critical role in the pollination of orchids and other plants, the distribution and abundance of Euglossine bees has relevance to plant biodiversity and conservation efforts. This is especially important in a region with a high diversity of difficult-to-study epiphytic orchids, such as in the Monteverde region.
NEW INFORMATION: A total of 2,742 Euglossine male individuals across four genera (, , and ) were collected in this study. Updated geographic distributions and elevation ranges were established for eight species of Euglossini in two genera: (Fabricius, 1787), (Kimsey, 1977), (Moure, 1965), (Moure, 1968), (Moure, 1965), (Smith, 1874), (Moure, 1970) and (Dressler, 1978). These are the first recorded occurrences of these species in the Monteverde region of Costa Rica, according to the Global Biodiversity Information Facility (GBIF) database (https://doi.org/10.15468/9f9kgp). This study also established expanded elevation ranges for , , , and , though these five species have been previously recorded in the Monteverde region and, thus, are not described in detail here. Additionally, our capture of 123 individuals is significant, as it indicates its abundance in this region. Prior to this study, there was a single record of in the Monteverde region, documented in 1993.

Biodivers Data J. 2022:10() | 0 Citations (from Europe PMC, 2024-04-20)

An Electroencephalography-based Database for studying the Effects of Acoustic Therapies for Tinnitus Treatment. [PMID: 35977951]

Alma Rosa Cuevas-Romero, Luz María Alonso-Valerdi, Luis Alejandro Intriago-Campos, David Isaac Ibarra-Zárate

Abstract

The present database provides demographic (age and sex), clinical (hearing loss and acoustic properties of tinnitus), psychometric (based on Tinnitus Handicapped Inventory and Hospital Anxiety and Depression Scale) and electroencephalographic information of 89 tinnitus sufferers who were semi-randomly treated for eight weeks with one of five acoustic therapies. These were (1) placebo (relaxing music), (2) tinnitus retraining therapy, (3) auditory discrimination therapy, (4) enriched acoustic environment, and (5) binaural beats therapy. Fourteen healthy volunteers who were exposed to relaxing music and followed the same experimental procedure as tinnitus sufferers were additionally included in the study (control group). The database is available at https://doi.org/10.17632/kj443jc4yc.1 . Acoustic therapies were monitored one week after, three weeks after, five weeks after, and eight weeks after the acoustic therapy. This study was previously approved by the local Ethical Committee (CONBIOETICA19CEI00820130520), it was registered as a clinical trial (ISRCTN14553550) in BioMed Central (Springer Nature), the protocol was published in 2016, it attracted L'Oréal-UNESCO Organization as a sponsor, and six journal publications have resulted from the analysis of this database.

Sci Data. 2022:9(1) | 2 Citations (from Europe PMC, 2024-04-20)

Curation of a reference database of COI sequences for insect identification through DNA metabarcoding: COins. [PMID: 35796594]

Giulia Magoga, Giobbe Forni, Matteo Brunetti, Aycan Meral, Alberto Spada, Alessio De Biase, Matteo Montagna

Abstract

DNA metabarcoding is a widespread approach for the molecular identification of organisms. While the associated wet-lab and data processing procedures are well established and highly efficient, the reference databases for taxonomic assignment can be implemented to improve the accuracy of identifications. Insects are among the organisms for which DNA-based identification is most commonly used; yet, a DNA-metabarcoding reference database specifically curated for their species identification using software requiring local databases is lacking. Here, we present COins, a database of 5' region cytochrome c oxidase subunit I sequences (COI-5P) of insects that includes over 532 000 representative sequences of >106 000 species specifically formatted for the QIIME2 software platform. Through a combination of automated and manually curated steps, we developed this database starting from all COI sequences available in the Barcode of Life Data System for insects, focusing on sequences that comply with several standards, including a species-level identification. COins was validated on previously published DNA-metabarcoding sequences data (bulk samples from Malaise traps) and its efficiency compared with other publicly available reference databases (not specific for insects). COins can allow an increase of up to 30% of species-level identifications and thus can represent a valuable resource for the taxonomic assignment of insects' DNA-metabarcoding data, especially when species-level identification is needed https://doi.org/10.6084/m9.figshare.19130465.v1.

Database (Oxford). 2022:2022() | 5 Citations (from Europe PMC, 2024-04-20)

VIBFREQ1295: A New Database for Vibrational Frequency Calculations. [PMID: 35723975]

Juan C Zapata Trujillo, Laura K McKemmish

Abstract

High-throughput approaches for producing approximate vibrational spectral data for molecules of astrochemistry interest rely on harmonic frequency calculations using computational quantum chemistry. However, model chemistry recommendations (i.e., a level of theory and basis set pair) for these calculations are not yet available and, thus, thorough benchmarking against comprehensive benchmark databases is needed. Here, we present a new database for vibrational frequency calculations (VIBFREQ1295) storing 1295 experimental fundamental frequencies and CCSD(T)(F12*)/cc-pVDZ-F12 harmonic frequencies from 141 molecules. VIBFREQ1295's experimental data was complied through a comprehensive review of contemporary experimental data, while the data was computed here. The chemical space spanned by the molecules chosen is considered in-depth and is shown to have good representation of common organic functional groups and vibrational modes. Scaling factors are routinely used to approximate the effect of anharmonicity and convert computed harmonic frequencies to predicted fundamental frequencies. With our experimental and high-level data, we find that a single global uniform scaling factor of 0.9617(3) results in median differences of 15.9(5) cm. A far superior performance with a median difference of 7.5(5) cm can be obtained, however, by using separate scaling factors (SFs) for three regions: frequencies less than 1000 cm (SF = 0.987(1)), between 1000 and 2000 cm (SF = 0.9727(6)), and above 2000 cm (SF = 0.9564(4)). This sets a lower bound for the performance that could be reliably obtained using scaling of harmonic frequency calculations to predict experimental fundamental frequencies. VIBFREQ1295's most important purpose is to provide a robust database for benchmarking the performance of any vibrational frequency calculations. VIBFREQ1295 data could also be used to train machine-learning models for the prediction of vibrational spectra and as a reference and data starting point for more detailed spectroscopic modeling of particular molecules. The database can be found as part of the Supporting Information for this paper or in the Harvard DataVerse at https://doi.org/10.7910/DVN/VLVNU7.

J Phys Chem A. 2022:126(25) | 2 Citations (from Europe PMC, 2024-04-20)

A database of animal metagenomes. [PMID: 35710683]

Ruirui Hu, Rui Yao, Lei Li, Yueren Xu, Bingbing Lei, Guohao Tang, Haowei Liang, Yunjiao Lei, Cunyuan Li, Xiaoyue Li, Kaiping Liu, Limin Wang, Yunfeng Zhang, Yue Wang, Yuying Cui, Jihong Dai, Wei Ni, Ping Zhou, Baohua Yu, Shengwei Hu

Abstract

With the rapid development of high-throughput sequencing technology, the amount of metagenomic data (including both 16S and whole-genome sequencing data) in public repositories is increasing exponentially. However, owing to the large and decentralized nature of the data, it is still difficult for users to mine, compare, and analyze the data. The animal metagenome database (AnimalMetagenome DB) integrates metagenomic sequencing data with host information, making it easier for users to find data of interest. The AnimalMetagenome DB is designed to contain all public metagenomic data from animals, and the data are divided into domestic and wild animal categories. Users can browse, search, and download animal metagenomic data of interest based on different attributes of the metadata such as animal species, sample site, study purpose, and DNA extraction method. The AnimalMetagenome DB version 1.0 includes metadata for 82,097 metagenomes from 4 domestic animals (pigs, bovines, horses, and sheep) and 540 wild animals. These metagenomes cover 15 years of experiments, 73 countries, 1,044 studies, 63,214 amplicon sequencing data, and 10,672 whole genome sequencing data. All data in the database are hosted and available in figshare https://doi.org/10.6084/m9.figshare.19728619 .

Sci Data. 2022:9(1) | 6 Citations (from Europe PMC, 2024-04-20)

Tallo: A global tree allometry and crown architecture database. [PMID: 35703577]

Tommaso Jucker, Fabian Jörg Fischer, Jérôme Chave, David A Coomes, John Caspersen, Arshad Ali, Grace Jopaul Loubota Panzou, Ted R Feldpausch, Daniel Falster, Vladimir A Usoltsev, Stephen Adu-Bredu, Luciana F Alves, Mohammad Aminpour, Ilondea B Angoboy, Niels P R Anten, Cécile Antin, Yousef Askari, Rodrigo Muñoz, Narayanan Ayyappan, Patricia Balvanera, Lindsay Banin, Nicolas Barbier, John J Battles, Hans Beeckman, Yannick E Bocko, Ben Bond-Lamberty, Frans Bongers, Samuel Bowers, Thomas Brade, Michiel van Breugel, Arthur Chantrain, Rajeev Chaudhary, Jingyu Dai, Michele Dalponte, Kangbéni Dimobe, Jean-Christophe Domec, Jean-Louis Doucet, Remko A Duursma, Moisés Enríquez, Karin Y van Ewijk, William Farfán-Rios, Adeline Fayolle, Eric Forni, David I Forrester, Hammad Gilani, John L Godlee, Sylvie Gourlet-Fleury, Matthias Haeni, Jefferson S Hall, Jie-Kun He, Andreas Hemp, José L Hernández-Stefanoni, Steven I Higgins, Robert J Holdaway, Kiramat Hussain, Lindsay B Hutley, Tomoaki Ichie, Yoshiko Iida, Hai-Sheng Jiang, Puspa Raj Joshi, Hasan Kaboli, Maryam Kazempour Larsary, Tanaka Kenzo, Brian D Kloeppel, Takashi Kohyama, Suwash Kunwar, Shem Kuyah, Jakub Kvasnica, Siliang Lin, Emily R Lines, Hongyan Liu, Craig Lorimer, Jean-Joël Loumeto, Yadvinder Malhi, Peter L Marshall, Eskil Mattsson, Radim Matula, Jorge A Meave, Sylvanus Mensah, Xiangcheng Mi, Stéphane Momo, Glenn R Moncrieff, Francisco Mora, Sarath P Nissanka, Kevin L O'Hara, Steven Pearce, Raphaël Pelissier, Pablo L Peri, Pierre Ploton, Lourens Poorter, Mohsen Javanmiri Pour, Hassan Pourbabaei, Juan Manuel Dupuy-Rada, Sabina C Ribeiro, Casey Ryan, Anvar Sanaei, Jennifer Sanger, Michael Schlund, Giacomo Sellan, Alexander Shenkin, Bonaventure Sonké, Frank J Sterck, Martin Svátek, Kentaro Takagi, Anna T Trugman, Farman Ullah, Matthew A Vadeboncoeur, Ahmad Valipour, Mark C Vanderwel, Alejandra G Vovides, Weiwei Wang, Li-Qiu Wang, Christian Wirth, Murray Woods, Wenhua Xiang, Fabiano de Aquino Ximenes, Yaozhan Xu, Toshihiro Yamada, Miguel A Zavala

Abstract

Data capturing multiple axes of tree size and shape, such as a tree's stem diameter, height and crown size, underpin a wide range of ecological research-from developing and testing theory on forest structure and dynamics, to estimating forest carbon stocks and their uncertainties, and integrating remote sensing imagery into forest monitoring programmes. However, these data can be surprisingly hard to come by, particularly for certain regions of the world and for specific taxonomic groups, posing a real barrier to progress in these fields. To overcome this challenge, we developed the Tallo database, a collection of 498,838 georeferenced and taxonomically standardized records of individual trees for which stem diameter, height and/or crown radius have been measured. These data were collected at 61,856 globally distributed sites, spanning all major forested and non-forested biomes. The majority of trees in the database are identified to species (88%), and collectively Tallo includes data for 5163 species distributed across 1453 genera and 187 plant families. The database is publicly archived under a CC-BY 4.0 licence and can be access from: https://doi.org/10.5281/zenodo.6637599. To demonstrate its value, here we present three case studies that highlight how the Tallo database can be used to address a range of theoretical and applied questions in ecology-from testing the predictions of metabolic scaling theory, to exploring the limits of tree allometric plasticity along environmental gradients and modelling global variation in maximum attainable tree height. In doing so, we provide a key resource for field ecologists, remote sensing researchers and the modelling community working together to better understand the role that trees play in regulating the terrestrial carbon cycle.

Glob Chang Biol. 2022:28(17) | 3 Citations (from Europe PMC, 2024-04-20)

Rhythm of the Night (and Day): Predictive Metabolic Modeling of Diurnal Growth in . [PMID: 35695419]

Alex J Metcalf, Nanette R Boyle

Abstract

Economical production of photosynthetic organisms requires the use of natural day/night cycles. These induce strong circadian rhythms that lead to transient changes in the cells, requiring complex modeling to capture. In this study, we coupled times series transcriptomic data from the model green alga Chlamydomonas reinhardtii to a metabolic model of the same organism in order to develop the first transient metabolic model for diurnal growth of algae capable of predicting phenotype from genotype. We first transformed a set of discrete transcriptomic measurements (D. Strenkert, S. Schmollinger, S. D. Gallaher, P. A. Salomé, et al., Proc Natl Acad Sci U S A 116:2374-2383, 2019, https://doi.org/10.1073/pnas.1815238116) into continuous curves, producing a complete database of the cell's transcriptome that can be interrogated at any time point. We also decoupled the standard biomass formation equation to allow different components of biomass to be synthesized at different times of the day. The resulting model was able to predict qualitative phenotypical outcomes of a starchless mutant. We also extended this approach to simulate all single-knockout mutants and identified potential targets for rational engineering efforts to increase productivity. This model enables us to evaluate the impact of genetic and environmental changes on the growth, biomass composition, and intracellular fluxes for diurnal growth. We have developed the first transient metabolic model for diurnal growth of algae based on experimental data and capable of predicting phenotype from genotype. This model enables us to evaluate the impact of genetic and environmental changes on the growth, biomass composition and intracellular fluxes of the model green alga, Chlamydomonas reinhardtii. The availability of this model will enable faster and more efficient design of cells for production of fuels, chemicals, and pharmaceuticals.

mSystems. 2022:7(4) | 0 Citations (from Europe PMC, 2024-04-20)

A Chinese multi-modal neuroimaging data release for increasing diversity of human brain mapping. [PMID: 35680932]

Peng Gao, Hao-Ming Dong, Si-Man Liu, Xue-Ru Fan, Chao Jiang, Yin-Shan Wang, Daniel Margulies, Hai-Fang Li, Xi-Nian Zuo

Abstract

The big-data use is becoming a standard practice in the neuroimaging field through data-sharing initiatives. It is important for the community to realize that such open science effort must protect personal, especially facial information when raw neuroimaging data are shared. An ideal tool for the face anonymization should not disturb subsequent brain tissue extraction and further morphological measurements. Using the high-resolution head images from magnetic resonance imaging (MRI) of 215 healthy Chinese, we discovered and validated a template effect on the face anonymization. Improved facial anonymization was achieved when the Chinese head templates but not the Western templates were applied to obscure the faces of Chinese brain images. This finding has critical implications for international brain imaging data-sharing. To facilitate the further investigation of potential culture-related impacts on and increase diversity of data-sharing for the human brain mapping, we released the 215 Chinese multi-modal MRI data into a database for imaging Chinese young brains, namely'I See your Brains (ISYB)', to the public via the Science Data Bank ( https://doi.org/10.11922/sciencedb.00740 ).

Sci Data. 2022:9(1) | 1 Citations (from Europe PMC, 2024-04-20)

PeSTK db a comprehensive data repository of Probiotic Serine Threonine kinases. [PMID: 35676297]

Dhanashree Lokesh, Suresh Psn, Rajagopal Kammara

Abstract

The signal transduction pathway of prokaryotes involves a peptidoglycan synthesis cluster (PG) to sense external stimuli. One of the major components of the PG synthesis cluster is protein kinases (pknA - G). The sequence data of probiotic eSTKs (Eukaryotic like Serine, Threonine kinases) are obscure, scarce and essentially required to understand the role of probiotic microbes in combating infectious diseases. The most essential need to understand and develop certain therapeutic drugs against pathogens is the eSTK sequence data. Hence, we developed a comprehensive user-friendly data repository of probiotic eSTK's (PeSTK), which holds 830 STK sequences. Therefore, the data resource of PeSTK developed is unique, an open-access very summative containing various probiotic eSTK's in a single locality. The sequence datasets of the eSTK developed with easy-to-operate browsing as well as searching. Therefore, eSTK data resources should be useful for sequence-based studies and drug development. The sequence datasets are available at Figshare Digital Object Identifier/DOI of the sequences is https://doi.org/10.6084/m9.figshare.146606 .

Sci Data. 2022:9(1) | 2 Citations (from Europe PMC, 2024-04-20)

Creation and evaluation of full-text literature-derived, feature-weighted disease models of genetically determined developmental disorders. [PMID: 35670729]

T M Yates, A Lain, J Campbell, D R FitzPatrick, T I Simpson

Abstract

There are >2500 different genetically determined developmental disorders (DD), which, as a group, show very high levels of both locus and allelic heterogeneity. This has led to the wide-spread use of evidence-based filtering of genome-wide sequence data as a diagnostic tool in DD. Determining whether the association of a filtered variant at a specific locus is a plausible explanation of the phenotype in the proband is crucial and commonly requires extensive manual literature review by both clinical scientists and clinicians. Access to a database of weighted clinical features extracted from rigorously curated literature would increase the efficiency of this process and facilitate the development of robust phenotypic similarity metrics. However, given the large and rapidly increasing volume of published information, conventional biocuration approaches are becoming impractical. Here, we present a scalable, automated method for the extraction of categorical phenotypic descriptors from the full-text literature. Papers identified through literature review were downloaded and parsed using the Cadmus custom retrieval package. Human Phenotype Ontology terms were extracted using MetaMap, with 76-84% precision and 65-73% recall. Mean terms per paper increased from 9 in title + abstract, to 68 using full text. We demonstrate that these literature-derived disease models plausibly reflect true disease expressivity more accurately than widely used manually curated models, through comparison with prospectively gathered data from the Deciphering Developmental Disorders study. The area under the curve for receiver operating characteristic (ROC) curves increased by 5-10% through the use of literature-derived models. This work shows that scalable automated literature curation increases performance and adds weight to the need for this strategy to be integrated into informatic variant analysis pipelines. Database URL: https://doi.org/10.1093/database/baac038.

Database (Oxford). 2022:2022() | 0 Citations (from Europe PMC, 2024-04-20)

eldBETA: A Large Eldercare-oriented Benchmark Database of SSVEP-BCI for the Aging Population. [PMID: 35641547]

Bingchuan Liu, Yijun Wang, Xiaorong Gao, Xiaogang Chen

Abstract

Global population aging poses an unprecedented challenge and calls for a rising effort in eldercare and healthcare. Steady-state visual evoked potential based brain-computer interface (SSVEP-BCI) boasts its high transfer rate and shows great promise in real-world applications to support aging. Public database is critically important for designing the SSVEP-BCI systems. However, the SSVEP-BCI database tailored for the elder is scarce in existing studies. Therefore, in this study, we present a large eldercare-oriented BEnchmark database of SSVEP-BCI for The Aging population (eldBETA). The eldBETA database consisted of the 64-channel electroencephalogram (EEG) from 100 elder participants, each of whom performed seven blocks of 9-target SSVEP-BCI task. The quality and characteristics of the eldBETA database were validated by a series of analyses followed by a classification analysis of thirteen frequency recognition methods. We expect that the eldBETA database would provide a substrate for the design and optimization of the BCI systems intended for the elders. The eldBETA database is open-access for research and can be downloaded from the website https://doi.org/10.6084/m9.figshare.18032669 .

Sci Data. 2022:9(1) | 5 Citations (from Europe PMC, 2024-04-20)

MusMorph, a database of standardized mouse morphology data for morphometric meta-analyses. [PMID: 35614082]

Jay Devine, Marta Vidal-García, Wei Liu, Amanda Neves, Lucas D Lo Vercio, Rebecca M Green, Heather A Richbourg, Marta Marchini, Colton M Unger, Audrey C Nickle, Bethany Radford, Nathan M Young, Paula N Gonzalez, Robert E Schuler, Alejandro Bugacov, Campbell Rolian, Christopher J Percival, Trevor Williams, Lee Niswander, Anne L Calof, Arthur D Lander, Axel Visel, Frank R Jirik, James M Cheverud, Ophir D Klein, Ramon Y Birnbaum, Amy E Merrill, Rebecca R Ackermann, Daniel Graf, Myriam Hemberger, Wendy Dean, Nils D Forkert, Stephen A Murray, Henrik Westerberg, Ralph S Marcucio, Benedikt Hallgrímsson

Abstract

Complex morphological traits are the product of many genes with transient or lasting developmental effects that interact in anatomical context. Mouse models are a key resource for disentangling such effects, because they offer myriad tools for manipulating the genome in a controlled environment. Unfortunately, phenotypic data are often obtained using laboratory-specific protocols, resulting in self-contained datasets that are difficult to relate to one another for larger scale analyses. To enable meta-analyses of morphological variation, particularly in the craniofacial complex and brain, we created MusMorph, a database of standardized mouse morphology data spanning numerous genotypes and developmental stages, including E10.5, E11.5, E14.5, E15.5, E18.5, and adulthood. To standardize data collection, we implemented an atlas-based phenotyping pipeline that combines techniques from image registration, deep learning, and morphometrics. Alongside stage-specific atlases, we provide aligned micro-computed tomography images, dense anatomical landmarks, and segmentations (if available) for each specimen (N = 10,056). Our workflow is open-source to encourage transparency and reproducible data collection. The MusMorph data and scripts are available on FaceBase ( www.facebase.org , https://doi.org/10.25550/3-HXMC ) and GitHub ( https://github.com/jaydevine/MusMorph ).

Sci Data. 2022:9(1) | 1 Citations (from Europe PMC, 2024-04-20)

HORDB a comprehensive database of peptide hormones. [PMID: 35469024]

Ning Zhu, Fanyi Dong, Guobang Shi, Xingzhen Lao, Heng Zheng

Abstract

Peptide hormones (also known as hormone peptides and polypeptide hormones) are hormones composed of peptides and are signal transduction molecules produced by a class of multicellular organisms. It plays an important role in the physiological and behavioral regulation of animals and humans as well as in the growth of plants. In order to promote the research on peptide hormones, we constructed HORDB database. The database currently has a total of 6024 entries, including 5729 peptide hormones, 40 peptide drugs and 255 marketed pharmaceutical preparations information. Each entry provided comprehensive information related to the peptide, including general information, sequence, activity, structure, physical information and literature information. We also added information on IC, EC, ED, target, and whether or not the blood-brain barrier was crossed to the activity information note. In addition, HORDB integrates search and sequence analysis to facilitate user browsing and data analysis. We believe that the peptide hormones information collected by HORDB will promote the design and discovery of peptide hormones, All data are hosted and available in figshare https://doi.org/10.6084/m9.figshare.c.5522241 .

Sci Data. 2022:9(1) | 2 Citations (from Europe PMC, 2024-04-20)

: A curated database of molecular tastants. [PMID: 35415670]

Cristian Rojas, Davide Ballabio, Karen Pacheco Sarmiento, Elisa Pacheco Jaramillo, Mateo Mendoza, Fernando García

Abstract

The purpose of this work is the creation of a chemical database named that includes both organic and inorganic tastants. The creation, curation pipeline and the main features of the database are described in detail. The database includes 2944 verified and curated compounds divided into nine classes, which comprise the five basic tastes (sweet, bitter, umami sour and salty) along with four additional categories: tasteless, non-sweet, multitaste and miscellaneous. provides the following information for each tastant: name, PubChem CID, CAS registry number, canonical SMILES, class taste and references to the scientific sources from which data were retrieved. The molecular structure in the HyperChem () format of each chemical is also made available. In addition, molecular fingerprints were used for characterizing and analyzing the chemical space of tastants by means of unsupervised machine learning. constitutes a useful tool to the scientific community to expand the information of taste molecules and to assist studies for the taste prediction of unevaluated and as yet unsynthetized compounds, as well as the analysis of the relationships between molecular structure and taste. The database is freely accessible at https://doi.org/10.5281/zenodo.5747393.

Food Chem (Oxf). 2022:4() | 3 Citations (from Europe PMC, 2024-04-20)

TREND database: Retinal images of healthy young subjects visualized by a portable digital non-mydriatic fundus camera. [PMID: 34297749]

Natasa Popovic, Stela Vujosevic, Miroslav Radunović, Miodrag Radunović, Tomo Popovic

Abstract

Topological characterization of the Retinal microvascular nEtwork visualized by portable fuNDus camera (TREND) is a database comprising of 72 color digital retinal images collected from the students of the Faculty of Medicine at the University of Montenegro, in the period from February 18th to March 11th 2020. The database also includes binarized images of manually segmented microvascular networks associated with each raw image. The participant demographic characteristics, health status, and social habits information such as age, sex, body mass index, smoking history, alcohol use, as well as previous medical history was collected. As proof of the concept, a smaller set of 10 color digital fundus images from healthy older participants is also included. Comparison of the microvascular parameters of these two sets of images demonstrate that digital fundus images recorded with a hand-held portable camera are able to capture the changes in patterns of microvascular network associated with aging. The raw images from the TREND database provide a standard that defines normal retinal anatomy and microvascular network geometry in young healthy people in Montenegro as it is seen with the digital hand-held portable non-mydriatic MiiS HORUS Scope DEC 200.This knowledge could facilitate the application of this technology at the primary level of health care for large scale telematic screening for complications of chronic diseases, such as hypertensive and diabetic retinopathy. In addition, it could aid in the development of new methods for early detection of age-related changes in the retina, systemic chronic diseases, as well as eye-specific diseases. The associated manually segmented images of the microvascular networks provide the standard that can be used for development of automatic software for image quality assessment, segmentation of microvascular network, and for computer-aided detection of pathological changes in retina. The TREND database is freely available at https://doi.org/10.5281/zenodo.4521043.

PLoS One. 2021:16(7) | 1 Citations (from Europe PMC, 2024-04-20)

DOE JGI Metagenome Workflow. [PMID: 34006627]

Alicia Clum, Marcel Huntemann, Brian Bushnell, Brian Foster, Bryce Foster, Simon Roux, Patrick P Hajek, Neha Varghese, Supratim Mukherjee, T B K Reddy, Chris Daum, Yuko Yoshinaga, Ronan O'Malley, Rekha Seshadri, Nikos C Kyrpides, Emiley A Eloe-Fadrosh, I-Min A Chen, Alex Copeland, Natalia N Ivanova

Abstract

The DOE Joint Genome Institute (JGI) Metagenome Workflow performs metagenome data processing, including assembly; structural, functional, and taxonomic annotation; and binning of metagenomic data sets that are subsequently included into the Integrated Microbial Genomes and Microbiomes (IMG/M) (I.-M. A. Chen, K. Chu, K. Palaniappan, A. Ratner, et al., Nucleic Acids Res, 49:D751-D763, 2021, https://doi.org/10.1093/nar/gkaa939) comparative analysis system and provided for download via the JGI data portal (https://genome.jgi.doe.gov/portal/). This workflow scales to run on thousands of metagenome samples per year, which can vary by the complexity of microbial communities and sequencing depth. Here, we describe the different tools, databases, and parameters used at different steps of the workflow to help with the interpretation of metagenome data available in IMG and to enable researchers to apply this workflow to their own data. We use 20 publicly available sediment metagenomes to illustrate the computing requirements for the different steps and highlight the typical results of data processing. The workflow modules for read filtering and metagenome assembly are available as a workflow description language (WDL) file (https://code.jgi.doe.gov/BFoster/jgi_meta_wdl). The workflow modules for annotation and binning are provided as a service to the user community at https://img.jgi.doe.gov/submit and require filling out the project and associated metadata descriptions in the Genomes OnLine Database (GOLD) (S. Mukherjee, D. Stamatis, J. Bertsch, G. Ovchinnikova, et al., Nucleic Acids Res, 49:D723-D733, 2021, https://doi.org/10.1093/nar/gkaa983). The DOE JGI Metagenome Workflow is designed for processing metagenomic data sets starting from Illumina fastq files. It performs data preprocessing, error correction, assembly, structural and functional annotation, and binning. The results of processing are provided in several standard formats, such as fasta and gff, and can be used for subsequent integration into the Integrated Microbial Genomes and Microbiomes (IMG/M) system where they can be compared to a comprehensive set of publicly available metagenomes. As of 30 July 2020, 7,155 JGI metagenomes have been processed by the DOE JGI Metagenome Workflow. Here, we present a metagenome workflow developed at the JGI that generates rich data in standard formats and has been optimized for downstream analyses ranging from assessment of the functional and taxonomic composition of microbial communities to genome-resolved metagenomics and the identification and characterization of novel taxa. This workflow is currently being used to analyze thousands of metagenomic data sets in a consistent and standardized manner.

mSystems. 2021:6(3) | 43 Citations (from Europe PMC, 2024-04-20)

Unlocking the Entomological Collection of the Natural History Museum of Maputo, Mozambique. [PMID: 33935558]

Domingos Sandramo, Enrico Nicosia, Silvio Cianciullo, Bernardo Muatinte, Almeida Guissamulo

Abstract

Background: The collections of the Natural History Museum of Maputo have a crucial role in the safeguarding of Mozambique's biodiversity, representing an important repository of data and materials regarding the natural heritage of the country. In this paper, a dataset is described, based on the Museum's Entomological Collection recording 409 species belonging to seven orders and 48 families. Each specimen's available data, such as geographical coordinates and taxonomic information, have been digitised to build the dataset. The specimens included in the dataset were obtained between 1914-2018 by collectors and researchers from the Natural History Museum of Maputo (once known as "Museu Alváro de Castro") in all the country's provinces, with the exception of Cabo Delgado Province.
New information: This paper adds data to the Biodiversity Network of Mozambique and the Global Biodiversity Information Facility, within the objectives of the SECOSUD II Project and the Biodiversity Information for Development Programme. The aforementioned insect dataset is available on the GBIF Engine data portal (https://doi.org/10.15468/j8ikhb). Data were also shared on the Mozambican national portal of biodiversity data BioNoMo (https://bionomo.openscidata.org), developed by SECOSUD II Project.

Biodivers Data J. 2021:9() | 1 Citations (from Europe PMC, 2024-04-20)

Risk-Based Chemical Ranking and Generating a Prioritized Human Exposome Database. [PMID: 33929905]

Fanrong Zhao, Li Li, Yue Chen, Yichao Huang, Tharushi Prabha Keerthisinghe, Agnes Chow, Ting Dong, Shenglan Jia, Shipei Xing, Benedikt Warth, Tao Huan, Mingliang Fang

Abstract

BACKGROUND: Due to the ubiquitous use of chemicals in modern society, humans are increasingly exposed to thousands of chemicals that contribute to a major portion of the human exposome. Should a comprehensive and risk-based human exposome database be created, it would be conducive to the rapid progress of human exposomics research. In addition, once a xenobiotic is biotransformed with distinct half-lives upon exposure, monitoring the parent compounds alone may not reflect the actual human exposure. To address these questions, a comprehensive and risk-prioritized human exposome database is needed.
OBJECTIVES: Our objective was to set up a comprehensive risk-prioritized human exposome database including physicochemical properties as well as risk prediction and develop a graphical user interface (GUI) that has the ability to conduct searches for content associated with chemicals in our database.
METHODS: We built a comprehensive risk-prioritized human exposome database by text mining and database fusion. Subsequently, chemicals were prioritized by integrating exposure level obtained from the Systematic Empirical Evaluation of Models with toxicity data predicted by the Toxicity Estimation Software Tool and the Toxicological Priority Index calculated from the ToxCast database. The biotransformation half-lives () of all the chemicals were assessed using the Iterative Fragment Selection approach and biotransformation products were predicted using the previously developed BioTransformer machine-learning method.
RESULTS: We compiled a human exposome database of chemicals, prioritized 13,441 chemicals based on probabilistic hazard quotient and 7,770 chemicals based on risk index, and provided a predicted biotransformation metabolite database of metabolites. In addition, a user-interactive Java software (Oracle)-based search GUI was generated to enable open access to this new resource.
DISCUSSION: Our database can be used to guide chemical management and enhance scientific understanding to rapidly and effectively prioritize chemicals for comprehensive biomonitoring in epidemiological investigations. https://doi.org/10.1289/EHP7722.

Environ Health Perspect. 2021:129(4) | 9 Citations (from Europe PMC, 2024-04-20)

An annotation database for chemicals of emerging concern in exposome research. [PMID: 33773387]

Jeroen Meijer, Marja Lamoree, Timo Hamers, Jean-Philippe Antignac, Sébastien Hutinet, Laurent Debrauwer, Adrian Covaci, Carolin Huber, Martin Krauss, Douglas I Walker, Emma L Schymanski, Roel Vermeulen, Jelle Vlaanderen

Abstract

BACKGROUND: Chemicals of Emerging Concern (CECs) include a very wide group of chemicals that are suspected to be responsible for adverse effects on health, but for which very limited information is available. Chromatographic techniques coupled with high-resolution mass spectrometry (HRMS) can be used for non-targeted screening and detection of CECs, by using comprehensive annotation databases. Establishing a database focused on the annotation of CECs in human samples will provide new insight into the distribution and extent of exposures to a wide range of CECs in humans.
OBJECTIVES: This study describes an approach for the aggregation and curation of an annotation database (CECscreen) for the identification of CECs in human biological samples.
METHODS: The approach consists of three main parts. First, CECs compound lists from various sources were aggregated and duplications and inorganic compounds were removed. Subsequently, the list was curated by standardization of structures to create "MS-ready" and "QSAR-ready" SMILES, as well as calculation of exact masses (monoisotopic and adducts) and molecular formulas. The second step included the simulation of Phase I metabolites. The third and final step included the calculation of QSAR predictions related to physicochemical properties, environmental fate, toxicity and Absorption, Distribution, Metabolism, Excretion (ADME) processes and the retrieval of information from the US EPA CompTox Chemicals Dashboard.
RESULTS: All CECscreen database and property files are publicly available (DOI: https://doi.org/10.5281/zenodo.3956586). In total, 145,284 entries were aggregated from various CECs data sources. After elimination of duplicates and curation, the pipeline produced 70,397 unique "MS-ready" structures and 66,071 unique QSAR-ready structures, corresponding with 69,526 CAS numbers. Simulation of Phase I metabolites resulted in 306,279 unique metabolites. QSAR predictions could be performed for 64,684 of the QSAR-ready structures, whereas information was retrieved from the CompTox Chemicals Dashboard for 59,739 CAS numbers out of 69,526 inquiries. CECscreen is incorporated in the in silico fragmentation approach MetFrag.
DISCUSSION: The CECscreen database can be used to prioritize annotation of CECs measured in non-targeted HRMS, facilitating the large-scale detection of CECs in human samples for exposome research. Large-scale detection of CECs can be further improved by integrating the present database with resources that contain CECs (metabolites) and meta-data measurements, further expansion towards in silico and experimental (e.g., MassBank) generation of MS/MS spectra, and development of bioinformatics approaches capable of using correlation patterns in the measured chemical features.

Environ Int. 2021:152() | 14 Citations (from Europe PMC, 2024-04-20)

Tropical cyclone simulations over Bangladesh at convection permitting 4.4 km & 1.5 km resolution. [PMID: 33594085]

Hamish Steptoe, Nicholas Henry Savage, Saeed Sadri, Kate Salmon, Zubair Maalick, Stuart Webster

Abstract

High resolution simulations at 4.4 km and 1.5 km resolution have been performed for 12 historical tropical cyclones impacting Bangladesh. We use the European Centre for Medium-Range Weather Forecasting 5 generation Re-Analysis (ERA5) to provide a 9-member ensemble of initial and boundary conditions for the regional configuration of the Met Office Unified Model. The simulations are compared to the original ERA5 data and the International Best Track Archive for Climate Stewardship (IBTrACS) tropical cyclone database for wind speed, gust speed and mean sea-level pressure. The 4.4 km simulations show a typical increase in peak gust speed of 41 to 118 knots relative to ERA5, and a deepening of minimum mean sea-level pressure of up to -27 hPa, relative to ERA5 and IBTrACS data. The downscaled simulations compare more favourably with IBTrACS data than the ERA5 data suggesting tropical cyclone hazards in the ERA5 deterministic output may be underestimated. The dataset is freely available from https://doi.org/10.5281/zenodo.3600201 .

Sci Data. 2021:8(1) | 0 Citations (from Europe PMC, 2024-04-20)

ACDC, a global database of amphibian cytochrome-b sequences using reproducible curation for GenBank records. [PMID: 32792559]

Matthijs P van den Burg, Salvador Herrando-Pérez, David R Vieites

Abstract

Genetic data are a crucial and exponentially growing resource across all biological sciences, yet curated databases are scarce. The widespread occurrence of sequence and (meta)data errors in public repositories calls for comprehensive improvements of curation protocols leading to robust research and downstream analyses. We collated and curated all available GenBank cytochrome-b sequences for amphibians, a benchmark marker in this globally declining vertebrate clade. The Amphibia's Curated Database of Cytochrome-b (ACDC) consists of 36,514 sequences representing 2,309 species from 398 genera (median = 2 with 50% interquartile ranges of 1-7 species/genus). We updated the taxonomic identity of >4,800 sequences (ca. 13%) and found 2,359 (6%) conflicting sequences with 84% of the errors originating from taxonomic misidentifications. The database (accessible at https://doi.org/10.6084/m9.figshare.9944759 ) also includes an R script to replicate our study for other loci and taxonomic groups. We provide recommendations to improve genetic-data quality in public repositories and flag species for which there is a need for taxonomic refinement in the face of increased rate of amphibian extinctions in the Anthropocene.

Sci Data. 2020:7(1) | 2 Citations (from Europe PMC, 2024-04-20)

An online global database of Hemiptera-Phytoplasma-Plant biological interactions. [PMID: 30846902]

Valeria Trivellone

Abstract

Background: Phytoplasmas are phloem-limited plant pathogenic bacteria in the class Mollicutes transmitted by sap-feeding insect vectors of the Order Hemiptera. Vectors still have not yet been identified for about half of the 33 known phytoplasma groups and this has greatly hindered efforts to control the spread of diseases affecting important crops. Extensive gaps of knowledge on actual phytoplasma vectors and on the plant disease epidemiology prevent our understanding of the basic underlying biological mechanisms that facilitate interactions between insects, phytoplasmas and their host plants.
New information: This paper presents a complete online database of Hemiptera-Phytoplasma-Plant (HPP) biological interactions worldwide, searchable via an online interface. The raw data are available through Zenodo at https://doi.org/10.5281/zenodo.2532738. The online database search interface was created using the 3I software (Dmitriev 2006) which enhances data usability by providing a customised web interface (http://trivellone.speciesfile.org/) that provides an overview of the recorded biological interactions and ability to discover particular interactions by searching for one or more phytoplasma, insect or plant taxa. The database will facilitate synthesis of all available and relevant data on the observed associations between phytoplasmas and their insect and plant hosts and will provide useful data to generate and test ecological and evolutionary hypotheses.

Biodivers Data J. 2019:(7) | 10 Citations (from Europe PMC, 2024-04-20)

Temporary dense seismic network during the 2016 Central Italy seismic emergency for microzonation studies. [PMID: 31554814]

Fabrizio Cara, Giovanna Cultrera, Gaetano Riccio, Sara Amoroso, Paola Bordoni, Augusto Bucci, Ezio D'Alema, Maria D'Amico, Luciana Cantore, Simona Carannante, Rocco Cogliano, Giuseppe Di Giulio, Deborah Di Naccio, Daniela Famiani, Chiara Felicetta, Antonio Fodarella, Gianlorenzo Franceschina, Giovanni Lanzano, Sara Lovati, Lucia Luzi, Claudia Mascandola, Marco Massa, Alessia Mercuri, Giuliano Milana, Francesca Pacor, Davide Piccarreda, Marta Pischiutta, Stefania Pucillo, Rodolfo Puglia, Maurizio Vassallo, Graziano Boniolo, Grazia Caielli, Adelmo Corsi, Roberto de Franco, Alberto Tento, Giovanni Bongiovanni, Salomon Hailemikael, Guido Martini, Antonella Paciello, Alessandro Peloso, Fabrizio Poggi, Vladimiro Verrubbi, Maria Rosaria Gallipoli, Tony Alfredo Stabile, Marco Mancini

Abstract

In August 2016, a magnitude 6.0 earthquake struck Central Italy, starting a devastating seismic sequence, aggravated by other two events of magnitude 5.9 and 6.5, respectively. After the first mainshock, four Italian institutions installed a dense temporary network of 50 seismic stations in an area of 260 km. The network was registered in the International Federation of Digital Seismograph Networks with the code 3A and quoted with a Digital Object Identifier ( https://doi.org/10.13127/SD/ku7Xm12Yy9 ). Raw data were converted into the standard binary miniSEED format, and organized in a structured archive. Then, data quality and completeness were checked, and all the relevant information was used for creating the metadata volumes. Finally, the 99 Gb of continuous seismic data and metadata were uploaded into the INGV node of the European Integrated Data Archive repository. Their use was regulated by a Memorandum of Understanding between the institutions. After an embargo period, the data are now available for many different seismological studies.

Sci Data. 2019:6(1) | 0 Citations (from Europe PMC, 2024-04-20)

OakEcol: A database of Oak-associated biodiversity within the UK. [PMID: 31304213]

R J Mitchell, P E Bellamy, C J Ellis, R L Hewison, N G Hodgetts, G R Iason, N A Littlewood, S Newey, J A Stockan, A F S Taylor

Abstract

Globally there is increasing concern about the decline in the health of oak trees. The impact of a decline in oak trees on associated biodiversity, species that utilize oak trees, is unknown. Here we collate a database of all known birds, bryophytes, fungi, invertebrates, lichens and mammals that use oak ( and ) in the UK. In total 2300 species are listed in the database. For each species we provide a level of association with oak, ranging from obligate (only found on oak) to cosmopolitan (found on a wide range of other tree species). Data on the ecology of each oak associated species was collated: part of tree used, use made of tree (feeding, roosting, breeding), age of tree, woodland type, tree form (coppice, pollarded, or natural growth form) and season when the tree was used. Data on use or otherwise by each of the 2300 species of 30 other tree species was also collated. A complete list of data sources is provided. For further insights into how this data can be used see Collapsing foundations: The ecology of the British oak, implications of its decline and mitigation options [1]. Data can be found at EIDC https://doi.org/10.5285/22b3d41e-7c35-4c51-9e55-0f47bb845202.

Data Brief. 2019:25() | 1 Citations (from Europe PMC, 2024-04-20)

The Generation of a Comprehensive Spectral Library for the Analysis of the Guinea Pig Proteome by SWATH-MS. [PMID: 31301205]

Pawel Palmowski, Rachael Watson, G Nicholas Europe-Finner, Magdalena Karolczak-Bayatti, Andrew Porter, Achim Treumann, Michael J Taggart

Abstract

Advances in liquid chromatography-mass spectrometry have facilitated the incorporation of proteomic studies to many biology experimental workflows. Data-independent acquisition platforms, such as sequential window acquisition of all theoretical mass spectra (SWATH-MS), offer several advantages for label-free quantitative assessment of complex proteomes over data-dependent acquisition (DDA) approaches. However, SWATH data interpretation requires spectral libraries as a detailed reference resource. The guinea pig (Cavia porcellus) is an excellent experimental model for translation to many aspects of human physiology and disease, yet there is limited experimental information regarding its proteome. To overcome this knowledge gap, a comprehensive spectral library of the guinea pig proteome is generated. Homogenates and tryptic digests are prepared from 16 tissues and subjected to >200 DDA runs. Analysis of >250 000 peptide-spectrum matches resulted in a library of 73 594 peptides from 7666 proteins. Library validation is provided by i) analyzing externally derived SWATH files (https://doi.org/10.1016/j.jprot.2018.03.023) and comparing peptide intensity quantifications; ii) merging of externally derived data to the base library. This furnishes the research community with a comprehensive proteomic resource that will facilitate future molecular-phenotypic studies using (re-engaging) the guinea pig as an experimental model of relevance to human biology. The spectral library and raw data are freely accessible in the MassIVE repository (MSV000083199).

Proteomics. 2019:19(15) | 9 Citations (from Europe PMC, 2024-04-20)

Linguistic Materials and Metrics for the Creation of Well-Controlled Swedish Speech Perception Tests. [PMID: 31265791]

Erik Witte, Susanne Köbler

Abstract

Purpose As factors influencing human word perception are important in the construction of speech perception tests used within the speech and hearing sciences, the purposes of this study were as follows: first, to develop algorithms that can be used to calculate different types of word metrics that influence the speed and accuracy of word perception and, second, to create a database in which those word metrics were calculated for a large set of Swedish words. Method Based on a revision of a large Swedish phonetic dictionary, data and algorithms were developed by which various frequency metrics, word length metrics, semantic metrics, neighborhood metrics, phonotactic metrics, and orthographic transparency metrics were calculated for each word in the dictionary. Of the various word metric algorithms used, some were Swedish language reimplementations of previously published algorithms, and some were developed in this study. Results The results of this study have been gathered in a Swedish word metric database called the AFC-list. The AFC-list consists of 816,404 phonetically transcribed Swedish words, all supplied with the word metric data calculated. The full AFC-list has been made publicly available under the Creative Commons Attribution 4.0 International license. Conclusion The results of this study constitute an extensive linguistic resource for the process of selecting test items in new well-controlled speech perception tests in the Swedish language. Supplemental Material https://doi.org/10.23641/asha.8330009.

J. Speech Lang. Hear. Res.. 2019:62(7) | 0 Citations (from Europe PMC, 2024-04-20)

CFTI5Med, the new release of the catalogue of strong earthquakes in Italy and in the Mediterranean area. [PMID: 31160582]

Emanuela Guidoboni, Graziano Ferrari, Gabriele Tarabusi, Giulia Sgattoni, Alberto Comastri, Dante Mariotti, Cecilia Ciuccarelli, Maria Giovanna Bianchi, Gianluca Valensise

Abstract

A key element for assessing seismic hazard and risk is the availability of a comprehensive dataset on past earthquakes. Here we present the rationale, structure and contents of CFTI5Med ( https://doi.org/10.6092/ingv.it-cfti5 ), the 2018 version of the Catalogue of Strong Earthquakes in Italy: a large multidisciplinary effort including historians, seismologists and geologists. It was conceived in 1989, following the inception of GIS technology, and first published in 1995 to offer a full account of Italy's strongest earthquakes, of their territorial impact and associated social and economic upheaval. Subsequent versions (1997, 2000, 2007) entailed a fine tuning of research methodologies, included additional research on Italian earthquakes, and were extended to large earthquakes of the Mediterranean area. CFTI5Med comprised an opportunity to streamline the structure of the Catalogue database and propose a renovated user interface. The new front-end (1) grants an easier, intuitive access to the data, including earthquake effects on the environment, and (2) allows all data to be displayed jointly with relevant topographic, geological and seismological overlays published as web services.

Sci Data. 2019:6(1) | 4 Citations (from Europe PMC, 2024-04-20)

The Ryerson Audio-Visual Database of Emotional Speech and Song (RAVDESS): A dynamic, multimodal set of facial and vocal expressions in North American English. [PMID: 29768426]

Steven R Livingstone, Frank A Russo

Abstract

The RAVDESS is a validated multimodal database of emotional speech and song. The database is gender balanced consisting of 24 professional actors, vocalizing lexically-matched statements in a neutral North American accent. Speech includes calm, happy, sad, angry, fearful, surprise, and disgust expressions, and song contains calm, happy, sad, angry, and fearful emotions. Each expression is produced at two levels of emotional intensity, with an additional neutral expression. All conditions are available in face-and-voice, face-only, and voice-only formats. The set of 7356 recordings were each rated 10 times on emotional validity, intensity, and genuineness. Ratings were provided by 247 individuals who were characteristic of untrained research participants from North America. A further set of 72 participants provided test-retest data. High levels of emotional validity and test-retest intrarater reliability were reported. Corrected accuracy and composite "goodness" measures are presented to assist researchers in the selection of stimuli. All recordings are made freely available under a Creative Commons license and can be downloaded at https://doi.org/10.5281/zenodo.1188976.

PLoS ONE. 2018:13(5) | 74 Citations (from Europe PMC, 2024-04-20)

ForC: a global database of forest carbon stocks and fluxes. [PMID: 29603730]

Kristina J Anderson-Teixeira, Maria M H Wang, Jennifer C McGarvey, Valentine Herrmann, Alan J Tepley, Ben Bond-Lamberty, David S LeBauer

Abstract

Forests play an influential role in the global carbon (C) cycle, storing roughly half of terrestrial C and annually exchanging with the atmosphere more than five times the carbon dioxide (CO ) emitted by anthropogenic activities. Yet, scaling up from field-based measurements of forest C stocks and fluxes to understand global scale C cycling and its climate sensitivity remains an important challenge. Tens of thousands of forest C measurements have been made, but these data have yet to be integrated into a single database that makes them accessible for integrated analyses. Here we present an open-access global Forest Carbon database (ForC) containing previously published records of field-based measurements of ecosystem-level C stocks and annual fluxes, along with disturbance history and methodological information. ForC expands upon the previously published tropical portion of this database, TropForC (https://doi.org/10.5061/dryad.t516f), now including 17,367 records (previously 3,568) representing 2,731 plots (previously 845) in 826 geographically distinct areas. The database covers all forested biogeographic and climate zones, represents forest stands of all ages, and currently includes data collected between 1934 and 2015. We expect that ForC will prove useful for macroecological analyses of forest C cycling, for evaluation of model predictions or remote sensing products, for quantifying the contribution of forests to the global C cycle, and for supporting international efforts to inventory forest carbon and greenhouse gas exchange. A dynamic version of ForC is maintained at on GitHub (https://GitHub.com/forc-db), and we encourage the research community to collaborate in updating, correcting, expanding, and utilizing this database. ForC is an open access database, and we encourage use of the data for scientific research and education purposes. Data may not be used for commercial purposes without written permission of the database PI. Any publications using ForC data should cite this publication and Anderson-Teixeira et al. (2016a) (see Metadata S1). No other copyright or cost restrictions are associated with the use of this data set.

Ecology. 2018:99(6) | 11 Citations (from Europe PMC, 2024-04-20)

Designing An Individualized EHR Learning Plan For Providers. [PMID: 30027541]

Lindsay A Stevens, Yumi T DiAngi, Jonathan D Schremp, Monet J Martorana, Roberta E Miller, Tzielan C Lee, Natalie M Pageler

Abstract

Electronic Health Records (EHRs) have been quickly implemented for meaningful use incentives; however these implementations have been associated with provider dissatisfaction and burnout. There are no previously reported instances of a comprehensive EHR educational program designed to engage providers and assist in improving efficiency and understanding of the EHR. Utilizing adult learning theory as a framework, Stanford Children's Health designed a tailored provider efficiency program with various inputs from: (1) provider specific EHR data; (2) provider survey data; and (3) structured observation sessions. This case report outlines the design of this individualized training program including team structure, resource requirements, and early provider response.
CITATION: Stevens LA, DiAngi YT, Schremp JD, Martorana MJ, Miller RE, Lee TC, Pageler NM. Designing An Individualized EHR Learning Plan. Appl Clin Inform 2017; 8:924-935 https://doi.org/10.4338/040054.

Appl Clin Inform. 2017:8(3) | 20 Citations (from Europe PMC, 2024-04-20)

URL:	https://doi.org/10
Full name:	Database of Hemiptera-Phytoplasma-Plant Biological Interactions
Description:	Database on associations between Hemiptera, phytoplasmas and plants. The database will facilitate synthesis of all available and relevant data on the observed associations between phytoplasmas and their insect and plant hosts. Also provide useful data to generate and test ecological and evolutionary hypotheses.
Year founded:	2017
Last update:
Version:
Accessibility:	Manual: Unaccessible Real time : Checking...
Country/Region:	United States

Data type:	Other
Data object:	Animal Bacteria Plant
Database category:	Interaction Literature
Major species:	Mgenia fuscovaria
Keywords:	plant disease epidemiology insect phytoplasma plant

University/Institution:	University of Illinois at Urbana-Champaign
Address:	Illinois Natural History Survey Prairie Research Institute, Universitty of Illinois at Urbana-Champaign, United States of America
City:
Province/State:
Country/Region:	United States
Contact name (PI/Team):	Valeria Trivellone
Contact email (PI/Helpdesk):	valeria.trivellone@gmail.com

Database Commons
a catalog of worldwide biological databases

a catalog of worldwide biological databases

Database Profile

General information

Classification & Tag

Contact information

Publications

Ranking

Community reviews

Word cloud

Tags

Related Databases

Record metadata

Database Commons a catalog of worldwide biological databases

a catalog of worldwide biological databases

Database Profile

DBHPP

General information

Classification & Tag

Contact information

Publications

Ranking

Community reviews

Word cloud

Tags

Related Databases

Record metadata

Database Commons
a catalog of worldwide biological databases