Bibliographic dataset characterizing studies that use online biodiversity databases
- Ball-Damerow, Joan E. 1
- Brenskelle, Laura 2
- Barve, Narayani 2
- LaFrance, Raphael 2
- Soltis, Pamela S. 2
- Sierwald, Petra 1
- Bieler, Rüdiger 1
- Ariño, Arturo 3
- Guralnick, Robert 1
- 1 Field Museum of Natural History
- 2 Florida Museum of Natural History, University of Florida, Gainesville
- 3 Department of Environmental Biology, Universidad de Navarra
Éditeur: Zenodo
Année de publication: 2019
Type: Dataset
Résumé
This dataset includes bibliographic information for 501 papers that were published from 2010-April 2017 (time of search) and use online biodiversity databases for research purposes. Our overarching goal in this study is to determine how research uses of biodiversity data developed during a time of unprecedented growth of online data resources. We also determine uses with the highest number of citations, how online occurrence data are linked to other data types, and if/how data quality is addressed. Specifically, we address the following questions: 1.) What primary biodiversity databases have been cited in published research, and which databases have been cited most often? 2.) Is the biodiversity research community citing databases appropriately, and are the cited databases currently accessible online? 3.) What are the most common uses, general taxa addressed, and data linkages, and how have they changed over time? 4.) What uses have the highest impact, as measured through the mean number of citations per year? 5.) Are certain uses applied more often for plants/invertebrates/vertebrates? 6.) Are links to specific data types associated more often with particular uses? 7.) How often are major data quality issues addressed? 8.) What data quality issues tend to be addressed for the top uses? Relevant papers for this analysis include those that use online and openly accessible primary occurrence records, or those that add data to an online database. Google Scholar (GS) provides full-text indexing, which was important to identify data sources that often appear buried in the methods section of a paper. Our search was therefore restricted to GS. All authors discussed and agreed upon representative search terms, which were relatively broad to capture a variety of databases hosting primary occurrence records. The terms included: “species occurrence” database (8,800 results), “natural history collection” database (634 results), herbarium database (16,500 results), “biodiversity database” (3,350 results), “primary biodiversity data” database (483 results), “museum collection” database (4,480 results), “digital accessible information” database (10 results), and “digital accessible knowledge” database (52 results)--note that quotations are used as part of the search terms where specific phrases are needed in whole. We downloaded all records returned by each search (or the first 500 if there were more) into a Zotero reference management database. About one third of the 2500 papers in the final dataset were relevant. Three of the authors with specialized knowledge of the field characterized relevant papers using a standardized tagging protocol based on a series of key topics of interest. We developed a list of potential tags and descriptions for each topic, including: database(s) used, database accessibility, scale of study, region of study, taxa addressed, research use of data, other data types linked to species occurrence data, data quality issues addressed, authors, institutions, and funding sources. Each tagged paper was thoroughly checked by a second tagger. The final dataset of tagged papers allow us to quantify general areas of research made possible by the expansion of online species occurrence databases, and trends over time. Analyses of this data will be published in a separate quantitative review.