On the Long Tails of Specimen Data

Ariño, Arturo H.

doi:10.3897/BISS.7.112151

On the Long Tails of Specimen Data

Ariño, Arturo H. ¹²

1 Universidad de Navarra

Universidad de Navarra

Pamplona, España

ROR https://ror.org/02rxc7m23
2 Biodiversity Data Analytics and Environmental Quality Group BEQ, Pamplona, Spain

Revista:

Biodiversity Information Science and Standards

ISSN: 2535-0897

Año de publicación: 2023

Volumen: 7

Páginas: e112151

Tipo: Artículo

DOI: 10.3897/BISS.7.112151 GOOGLE SCHOLAR Acceso abierto editor

Otras publicaciones en: Biodiversity Information Science and Standards

Resumen

A recent article by K.R. Johnson and I.F.P. Owens in Science (Johnson and Owens 2023) suggested that the 73 main natural history museums around the world collectively hold over 1 billion records of accessioned "specimens" (taken as collection units), a result remarkably close to, but obtained through a completely different method from, research published a decade earlier by A.H. Ariño in Biodiversity Informatics (Ariño 2010). Both sets of approaches have benefitted from information available at the Global Biodiversity Information Facility (GBIF), which in the intervening years has grown by an order of magnitude, although mostly through observation-based occurrences rather than through accretion of specimen records in collections. When comparing the estimated size of collections and the amount of digital data from those collections, there is still a huge gap, as there was then. Digitization efforts have been progressing, but they are still far from reaching the goal of bringing information about all specimens into the digital domain. While the larger institutions may doubtlessly have greater overall resources to try and make their data available than smaller institutions, how do they compare in terms of data mobilization and sharing? Not surprisingly, the distribution of the collection sizes shows a long tail of small institutions that, nonetheless, are also embarking on digitization efforts. Will this long tail of science actually manage to have all their biodiversity data available sooner than the larger institutions? It is becoming more widely recognized that data usability is predicated on data becoming findable, accessible, interoperable and reusable (FAIR, Wilkinson et al. 2016). What could be the consequences of having a data availability bias towards having many tiny collections available for ready use, rather than a much smaller (although surely very significant) fraction of larger collections of a comparable type? This presentation explores and compares the distribution of potential versus readily available data in 2010 and in 2023, examines what trends might exist in the race to universal specimen data availability, and whether the digitization efforts might be better targeted to achieve greater overall scientific benefit.

Referencias bibliográficas

10.17161/bi.v7i2.3991
10.1126/science.adf6434
10.1038/sdata.2016.18

On the Long Tails of Specimen Data

Universidad de Navarra

Resumen

Referencias bibliográficas