Gaps, lags and other shortfalls besetting the digital accessible knowledge of biodiversity

  1. Escribano Compains, Nora
Dirigida por:
  1. Arturo Hugo Ariño Plana Director
  2. David Galicia Paredes Codirector

Universidad de defensa: Universidad de Navarra

Fecha de defensa: 04 de diciembre de 2018

Tribunal:
  1. Jesús Miguel Santamaría Ulecia Presidente
  2. Ricardo Ibáñez Gastón Secretario
  3. Schigel Dmitry Sergèyevich Vocal
  4. Isabel Rey Fraile Vocal
  5. Francisco Pando de la Hoz Vocal
Departamento:
  1. (FC) Biología Ambiental

Tipo: Tesis

Teseo: 148175 DIALNET

Resumen

The rising awareness of the biodiversity crisis pressed urgency on the scientific community, and a better knowledge of biodiversity was stressed out as one of the cornerstones to counteract such crisis. Building on the development of the Biodiversity Informatics (BI) field, the Global Biodiversity Information Facility (GBIF) took up the task to enable access to biodiversity data via the Internet. GBIF has now become the most extensive biological information exchange infrastructure in the world. GBIF provides access to Primary Biodiversity Records (PBR). As the number of accessible PBR increased, their quality and fitness-for-use for studies they were not collected for were questioned. Much research has focused on taxonomic and spatial biases of this data, but the potential biases that can spring from the aging of the PBR had never been comparably addressed. On the other hand, any increase in data quality or enhancement in usability depends heavily on the capacity, will, time, and resources allocation put forth by data publishers. An appropriate and well-established framework to acknowledge such an effort is yet to be universally developed and accepted. This dissertation is framed in that context, and its general aim is to contribute to the better understanding of temporal issues on the fitness-for-use of the data accessible online, as well as exploring the citation practices among data users contributing to the development of a data citation framework that can incentivize quality data publication and a better characterization of its usability. In Chapter I, I provide a brief insight into the history of GBIF and the different key elements that have shaped the BI landscape. Chapter II contains the specific objectives of this dissertation. The Chapter III documents the working process, and the data quality controls necessary to publish Natural History Collections in the GBIF. In chapter IV I assess the spatial and temporal completeness of the Digital Accessible Knowledge (DAK) of wild terrestrial mammals in the Iberian Peninsula. This analysis reveals that the level of DAK is low, as well as spatially and temporally biased. Moreover, its usefulness is compromised by quality issues, mainly driven by lack of collection dates. Chapters V and VI properly focus on the temporal dimension of PBR and its implications on the fitness-for-use of the data. Firstly, I use small mammals occurrences in Navarra (Chapter III.A) to explore how changes in land uses compromises the usefulness of the information that PBR give as the time gap between data collection and data use increases (Chapter V). By overlapping maps of land use changes in this territory and the occurrences of small mammals, 75% of records are flagged as obsolete by being compromised by land uses changes throughout Navarra. Then, I use the data from Chapter III.A to build small mammals distribution models under two scenarios of time matching (Chapter VI). Synchronous models take into account temporal resolution by relating environmental predictors averaged for the temporal range of the data. On the other hand, lagged models relate past occurrences to predictors averaged for a recent period. Models perform equally well or bad regardless of temporally matching occurrences and predictors, highlighting the necessity to find better evaluation metrics. Also, variable importance and response curves change between both types of models for the same species. Next, I explore the different citation practices among GBIF-mediated data users in Chapter VII. Results show that the mainstream citation practice is an inline reference to the data repository, potentially resulting in lack of incentives for the publication of quality data. Finally, I discuss the results of this dissertation in Chapter VIII and outline the main conclusions in Chapter IX.