Flinders University
15 files

Dataset for a globally synthesised and flagged bee occurrence dataset and cleaning workflow

Version 7 2024-06-17, 10:57
Version 6 2024-06-17, 07:31
Version 5 2024-06-17, 05:57
Version 4 2024-03-05, 03:17
Version 3 2024-02-15, 04:42
Version 2 2023-11-29, 14:29
Version 1 2023-10-18, 05:32
posted on 2024-06-17, 10:57 authored by James DoreyJames Dorey, Erica E. Fischer, Paige R. Chesshire, Angela Nava-BolañosAngela Nava-Bolaños, Robert L O'Reilly, Silas BossertSilas Bossert, Shannon M. Collins, Elinor M. Lichtenberg, Tucker, Erika M., Allan Smith-Pardo, Armando Falcón-Brindis, Diego A. Guevara, Bruno RibeiroBruno Ribeiro, Diego de Pedro, Keng-Lou James Hung, Katherine A. Parys, Lindsie M. McCabe, Matthew S. Rogan, Robert L. Minckley, Santiago José Elías VelazcoSantiago José Elías Velazco, Terry Griswold, Tracy A. Zarrillo, Walter Jetz, Yanina V. Sica, Michael Christopher Orr., Laura Melissa Guzman, John S. Ascher, Alice HughesAlice Hughes, Neil S. Cobb

Species occurrence data are foundational for research, conservation, and science communication, but the limited availability and accessibility of reliable data represents a major obstacle, particularly for insects, which face mounting pressures. We present BeeBDC, a new R package, and a global bee occurrence dataset to address this issue. We combined >18.3 million bee occurrence records from multiple public repositories (GBIF, SCAN, iDigBio, USGS, ALA) and smaller datasets, then standardised, flagged, deduplicated, and cleaned the data using the reproducible BeeBDCR-workflow. Specifically, we harmonised species names (following established global taxonomy), country names, and collection dates and we added record-level flags for a series of potential quality issues. These data are provided in two formats, “cleaned” and “flagged-but-uncleaned”. The BeeBDC package with online documentation provides end users the ability to modify filtering parameters to address their research questions. By publishing reproducible R workflows and globally cleaned datasets, we can increase the accessibility and reliability of downstream analyses. This workflow can be implemented for other taxa to support research and conservation.


Biodiversity Outreach Network

Collaborative Research: Digitization TCN: iDigBees Network, Towards Complete Digitization of US Bee Collections to Promote Ecological and Evolutionary Research in a Keystone Clade

Directorate for Biological Sciences

Find out more...


Primary contact


Access Rights

Users must cite the associated primary data, our Scientific Data publication, and/or the BeeBDC R package where relevant. Paper (bee data, workflow, and code): Dorey, J. B., Fischer, E. E., Chesshire, P. R., Nava-Bolaños, A., O’Reilly, R. L., Bossert, S., . . . Cobb, N. S. (2023). A globally synthesised and flagged bee occurrence dataset and cleaning workflow. Scientific Data. Package (workflow and code): Dorey, J. B., O’Reilly, R. L., Bossert, S., & Fischer, E. E. (2023). BeeBDC: an occurrence data cleaning package (Version R package version 1.0.1). Retrieved from https://jbdorey.github.io/BeeBDC/index.html Bee taxonomy and checklist: Ascher, J. S., & Pickering, J. (2020). Discover Life bee species guide and world checklist (Hymenoptera: Apoidea: Anthophila). http://www.discoverlife.org/mp/20q?guide=Apoidea_species. Data sources: See paper, R package vignette, and/or the data provided here.