S10 - Digital Collection Data: storage, archive and disaster recovery

Session Type: Symposium
Full Title: S10 - Digital Collection Data: storage, archive and disaster recovery
Short Title: Collection Data Security
Organizer(s): Jeff Gerbracht, Cornell University - Lab of Ornithology
Contributers Rob Guralnick; Steve Kelling


Unsolicited contributions considered? Yes

Abstract

There is a wide variety of institutions, projects and universities collecting and storing specimens and data related to biodiversity. Museum collections house millions of physical specimens, each specimen with its own set of digital data managed in Collection Management Systems like Specify and EMu. Multimedia libraries such as Macaulay Library, the British Library and the BBC house photos, video and audio recordings, which are now almost exclusively digital in nature and these organizations must manage both the metadata and the actual digital specimen. Other projects such as GBIF, iNaturalist, eBird and iDigBio gather, archive and distribute observations of the natural world. Data collections are at risk of loss every day from a wide variety of causes, ranging from natural disasters to hardware failures to simple human error. Ensuring these data are maintained in a fashion that nearly guarantees both the long term survival and loss-less archive of biodiversity datasets is a core responsibility of our community and must be a prime consideration of any organization managing biodiversity data. Traditionally, loss-less archival of digital data have been managed with local hardware. With the advent of cloud-based solutions, storage, archives and disaster recovery are more frequently managed as part of the cloud solution. Whether an archive is managed with on-premises hardware or in the cloud, the challenges and solutions are often the same, however, many of the existing systems do not include processes to ensure the long-term survival and integrity of digital data. In this symposium, we will review current best practices in managing digital data, including presentations from several organizations that manage large datasets. Presentations will focus on how these organizations manage their data to ensure the long-term existence and integrity of their unique datasets. Discussions will include new infrastructure options and technologies as well as challenges and gaps faced financially, technically and sociologically.