Proposed BoF Session at the RDA Gothenburg Meeting:
Being able to reliably and efficiently cite entire or subsets of data in large and dynamically growing or changing datasets constitutes a significant challenge for a range of research domains. Several approaches for assigning PIDs to support data citation at different levels in the process have been proposed. These may range from individual PIDs being assigned to individual data elements, via metadata-based approaches, to PIDs assigned to queries executed on time-stamped and versioned databases.
This BoF session aims to bring together a small group of experts to discuss the issues, requirements, advantages and shortcomings of existing approaches for efficiently citing entire or arbitrary subsets of large and dynamically changing datasets. We will be looking at different types of data and database management system, ranging from structured data and SQL-based DBMS to semi-structured and graph-based databases. The goal is to assure that subsections of data can be uniquely identified in the face of data being added, deleted or otherwise modified in a database, across longer periods of time, even when data is being migrated from one DBMS to another. We want to discuss and evaluate different existing approaches to this challenge, evaluate their and vantages and shortcomings and identify obstacles to their deployment in different settings, as well as potential recommendations for approaches under certain conditions. Amongst others these should subsequently form a solid basis for citing data, linking to it from publications in an actionable manner. Yet, the focus of the discussion will NOT be on the definition of suitable PID systems and protocols or publications-to-data linking data structures, but focus on the best way to uniquely and persistently identify subsets of data in a dynamically changing setting.
Following an evaluation of the field, the barriers identified and potential solutions one potential output of the session would be the prospect of establishing a Working Group within RDA to tackle and resolve this issue.
Andreas Rauber, Vienna University of Technology & SBA
Reagan Moore, UNC Chapel Hill
Dieter van Uytvanck, MPI
Hans Pfeiffenberger, Alfred-Wegner Institute for Polar and Maritime Research
Daan Broeder, MPI
Peter Wittenburg, Max Plank Institute
Martina Stockhause, World Data Centre for Climate (WDCC)
Jan Brase, Technische Informationsbibliothek (TIB), German National Library of Science and Technology
Natalia Manola, National & Kapodistrian University of Athens
Jo McEntyre, EMBL - European Bioinformaics Institute
Stefan Proell, Secure Business Austria
Paul Uhlir, The National Academies
A number of other people have expressed interest on addressing this subject and will be contacted as soon as the BoF Session has been approved, to submit 1-2 page position papers on core challenges and/or potential solutions.
Note: A summary of the position statements to this BoF is available atdownload/file.php?id=90