Page 2 of 2

Re: BoF Session on Data Citation

PostPosted: Fri Mar 15, 2013 11:10 am
by sproell
Dear colleagues,
for our convenience I merged the position statements I received so far into a document. I circulated it via Email and you can download it here as well.

BoF-Session-DataCitation.pdf

(last update: 16.03.2012, 13:25h)

Re: BoF Session on Data Citation

PostPosted: Sat Mar 16, 2013 12:17 pm
by Gary
It seems to me that many of the issues in Natalia Manola's Position Statement such as citiation granularity will also be important topics for the Data Foundation and Terminology WG.

For example getting the terminology right to describe collections in that are db and/or data files.

Re: BoF Session on Data Citation

PostPosted: Mon Mar 18, 2013 3:59 pm
by rauber
Gary,

I agree. The "meeting point" most likely be the fact that in this WG we will focus predominantly on jow to *identify* a subset of data, whereas the termonology WG will likely focus on how to *describe* this subset.
Identify, on this context, basically means to be able to arrive at an identical data set (sequence) at any given point in time, on a machine-actionable level. For proper data citation, you obviously need both, identification and description. Let's see how focused or broad we want this WG to grow, and then we definitely will need to ensure proper comminication across WGs, probably fprmong WG clusters...

Andi

Re: BoF Session on Data Citation - Position Statement WDCC

PostPosted: Mon Mar 18, 2013 4:35 pm
by jprombouts
The 3TU.Datacentrum (research data repository of the 3 Dutch Universities of technologies) is been doing some work on this en meeting the same issues as WDCC.

We use several types of objects: collections, datasets, places, periods, instruments, studies.
Every object has metada and some objects have data. Data is stored in datastreams.
A dataset can have more then one datastream and it can contain other datasets.
It can be measured by an instrument which is located in a place. And the data is measured in a certain period.
Datasets and instruments can be member of a collection.
(See attached ppt for a picture).
We assign DOI’s to datasets, collections and studies which have the metadata required by DataCite.

Studies are the solution for user-defined collections and are occasionally used for linking subsets from a large collection to PhD theses.

Martina wrote:I. Short- to Medium-term Goals (for static data)

1. Enable the citation of subsets of a DOI data collection:
Subsets are typically collections selected in respect of e.g. variable, temporal coverage, spatial coverage, frequency, ensemble member and/or other parameters.

2. Enable the citation of subsets including parameters of multiple DOIs:
WDCC plans to provide additional PIDs for selected parameters, which are often analyzed and cited together, e.g. like single parameters for all model runs for a single CMIP5 experiment.


II. Long-term Goals

1. Enable the citation of user-defined (customized) data collections:
Users can create their own collections in order to cite precisely the used files of one or multiple DOI data collections or to make their data products citable. Thus a reference list would include the original data DOI and more specific PIDs.
For this, WDCC could set up an end-user web GUI where any user can stitch together custom data collections, which will receive their own PIDs.

2. Citation of data during the project phase (dynamic data):
DKRZ aims to support its users during the data creation / project collaboration phase as well. That requires version control and tracking of data update/changes. Version control involves the problem of distinguishing technical data versions (checksum changes) needed for data identification from scientific data versions (data content changes) for use in data citations. For the citation of data only relevant scientific data versions are appropriate, e.g. major data versions.


I.1 We achieve this by assigning DOI's to datasets in datasets or collections. e.g. http://dx.doi.org/10.4121/uuid:839995ea ... 173883d238
I.2 We achieve this by assigning DOI's to aggregates. e.g. http://dx.doi.org/10.4121/uuid:5f3bcaa2 ... ec928cae6d
II.1We achieve this by assigning DOI's to studies. e.g. http://dx.doi.org/10.4121/uuid:57acdc8d ... 075cc30b0a
We do not have an online end-user GUI yet.
II.2 Versioning and especially referring to the latest version is an unsolved issue to us.

Also we've only dealt with longitudinal sets sofar not with changing databases.

Re: BoF Session on Data Citation - Position Statement WDCC

PostPosted: Tue Mar 19, 2013 9:22 am
by Martina
Thanks, we did not know that !

I have two questions and one comment.

Questions:
- How do you combine your RDF with the possibilities of DataCite relations in the DataCite scheme?
If I understood it right, you store the relations like isPartOf locally and not in the DataCite metadata.
- Which standards do you use? E.g. SensorML or ISO 19139 for the metadata. I mean, can you exchange these low level information e.g. on instruments and on relations between the datasets with other repositories holding similar data?

Comment:
Our DOI process includes thorough quality assurance, which means that we regard DOI data as high-quality data. We want to offer a lower-quality and more flexible possibility to identify and cite collections of data by using PIDs. Thus, we would use DOIs on one collection and add PIDs for the additional collection: subsets or collections across different DOIs. We would like to use PIDs more for identification of the exact data used in a publication and the DOI as a citation and for use in scientists' publication lists.


jprombouts wrote:The 3TU.Datacentrum (research data repository of the 3 Dutch Universities of technologies) is been doing some work on this en meeting the same issues as WDCC.

We use several types of objects: collections, datasets, places, periods, instruments, studies.
Every object has metada and some objects have data. Data is stored in datastreams.
A dataset can have more then one datastream and it can contain other datasets.
It can be measured by an instrument which is located in a place. And the data is measured in a certain period.
Datasets and instruments can be member of a collection.
(See attached ppt for a picture).
We assign DOI’s to datasets, collections and studies which have the metadata required by DataCite.

Studies are the solution for user-defined collections and are occasionally used for linking subsets from a large collection to PhD theses.
I.1 We achieve this by assigning DOI's to datasets in datasets or collections. e.g. http://dx.doi.org/10.4121/uuid:839995ea ... 173883d238
I.2 We achieve this by assigning DOI's to aggregates. e.g. http://dx.doi.org/10.4121/uuid:5f3bcaa2 ... ec928cae6d
II.1We achieve this by assigning DOI's to studies. e.g. http://dx.doi.org/10.4121/uuid:57acdc8d ... 075cc30b0a
We do not have an online end-user GUI yet.
II.2 Versioning and especially referring to the latest version is an unsolved issue to us.

Also we've only dealt with longitudinal sets sofar not with changing databases.

Re: BoF Session on Data Citation

PostPosted: Tue Mar 19, 2013 5:53 pm
by rauber
Dear all,

Thank you very much for contributing to a very dense, but focused and (I feel) extremely productive BoF session.
Attached are the slides that we used during the meeting:
130319_rda_bof_datacitation.pdf
.

Minutes will be following shortly, as well as the initiation of the next steps, as agreed during today's meeting.

Best regards, Andi

Minutes BoF Session on Data Citation

PostPosted: Fri Apr 05, 2013 8:34 am
by sproell
Here you can download the minutes of the BoF-Session. Comments are very welcome!

Re: BoF Session on Data Citation

PostPosted: Wed May 08, 2013 4:45 pm
by rauber
Dear colleagues,

Following the BoF session in Gothenburg and subsequent follow-up discussions we have now prepared a first draft of the Case statement of the planned Working Group on Data citation: Making Data Citable. This has been posted in an according thread in the Case Statement Section of the forum under
http://forum.rd-alliance.org/viewtopic.php?f=3&t=87

We are looking forward to an active discussion and feedback to prepare a solid case statement for submission and approval by the RDA, as well as for the nomination of pilot projects. Wih the creation of the new thread on the case statement, this discussion thread here is now basically closed - and continues in the new thread at
http://forum.rd-alliance.org/viewtopic.php?f=3&t=87.

Best regards, Andreas