BoF Session on Data Citation

Where we discuss Anything related to the march plenary, such as requests for f2f meetings, BoF sessions, the agenda, etc.

Moderators: Leif.Laaksonen, SaraPittonetGaiarin

Re: BoF Session on Data Citation

Postby sproell » Fri Mar 15, 2013 11:10 am

Dear colleagues,
for our convenience I merged the position statements I received so far into a document. I circulated it via Email and you can download it here as well.

BoF-Session-DataCitation.pdf

(last update: 16.03.2012, 13:25h)
You do not have the required permissions to view the files attached to this post.
Last edited by sproell on Sat Mar 16, 2013 2:11 pm, edited 1 time in total.
sproell
 
Posts: 5
Joined: Tue Feb 12, 2013 12:03 pm
Location: Vienna, Austria

Re: BoF Session on Data Citation

Postby Gary » Sat Mar 16, 2013 12:17 pm

It seems to me that many of the issues in Natalia Manola's Position Statement such as citiation granularity will also be important topics for the Data Foundation and Terminology WG.

For example getting the terminology right to describe collections in that are db and/or data files.
Gary
 
Posts: 27
Joined: Thu Dec 06, 2012 7:45 pm

Re: BoF Session on Data Citation

Postby rauber » Mon Mar 18, 2013 3:59 pm

Gary,

I agree. The "meeting point" most likely be the fact that in this WG we will focus predominantly on jow to *identify* a subset of data, whereas the termonology WG will likely focus on how to *describe* this subset.
Identify, on this context, basically means to be able to arrive at an identical data set (sequence) at any given point in time, on a machine-actionable level. For proper data citation, you obviously need both, identification and description. Let's see how focused or broad we want this WG to grow, and then we definitely will need to ensure proper comminication across WGs, probably fprmong WG clusters...

Andi
rauber
 
Posts: 12
Joined: Mon Jan 14, 2013 2:11 pm

Re: BoF Session on Data Citation - Position Statement WDCC

Postby jprombouts » Mon Mar 18, 2013 4:35 pm

The 3TU.Datacentrum (research data repository of the 3 Dutch Universities of technologies) is been doing some work on this en meeting the same issues as WDCC.

We use several types of objects: collections, datasets, places, periods, instruments, studies.
Every object has metada and some objects have data. Data is stored in datastreams.
A dataset can have more then one datastream and it can contain other datasets.
It can be measured by an instrument which is located in a place. And the data is measured in a certain period.
Datasets and instruments can be member of a collection.
(See attached ppt for a picture).
We assign DOI’s to datasets, collections and studies which have the metadata required by DataCite.

Studies are the solution for user-defined collections and are occasionally used for linking subsets from a large collection to PhD theses.

Martina wrote:I. Short- to Medium-term Goals (for static data)

1. Enable the citation of subsets of a DOI data collection:
Subsets are typically collections selected in respect of e.g. variable, temporal coverage, spatial coverage, frequency, ensemble member and/or other parameters.

2. Enable the citation of subsets including parameters of multiple DOIs:
WDCC plans to provide additional PIDs for selected parameters, which are often analyzed and cited together, e.g. like single parameters for all model runs for a single CMIP5 experiment.


II. Long-term Goals

1. Enable the citation of user-defined (customized) data collections:
Users can create their own collections in order to cite precisely the used files of one or multiple DOI data collections or to make their data products citable. Thus a reference list would include the original data DOI and more specific PIDs.
For this, WDCC could set up an end-user web GUI where any user can stitch together custom data collections, which will receive their own PIDs.

2. Citation of data during the project phase (dynamic data):
DKRZ aims to support its users during the data creation / project collaboration phase as well. That requires version control and tracking of data update/changes. Version control involves the problem of distinguishing technical data versions (checksum changes) needed for data identification from scientific data versions (data content changes) for use in data citations. For the citation of data only relevant scientific data versions are appropriate, e.g. major data versions.


I.1 We achieve this by assigning DOI's to datasets in datasets or collections. e.g. http://dx.doi.org/10.4121/uuid:839995ea ... 173883d238
I.2 We achieve this by assigning DOI's to aggregates. e.g. http://dx.doi.org/10.4121/uuid:5f3bcaa2 ... ec928cae6d
II.1We achieve this by assigning DOI's to studies. e.g. http://dx.doi.org/10.4121/uuid:57acdc8d ... 075cc30b0a
We do not have an online end-user GUI yet.
II.2 Versioning and especially referring to the latest version is an unsolved issue to us.

Also we've only dealt with longitudinal sets sofar not with changing databases.
You do not have the required permissions to view the files attached to this post.
jprombouts
 
Posts: 1
Joined: Mon Mar 18, 2013 3:02 pm

Re: BoF Session on Data Citation - Position Statement WDCC

Postby Martina » Tue Mar 19, 2013 9:22 am

Thanks, we did not know that !

I have two questions and one comment.

Questions:
- How do you combine your RDF with the possibilities of DataCite relations in the DataCite scheme?
If I understood it right, you store the relations like isPartOf locally and not in the DataCite metadata.
- Which standards do you use? E.g. SensorML or ISO 19139 for the metadata. I mean, can you exchange these low level information e.g. on instruments and on relations between the datasets with other repositories holding similar data?

Comment:
Our DOI process includes thorough quality assurance, which means that we regard DOI data as high-quality data. We want to offer a lower-quality and more flexible possibility to identify and cite collections of data by using PIDs. Thus, we would use DOIs on one collection and add PIDs for the additional collection: subsets or collections across different DOIs. We would like to use PIDs more for identification of the exact data used in a publication and the DOI as a citation and for use in scientists' publication lists.


jprombouts wrote:The 3TU.Datacentrum (research data repository of the 3 Dutch Universities of technologies) is been doing some work on this en meeting the same issues as WDCC.

We use several types of objects: collections, datasets, places, periods, instruments, studies.
Every object has metada and some objects have data. Data is stored in datastreams.
A dataset can have more then one datastream and it can contain other datasets.
It can be measured by an instrument which is located in a place. And the data is measured in a certain period.
Datasets and instruments can be member of a collection.
(See attached ppt for a picture).
We assign DOI’s to datasets, collections and studies which have the metadata required by DataCite.

Studies are the solution for user-defined collections and are occasionally used for linking subsets from a large collection to PhD theses.
I.1 We achieve this by assigning DOI's to datasets in datasets or collections. e.g. http://dx.doi.org/10.4121/uuid:839995ea ... 173883d238
I.2 We achieve this by assigning DOI's to aggregates. e.g. http://dx.doi.org/10.4121/uuid:5f3bcaa2 ... ec928cae6d
II.1We achieve this by assigning DOI's to studies. e.g. http://dx.doi.org/10.4121/uuid:57acdc8d ... 075cc30b0a
We do not have an online end-user GUI yet.
II.2 Versioning and especially referring to the latest version is an unsolved issue to us.

Also we've only dealt with longitudinal sets sofar not with changing databases.
Martina
 
Posts: 2
Joined: Mon Mar 11, 2013 12:04 pm

Re: BoF Session on Data Citation

Postby rauber » Tue Mar 19, 2013 5:53 pm

Dear all,

Thank you very much for contributing to a very dense, but focused and (I feel) extremely productive BoF session.
Attached are the slides that we used during the meeting:
130319_rda_bof_datacitation.pdf
.

Minutes will be following shortly, as well as the initiation of the next steps, as agreed during today's meeting.

Best regards, Andi
You do not have the required permissions to view the files attached to this post.
rauber
 
Posts: 12
Joined: Mon Jan 14, 2013 2:11 pm

Minutes BoF Session on Data Citation

Postby sproell » Fri Apr 05, 2013 8:34 am

Here you can download the minutes of the BoF-Session. Comments are very welcome!
You do not have the required permissions to view the files attached to this post.
sproell
 
Posts: 5
Joined: Tue Feb 12, 2013 12:03 pm
Location: Vienna, Austria

Re: BoF Session on Data Citation

Postby rauber » Wed May 08, 2013 4:45 pm

Dear colleagues,

Following the BoF session in Gothenburg and subsequent follow-up discussions we have now prepared a first draft of the Case statement of the planned Working Group on Data citation: Making Data Citable. This has been posted in an according thread in the Case Statement Section of the forum under
http://forum.rd-alliance.org/viewtopic.php?f=3&t=87

We are looking forward to an active discussion and feedback to prepare a solid case statement for submission and approval by the RDA, as well as for the nomination of pilot projects. Wih the creation of the new thread on the case statement, this discussion thread here is now basically closed - and continues in the new thread at
http://forum.rd-alliance.org/viewtopic.php?f=3&t=87.

Best regards, Andreas
rauber
 
Posts: 12
Joined: Mon Jan 14, 2013 2:11 pm

Previous

Return to March Plenary

Who is online

Users browsing this forum: No registered users and 0 guests

cron