Case Statement: PID Information Types

Where we discuss case statements that are "final" (i.e. ready for review).

Postby tobiasweigel » Fri Jan 11, 2013 3:07 pm

Dear all,

we'd like to officially submit the case statement (attached) for the WG "PID Information Types".
Please provide your comments!

-edit- 11 Feb 2013: New version (revision 3) of the CS with minor changes to reflect comments posted here and update on the member list. Thank you for your feedback!
-edit- 15 Feb 2013: Final version (revision 5).

Best, Tobias & Tim
Tobias Weigel, DKRZ
Re: Case Statement: PID Information Types

Postby DFFlanders » Tue Jan 15, 2013 7:33 am

Looks like a tough nut to crack but good on this group for taking it on. However, a bit more clarification might be nice in terms of the practicalities so others might be interested and/or want to join the group? PIDs are naturally hard to define (and boring for the rest of the world) so extra effort needs to be made to market the value of this group, otherwise our lessons learned will fall on deaf ears despite the importance of objects 'types'.

1.) With the reference to 'PID' in general, it would seem that the body of data being investigated by the WG will be the likes of DOI based data objects and/or ISSN based data objects (NB sympathy for the dataset/object terminology problems). Does this mean that some crawling (of say) the DOI/Datacite corpus will be the quantitative dataset by which the PID types will be derived, e.g. if you find a bunch of excel spreadsheets listing interview questions then that is the type we'll try and programmatically define as a type alongside what metadata should define it?

2.) How much will this group take into account the likes of the global format registry works such as FDDR, Pronom and other mime type based registries? Are these in scope to collaborate with, thereby building on their already siginficant 'types'? How does this compare with a simple approach such as BagIt or SWORD, and how might this integrate with a SWORD4Data and other transport protocols that already exist and would need to be expanded to include typing... the natural implications for having URIs for each new type looming :)

3.) How will the group address the 'polymorphism' problem that we continue to run into and get sucked into never-ending debates about <-- this mostly a request that a public statement is made about 'failing fast' so that this WG doesn't get tied down in arguments. Saying that, is data more prone to polymorphism than resources? <-- See, I can't even resist the philosophical musings!

Hope these comments are helpful and good luck in your endeavours.

Kind Regards,

David F. Flanders
Senior Analyst
Re: Case Statement: PID Information Types

Postby Gary » Thu Jan 17, 2013 9:50 pm

Has the WG considered what extant work it will leverage on assigning identifiers and object identification? It would be nice to cite existing publications on this.

One such might be "Distinguishing Provenance Equivalence of Earth Science Data" by Tilmesa, Yeshab, & Halemb. ... 0911001153

Gary Berg-Cross
SOCoP and Foundations and Data Terminology WG
Re: Case Statement: PID Information Types

Postby tobiasweigel » Wed Jan 23, 2013 8:56 am

Thanks for your comments! It is very helpful to have more pairs of eyes look at the proposal to see if it fits a viewpoint across communities.

I'm not exactly sure what you mean by practicalities. Participation in the WG is open to all, and we're planning on dealing with moderation and steering issues as we go. The rough project plan we are sticking to also emphasizes use cases as the foundation for all WG work early on.

I agree that the added value we are going to provide is not easy to communicate. I think that there is a fundamental contradiction here: we want to market the value, i.e. demonstrate benefits down at the community level, where potential users can see it best; on the other hand, the number of communities involved in the RDA effort is already quite large and will continueto grow, so we must limit ourselves if we want to have measureable outcomes within this limited timeframe. The project plan takes this into account by restricting the total number of use cases we will actively work on, while also incorporating potential other ones which were not dealt with. I'm sure not everyone will be happy with this compromise, but I believe it's the best we can offer if we want to have measurable outcomes.

I do not think of DOIs as the primary population for PIDs we want to look at; there should be a difference between PIDs in general and (Datacite) DOIs in particular, e.g. regarding granularity, acceptance by publishers, quality control and assignment process. I would like to see PIDs as we are talking about here as the much more lightweight and more general thing. This also means that we are not planning on using the DOI/Datacite corpus as the main information source. I cannot speak for other communities as well as for my own (Earth Science), but in general I think the bulk amount of data that should get PIDs is not intended to be DOI'ed, at least not in the first place; so we have to look at a much broader data space.

I'm not particularly familiar with FDDR, Pronom or SWORD, but from a quick review I get the feeling that this is much more a topic for the Type Registry WG. PID Information types are only partly about the typing of the digital objects we are going to identify. The core motivation for this WG is that given a PID there is not just the digital object referenced by that PID but also a set of additional information; maybe external object properties is a suitable term. Of course it is tempting to call this metadata, however metadata is much more about the internal properties of the digital object, and we need to separate our concerns from those of the Metadata WG. (Again: there we are already at a terminology discussion!)

Our overall strategy to avoid terminology lock-ups is to accept some (potentially itching) compromises and focus on exemplary use cases.
Let me give a small use case example off the top of my head: A data storage managament system frequently encounters use cases such as to 'copy' or 'move' data; such data may come from various domains, residing at a single dataspace nonetheless. For such operations it is beneficial to know if the data is not compromised (checksum) or whether there are other objects bound to it that may be valueless if the object is moved (references).
This is just one example for a possible cross-community use case we might want to deal with. Other important use cases may e.g. deal with provenance. I hope this answers your question..?
Anyway - if the existing registries and the theoretical work done along with them are relevant, I'll be happy to see some experts on them contributing to our WG efforts. We certainly don't want to repeat elaborate discussions that have already been crunched by other projects.

I don't understand what you mean with the polymorphism problem. Can you elaborate? Is this perhaps about terminology discussions that may stall the WG effort?

Gary: Existing work - yes, we'll need to sift through existing efforts as part of our first WG activities. I'll set up a literature list. The reference you gave is certainly relevant as there are baseline definitions it it that can be reused; granularity is something we must be aware of, as well as different (domain) views on identity of objects. Provenance is a good cross-community use case we might want to pick up.
Tobias Weigel, DKRZ
Re: Case Statement: PID Information Types

Postby pwittenburg » Fri Jan 25, 2013 3:37 pm

Dear Tim, Tobias and colleagues,

here are a few comments from my side to the current case statement.

• I think that the main objectives (define and harmonize core types, establish a process for more types, develop an API) are very concrete, that the deliverables are clear and seem to be achievable.
• It seems that there is sufficient basis to also implemented results quickly to show operation and to get this into practice.
• I found the engagement too weak - so people such as from AWK, URN etc. initiatives should be contacted to participate and the group membership should be broader.

Hope this helps
Peter Wittenburg
Re: Case Statement: PID Information Types

Postby tobiasweigel » Mon Feb 11, 2013 8:14 am

Dear all,

I've made some minor updates to the case statement to reflect the comments posted here so far. The new version is attached to the first posting in this thread.

Best, Tobias
Tobias Weigel, DKRZ
Re: Case Statement: PID Information Types

Postby tobiasweigel » Fri Feb 15, 2013 1:33 pm

Dear all,

the final version of the case statement is attached to the first posting. Changes include mostly the adoption plan and clarifications in view of the council questions to all CWGs. No changes to deliverables or the general WG scope.

Best, Tobias
Tobias Weigel, DKRZ
Re: Case Statement: PID Information Types

Postby parsonsm » Sun Mar 31, 2013 12:47 pm

Council has formally recognized this RDA Working Group. Below is the letter the Council sent the group chairs prior to the Launch, where the decision was finalized.

The group had active and fruitful discussions in Göteborg. They are preparing on an adoption plan as requested by Council and are getting to work on the deliverables outlined in the case statement.

Letter from Council:
Dear Tim and Tobias,

In the last few weeks the RDA Council received an initial group of Case Statements for assessment and we are writing to you to provide a response to your submission. We very much appreciate your discussions which have been, and continue to be, an important part of establishing and building the RDA as an organization that can have substantial impact and effectiveness within the data community.

The process for assessing the Case Statements was as follows:  The Liaisons for each of the submitting groups were asked to provide a technical assessment of the Case Statement.  Council used these assessments in a substantial way to assess the potential impact of the effort.  Council closely read the Case Statement as well as RDA Forum discussions for the group in order to get a sense of the discussion and participation.  (We were also aware that at this point, part of the discussion for some groups were not conducted on the RDA Forum).  Council’s assessment of the impact and alignment with the RDA mission included a strong focus on whether the Case Statement included concrete deliverables that would directly or indirectly enable a research community of practice and would be adopted by Working Group members during the effort.

With regard to your Case Statement the RDA Council approved PID Information Types as a Working Group.  Please provide an adoption plan within one month that includes at least two organizations that will adopt the API implementation (e.g. DKRZ and California Digital Library).  Please provide a timeline for the adoptions.

If it is useful, we would like to meet the chairs of your group to discuss this decision with you, and hope to do so during the Plenary meeting in Gothenburg next week. Ross Wilkinson ( and Fran Berman ( will be available on Sunday to answer any questions or discuss if you are arriving early, and otherwise all of us (including John Wood []) will work with you to find a time during the three days to meet with you.

We appreciate your group’s discussion and engagement with the RDA and look forward to seeing you in Gothenburg.

Ross, Fran, and John
Mark A. Parsons
