Data Foundation and Terminology

Where we discuss anything related to the RDA (catchall)

Moderators: Leif.Laaksonen, SaraPittonetGaiarin

Data Foundation and Terminology

Postby pwittenburg » Thu Nov 08, 2012 12:08 pm

Dear all,

after a recent discussion with Gerhard Budin we agreed to start working with the RDA discussion forum for the terminology harmonization interest group that was formed in Washington and which we now gave the label "data foundation and terminology". There will be a closely related discussion forum on best practices in semantic interoperability as well. The initial members of this combined interest group are Gerhard Budin, Peter Wittenburg, Gary Berg-Cross, Ruth Duerr, Talapady Bath, Jens Ludwig, Tobias Weigel, Michael Lautenschlager, Larry Lannom, Nancy Wiegand, David Baker, Daan Broeder, Deborah McGuinness, J. West, Rainer Stotzka, John Elliott, Steve Richard.

There have already been some chats via the "old" email list, but we should turn over to make things transparent to the rest of the RDA community. Gerhard, Gary and myself volunteered to act as chairs and I certainly will focus on the terminology topic. It is obvious that many of us are looking for a fast harmonization in this respect. I hope that this is ok for all of you. Hoping that all of you can come to the official launch meeting in March in Gothenburg where we should have a break out session for our two related topics.

Let me start/continue the discussion with a few points:
- There are a few documents that were discussed during the last months (and there will be others I assume). For the time being we should include them as attachments. I am attaching one doc some of us were discussing during the last months and I will also attach the PPTs from a talk Bob Kahn gave recently. Here is the link to another note from Kahn & Wilensky which we also found very useful: http://www.cnri.reston.va.us/k-w.html
- Currently we are preparing a first draft of a document that describes the functions of a proper forum that is useful for doing the (C)WG business. We would like to discuss this with you and others very soon.
- Are there other people we need to include in this particular forum? If you have names please let us know. Please, remember that we want to have committed people in the (C)WGs.
- We should turn to using an RDA email list asap. So I asked to create an email list with the addresses listed below which should be in operation very quickly. The name of that list is: RDA-cwg-terminology@lists.rd-alliance.org
- We should start working on a Case Statement for the "Data Foundation and Terminology" work very soon and I will give it a start based on what we discussed in Washington.
- Obviously we need a short note that defines the scope of this work and a discussion about this note.

So let's enter a phase of discussion about the topic.

best
Peter & Gerhard

ludwig@sub.uni-goettingen.de
weigel@dkrz.de
lautenschlager@dkrz.de
rduerr@nsidc.org
dbaker@casrai.org
peter.wittenburg@mpi.nl
daan.broeder@mpi.nl
gbergcross@gmail.com
llannom@cnri.reston.va.us
gerhard.budin@univie.ac.at
wiegand@cs.wisc.edu
dlm@cs.rpi.edu
jwest@rcsb.rutgers.edu
rainer.stotzka@kit.edu
bhat@nist.gov
john.elliott@nist.gov
steve.richard@azgs.az.gov
You do not have the required permissions to view the files attached to this post.
pwittenburg
 
Posts: 31
Joined: Wed Oct 17, 2012 4:47 pm

Mailing list SEMTERM

Postby stotzka » Tue Nov 13, 2012 11:03 am

Dear All,

Currently we are running a mailing list
“Terminology & Semantic Interoperability”
at rda-wg-semterm@lists.kit.edu.

When the new list rda-cwg-terminology@lists.rd-alliance.org
is set up I will move and inform the subscribers.

I think in future most of the discussions will happen here in the forum.

Remark:
Should we use the new title "Data Foundation and Terminology"
or the old one "Terminology & Semantic Interoperability "we agreed on in Washington?
Or am I currently mixing up totally different WGs?

RAiner
Rainer Stotzka
Karlsruhe Institute of Technology (KIT)
http://ipelsdf1.lsdf.kit.edu/cms/
User avatar
stotzka
 
Posts: 12
Joined: Sat Nov 10, 2012 7:36 am

Re: Data Foundation and Terminology

Postby pwittenburg » Tue Nov 13, 2012 6:55 pm

Thanks Rainer.
I guess that it was clear in Washington that we have two foci of the work to be done - both very much related.
To keep momentum with respect to the focused requirements for RDA Working Groups there is no other chance to come up with two very well defined Case Statements. So let's have two threads and look at each others notes etc.

best
Peter
pwittenburg
 
Posts: 31
Joined: Wed Oct 17, 2012 4:47 pm

Re: Data Foundation and Terminology

Postby pwittenburg » Wed Dec 12, 2012 10:30 am

Dear all,

I uploaded the concept note on Data Foundation and Terminology that has been created by a number of colleagues. This should motivate others to also upload useful document which may inspire the discussion in the work group.
This note serves a few purposes for those who participated in writing:
- it tries to model the basic entities and relations with which we are faced with in data management, access, sharing etc. and it does so by comparing our situation with the Internet basics despite all differences
- it indicates the scope of terms that needs to be defined according to the group of authors

Peter
You do not have the required permissions to view the files attached to this post.
pwittenburg
 
Posts: 31
Joined: Wed Oct 17, 2012 4:47 pm

Re: Data Foundation and Terminology

Postby pwittenburg » Wed Dec 12, 2012 10:48 am

Dear all,

here is the first version of a Case Statement prepared by Gary and myself.
Please have a look and comment. Please use the forum for commenting to indicate to others what we are discussing.

Gary and myself would ask you in particular to think about the following aspects:
- In the paragraph "Engagement with Existing Work" we should have a list of essential papers that may contribute to the focus of the WG. Please add papers or documents which add new and relevant views.
- In the last paragraph "Initial membership" we need to make statements who will be part of the discussion. Here we should try to engage people from relevant initiatives. Yet we did not add anything although we have some partners from whom we know that they are interested and committed.

Gary and myself are suggesting to have a first virtual meeting before Christmas. So please fill in the Doodle quickly. Dependent on participation we will check which time will be most optimal - should be afternoon/evening for Europeans and morning for US colleagues.
http://doodle.com/i8fpy7xmhv522bph

Hope on interesting contributions

Peter and Gary
You do not have the required permissions to view the files attached to this post.
pwittenburg
 
Posts: 31
Joined: Wed Oct 17, 2012 4:47 pm

Re: Data Foundation and Terminology

Postby tobiasweigel » Wed Dec 12, 2012 4:06 pm

Hi,

thanks, Peter, for uploading the documents to the forum.

Regarding the dtf-concept-note document, Michael and I feel that the definition of the Data Object as a triple is too narrow. It is very crucial that we define this precisely at an early stage to keep cross-community applicability. The fixed metadata description as the third part of the triple is problematic. In some cases, this may be true, but not in the general case. What we want to have is the ability to refer from one DO to many other DOs, regardless of whether their bit sequences contain metadata, other data objects, images, etc. - there are even various types of metadata, and we want to refer from one data object to more than just one of them (or none of them). Think of different metadata records for browse/search use cases, provenance, experiment documentation and so on.

An elegant solution for this is not to talk about a metadata description, but of (any number of) typed references to other DOs. The current view of the triple would be reflected in a 'metadata' type reference. In the same manner, we can easily enable collections by an 'element' type reference.

Consequently, the more abstract model looks as follows:
DOs are described as the tuple of a bit sequence and a PID record. The bit sequence is arbitrary digital material which is a black box to the global PID system. The PID record in turn consists of several elements: the PID, one or more references to locations where the bit sequence can be obtained, arbitrary attributes, and any number of typed references to other DOs. This is also much closer to the original Digital Object definition from the Kahn & Wilensky paper. I'd also like to actually talk about proper Digital Objects, not Data Objects, because 'data' carries different semantics for different communities already.

Best, Tobias
Tobias Weigel, DKRZ
tobiasweigel
 
Posts: 26
Joined: Mon Oct 29, 2012 9:01 am
Location: DKRZ, Hamburg

Re: Data Foundation and Terminology

Postby rduerr » Fri Dec 14, 2012 7:36 pm

Hi Peter,

Thanks for posting the Case statement and rtf-concept-notes to the forum.

I think the WG Charter is for the most part just fine, though I would hope that the "reference document about the DFT" would contain the terminology definitions, be web accessible and presented in a form such that each term has its own URI. That would simplify both user understanding of the terms and their ability to be re-used in any number of environments.

As for the UML data organization model - I must admit that I am not interested in that, as I've found UML to be less than useful for communicating with the communities I deal with. I am also not a fan of registries of any sort except the kind that create themselves on the fly (i.e., Google, Bing, etc.) as I have yet to find an example of a functional registry that has worked and actually usefully serves its community.

I've decided not to comment on the rtf-concept-notes except to note that the conception of data objects as bit strings is flawed from the start for the communities I deal with. Bit strings are not the data, they may be one of likely many scientifically equivalent representations of "the data" but they are not "the data". Moreover, the more different communities a data object is useful to the more likely there are to be many equivalent bit string representations of that data. Each community uses its own tools, those that are most suitable for their needs, and typically each tool is optimized for different sorts of bit strings - bit strings that match the conceptualizations of that user community; conceptualizations that vary deeply between communities simply due to the nature of the disciplines involved. We've even been known to archive multiple representations of the data, to facilitate use by multiple communities just because of this fact.

I think the videoconferences should be useful for illuminating these different conceptualizations of data, so am looking forward to next weeks call.
rduerr
 
Posts: 4
Joined: Tue Oct 23, 2012 5:48 pm

Comments in the chat: TelCo 20.Dec.2012

Postby stotzka » Thu Dec 20, 2012 4:14 pm

pewi: RAJA TOO LOW VOICE
Rainer Stotzka: Arcot, I could not hear you
pewi: GREt
pewi: too low voice
pewi: not good raja
Stan Ahalt: raja, I sent you an email
pewi: can hear you very well
pewi: can't hear you
Stan Ahalt: larry, you have the mic, but we cannot hear you
Stan Ahalt: if everyone will click the chat button, we can communicate
Stan Ahalt: yes, Gary, we can hear you
Stan Ahalt: still cannnot hear you Larry, sorry!
pewi: no voice Larry
Ruth Duerr: This is Ruth - I am agreeing completely with what Gary said, we need to start with terminology
Daan Broeder: maybe try to reconnect larry
Gary BergCross: No video either Larry
Arcot Rajasekar: His Mike might be on mute
Arcot Rajasekar: Sorry folks. My mike is not working well
Stan Ahalt: yes we can hear you now
pewi: very good Larry
pewi: absolutely right Gary
Gary BergCross: Absolutely right to look at the sections and decide if they are adequate and correct.
Gary BergCross: 11179 was used as an example, thre might be others
Gary BergCross: machine processable is not the goal of this effort, but would contribute to it.
Arcot Rajasekar: do we have complete list of communities that we want to engage
pewi: agree with you Ruth
Gary BergCross: Something like UML helps show relationships visually.
Gary BergCross: I agree with Daan on this.
Tobias / Michael: uml is good for communicating things, though it cannot go without adequate documentation
Daan Broeder: true.
Gary BergCross: I took UML as a supplement to the text.
Ruth Duerr: Well if visual representations are useful - I would prefer concept maps
Ruth Duerr: but I really was just thinking of terms and their definitions
Gary BergCross: I agree that a network view is a better idea for the model, although there may be hierarchies in it.
Daan Broeder: you can use a subset of UML to keep it simple
Arcot Rajasekar: just creating the data model will take time
Daan Broeder: we need both
Stan Ahalt: I think that we can use either UML or a concept map, but we'll certainly need some kind of diagram
Ruth Duerr: though I would like each term to have its own URL
Arcot Rajasekar: it is easy to convert from ne to another
Stan Ahalt: also, can we refer to an "ISO-like" concept registry?
Gary BergCross: Does the value prop section also serve as a problem statement section?
Ruth Duerr: In my opinion, the value proposition should also talk about the general value of having the same definition for the same terms, so that conversations are meaningful and people aren't talking past each other
Arcot Rajasekar: vaue proposition is in enabling semantic interoperability across disciplines
pewi: right Ruth - that's what we see
Tobias / Michael: and if we end up stil don't agreeing on a single definition, at least have some defined alternatives?
Ruth Duerr: agreed
Stan Ahalt: what I am saying is that we should strive for semantic interoperability, but admit that we won't always be hitting that goal, but we should continue to iterate
Ruth Duerr: or Tobias, perhaps the definitions become more broad or general
Arcot Rajasekar: i agree with Stan and Ruth mentioned. Hence the meaning/definition have vale beyond the syntactic equivalence
pewi: good point Rainer - but after we have defined our case statement
Rainer Stotzka: fine
Gary BergCross: I think the value prop does start out with ideas like what Bob is talking about: There is substantial value to research data communities in establishing a common ground for interactions such as data sharing and interoperation. Proper data organization will be enabled by agreeing upon a number of basic concepts and their relationships as well as by explicitly defining and registering appropriate terms.
Ruth Duerr: Agreed Gary
Reagan Moore: Data interoperability is required in collaboration environments that span multiple types of data management systems. This requires a basic understanding of the assertions made by each data management system about both the properties of the digital objects, and the properties of the data management environment.
Reagan Moore: Data interoperability is required for multi-disciplinary research. While this focuses on properties of a digital object (descriptive semantics, format, processing procedures), a digital object is typically a member of a collection. The context provided by the collection can be cast as assertions about record properties.
Reagan Moore: Data interoperability can be examined at the level of byte access (file systems), or at the level of information exchange (discovery within digital libraries), or at the level of knowledge exchange (manipulation through procedures that to extract information).
pewi: ok would like to see that paragraph of text
Ruth Duerr: I think it would be useful for each participant to state what communities they represent?
Ruth Duerr: I, for example, mostly work with Earth sciences, though I also work with space science/Astronony and more recently a bit of community and traditional knowledge
Ruth Duerr: And I work with the Federation of Earth Science Information Partners
Rainer Stotzka: Photon science, biology (imaging and genome sequencing), arts&humanities
Gary BergCross: GeoSpatial, Health and Ecological
Reagan Moore: Astronomy, genetics, seismology, climate, hydrology, neuroinformatics, cognitive science, earth systems, health
Ruth Duerr: Peter - I think as a first draft the plan sounds fine
Arcot Rajasekar: Biology
Stan Ahalt: There is an ESIP meeting in early January. Some of the topics are very relevant.
Daan Broeder: humanities & social sciences
Arcot Rajasekar: I can ask som people
Gary BergCross: We may agree with the short-term goals of Phase 1 and the long range goal of Phase 4 and have to adjust in between.
Tobias / Michael: agree too
Stan Ahalt: http://commons.esipfed.org/schedule/Win ... ing%202013
Rainer Stotzka: agree
Arcot Rajasekar: agree
Rainer Stotzka: agree
Reagan Moore: agree
ulrich: agree
Daan Broeder: agree
Ruth Duerr: agree
pewi: very good point Gary
Gary BergCross: In phase 1 we talk about oidentify groups, initiatives, experts that worked on the issues and that are interested to participate in the work and motivate them to contribute to the discussion. This includes other WGs.
Gary BergCross: Yes, I agree with the January timeline.
Rainer Stotzka: ok
Gary BergCross: We are aiming at the March meeting for one thing.
Larry: ok
Ruth Duerr: ditto - though I note that the ESIP meeting is Jan. 8-10
Gary BergCross: We will all have access to these chat notes?
pewi: yes will do
pewi: will get this into a doc
Gary BergCross: Services may need proper metadata to function.
pewi: that's fine Bob
Daan Broeder: just as usable data
Gary BergCross: Thank you Rainer.
Gary BergCross: I took the idea to be one of data services.
Gary BergCross: Registries include various data services.
Stan Ahalt: data as a good OR data as a service: very nice
Arcot Rajasekar: data as a byproduct of a service or workflow
Gary BergCross: Data as an ingredient in information processes.
Stan Ahalt: so just to get this recorded, in the context of data:object is the dual of service
Arcot Rajasekar: sounds good. wishing all a very happy holiday and new year
Gary BergCross: Merry Xmas all...
Daan Broeder: thank you all the best....
Stan Ahalt: very best wishes to all for a safe and relaxing holiday!
Rainer Stotzka
Karlsruhe Institute of Technology (KIT)
http://ipelsdf1.lsdf.kit.edu/cms/
User avatar
stotzka
 
Posts: 12
Joined: Sat Nov 10, 2012 7:36 am

Re: Data Foundation and Terminology

Postby tobiasweigel » Thu Dec 20, 2012 5:16 pm

Another note on the ISO-compliant concept registry discussion:

If we set up a concept registry, we should make sure that it is very (very!) lightweight. The goal should be to have URLs (PIDs?) pointing to the definitions we make, that's useful. But from the current state of the discussion I am not sure whether everyone really agrees that this will be the end of it. The next step obviously is to formalize term semantics, have some form of API access, but if we go down this road from my experience there will be a lot of extra effort involved, which we can't afford and which I don't see reasoning for unless there are use cases that are relevant at this early stage (maybe later on, but that's beyond the scope of this WG!).

The focus of the WG should be to agree on a conceptual framework, which is a lot of work by itself and crucially needed by other WGs. Only once that is established we can begin to formalize things, but this is a lower priority down the road. My suggestion is that a concept registry at a higher level of sophistication is the task of a separate WG such as the one on semantic interoperability (?) which was briefly mentioned during the call; probably commencing work 12 months from now.

I've modified the attached case statement accordingly to clarify the point, feel free to comment and iterate.
You do not have the required permissions to view the files attached to this post.
Tobias Weigel, DKRZ
tobiasweigel
 
Posts: 26
Joined: Mon Oct 29, 2012 9:01 am
Location: DKRZ, Hamburg

Re: Data Foundation and Terminology

Postby srichardUSGIN » Mon Jan 07, 2013 7:15 pm

After looking at the draft Case Statements, following the e-mail discussion, and reading this forum, here's my 2 cents.

From the ‘DFT-Case Statement’ document, it is necessary to clarify what is denoted by “a basic, abstract data organization model which can be used to derive a reference data terminology” -- what is the scope of the terminology that is to be developed? It’s not clear to me from what’s currently in the case statement.
Attached is a revised Case Statement, attempting to clarify the scope and products. The approach is based on scoping the model, defining the concepts of interest and their relationships, assigning labels (terms) for the concepts to use in discussion, documenting the model (text, graphics, formal model notation), getting a Web resource deployed to make the model and terminology accessible (both in text for people and formal notation like SKOS or OWL for machines). The terminology is essentially the entities, properties and relationships in the model. An important part of the project is determining and deploying a govenance scheme--how are new concepts introduced, what is the revision/update process, what is the change management/versioning scheme, what kinds of representations will the concepts URIs GET.

Here's the proposed Charter statement:
The DFT WG task is to develop and document a cross-domain, abstract model for the entities and relations required for data management, access, understanding, and utilization. Entities of interest include (but are not limited to) digital object, data, dataset, repository, provenance, disseminator, access manager, registry, broker, schema, format, protocol, and identifier. The model will consider various data access models, including library-type access to individual digital objects, service-based access to subsets or aggregations of data, access to real time data from sensor networks, and data objects encapsulated by methods that need to be invoked to yield the targeted data.

p.s. These edits are suggestions, mostly intended to indicate the kind of explicit statement of purpose, scope, and deliverables for the project that I would expect in a charter. Since I haven't been active in the live conversations, I assume what is written is not entirely consistent with the sense of the group, and should be revised accordingly.
You do not have the required permissions to view the files attached to this post.
Last edited by srichardUSGIN on Wed Jan 09, 2013 3:13 pm, edited 1 time in total.
srichardUSGIN
 
Posts: 1
Joined: Mon Jan 07, 2013 3:12 pm

Next

Return to RDA Discussion Area

Who is online

Users browsing this forum: No registered users and 1 guest

cron