UPC for Data

Where we discuss anything related to the RDA (catchall)

Moderators: Leif.Laaksonen, SaraPittonetGaiarin

UPC for Data

Postby jimmyers » Wed Jan 30, 2013 3:28 pm

Welcome to the "UPC for Data" WG discussion. This topic will document the UPC for data discussion from last fall's meeting and identify a path(s) forward to formalize the idea as a WG case statement. All are welcome to join the discussion and help define an useful and usable standard in this area!
jimmyers
 
Posts: 4
Joined: Thu Jan 24, 2013 2:42 am

Re: UPC for Data

Postby jimmyers » Thu Feb 28, 2013 6:03 pm

What is a UPC for Data?

During the initial RDA meeting and in subsequent email exchanges prior to opening this topic, that while there's enthusiasm for the idea of a Universal Product Code (UPC) for Data, there are actually a couple concepts being put forward (both interesting) that take the UPC analogy more/less strictly. To kick of the public discussion here and to help the group move to one or more concrete standards targets that can be turned into case statements,I'll try to outline these concepts and ask others to clarify and then help us move forward towards understanding how we can move one or more of these forward and to gauge the level of interest across RDA in each.

Most strictly, the UPC symbol, as seen in the grocery store, is primarily a type code (see http://en.wikipedia.org/wiki/Universal_Product_Code for a quick summary) - it identifies the type of the product rather than a single instance (e.g. the 12 ounce size of Kelloggs Corn Flakes, not each box of it). Various parts of the code identify the numbering scheme used (there are several) and, within some, there's a hierarchy where some digit sequences are assigned to companies and further digits encode their (independently assigned) numbering of products.

One interpretation of UPC for Data would follow this model and treat the UPC as a data type identifier - possibly with hierarchical assignment authority. In this concept, a UPC symbol for data would be similar to defining a minimal set of discovery metadata - with a compact/convenient/universal encoding mechanism. Having such metadata universally available (e.g. as part of a standard data package/retrieval mechanism) could make search and discovery more efficient and would allow data consumers, repositories, search providers/metadata registries, etc. to handle data from arbitrary sources in a uniform way.

Another interpretation of UPC for data focuses more on the idea of the UPC as a universal resolution mechanism but assumes that these identifiers apply to data instances rather than types. With UPCs, standardized scanners can be used to read unique identifiers on products from any manufacturer and, via a database lookup, associate the item in someone's hand with, for example, a price. (Since all instances of a given product are supposed to be the same, the distinction between product type and instance is a bit blurred with normal UPCs and, in some UPC schemes, e.g. those that apply to variable weight items like packages of meat, instance-specific info such as weight is encoded, so applying UPCs at a data instance level doesn't completely break the analogy.)

In this concept, the UPC becomes a universal means of encoding a persistent identifier (PID) and the benefit would come primarily from creating a universal mechanism to resolve/retrieve metadata for the PID, regardless of how it was minted/what identifier scheme is used. The value would be that data consumers could integrate data from multiple sources without having to convince data providers to adopt a specific minintg/identifier assignment mechanism, etc.

There are potential variants/combinations as well, e.g. a universal PID encoding/metadata retrieval mechanism could be combined with a minimal metadata standard/required typing information.

To move forward towards specific standards proposals, it would be useful to know:

* Are there other meanings of UPC that we should be considering? Other aspects to consider for the two outlined above?

* Are there other groups contemplating either of these topics already under another name?

* Do we think there is sufficient interest/ enough projects willing to standardize and adopt for either of these concepts to go forward with a case statement (addressing the multiple questions the steering committee has identified).[/list]
jimmyers
 
Posts: 4
Joined: Thu Jan 24, 2013 2:42 am

Re: UPC for Data

Postby parsonsm » Thu Feb 28, 2013 7:13 pm

jim,

How does this effort relate to the work of the Persistent Identifier group? Also is UPC meant literally? What about QR codes?

You may also be interested by work by the Integrated Earth Data Applications group at Lamont Doherty. They are establishing a UPC code for physical samples: http://www.geosamples.org/aboutigsn

cheers,

-m.
Mark A. Parsons
RDA/US
User avatar
parsonsm
 
Posts: 18
Joined: Sun Nov 04, 2012 8:44 pm

Re: UPC for Data

Postby jimmyers » Thu Feb 28, 2013 9:46 pm

Mark,
By PID group you mean the PID Info Type group? My imperfect understanding is that this group is concerned with standardizing ~preservation metadata across PID systems but not domain/descriptive metadata (e.g. that data is geospatial for example), and would provide a common API to get that metadata. If that's correct, there's partial overlap with the strict UPC concept to the extent that there's overlap in the minimal metadata that would be defined, but the idea of encoding the type info in the UPC would be different. For the UPC as a universal way to get data and any metadata available across PID systems, there's again some overlap in terms of a common API/mechanism (perhaps closer to the integration framework that was discussed on that group's email list). Beyond that, whether there's a good way to combine/leverage/interact - I think more discussion is needed (and we need to make sure we keep clear which UPC concept we're talking about as we discuss).

The UPC for physical samples example you gave is relevant, but a mix: it is a new PID scheme that identifies instances using a hierarchical naming scheme and UPC/QR style encoding. Right now I think we have the concepts of a) creating a new identifier scheme for data types, or b) a universal data instance/metadata resolver mechanism that would not involve minting new IDs.

As for bar/QR printouts - I don't think we've talked much about representation beyond text/URL - unlike the physical sample case where a printed label is useful, I think we were primarily staying in the digital realm in our discussion.

As always - trying to represent the group but acknowledging personal bias - others should feel free to chime in!

-- Jim
jimmyers
 
Posts: 4
Joined: Thu Jan 24, 2013 2:42 am

Re: UPC for Data

Postby parsonsm » Fri Mar 01, 2013 4:41 pm

Thanks Jim, that clarifies. Definitely want to keep the two groups linked, but it sounds like they are.

Also I think I was taking your UPC metaphor a little to literally.
Mark A. Parsons
RDA/US
User avatar
parsonsm
 
Posts: 18
Joined: Sun Nov 04, 2012 8:44 pm

Re: UPC for Data

Postby tobiasweigel » Tue Mar 12, 2013 3:53 pm

jimmyers wrote:Mark,
By PID group you mean the PID Info Type group? My imperfect understanding is that this group is concerned with standardizing ~preservation metadata across PID systems but not domain/descriptive metadata (e.g. that data is geospatial for example), and would provide a common API to get that metadata. If that's correct, there's partial overlap with the strict UPC concept to the extent that there's overlap in the minimal metadata that would be defined, but the idea of encoding the type info in the UPC would be different. For the UPC as a universal way to get data and any metadata available across PID systems, there's again some overlap in terms of a common API/mechanism (perhaps closer to the integration framework that was discussed on that group's email list). Beyond that, whether there's a good way to combine/leverage/interact - I think more discussion is needed (and we need to make sure we keep clear which UPC concept we're talking about as we discuss).


Hi Jim - that sounds roughly correct. UPC would in our current PID Info Types WG world view would be one system/infrastructure among others which supports such minimal metadata and can potentially implement a common API to work with said metadata. The latter is one of the most important outcomes of the PID Info Types WG.
Differences probably occur in the level of detail regarding the minimal metadata - this will become one of the most elaborate discussion points during our WG activity. Our strategy to cope with the potentially infinitely large topic is to focus on a very limited number of use cases and try to enable these across infrastructures - including UPC if you can make it! So it would be good to sort these overlaps out so that
- the UPC WG benefits from the core types and a seamless integration through the API
- the PIT WG benefits from potential use cases and uptake by another suitable infrastructure

Best, Tobias

PS: Usually I try to avoid the term metadata, particularly because there is also a metadata WG (actually two of them :)), but I feel that we agree on the line of separation; minimal, preservation-type metadata as opposed to domain metadata. The Data Foundation WG is revolving around "external" and "internal" properties, which can be another suitable terminology.
Tobias Weigel, DKRZ
tobiasweigel
 
Posts: 26
Joined: Mon Oct 29, 2012 9:01 am
Location: DKRZ, Hamburg

Re: UPC for Data

Postby jimmyers » Sat Mar 16, 2013 9:07 pm

tobiasweigel wrote:Hi Jim - that sounds roughly correct. UPC would in our current PID Info Types WG world view would be one system/infrastructure among others which supports such minimal metadata and can potentially implement a common API to work with said metadata. The latter is one of the most important outcomes of the PID Info Types WG.
Differences probably occur in the level of detail regarding the minimal metadata - this will become one of the most elaborate discussion points during our WG activity. Our strategy to cope with the potentially infinitely large topic is to focus on a very limited number of use cases and try to enable these across infrastructures - including UPC if you can make it! So it would be good to sort these overlaps out so that
- the UPC WG benefits from the core types and a seamless integration through the API
- the PIT WG benefits from potential use cases and uptake by another suitable infrastructure

Best, Tobias

PS: Usually I try to avoid the term metadata, particularly because there is also a metadata WG (actually two of them :)), but I feel that we agree on the line of separation; minimal, preservation-type metadata as opposed to domain metadata. The Data Foundation WG is revolving around "external" and "internal" properties, which can be another suitable terminology.


Tobias,

Thanks for the ping!

Definitely want to coordinate across groups and will plan to connect at the meeting. I was hoping that the UPC group, with either concept discussed previously would really stay on the service/api side and coordinate with others on what metadata/vocabulary would be transmitted.

From my recollection of the first working meeting, I think our group is really thinking about use cases relevant to data consumers and data consuming applications. If you're a researcher looking for data, the domain type is probably more relevant for the initial discovery than structural/preservation info, but once you know its relevant, you'd quickly want to understand how you could use it/what it looks like. Similarly, for the instance level metadata flavor of UPC - the goal is to be able to retrieve and display information to the consumer that represents any and everything the holder knows about it. If that info is standardized, applications can do interesting things with it, but there's real value in just presenting what's know to the user and perhaps letting them map info into local vocabularies.

I think a hope running through our approach is that, while issues of how you mint good identifiers, what you should identify, and what minimal metadata you should provide seem complex and probably hard to make universal given the broad range of interests, it seems like defining a common service/api to get the thing and the metadata about the thing could be a no-brainer and enable further extension - if you can send any metadata, groups can add to the minimum for their purposes and keep going.

Hoping to have some good discussions from the UPC group and with the wider community at the RDA meeting to see how/where we have overlap and to see whether this is really something we can separate from the other discussions and turn into a usable standard.

Cheers,
Jim
jimmyers
 
Posts: 4
Joined: Thu Jan 24, 2013 2:42 am


Return to RDA Discussion Area

Who is online

Users browsing this forum: No registered users and 1 guest

cron