Mapping bibliographic record subfields to JSON

13. April 2011 um 16:26 4 Kommentare

The current issue of Code4Lib journal contains an article about mapping a bibliographic record format to JSON by Luciano Ramalho. Luciano describes two approaches to express the CDS/ISIS format in a JSON structure to be used in CoudDB. The article already provoked some comments – that’s how an online journal should work!

The commentators mentioned Ross Singer’s proposal to serialize MARC in JSON and Bill Dueber’s MARC-HASH. There is also a MARC-JSON draft from Andrew Houghton, OCLC. The ISIS format reminded me at PICA format which is also based on fields and subfields. As noted by Luciano, you must preserves subfield ordering and allow for repeated subfields. The existing proposals use the following methods for subfields:

Luciano’s ISIS/JSON:

[ ["x","foo"],["a","bar"],["x","doz"] ]


"subfields": [ {"x":"foo"},{"a":"bar"},{"x":"doz"} ]


[ ["x","foo"],["a","bar"],["x","doz"] ]

Andrew’s MARC/JSON:

"subfield": [
  {"code":"x","data":"doz"} ]

In the end the specific encoding does not matter that much. Selecting the best form depends on what kind of actions and access are typical for your use case. However, I could not hesitate to throw my encoding used in luapica into the ring:

{ "foo", "bar", "doz", 
  ["codes"] = { 
    ["x"] = {1,3}
    ["a"] = {2}

I think about further simplifying this to:

{ "foo", "bar", "doz", ["x"] = {1,3}, ["a"] = {2} }

If f is a field than you can access subfield values by position (f[1], f[2], f[3]) or by subfield code f[f.x[1]],f[f.a[1]],f[f.x[2]]. By overloading the table access method, and with additional functions, you can directly write f.x for f[f.x[1]] to get the first subfield value with code x and f:all("x") to get a list of all subfield values with that code. The same structure in JSON would be one of:

{ "values":["foo", "bar", "doz"], "x":[1,3], "a":[2] }
{ "values":["foo", "bar", "doz"], "codes":{"x":[1,3], "a":[2]} }

I think a good, compact mapping to JSON that includes an index could be:

[ ["x", "a", "x"], {"x":[1,3], "a":[2] },
  ["foo", "bar", "doz"], {"foo":[1], "bar":[2], "doz":[3] } ]

And, of course, the most compact form is: