Monday, August 8, 2011

Angle Bracket Backlash In Protocol Development: XML vs JSON

At IETF 81, I found myself in more than one discussion about XML vs JSON all of which had the tone "XML sucks, we are all moving to JSON".  Having been in one or two angle bracket fiascos in my life, I understand the sentiment. But I just can't square the points being made.

First, there was a repeated talking point that JSON usage is for Python and Ruby on the server side, the unsaid implication being that XML can't be done in those languages. This is false; Python has, and has had for sometime, XML APIs. The same is true of Ruby, which does not have a built-in JSON parser unless one is considering JSON as a proper subset of YAML (more on this in a bit).

Beyond that, any talk of scrapping XML for JSON meanders to two other topics: a JSON schema language and namespaces. Officially, JSON has neither.

Schema languages are important for writing protocol specifications as they greatly help reduce ambiguity in prose. It is why the IETF has its own form of BNF. In some XML-based provisioning protocols, such as EPP, validation against a schema is a useful guard to prevent bad data being committed to a database. JSON does have a schema effort: http://jason-schema.org. However, the draft seems to have expired indicating that the effort is stumbling (let's hope I'm wrong and work continues).

Namespaces are important to stop protocol element collisions from separate specifications and are frequently used as a protocol extension mechanism. There are many proposals for JSON namespaces, the first I encountered being the Badgerfish convention. It works and provides all the functionality you get with XML namespaces, but I think it is fairly obvious that the results are just as bad, if not worse, than what you get in XML.

There are other proposals, such as using domain names (and there is an even simpler version of that, but I can't find the link). This proposal, and ones like it, boil down to always using the same globally unique prefix (more on why the dotted notation is not good in a bit). Anybody who has done their fair share of C coding would be familiar with the concept, which is why C++ has namespaces, Java has packages, Ruby has modules, etc... in theory it sounds good, in practice not so much.

The Rise of JSON

Truth be told, JSON's popularity comes from the fact that Javascript in the web browser easily handles JSON. If Java applets had ever caught on, perhaps we'd be talking about Java Serialization instead of JSON and RMI instead of Ajax (incidentally, Java Serialization has ready answers to the schema and namespace quandaries). This leads to some insight from Tim Bray:
Seems easy to me; if you want to serialize a data structure that’s not too text-heavy and all you want is for the receiver to get the same data structure with minimal effort, and you trust the other end to get the i18n right, JSON is hunky-dory.
Seems easy to me; if you want to serialize a data structure that’s not too text-heavy and all you want is for the receiver to get the same data structure with minimal effort, and you trust the other end to get the i18n right, JSON is hunky-dory.
I should point out that the "minimal effort" is when the language in use is Javascript. To better understand this, let us talk about that Ruby+JSON nexus mentioned above.

If a Ruby developer were to do what comes natural to Ruby developers and serialize an object hierarchy with the built-in YAML engine, the result would not be digestible by a Javascript engine. The results would be specific to the Ruby object hierarchy and class structure, and the resultant syntax is probably not JSON compliant (all JSON syntax maybe YAML compliant, but the opposite is not true).

The results would be a little better with the Ruby program ingesting JSON, but it wouldn't be the "Ruby way", meaning user defined objects instead of an array or hash. However, such is possible with XML with a little code, as demonstrated here... which brings up the point as to why the dotted name syntax has problems: those dots disallow usage as an object variable (the same is true for XML elements with dashes, or minus signs, in them).

The Fall of JSON

In Googling around for JSON information, I noticed that a lot of the XML vs JSON discussions and JSON schema/namespace proposals are years old, thus giving me the impression that the movement forward on JSON has slowed considerably. This may not be true, but that's the impression I got.  If so, let me offer this hypothesis.

JSON gained rapid popularity because of the heightened and new use of AJAX-like widgets on web sites, the client side being written in JavaScript. These things were hot stuff... three years ago. But nowadays mobile programming is all the rage, which means Objective-C for iOS and Java for Android and the natural nature of JSON evaporates on the client-side.

Conclusion

I hope this post doesn't make it sound like I'm an XML apologist. There are certainly a slew of issues with XML and a large number of misuses of it (SOAP, WS-Everything). Something new and better would be good, but the that new and better needs to have more clearly defined benefits than the merely subjective "Its just simpler."

Some Good Reads on This Topic
  1. The Limitations of JSON
  2. JSON and XML - Tim Bray
  3. On Namespaces in JSON - Michael Hanson
  4. XML and JSON - James Clark
    • Money Quotes: "You can't partition the world of information neatly into documents and data."
  5. JSON Namespacing - Kris Zyp

Seems easy to me; if you want to serialize a data structure that’s not too text-heavy and all you want is for the receiver to get the same data structure with minimal effort, and you trust the other end to get the i18n right, JSON is hunky-dory.
Seems easy to me; if you want to serialize a data structure that’s not too text-heavy and all you want is for the receiver to get the same data structure with minimal effort, and you trust the other end to get the i18n right, JSON is hunky-dory.
Seems easy to me; if you want to serialize a data structure that’s not too text-heavy and all you want is for the receiver to get the same data structure with minimal effort, and you trust the other end to get the i18n right, JSON is hunky-dory.

0 comments:

Post a Comment