SLUG Mailing List Archives
Re: [SLUG] Re: Why XML bites and why it is NOT a markup language
- To: slug@xxxxxxxxxxx
- Subject: Re: [SLUG] Re: Why XML bites and why it is NOT a markup language
- From: Jamie Honan <jhonan@xxxxxxxxxxxxxxxx>
- Date: Mon, 13 Jun 2005 17:27:13 +1000
- User-agent: Mutt/1.4.1i
We're probably abusing the list here ...
> You can imprint a record-oriented structure onto a stream format
> by using tags in the stream but trying to support a stream by using
> a record format is really ugly (not impossible). It is desirable
> to have a format that makes it easy to build higher level formats on
> top of rather than a format which is already so high level that it
> becomes cumbersome for ordinary tasks.
Going back to my DVB example, the MPEG data is a natural fit inside
of 188 byte blocks. However, the program guides and other info
aren't. So they have sequence numbers, continuation markers
and other sundry transport stuff. Nasty, but neccessary because
of temporal data needs.
> Providing you don't want to seek an XML stream...
Seeking. Ohoh, you've introduced new material at a late stage!
Seeking and stream oriented may be asking a bit much. Seeking
is very hard when you have escape characters because the
block lengths may not be exactly the same.
> Not at all. Validation is a higher level function and should
> be treated as such. Building a layered architecture is far more
> reliable, flexible and maintainable than building a monolithic
> architecture. Thus the job of the parser is to: read the raw data
> stream; identify the tags; identify the data blocks and provide
> an API that gives access to these entities (and nothing else).
I guess xml proponents would be saying that xml is that higher
> XML does not remove the requirement to sanity-check the data
> you are given.
Yes, that is true. That is why I mentioned UBF in a previous
email. It has a validation stage which is based on
a contract definition between two applications.
> belongs to... but all that is an optional add-on. As I mentioned earlier,
> you can take a robust protocol and deliberately make it brittle if that
> is useful for your application but you can't go the other way. So start
> with something simple and robust and add the brittle bits if and when
> you happen to need them.
This is the bit I don't buy. You seem to want to add redundancy
to get around problems with a transport layer (or is it human
If it's transport problems, then you need to design something
special for the kind of transport you are supporting, e.g.
like DVB does. If it's human problems, well, I just don't think
you'll ever solve the problem of humans getting away with the
least amount of effort...
> Then again, you could look at how Wikipedia keeps metadata and that
> also supports high quality resynchronisation because it depends on
> particular patterns being detected and anything that doesn't match a
> I mean, if XML is so fantastic, why does every Wiki avoid it
> (for the wiki data pages at any rate)?
Now you're really getting to the meat of what many people
find wrong with xml. It's butt-ugly. It requires a fairly large
No-one wants to type in raw xml.
> ... until you want to seek, or someone manages to introduce a single
> byte of bad data into your database though perhaps a bug or simple
> user idiocy that no programmer was expecting, or someone wanted to
> tweak a record in the data but there was no program function for it
> so they just tweaked the record in a text editor (and screwed something
> at the same time) or all of those other things that really do happen
> especially when everyone is sure they can't happen.
Now hang on a minute. We've just gone down the road of special
escape characters, low ascii characters, and maximum transport lengths.
You can't use an editor successfully with these.
'Someone just tweaked a record'.
See, you can't have unreliable data and a bunch of ad-hoc rules
and end up with reliable data.
Maybe the parser could give you more sensible warnings and errors,
yes, but it still can't fix your mistakes reliably.
> I'm 100% confident that XML will cut itself off at the knees.
> Programming fads go in approximately 10 year cycles and I think XML is
> about 5 years in and already people are asking "why bother?"
Well, xml is a simplified version of SGML, meant to strip out the
over-complicated features of SGML. I have a copy of Goldfarb's
SGML Handbook, published in 1990. In it, it is claimed the
first working draft of the SGML standard was published in 1980.
That's 25 years.
I think the trend is towards more correctness of data, more
strictness in specifying inter-process communication, away from
There are problems with xml in certain domains; it's verbose,
it's not very readable, it's expensive to parse for certain
It doesn't lend itself to every problem domain, and yet is often
pushed to do so (soap, xlst to name a few off the top of my head).
> The other cool thing is that since XML is so nicely structured,
> any existing documents that DO go through XML parsers can rapidly
> be converted to any other tagged format. If a new format comes along
> that does everything XML does AND is easier to use AND more robust
> AND supports more documents then adoption is relatively painless.
Hey, that's one of the promises of xml! See, it can't be all bad.