SLUG Mailing List Archives
Re: [SLUG] Why XML bites and why it is NOT a markup language
- To: slug@xxxxxxxxxxx
- Subject: Re: [SLUG] Why XML bites and why it is NOT a markup language
- From: Jamie Honan <jhonan@xxxxxxxxxxxxxxxx>
- Date: Thu, 9 Jun 2005 21:13:49 +1000
- User-agent: Mutt/1.4.1i
> Yes I realise that in an ideal world the <?xml?> tag would contain
> encoding information and yes I realise that in order to be correct UTF-8
> it must encode characters above 127 in a special way and this encoding
I'm not an xml fan nor an expert (calling Mike and others), but as I
understand things you have valid xml (which includes valid encoding) and
Plainly, you have _other stuff_ pretending to be xml. It wouldn't
pass a validating parser, thus it is not xml.
That's the whole point. It has to pass strict rules to be called xml.
> And here comes the gist of this rant...
Good, hope it's cathartic.
> The fundamental difference between a programming language and a markup
> language is that a programming language can have parser errors and
> syntax errors whereas a markup language cannot (by definition) have
> any errors at all under any circumstances. The parser for a markup
Not as I see it. The markup must pass tests as well.
> language must be fully robust to all possible inputs and although it
> certainly can result in various severity of WARNINGs but nothing must
> stop the parser.
OK. Present random bits to your black box, stand back and declare,
"Call yeself a parser eh, well parse this Jimmy!"
> Fundamentally, XML is crap as a markup language because it simply
> isn't possible to build a fully robust parser. Worse than that, you
> can't recover state (even approximately) in the presence of a damaged
> document, XML is brittle, as brittle as any programming language.
No, xml is just overhyped. That's not its fault. xml requires more
resources than csv files, but it does more.
> Let's make a simple comparison... suppose I do all my data transfer
> by simple tab delimited ASCII files with one record per line.
> If a line gets damaged, I might lose that line, I might even lose
> the line after the damaged line but at least I have the rest of the
> document. If I jump into a plain ASCII file at a random location then
> I can scan around the local area until I find the end of a line and
> I can resynchronise to the local records. This technique can be used
> to perform a fast binary search directly into an ASCII file that is
> sorted by line -- can you do this with XML? Of course not... your
> basic parse-state is broken the moment you seek to anywhere at all,
> and that state is perpetually unrecoverable because something that
> looks like a tag can exist within a string or you can have a CDATA
> or some other stupid thing.
I think the key is 'validating'...
> I've got three answers to the above. Most importantly, you don't
> use markup language for talking to a mars rover... you use a
> programming language and we all agree that programming languages
> are brittle and always will be. Another (still significant)
> point is that you can always take a robust language (e.g. simple
> TAB delimited text file) and make it robust by adding a CRC or
> some sort of signature system... you cannot take a brittle
> language and make it robust.
I'm having trouble parsing this :) robust and robust?
Maybe the difference between communication protocol and
> Finally, your brittle language
> still isn't full protection against a comms failure because
> sometimes a single bit flip (like turning "1" into "3") will
> have disastrous consequences to the command but will look fine to
> the parser. So the parser isn't real safety, it is at best a
> false sense of safety. You still need CRCs and the like.
That's got to be a different issue. You parse after you get
known correct data ...
> Thus if anyone is going to design a communications language it
> should be a robust and that means it can recover from problems
> and can guarantee resynchronisation from an arbitrary seek.
> XML doesn't live up to the promise of being a universal markup
> language because it is too annoying an too brittle.
> By the way, how DO I get perl to read such a file?
> Do I have to write my own parser?
Hack-o-matic. You think it's mac, yah? Maybe run it first
through something that converts to UTF-8 or whatever then try
to parse. But, fundamentally, you've been lied to. It isn't xml.
A page of xml alternatives:
Some of my favourites, which you will hate because they
are similarly strict:
yaml (good perl support, very readable by humans)
(two implementation versions - s expressions, think lisp data)
and my current belle-o-the-ball:
ubf is stricter than xml, smaller and good for marshalling.
I'm glad you're cranky Telford, seeing someone else in a bad
mood makes me feel better.