Tugger the SLUGger!SLUG Mailing List Archives

Re: [SLUG] Why XML bites and why it is NOT a markup language


> Yes I realise that in an ideal world the <?xml?> tag would contain
> encoding information and yes I realise that in order to be correct UTF-8
> it must encode characters above 127 in a special way and this encoding

I'm not an xml fan nor an expert (calling Mike and others), but as I
understand things you have valid xml (which includes valid encoding) and
_other stuff_.

Plainly, you have _other stuff_ pretending to be xml. It wouldn't
pass a validating parser, thus it is not xml.

That's the whole point. It has to pass strict rules to be called xml.

> And here comes the gist of this rant...

Good, hope it's cathartic.

> The fundamental difference between a programming language and a markup
> language is that a programming language can have parser errors and
> syntax errors whereas a markup language cannot (by definition) have
> any errors at all under any circumstances. The parser for a markup

Not as I see it. The markup must pass tests as well.

> language must be fully robust to all possible inputs and although it
> certainly can result in various severity of WARNINGs but nothing must
> stop the parser.

OK. Present random bits to your black box, stand back and declare,
"Call yeself a parser eh, well parse this Jimmy!"

> Fundamentally, XML is crap as a markup language because it simply
> isn't possible to build a fully robust parser. Worse than that, you
> can't recover state (even approximately) in the presence of a damaged
> document, XML is brittle, as brittle as any programming language.

No, xml is just overhyped. That's not its fault. xml requires more
resources than csv files, but it does more.

> Let's make a simple comparison... suppose I do all my data transfer 
> by simple tab delimited ASCII files with one record per line.
> If a line gets damaged, I might lose that line, I might even lose
> the line after the damaged line but at least I have the rest  of the
> document. If I jump into a plain ASCII file at a random location then
> I can scan around the local area until I find the end of a line and
> I can resynchronise to the local records. This technique can be used
> to perform a fast binary search directly into an ASCII file that is
> sorted by line -- can you do this with XML? Of course not... your
> basic parse-state is broken the moment you seek to anywhere at all,
> and that state is perpetually unrecoverable because something that
> looks like a tag can exist within a string or you can have a CDATA
> or some other stupid thing.

I think the key is 'validating'...

> I've got three answers to the above. Most importantly, you don't
> use markup language for talking to a mars rover... you use a 
> programming language and we all agree that programming languages
> are brittle and always will be. Another (still significant)
> point is that you can always take a robust language (e.g. simple
> TAB delimited text file) and make it robust by adding a CRC or
> some sort of signature system... you cannot take a brittle
> language and make it robust. 

I'm having trouble parsing this :) robust and robust?

Maybe the difference between communication protocol and
markup?

> Finally, your brittle language
> still isn't full protection against a comms failure because
> sometimes a single bit flip (like turning "1" into "3") will
> have disastrous consequences to the command but will look fine to
> the parser. So the parser isn't real safety, it is at best a
> false sense of safety. You still need CRCs and the like.

That's got to be a different issue. You parse after you get
known correct data ...

> Thus if anyone is going to design a communications language it
> should be a robust and that means it can recover from problems
> and can guarantee resynchronisation from an arbitrary seek.
> XML doesn't live up to the promise of being a universal markup
> language because it is too annoying an too brittle.
> 
> By the way, how DO I get perl to read such a file?
> 
> Do I have to write my own parser?

Hack-o-matic. You think it's mac, yah? Maybe run it first
through something that converts to UTF-8 or whatever then try
to parse. But, fundamentally, you've been lied to. It isn't xml.

BTW:

A page of xml alternatives:
http://www.pault.com/pault/pxml/xmlalternatives.html

Some of my favourites, which you will hate because they
are similarly strict:

yaml (good perl support, very readable by humans)
http://www.yaml.org/

sexp
(two implementation versions - s expressions, think lisp data)

and my current belle-o-the-ball:
ubf

http://www.sics.se/~joe/ubf/site/home.html

ubf is stricter than xml, smaller and good for marshalling.

I'm glad you're cranky Telford, seeing someone else in a bad
mood makes me feel better.

Regards
Jamie