SLUG Mailing List Archives
Re: [SLUG] Re: Why XML bites and why it is NOT a markup language
- To: slug@xxxxxxxxxxx
- Subject: Re: [SLUG] Re: Why XML bites and why it is NOT a markup language
- From: Jamie Honan <jhonan@xxxxxxxxxxxxxxxx>
- Date: Sat, 11 Jun 2005 09:15:27 +1000
- User-agent: Mutt/1.4.1i
I can't believe I'm defending xml.
I'm not a fan of it, but a lot of thought went into it, there's
a lot of agreement on it, and there are some very good ideas in it.
> Correct... and that's what makes HTML successful. The whole "world wide web"
> thing simply would not have happened if we started out with something as
> strict and breakable as XML.
We will never know if this is true.
There are other aspects of human nature which also come into play.
Pride in doing the 'correct thing', for example. If you wrote to
the person at the ABC who set the xml up, they may take a degree
of pride in what they've done that they may want to correct it.
> Then we need to accept that XML is not particularly useful and we need to
> start looking for something better.
You might need to convert a few people to your point of view.
But, on the principle of one person doing the right thing being
a good start, let's look at your list.
> I'd like to coin the name "RML" which
> stands for "Robust Markup Language" which should have the following
Don't be bashful here, Telford. I suggest "TOTRML": Telford's One True
Robust Markup Language.
> desirable properties:
> * stream-oriented construction
Stream is good, yes. PNG, and JPEG are streamable. ASF is streamable
and AVI suffers cause it isn't. However, not all data is streamable.
Thus xml parsers present data as callbacks (kind of streaming),
or a walkable tree.
> * byte-oriented construction (no 16 bit encodings at all)
You mean no unicode? As opposed to no binary mode?
> * supports arbitrary tags
Ah. You've just lost validation. You now can't prove your data
conforms to an agreed DTD.
That's OK, because I suspect Telford is talking Telford protocol
to Telford at this point.
> * supports parametric tags
Lot's of people don't like parameters. They think they should be
in the data part. I don't mind, we are changing the world here.
> * never allow tags inside a tag definition
Hmm. You mean no heirarchical tags? Not sure here. Or you mean
tags are atomic... Fair enough.
> * NO guarantee of tags making a perfect tree (but parser can provide
> information about tree or partial-tree structures if they exist)
It's rafferty's rules, anyway.
> * when tags are all next to one another, ordering is NOT important
> (thus italic/bold is the same as bold/italic)
Order not important. OK, can't marshall arrays.
> * at most one parameter per tag and not named parameters
> (because named parameters bend your head and get very complex and
> require special syntax and further because it is always better to
> introduce a new tag than introduce a new named parameter)
Simplicity is good. Damn parameters. I've always hated them in
> * supports guarantee of resynchronisation to tag boundary after an
> arbitrary seek into the file (scanning forwards or backwards) and
> something that "seems to be" a tag boundary always IS a tag boundary
Ah, we need an escape character mechanism.
> * case insensitive tag matching (for English at least plus any other
> language that sensibly defines mixed case)
Character encoding set.
> * damaged files can be recovered by an automatic process at least to
> the extent that lost data is proportional to the amount of damage
By resyncing. But how far? DVB, have 187 (?) bytes. BUT that is a transport
protocol. You put your packets together to make a block of data.
Of course, if all you ever do is have files, complete atomic
gobs of data, the 187 bytes and resyncing and escape characters
is all very inefficient.
That's why you might go for a reliable transport protocol and
then try to parse known, good data. Whoops, that's the xml premise...
> * don't use closing tags at all, instead use the single parameter of
> the parametric tags to "update" that type of tag. e.g.:
.... Useful suggestions deleted for brevity ....
> * non-ascii encodings can never break the basic tag structure,
> so the parser can detect interesting encoding anomalies but can
> still continue to scan the file
> That's my wishlist... probably won't get done this afternoon but at least
> it is down on record so that when everyone is old and grey and some young
> guy says "I've invented this new tagging system that fixes the XML
> nightmare that has plagued the world for so long" I can give him a link
> to the Slug archives and say "told you so".
More power to you Telford. What we need is a special ISO sub comittee
and some funding to study this problem further, and more in depth.
Preferrably, somewhere warm at the moment. Let's get our application in
before those xml b******s cut us off at the knees.