Tugger the SLUGger!SLUG Mailing List Archives

Re: [chat] Re: [SLUG] apache access log -- "GET /robots.txt HTTP/1.0" 404 284


...and then Andre Pang said:
> it does.  wget downloads /robots.txt to find out whether the
> web{master,mistress} wants to keep automated robots (aka
> spiders) out of their websites.  robots.txt contains a list of
> directives for robots to follow -- eg "don't go into this
> directory".  i think that all web search engines respect the
> robots.txt file.

Sorry, I wasn't clear enough.
I know what the robots.txt file is, and what it's for.  I've just noticed
that, when doing a recursive websuck with wget, it always fetches
/robots.txt as well, no matter what you're getting.
Does wget respect the contents of robots.txt?

Peter
peterhardy@xxxxxxxxxxxxxx
-- 
And it came to pass that in that time the Great God Om
spake unto Brutha, the Chosen One:
        'Psst!'
                 -- (Terry Pratchett, Small Gods)

Attachment: pgpJZPBCpVO2K.pgp
Description: PGP signature