SLUG Mailing List Archives...and then Andre Pang said:
> it does. wget downloads /robots.txt to find out whether the
> web{master,mistress} wants to keep automated robots (aka
> spiders) out of their websites. robots.txt contains a list of
> directives for robots to follow -- eg "don't go into this
> directory". i think that all web search engines respect the
> robots.txt file.
Sorry, I wasn't clear enough.
I know what the robots.txt file is, and what it's for. I've just noticed
that, when doing a recursive websuck with wget, it always fetches
/robots.txt as well, no matter what you're getting.
Does wget respect the contents of robots.txt?
Peter
peterhardy@xxxxxxxxxxxxxx
--
And it came to pass that in that time the Great God Om
spake unto Brutha, the Chosen One:
'Psst!'
-- (Terry Pratchett, Small Gods)
Attachment:
pgpJZPBCpVO2K.pgp
Description: PGP signature