Tugger the SLUGger!SLUG Mailing List Archives

Re: [chat] Re: [SLUG] apache access log -- "GET /robots.txt HTTP/1.0" 404 284


On Tue, Sep 04, 2001 at 03:26:32PM +1000, Peter Hardy wrote:

> ...and then Matt Allen said:
> > That one is a search engine spidering your site.
> 
> I was interested to note that wget also accesses robots.txt, although I
> don't know if it actually acts on the contents.  Anybody?

it does.  wget downloads /robots.txt to find out whether the
web{master,mistress} wants to keep automated robots (aka
spiders) out of their websites.  robots.txt contains a list of
directives for robots to follow -- eg "don't go into this
directory".  i think that all web search engines respect the
robots.txt file.

of course, you can always ignore the file if you're using wget.


-- 
#ozone/algorithm <ozone@xxxxxxxxxxxxxxxx>          - trust.in.love.to.save