- To: Peter Hardy <peterhardy@xxxxxxxxxxxxxx>, Silly Linux Users Group <slug-chat@xxxxxxxxxxx>
- Subject: Re: [chat] Re: [SLUG] apache access log -- "GET /robots.txt HTTP/1.0" 404 284
- From: Andre Pang <ozone@xxxxxxxxxxxxxxxx>
- Date: Tue Sep 4 15:46:02 2001
- User-agent: Mutt/1.3.20i
On Tue, Sep 04, 2001 at 03:26:32PM +1000, Peter Hardy wrote:
> ...and then Matt Allen said:
> > That one is a search engine spidering your site.
>
> I was interested to note that wget also accesses robots.txt, although I
> don't know if it actually acts on the contents. Anybody?
it does. wget downloads /robots.txt to find out whether the
web{master,mistress} wants to keep automated robots (aka
spiders) out of their websites. robots.txt contains a list of
directives for robots to follow -- eg "don't go into this
directory". i think that all web search engines respect the
robots.txt file.
of course, you can always ignore the file if you're using wget.
--
#ozone/algorithm <ozone@xxxxxxxxxxxxxxxx> - trust.in.love.to.save