Tugger the SLUGger!SLUG Mailing List Archives

Re: [SLUG] Spider a website


You could use wget to do this, it's installed on most distributions by default.

Usually you'd run it like this: wget --mirror -np http://some.url/
(the -np tells it not to recurse up to the parent, which is useful if you only want to mirror a subdirectory. I add it on out of habit.)

It's not always perfect however, as it can sometimes mess the URLs up, but it's worth a try anyway.

On 03/06/2008, at 2:20 PM, Peter Rundle wrote:

I'm looking for some recommendations for a *simple* Linux based tool to spider a web site and pull the content back into plain html files, images, js, css etc.

I have a site written in PHP which needs to be hosted temporarily on a server which is incapable (read only does static content). This is not a problem from a temp presentation point of view as the default values for each page will suffice. So I'm just looking for a tool which will quickly pull the real site (on my home php capable server) into a directory that I can zip and send to the internet addressable server.

I know there's a lot of code out there, I'm asking for recommendations.

TIA's

Pete

--
SLUG - Sydney Linux User's Group Mailing List - http://slug.org.au/
Subscription info and FAQs: http://slug.org.au/faq/mailinglists.html