- To: "Peter Rundle" <slug@xxxxxxxxxxxxxxxxxx>
- Subject: Re: [SLUG] Spider a website
- From: "James Polley" <james@xxxxxxxxxx>
- Date: Tue, 3 Jun 2008 15:51:48 +1000
- Cc: SLUG <slug@xxxxxxxxxxx>
- Dkim-signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=gamma; h=domainkey-signature:received:received:message-id:date:from:sender:to:subject:cc:in-reply-to:mime-version:content-type:references:x-google-sender-auth; bh=65c02Sk8hjOxECk2tt4zaUtewWY7jeu0eAvWhFwiI3g=; b=eAR/qzT889ObuCGeLGr+rIaKBxZAtTHso8aaiUEJGLA6fWIKNb104KpfDiqIfW2/Itd3P10z6hkCq6a+ojYidBOF06+uQe2JmOwv8ZI1A9eRZXKJeFsQcV7a0becESErB3T4sCSGXf7PNlWfFJus2p5VJmgE3mPDUDUrK37soq0=
- Domainkey-signature: a=rsa-sha1; c=nofws; d=gmail.com; s=gamma; h=message-id:date:from:sender:to:subject:cc:in-reply-to:mime-version:content-type:references:x-google-sender-auth; b=OPdc8gFjI+4wHHv6PIiCxSAF3nwD7gynXGc3fThuW9Qx6LJzaiZevtbxtzi0vBBYCUwcTVMUShDjENahkQZWi9CclVDTXxUPHK91YPeH3G+8mp+0pQS0FNb7WWEwMISISrIvXNS07k2QlV/VxPl1tSf6AOEz3gjiAg/+5Stxqrw=
wget-smubble-yew-get. Wget works great for getting a single file or a very
simple all-under-this-tree setup, but it can take forever.
Try httrack - http://www.httrack.com/. Ignore the pretty little screenshots,
the linux commandline version does the same job, just requires much
command-line-fu. It handles simple javascript links, is intelligent about
fetching requisites (images, css etc) from off-domain without trying to
cache the whole internet, is multi-threaded - and is actually designed
specifically for the purpose of making a static, offline copy of a website.
The user's guide at http://www.httrack.com/html/fcguide.html goes through
most common scenarios for you, and $DISTRO should be able to apt-get install
it for you. Urrr.. or whatever broken tool distros unfortunate enough not to
have apt-get use.
On Tue, Jun 3, 2008 at 2:20 PM, Peter Rundle <slug@xxxxxxxxxxxxxxxxxx>
wrote:
> I'm looking for some recommendations for a *simple* Linux based tool to
> spider a web site and pull the content back into plain html files, images,
> js, css etc.
>
> I have a site written in PHP which needs to be hosted temporarily on a
> server which is incapable (read only does static content). This is not a
> problem from a temp presentation point of view as the default values for
> each page will suffice. So I'm just looking for a tool which will quickly
> pull the real site (on my home php capable server) into a directory that I
> can zip and send to the internet addressable server.
>
> I know there's a lot of code out there, I'm asking for recommendations.
>
> TIA's
>
> Pete
>
> --
> SLUG - Sydney Linux User's Group Mailing List - http://slug.org.au/
> Subscription info and FAQs: http://slug.org.au/faq/mailinglists.html
>
>
--
There is nothing more worthy of contempt than a man who quotes himself -
Zhasper, 2004