- To: slug@xxxxxxxxxxx
- Subject: Re: [SLUG] Spider a website
- From: Daniel Pittman <daniel@xxxxxxxxxxxx>
- Date: Tue, 03 Jun 2008 15:15:00 +1000
- Organization: How about yours? http://rimspace.net/resume/
- User-agent: Gnus/5.110006 (No Gnus v0.6) Emacs/23.0.60 (gnu/linux)
Peter Rundle <slug@xxxxxxxxxxxxxxxxxx> writes:
> I'm looking for some recommendations for a *simple* Linux based tool
> to spider a web site and pull the content back into plain html files,
> images, js, css etc.
Others have suggested wget, which works very well. You might also
consider 'puf':
Package: puf
Priority: optional
Section: universe/web
Description: Parallel URL fetcher
puf is a download tool for UNIX-like systems. You may use it to download
single files or to mirror entire servers. It is similar to GNU wget
(and has a partly compatible command line), but has the ability to do
many downloads in parallel. This is very interesting, if you have a
high-bandwidth internet connection.
This works quite well when, as it notes, presented with sufficient
bandwidth (and server resources) to have multiple links fetched at once.
Regards,
Daniel