Archiving websites for research
Posted on October 10, 2002 @ 12:37 in Research
A short while ago I posted to the Association of Internet Researchers mailinglist (Air-L), asking if those gathered there knew some good programs to archive websites for research. (Most of the messages about this topic can be found in the archives, September 2002 and October 2002.) A good deal of programs were mentioned, some commercial, some freeware. I haven't yet had time to try all of them out, but I will over the next few months and this post will get updated with reports on the different programs. For the moment there's a list with the suggested programs and some preliminary thoughts.
One of the problems with archiving websites is that some webdesigners use JavaScript to control hyperlinks, for instance for roll-over images or pop-up windows. Unfortunately there is a wrong and a right way to use JS to control hyperlinks and most designers only appear to be aware of the wrong way to do it. It is noted for the programs if they can deal with JS hyperlinks or not.
Webcopier
I tried the V3.0 (currently V3.2) trialversion and it seems a fairly capable program, with lots of parameters for what and how to archive. Appears not to support JS links; payware.
SuperBot
Haven't tried it yet. Freeware program, appears simple but decent enough. No apparant JS link support.
SurfSaver
Haven't tried it yet. Browser add-on, creates searchable archive but looks a bit limited from the features. Free version and pay-for Pro version. No mention of JS link support.
Adobe Acrobat
Haven't tried it for archiving websites, but apparently it can do that too, presumably in PDF format. No mention of JS link support; payware.
Site Snagger
Haven't tried it yet. Appears to be abandonware for now, last mention of version 1.2. No apparant support for JS links; freeware?
Teleport Pro
Haven't tried it yet. Looks fairly professional, claims JS link support, payware.
Offline Commander/Internet Researcher
Haven't tried it yet. Internet Researcher is the Pro version of Offline Commander. Looks fairly professional, allows custom plugins for refined parsing and claims JS link support; payware.
GNU wget
Open Source command line app (there is a GUI). Very versatile and powerful, creates perfect mirror copies of websites and offers possibility of not converting hyperlinks, so it will also function to create backups (its original function). Unless you plan to recompile the source, no standard support for JS links.
Offline Explorer
Haven't tried it yet. Looks fairly professional, no mention of JS link support though; payware.
Internet Explorer 5+
You can also archive websites with IE, which works pretty well (but doesn't appear to support JS links), but setting it up is a bit complicated. 1) add URL to favorites, 2) right-click bookmark and select 'Make Available Offline', 3) click through the wizard and don't change any of the settings, 4) click Finish, 5) right-click the bookmark again and select 'Properties', 6) go to the Download tab, 7) select the required linkdepth (only up to 3 levels deep) and uncheck the 'Follow links outsite of this website's domain' option.
Netscape 6+
Haven't looked into it, but should also offer offline browsing.
Net Snippets
Not really for archiving complete websites, but appears an interesting research tool. Allows archiving and annotation of "snippets" or selections of websites and creates 'bibliography' of snippets automatically.
Comments and Trackbacks
Have a look at SiteImprove.com
they have a very affordable webarchiving online service - definitely worth checking outPosted by Jessica Riley on April 22, 2004 @ 11:27
Post a comment
Comments and trackbacks have been closed on this site. My apologies.
Since MT-Blacklist inexplicably stopped working I had no other recourse than close comments and trackbacks to stop the spam. I've been meaning to correct this for quite a while, but life got in the way... in a good way I should add.