1. Copy or unzip/untar all files into a directory.
2. Install the following pre-requisite Perl modules as 'root' or
into a local Perl module directory.
XML::DOM
Cwd
LWP
If you can install packages as root, run
perl -e "use CPAN; shell"
then "install " for each package
(this is highly recommended since there are lots of module
dependencies for XML::DOM)
If you cannot install packages as root, select a local directory,
download each package from CPAN (www.cpan.org) and follow the
installation instructions. Specifically, for each module (and their
dependencies) you may need to do something like
perl Makefile.PL PREFIX=/home//perlpackages
make
make test
make install
see the following sites for more information on how to do this
http://www.singlesheaven.com/stas/TULARC/webmaster/myfaq.html#7
http://www.iserver.com/support/virtual/perl/mod/install.html
3. Edit the configuration file (harvest.pl) to include all the archives
you want to harvest from and their ids/metadata formats/sets/etc..
4. If necessary, run "make" to compile the man pages. These will not
be installed in the regular locations so to access them you may need
to specify something like
man ./Harvester.3
5. Test the harvester by running harvest.pl
6. Write your own Perl modules that subclass Harvester (using TestHarvest
as a sample) to perform whatever you need done.
7. Add a line to "chdir /" at the top of
harvest.pl so that the script is always run in the correct context.
Install a line in your crontab file to run the harvester as often as
you want it to check for the need for more harvesting (the schedule
in the configuration file determines whether or not harvesting is
needed). For example, if you want to check every 10 minutes:
*/10 * * * * //harvest.pl
|