Frequently Asked Questions about PubCrawler (the standalone program)

This document covers the following topics:

Questions

Installation

  1. Where can I download PubCrawler?
  2. What are the system requirements for running PubCrawler?
  3. How do I find out which Perl version I have installed?
  4. How do I install the latest Perl version?
  5. What are the additional Perl modules?
  6. How do I find out if I have the required modules installed?
  7. What are precompiled libraries?
  8. How do I get information about my platform?
  9. My platform is not supported!
  10. How do I install the modules/libraries manually?
  11. What is an external command line browser?
  12. Why should I download the package into my home directory?
  13. I can't extract the package that I just downloaded!

Error Messages

  1. bash: perl: command not found
  2. Can't locate ???.pm in @INC....
  3. gzip: pubcrawler_???.tgz: not in gzip format
  4. bash: .../pubcrawler.pl: No such file or directory
  5. pubcrawler ERROR: cannot open configuration file pubcrawler.config at pubcrawler.pl line 1611.

Usage

  1. What does PubCrawler do when running in check-mode?
  2. What are the command line options for PubCrawler?
  3. How do I use the configuration file?
  4. Everytime my scheduler runs PubCrawler I get an e-mail with LOG-messages!


Answers

Installation

  1. Where can I download PubCrawler?
    PubCrawler is available for free and downloadable from the internet via FTP (File Transfer Protocol). Please visit the PubCrawler Program Page for more information on downloading and installing the program.

    See also:

  2. What are the system requirements for running PubCrawler?
    PubCrawler is available for Windows 95/98, MacOS, and Unix. You have to have Perl version 5 installed on your system and some additional modules that are necessary for HTTP-connections, namely LWP and HTML::Parser (more info). As an alternative to the modules an external command line browser can be used in connection with PubCrawler.
    Space requirements for the installation varies from 90 kB (source file and configuration file) to ~ 800 kB (source file, configuration file and precompiled libraries - Unix only).
    During the usage of PubCrawler a database is created which might grow up to a few megabytes in size - depending on the configuration and the searches carried out.

    See also:

  3. How do I find out which Perl version I have installed?
    In one of your terminals (xterm or DOS-box) type the following command:
    perl -v
    This runs the perl interpreter with the option '-v' which stands for 'version'. It should produce a message reading something like This is perl, version 5....


    See also:
  4. How do I install the latest Perl version?
    If you share your system with other users, it is recommended that the system administrator installs Perl to make it available for everyone.
    If you whish you can also install your own version in one of your local directories (it requires about 15 MB of space).
    Please check out Perls own site for information on downloading and installing the latest package.

  5. What are the additional Perl modules?
    When carrying out searches through the internet at NCBI, Perl uses additional modules (or libraries). These are pieces of code that are normally not included in the standard Perl-distribution. They are freely available on the internet and can be easily downloaded and installed.
    Beside the standard library Perl needs the following modules:
    1. LWP (Library for WWW access in Perl), which is part of the libwww-bundle, freely available at CPAN (check the modules/by-module/LWP-directory for a file called libwww-perl-5.36.tar.gz).
    2. HTML::Parser, necessary for parsing HTML-commands, also available from CPAN (check the modules/by-module/HTML-directory for a file called HTML-Parser-3.10.tar.gz).
    3. The former two modules depend on the URI-module to be installed (check the modules/by-module/URI-directory at CPAN for a file called URI-1.08.tar.gz).
    4. The former modules depend on the MIME::Base64-module to be installed (check the modules/by-module/MIME-directory at CPAN for a file called MIME-Base64-2.11.tar.gz).
    Copies of all of these four packages are also available from the Perl-modules-directory at the PubCrawler-FTP-site.

    See also:

  6. How do I find out if the required modules are installed?
    Perl 5 enables the use of modules as command line options. Make sure you have Perl 5 installed and enter the following command into your terminal (xterm or DOS-box):
    perl -MLWP -MHTML::Parser -e 42
    An error message like Can't locate LWP.pm in @INC... indicates that some of the required modules could not be found. This means that you either have to install PubCrawler with precompiled libraries, install the libraries manually, or use an external command line browser.
    If the prompt appears without any error messages after entering the command, the additional modules where found and the basic PubCrawler package (just the source file) should work on your system without the need to install any additional modules. Nevertheless you should run PubCrawler in check-mode before using it.

    See also:

  7. What are precompiled libraries?
    Since the additional Perl modules needed for PubCrawler are not available on every system and installation of these modules appears a bit daunting for inexperienced users, we have provided PubCrawler packages with precompiled libraries.
    Background info:
    For each platform accessible to us we have manually installed the additional Perl modules into a local directory - just like a normal user could do it. This involves creating a makefile, putting together files and sometimes compilation of code. The outcome is dependent on the underlying operating system and architecture (the platform), and also on the installed Perl version. After installing and testing the modules we have bundled the whole thing together as a package for that certain platform.
    Once the package is downloaded and unpacked the program is ready to use. In certain cases some minor changes might be necessary, which are explained in the README file included in the packages.
    NOTE: The way of providing precompiled libraries is not guaranteed to work on every platform. Although it does involve a bit more work, the safest way to use the additional modules is by installing them manually.

    See also:

  8. How do I get information about my platform?
    Type the following command into your (x-)terminal:
    uname -a
    This should provide you with information about the operating system and the architecture of your computer. If it doesn't - ask your system administrator.

  9. My platform is not supported!
    There are a vast number of Unix-flavors and even more combinations of OS and architecture. We can only provide packages for platforms that we have access to. If there is no PubCrawler package with precompiled libraries available for your particular platform, you could try one of the other packages. Please follow the steps in the README, which comes with the package, to make sure PubCrawler is working correctly.
    If you can't make the program run on your system, send an e-mail to pubcrawlerREMOVECAPShelp@gmail.com with a detailed description of your system.

    See also:

  10. How do I install the /modules/libraries manually?


  11. What is an external command line browser?
    As an alternative to using the additional Perl modules, PubCrawler can be run with the help of an external command line browser. We mean by this an additional program, an HTTP-client that works from the command line and can access pages on the World Wide Web. An example for this is Lynx To see if it is installed on your system enter the command
    which lynx
    This will show you the location of the program, if available.
    You can run PubCrawler with this browser by using the 'lynx' command line option or by specifying the according command in the configuration file.

    See also:

  12. Why should I download the package into my home directory?
    This recommendation is only important, when you are downloading a PubCrawler package with precompiled libraries. The reason for it needs some wider explanations:
    When PubCrawler is being executed Perl first looks in specific directories for all the additional modules that are needed. If it can't find them it will stop with an error message. Normally the location of the additional modules that come as precompiled libraries would not be known to Perl or PubCrawler since it could be anywhere on the file system. But in a section at the beginning of the PubCrawler Perl-script, some locations are added to the list which are being searched for these modules. One of them is $HOME/lib (or $ENV{'HOME'}/lib in Perl-lingo), and this is exactly the directory, where the precompiled libraries would reside if the package was stored in the user's home directory and unpacked from there.

    See also:

  13. I can't extract the package that I just downloaded!
    The packages provided for downloading are compressed archives. They can normally be extracted whith the following command, which combines decompression and unpacking:
    gzip -cd <name_of_the_package> | tar xovf -
    If a browser (like Netscape) was used for downloading the package might have been extracted already during that process. Executing the command mentioned above would result in an error message like gzip: pubcrawler_???.tgz: not in gzip format, whereas '???' stands for the identification of your package.
    In that case a simple
    tar xovf <name_of_the_package>
    (unpacking without decompression) should do the job.


Error Messages

  1. bash: perl: command not found
    If you are trying to run the Perl interpreter from the command line (for example to find out which version you have installed) and get a message like this, then Perl is probably not installed on your system. Sometimes using the command perl5 instead works.
    Ask your system administrator about the availability of Perl 5 or install the latest version, which is freely available on the internet.

  2. Can't locate ???.pm in @INC....
    This is a typical error message that occurs if a Perl program like PubCrawler is started, that makes use of certain modules or libraries. If these are not found in the search directories, which are listed in the Perl array '@INC', then the program exits with an error message, telling the user which modules couldn't be found and which directories have been searched for them.
    If the libraries are installed on the system, their location has to be made known to the program, by extending the list of directories that will be searched. Otherwise the specified modules have to be installed.
    In the case of PubCrawler packages with precompiled libraries are available for downloading.

    See also:

  3. gzip: pubcrawler_???.tgz: not in gzip format
    When trying to uncompress an archive that is not uncompressed anymore you will get an error message like that.
    This can happen when a PubCrawler package (a compressed archive) is downloaded with a browser (like Netscape) and simultaneously compressed during this process. When the steps given in PubCrawler's installation guide are followed precisely, the following command will cause such an error:
    gzip -cd <name_of_the_package> | tar xovf -
    In that case a simple
    tar xovf <name_of_the_package>
    (unpacking without decompression) should do the job.

  4. bash: .../pubcrawler.pl: No such file or directory
    The first line of the Perl script for PubCrawler (pubcrawler.pl) contains a line specifying the full path to the Perl interpreter. If this points to the wrong location you will get an error messager like this.
    To find out where Perl resides on your system, type the following command into one of your (x-)terminal windows:
    which perl
    This should present the full path which has to be inserted in the first line of your PubCrawler script (preceded by special characters #!).

  5. pubcrawler ERROR: cannot open configuration file pubcrawler.config at pubcrawler.pl line 1611.
    When PubCrawler is run by Cron, it sometimes happens that the configuration file could not be found. In this case it is advisable to specify its location explicitly in the crontab entry using the -c option:
    10 07 * * 1-5 "$HOME/pubcrawler.pl" -c "$HOME/pubcrawler.config"
    Another option is to specify a PubCrawler working directory. This could be for example "$HOME/PubCrawler". All of the PubCrawler-related files like ouput, log and database will be written into this directory (unless diverted by command line option). It will also attempt to read the configuration file from it. So place your pubcrawler.config into the directory, make sure it is readable:
    chmod 644 pubcrawler.config
    and add the following entry to your cron tab file:
    10 07 * * 1-5 "$HOME/pubcrawler.pl" -d "$HOME/"


    See also:


Usage

  1. What does PubCrawler do when running in check-mode?
    In check-mode PubCrawler carries out tests to verify your settings without querying NCBI with the specified searches. This is useful to make sure everything is working fine before launching a genuine run (or have it automatically launched at night).
    It can be evoked with the command
    pubcrawler -check
    followed by any other options. After a short introduction you have to hit <return> to start the check.
    Next, several files and variables are tested that are important for the execution of PubCrawler. Finally a connection to an HTTP-site (NCBI by default) is attempted to see if the internet is accessible with the current setup.
    Any errors encountered will result in messages printed at the end, that might help solving the problems.

    See also:

  2. What are the command line options for PubCrawler?
    The default behaviour of PubCrawler is determined by variables in a configuration file. Command line options offer the possibility of temporary changes. To find out, which options are available, have a look at PubCrawler's Technical Description or run PubCrawler with the '-h' option ('h' for 'help'):
    pubcrawler.pl -h

  3. How do I use the configuration file?
    The configuration file controls PubCrawlers behaviour at execution time and can be used for permanent settings. It also holds the search queries that are submitted to NCBI's Entrez.
    The PubCrawler package comes with a default configuration file which can be found after unpacking in the PubCrawler directory. It is named 'pubcrawler.config' and can be used as a template. Edit it for your adjustment. All known variables are provided. An option is deactivated when a variable has a value of zero (the number behind it) or when no value is given. A value of '1' or a text string activate the option.
    When executed, the program will first look for a configuration file in the PubCrawler working directory, if specified via command line option. The second place to look for it is the home directory as set in the environmental variable 'HOME'. Last place is the current working directory. If no configuration file could be found and not all of the mandatory variables are specified via command line options then the program exits with an error message.
    The best place for the configuration file is the home directory. Unix allows to create symbolic links which can be used for keeping the file in the PubCrawler directory but making it also appear in the home directory:
    ln -s ~/pubcrawler.config ~/
    presuming that the configuration file is located in the directory ~/PubCrawler (the tilde - '~' - will be evaluated by the operating system to your home directory).
    This will keep your PubCrawler files in the PubCrawler directory, but assures that the configuration file will be found when the program is run.

    See also:

  4. Everytime my scheduler runs PubCrawler I get an e-mail with LOG-messages!
    During normal execution PubCrawler prints messages to the terminal that inform the user about its progress. When PubCrawler is automatically started by a scheduler (like 'Cron') any output of this kind is normally mailed to the person who set up the task.
    Frequent mails with these LOG-messages can be quite annoying and to change PubCrawlers behaviour an option called 'mute' is provided to suppress every output to the terminal but error messages. It can be activated through command line option (pubcrawler -mute) or permanently by setting the value of the mute variable in the configuration file to '1';

    See also:


 PubCrawler's Home Page


Last modified at $Date: 2007/07/06 13:08:47 $