Configuration

Site Valet Configuration is now managed entirely through configuration files at your site: valet.txt as described above, and robots.txt which is parsed according to standard robot rules to determine which pages should or shouldn't be visited.

Controlling the spider with robots.txt

The Site Valet spider follows standard robot rules for user agent Site Valet. These are documented at robotstxt.org, and we won't repeat them here.

Example

Suppose your site is http://www.foo.bar.tld/. You want to monitor the site, except for a couple of parts:

  1. You have a busy bulletin board at http://www.foo.bar.tld/bbs/. Posts will come and go rapidly, and it would be pointless for any robot to index or check them. So you use a rule:
    User-Agent: *
    Disallow: /bbs/
  2. You have a set of documents that will never change, and have no external links to worry about, at http://www.foo.bar.tld/esoteric/. You do want this indexed by the search engines, but it would be pointless for Site Valet to monitor it. The rule for this is:
    User-Agent: Site Valet
    Disallow: /esoteric/

The valet.txt file

There is no need to read or understand the valet.txt file format: you may instead use the form to generate it for you.

Note that Site Valet will no longer automatically re-check your valet.txt. You must use the activation form whenever you wish to change your configuration.