Thunderstone Search Appliance Manual

The default rewalk type Refresh updates the existing database, and only downloads files that have been modified or created since the last walk. Pages that are no longer present on the server are removed from the database.

Here are other considerations for using Refresh. Pages that were referenced but were missing in the initial walk (the walk prior to the Refresh), but were added after the initial walk, will be missed by Refresh if their parent page has not been modified. If you change your settings to be more inclusive (i.e. add extensions, ignore robots, add domains, etc.), you should do a New walk once, because a Refresh is not likely to find the newly allowed data, unless all of the pages leading to this data have been modified.

If more than 30%-50% of your site changes between walks you may be better off using a New walk instead of Refresh. Also, many dynamic content generators do not give modified dates which will cause every page to be rewalked. In that case you should use New instead of Refresh.

