Thunderstone Search Appliance Manual

Refresh in version 5 vs. 4

In the Search Appliance version 4 and earlier, the refresh walk checked every page in the database to determine whether it needed updating. Since only changed pages need updating, and those are typically a small percentage of the site, checking for changed pages is faster than doing a complete new walk. However, it is still time-consuming, because the web server must be accessed for every page on the site, and only the web server can inform the Search Appliance whether the page has changed.

In the Search Appliance version 5 and later, there is an improved refresh process. The walk is adapted to focus on the small but important group of changing pages. As each page is walked, a refresh period is calculated for that individual page. The calculation is based on whether the page has changed since the last time it was fetched, and how long ago that fetch was. This refresh information is used to determine when the page should be checked again. In this way, the walk prioritizes the walking of pages that change often or are new, and it delays the fetch of pages that seldom change.

Thus, when a walk (scheduled or manual) takes place, only the pages that need to be refreshed now are actually fetched - not the entire database. The result is a database that is updated by a process that consumes fewer server resources.

