Thunderstone Search Appliance Manual

Unknown File Formats

Syntax: select Exclude or Include

Unknown File Formats controls how files in an unknown format (e.g. binary content not identified as PDF, Word, text, etc.) are handled. If set to Exclude (the default), such unknown formats' data will be ignored; this avoids bloating the crawl database and query autocomplete dictionary with garbage binary content.

If set to Include, the data will be included. This might help find words in otherwise-unsearchable binary files, but is unlikely to succeed: since the file format is unrecognized, all that can be done is a simple strings-like scan for ASCII words in the file. If the file does not store words in an ASCII format, only garbage binary content will be returned.

Note that unlabeled plain-text files - i.e. those not identified by MIME type nor by file extension ".txt" - will generally be identified by a natural language scan (if running Texis 7.01 or later), and properly passed as-is. This setting only applies to files that fail that test, i.e. are unlikely to be plain text. Added in Appliance 9 / Webinator 7.

