Optional
errorDetermines the desired error rate. A lower number yields more reliable
results but makes the index larger. The value defaults to 0.0001
(or
0.01%).
The fields to index, provided as an array or as a record of field keys and weight values.
Optional
minMinimum term cardinality used to calculate the Bloom filter size. This can
be used to reduce false positives when dealing with small documents with
sparse term frequency distribution. The default value is 0
.
Optional
ngramsIndexes n-grams beyond the single text tokens. A value of 2
indexes
digrams, a value of 3
indexes digrams and trigrams, and so forth. This
allows seaching the index for simple phrases (a phrase search is entered
"between quotes"). Indexing n-grams will increase the size of the
generated indices roughly by a factor of n. Default value is 1
(no
n-grams are indexed).
Optional
preprocessPreprocessing function, executed before all others. The function serialises
each field as a string
and optionally process it before indexing. For
example, you might use this function to strip HTML from a field value. By
default, this class simply converts the field value into a string
.
Optional
seedHash seed to use in Bloom Filters, defaults to 0x00c0ffee
.
Optional
stemmerAllows plugging in a custom stemming function. By default, this class does not change text tokens.
Optional
stopwordsFilters tokens so that words that are too short or too common may be excluded from the index. By default, no stopwords are excluded.
Determines which fields in the document can be stored in the index and returned as a search result.
Optional
termOptimises storage by grouping indexed terms into buckets according to term
frequency in a document. Defaults to [1, 2, 3, 4, 8, 16, 32, 64]
.
Optional
tokenizerAllows a custom tokenizer function. By default content is transformed to
lowercase, split at every whitespace or hyphen, and stripped of any
non-word (A-Z
, 0-9
, and _
) characters.
Configuration options for creating a search index.