Pacote
    Preparing search index...

    Class BloomSearch<Document, SummaryField, IndexField>

    Indexer and searcher based on Bloom filters.

    import { BloomSearch } from '@pacote/bloom-search'

    const bs = new BloomSearch({
    fields: ['text'],
    summary: ['id'],
    })

    bs.add('id1', { id: 1, text: 'foo bar' })
    bs.add('id2', { id: 2, text: 'foo baz' })

    bs.search('foo +bar') // => [{ id: 1 }])
    bs.search('foo -bar') // => [{ id: 2 }])

    Type Parameters

    Index

    Constructors

    Properties

    errorRate: number

    Error rate used in all Bloom filters to generate document signatures.

    fields: Record<IndexField, number>

    A record containing the name of all indexable fields and their relative weight used to rank search results.

    index: Index<Document, SummaryField> = ...

    Collection containing document summaries and Bloom filter signatures used to search, with document shorthand reference identifier used as keys.

    minSize: number

    Minimum term cardinality used to calculate the Bloom filter size. This can be used to reduce false positives when dealing with small documents with sparse term frequency distribution.

    ngrams: number

    The n-grams to store in the index. Defaults to 1 (no n-grams).

    seed: number

    Hash seed to use in Bloom Filters, defaults to 0x00c0ffee.

    summary: SummaryField[]

    An array with the names of fields to preserve as summary, and which are returned as search results for the matching documents. It is recommended to keep only fields necessary to identify a document (e.g. title, URL, short description) to keep space requirements down.

    termFrequencyBuckets: number[]

    Optimises storage by grouping indexed terms into buckets according to term frequency in a document.

    Methods

    • Indexes a single document with its unique reference identifier.

      Parameters

      • ref: string

        A unique reference identifier for the document. Adding another document with the same reference replaces the document on the search index.

      • document: Document

        The document to index.

      • Optionallanguage: string

        Language identifier which is fed back into the stemmer and stopwords callback functions to help decide how to handle these steps.

      Returns void

    • Replaces the instance's index with an index from another instance. Its primary use case is to rehydrate the index from a static file or payload.

      NB: Calling this method will not change any other attributes in the instance. It is up to developers to ensure that the instances were initialised with compatible options, in particular the stemmer function. Incompatible stemmer implementations may cause matches to not be found in the rehydrated index.

      Parameters

      Returns void

    • Removes an indexed document.

      Parameters

      • ref: string

        Reference identifier of the document to remove.

      Returns void

    • Scans the document index and returns a list of documents summaries (with only the properties declared in the summary option) that possibly match one or more terms in the query.

      Each search term is run through the provided stemmer function to ensure terms are processed in the same way as the tokens previously added to the index's signature.

      Parameters

      • query: string

        Terms to search.

        Individual words are matched against the signature of each indexed document. You may prefix each word with the + operator to intersect results that (probably) contain the required word, or use the - operator to exclude results containing the word.

        If the ngrams option is greater than 1, you are also able to search for exact phrases up to ngrams words typed between quotes (for example, "this phrase"). Only documents containing these words in that sequence are returned in the search results.

      • Optionallanguage: string

        Language identifier for the search terms. This is used only to help choose the appropriate stemming algorithm, search results will not filtered by language.

      Returns Pick<Document, SummaryField>[]

      Ordered list of document summaries, sorted by probable search term frequency.