What data structure should be used to index data for partial search% infix%?

Imagine that you have a huge data cache that can be searched in four ways:

  • exact match
  • Prefix%
  • % suffix
  • % infix%

I use Trie for the first three types of searches, but I can’t figure out how to get to the fourth other than sequentially processing a huge array of elements.

+7
source share
3 answers

If your dataset is a huge cosider, using a search platform like Apache Solr so you don't mess up with performance.

+1
source

You can create a navigation map or set (for example, TreeMap or TreeSet) for 2 (with keys in the usual order) and 3 (keys in the reverse order)

For option 4, you can create a collection with a key for each starting letter. You can simplify this depending on your requirement. This can lead to more space being used, but get O (log n) lookup time.

0
source

For # 4, I think that if you pre-calculate the number of events of each character, then you can search in this table so that they have at least as many places where characters appear in the search bar.

How effective this algorithm probably depends on the nature of the data and the search string. It might be useful to give some examples of both here to get better answers.

0
source

All Articles