Recognize words in a sequence of characters

I need an algorithm that can recognize words (based on a dictionary) in a sequence of characters without spaces.

let's say, for example, the sequence: it is
spaceless,
it must recognize space and less.

and there may be situations where more words can be recognized. it’s hard to give such an example, but I will try:

example: spaceslight
recognized words: space and light (1)
recognized words: spaces and light (2)

so that the algorithm can also find such options.

+5
source share
3 answers

, trie . O (n), n - ( , , ).

trie , DAWG, .

+1

Perhaps you should take a look at the Rabin-Karp algorithm, which allows you to skip one passage through a text file to search for all n alphabetic words in the dictionary for a certain value of n. The standard Rabin-Karp will find overlaps: spaceslight → spaces, a, ace, aces, light, bright, i. You will need to change it if you do not want to duplicate words.

+1
source

All Articles