Recognize words in a sequence of characters

Question

Recognize words in a sequence of characters

I need an algorithm that can recognize words (based on a dictionary) in a sequence of characters without spaces.

let's say, for example, the sequence: it is
spaceless,
it must recognize space and less.

and there may be situations where more words can be recognized. it’s hard to give such an example, but I will try:

example: spaceslight
recognized words: space and light (1)
recognized words: spaces and light (2)

so that the algorithm can also find such options.

+5

string algorithm

Antagonist Aug 9 '11 at 9:31

source share

3 answers

LiKao · Answer 1 · 2011-08-09T09:51:35+0000

, trie . O (n), n - ( , , ).

trie , DAWG, .

Fred · Answer 2 · 2011-08-09T11:51:11+0000

--. ... , . :

http://en.wikipedia.org/wiki/Knuth%E2%80%93Morris%E2%80%93Pratt_algorithm

PS: ...

rossum · Answer 3 · 2011-08-09T12:14:09+0000

Perhaps you should take a look at the Rabin-Karp algorithm, which allows you to skip one passage through a text file to search for all n alphabetic words in the dictionary for a certain value of n. The standard Rabin-Karp will find overlaps: spaceslight → spaces, a, ace, aces, light, bright, i. You will need to change it if you do not want to duplicate words.

Recognize words in a sequence of characters

More articles: