The core of your program should be a predicate that tokens the list of character codes, i.e. builds a list of atoms (= words) from codes. Below is the chart:
%% tokenize(+Codes:list, -Atoms:list) % % Converts a list of character codes % into a list of atoms. There can be several solutions. tokenize([], []) :- !. tokenize(Cs, [A | As]) :- % Use append/3 to extract the Prefix of the code list append(...), % Check if the prefix constitutes a word in the dictionary, % and convert it into an atom. is_word(Prefix, A), % Parse the remaining codes tokenize(...).
Now you can determine:
is_word(Codes, Atom) :- atom_codes(Atom, Codes), word(Atom). word(the). word(there). word(review). word(view). split_words(Sentence, Words) :- atom_codes(Sentence, Codes), tokenize(Codes, Words).
and use it as follows:
?- split_words('thereview', Ws). Ws = [the, review] ; Ws = [there, view] ; false.
or use it in something more complicated, where you parse a file to get input and output to a file.
source share