Separating a sentence without any spaces / seperators in a sentence with spaces

Question

Separating a sentence without any spaces / seperators in a sentence with spaces

I am working on a late term project for the Programming Course. The purpose is given below. I'm finishing writing in Java and I'm having trouble writing to Prolog. I am having problems with Prolog, so this question is also looking for help in the task, as it is trying to understand Prolog more. Any help I can get will be SO appreciated

A sentence contains words, all occurring in a dictionary that occur to be concatenated without white space as delimiters. Describe a solution that produces all possible answers compatible with this dictionary in 2 of the following 3 languages: Java, Haskell, Prolog. test data is provided in the form of a text UTF-8 file containing one sentence per line, with all the words found in the dictionary, provided in the form of text UTF-8 file with one word in each line. the output should be a UTF-8 text file containing sentences with all words separated by spaces.
Example word file:
cat
dog
barks
works

distance
proposal file example
thedogbarks
thecatrunsaway

+1

tokenize prolog

MeeksMan13 May 02, '11 at 1:48

source share

1 answer

Kaarel · Answer 1 · 2011-05-02T06:43:57+0000

The core of your program should be a predicate that tokens the list of character codes, i.e. builds a list of atoms (= words) from codes. Below is the chart:

%% tokenize(+Codes:list, -Atoms:list) % % Converts a list of character codes % into a list of atoms. There can be several solutions. tokenize([], []) :- !. tokenize(Cs, [A | As]) :- % Use append/3 to extract the Prefix of the code list append(...), % Check if the prefix constitutes a word in the dictionary, % and convert it into an atom. is_word(Prefix, A), % Parse the remaining codes tokenize(...).

Now you can determine:

 is_word(Codes, Atom) :- atom_codes(Atom, Codes), word(Atom). word(the). word(there). word(review). word(view). split_words(Sentence, Words) :- atom_codes(Sentence, Codes), tokenize(Codes, Words).

and use it as follows:

 ?- split_words('thereview', Ws). Ws = [the, review] ; Ws = [there, view] ; false.

or use it in something more complicated, where you parse a file to get input and output to a file.

Separating a sentence without any spaces / seperators in a sentence with spaces

More articles: