Functionally separate the string with spaces, group by quotation marks!

By writing idiomatic functional code in Clojure [1], how can you write a function that separates a string by spaces, but keeps the encoded phrases intact? A quick fix, of course, should use regular expressions, but it should be possible without them. With a quick glance, it looks rather complicated! I wrote similar ones in imperative languages, but I would like to see how a functional, recursive approach works.

A quick check on what our function should do:

"Hello there!" -> ["Hello", "there!"] "'A quoted phrase'" -> ["A quoted phrase"] "'a' 'b' cd" -> ["a", "b", "c", "d"] "'ab' 'c d'" -> ["ab", "cd"] "Mid'dle 'quotes do not concern me'" -> ["Mid'dle", "quotes do not concern me"] 

I don't mind if the spacing changes between quotation marks (so you can use simple space separation first).

 "'lots of spacing' there" -> ["lots of spacing", "there"] ;is ok to me 

[1] This question can be answered at a general level, but I think that the functional approach in Clojure can easily be translated into Haskell, ML, etc.

+7
functional-programming recursion clojure
source share
7 answers

Here is the version returning the lazy seq of words / quoted lines:

 (defn splitter [s] (lazy-seq (when-let [c (first s)] (cond (Character/isSpace c) (splitter (rest s)) (= \' c) (let [[w* r*] (split-with #(not= \' %) (rest s))] (if (= \' (first r*)) (cons (apply str w*) (splitter (rest r*))) (cons (apply str w*) nil))) :else (let [[wr] (split-with #(not (Character/isSpace %)) s)] (cons (apply str w) (splitter r))))))) 

Testing:

 user> (doseq [x ["Hello there!" "'A quoted phrase'" "'a' 'b' cd" "'ab' 'c d'" "Mid'dle 'quotes do not concern me'" "'lots of spacing' there"]] (prn (splitter x))) ("Hello" "there!") ("A quoted phrase") ("a" "b" "c" "d") ("ab" "cd") ("Mid'dle" "quotes do not concern me") ("lots of spacing" "there") nil 

If the single quotes in the input do not match correctly, everything, starting with the final first single quote, is one word:

 user> (splitter "'asdf") ("asdf") 

Update: another version in response to edbond's comment, with better handling of quote characters inside words:

 (defn splitter [s] ((fn step [xys] (lazy-seq (when-let [c (ffirst xys)] (cond (Character/isSpace c) (step (rest xys)) (= \' c) (let [[w* r*] (split-with (fn [[xy]] (or (not= \' x) (not (or (nil? y) (Character/isSpace y))))) (rest xys))] (if (= \' (ffirst r*)) (cons (apply str (map first w*)) (step (rest r*))) (cons (apply str (map first w*)) nil))) :else (let [[wr] (split-with (fn [[xy]] (not (Character/isSpace x))) xys)] (cons (apply str (map first w)) (step r))))))) (partition 2 1 (lazy-cat s [nil])))) 

Testing:

 user> (doseq [x ["Hello there!" "'A quoted phrase'" "'a' 'b' cd" "'ab' 'c d'" "Mid'dle 'quotes do not concern me'" "'lots of spacing' there" "Mid'dle 'quotes do no't concern me'" "'asdf"]] (prn (splitter x))) ("Hello" "there!") ("A quoted phrase") ("a" "b" "c" "d") ("ab" "cd") ("Mid'dle" "quotes do not concern me") ("lots of spacing" "there") ("Mid'dle" "quotes do no't concern me") ("asdf") nil 
+6
source share

This solution is in haskell, but the basic idea should be applicable in clojure.
Two analyzer states (inside or outside quotation marks) are represented by two mutually recursive functions.

 splitq = outside [] . (' ':) add c res = if null res then [[c]] else map (++[c]) res outside res xs = case xs of ' ' : ' ' : ys -> outside res $ ' ' : ys ' ' : '\'' : ys -> res ++ inside [] ys ' ' : ys -> res ++ outside [] ys c : ys -> outside (add c res) ys _ -> res inside res xs = case xs of ' ' : ' ' : ys -> inside res $ ' ' : ys '\'' : ' ' : ys -> res ++ outside [] (' ' : ys) '\'' : [] -> res c : ys -> inside (add c res) ys _ -> res 
+5
source share

Here is the version of Clojure. This will probably hit the stack for very large inputs. A regular expression or a real parser will be much more concise.

 (declare parse*) (defn slurp-word [words xs terminator] (loop [res "" xs xs] (condp = (first xs) nil ;; end of string after this word (conj words res) terminator ;; end of word (parse* (conj words res) (rest xs)) ;; else (recur (str res (first xs)) (rest xs))))) (defn parse* [words xs] (condp = (first xs) nil ;; end of string words \space ;; skip leading spaces (parse* words (rest xs)) \' ;; start quoted part (slurp-word words (rest xs) \') ;; else slurp until space (slurp-word words xs \space))) (defn parse [s] (parse* [] s)) 

Your inputs:

 user> (doseq [x ["Hello there!" "'A quoted phrase'" "'a' 'b' cd" "'ab' 'c d'" "Mid'dle 'quotes do not concern me'" "'lots of spacing' there"]] (prn (parse x))) ["Hello" "there!"] ["A quoted phrase"] ["a" "b" "c" "d"] ["ab" "cd"] ["Mid'dle" "quotes do not concern me"] ["lots of spacing" "there"] nil 
+3
source share

He could modify Brian to use a trampoline so that he did not leave the stack space. Basically create slurp-word and parse* return functions instead of executing them, and then change parse to use trampoline

 (defn slurp-word [words xs terminator] (loop [res "" xs xs] (condp = (first xs) nil ;; end of string after this word (conj words res) terminator ;; end of word #(parse* (conj words res) (rest xs)) ;; else (recur (str res (first xs)) (rest xs))))) (defn parse* [words xs] (condp = (first xs) nil ;; end of string words \space ;; skip leading spaces (parse* words (rest xs)) \' ;; start quoted part #(slurp-word words (rest xs) \') ;; else slurp until space #(slurp-word words xs \space))) (defn parse [s] (trampoline #(parse* [] s))) (defn test-parse [] (doseq [x ["Hello there!" "'A quoted phrase'" "'a' 'b' cd" "'ab' 'c d'" "Mid'dle 'quotes do not concern me'" "'lots of spacing' there" (apply str (repeat 30000 "'lots of spacing' there"))]] (prn (parse x)))) 
+3
source share

There is, for example, fnparse , which allows you to write a parser in a functional way.

+2
source share

Use regex:

  (defn my-split [string] (let [criterion " +(?=([^']*'[^']*')*[^']*$)"] (for [s (into [] (.split string criterion))] (.replace s "'" "")))) 

The first character in the regular expression is the character with which you want to split the string - here it is at least one space.

And if you want to change the quote character, just change each 'to something else, like / ".

EDIT: I just noticed that you explicitly mentioned that you didn't want to use regex. Sorry!

+1
source share

Oh, my answers seem to be surpassed to me, and I got the tests. Anyway, I post it here to ask for some comments about the idiomatization of the code.

I sketched a Haskell pseudo:

 pl pw:ws = | if w:ws empty => p | if w begins with a quote => pli pw:ws | otherwise => pl (p ++ w) ws pli pw:ws = | if w:ws empty => p | if w begins with a quote => pli (p ++ w) ws | if w ends with a quote => pl (init p ++ (tail p ++ w)) ws | otherwise => pli (init p ++ (tail p ++ w)) ws 

Good, poorly named. There

  • The pl function does not process words
  • The pli function (i, as in the internal one) processes quoted phrases
  • Parameter (list) p - this is already processed (executed) information
  • Parameter (list) w:ws is the information to be processed

I translated the pseudo:

 (def quote-chars '(\" \')) ;' ; rewrite .startsWith and .endsWith to support multiple choices (defn- starts-with? "See if given string begins with selected characters." [word choices] (some #(.startsWith word (str %)) choices)) (defn- ends-with? "See if given string ends with selected characters." [word choices] (some #(.endsWith word (str %)) choices)) (declare pli) (defn- pl [pw:ws] (let [w (first w:ws) ws (rest w:ws)] (cond (nil? w) p (starts-with? w quote-chars) #(pli pw:ws) true #(pl (concat p [w]) ws)))) (defn- pli [pw:ws] (let [w (first w:ws) ws (rest w:ws)] (cond (nil? w) p (starts-with? w quote-chars) #(pli (concat p [w]) ws) (ends-with? w quote-chars) #(pl (concat (drop-last p) [(str (last p) " " w)]) ws) true #(pli (concat (drop-last p) [(str (last p) " " w)]) ws)))) (defn split-line "Split a line by spaces, leave quoted groups intact." [input] (let [splt (.split input " +")] (map strip-input (trampoline pl [] splt)))) 

Not very Clojureque, details. In addition, I rely on regexp to split and remove quotes, so I have to deserve some reductions because of this.

+1
source share

All Articles