Parsing numbers from strings in lisp

Here's a brief problem:
Input: a list of strings, each of which contains numbers
("3.4 5.4 1.2 6.4" "7.8 5.6 4.3" "1.2 3.2 5.4")

Exit: list of numbers
(3.4 5.4 1.2 6.4 7.8 5.6 4.3 3.2 5.4)

Here is my coding attempt:

(defun parse-string-to-float (line &optional (start 0)) "Parses a list of floats out of a given string" (if (equalp "" line) nil (let ((num (multiple-value-list (read-from-string (subseq line start))))) (if (null (first num)) nil (cons (first num) (parse-string-to-float (subseq line (+ start (second num))))))))) (defvar *data* (list " 3.4 5.4 1.2 6.4" "7.8 5.6 4.3" "1.2 3.2 5.4")) (setf *data* (format nil "~{~a ~}" *data*)) (print (parse-string-to-float *data*)) ===> (3.4 5.4 1.2 6.4 7.8 5.6 4.3 1.2 3.2 5.4) 

However, for fairly large datasets, this is a slow process. I assume that recursion is not as dense as possible, and I'm doing something unnecessary. Any ideas?

In addition, the grandiose project includes an input file with various sections of data separated by keywords. Example -

 %FLAG START_COORDS 1 2 5 8 10 12 %FLAG END_COORDS 3 7 3 23 9 26 %FLAG NAMES ct re ct cg kl ct 

etc ... I am trying to parse a hash table with keywords that follow% FLAG as keys, and the values ​​are stored as numeric or string lists depending on the particular keyword that I am processing. Any ideas for libraries that are already doing this work, or simple ways to get around this in lisp?

+4
source share
4 answers

This is not the task you want to start recursively with. Use LOOP and a COLLECT instead. For instance:

 (defun parse-string-to-floats (line) (loop :with n := (length line) :for pos := 0 :then chars :while (< pos n) :for (float chars) := (multiple-value-list (read-from-string line nil nil :start pos)) :collect float)) 

Additionally, you may need to use WITH-INPUT-FROM-STRING instead of READ-FROM-STRING , which makes things even easier.

 (defun parse-string-to-float (line) (with-input-from-string (s line) (loop :for num := (read s nil nil) :while num :collect num))) 

In terms of performance, you might want to do profiling and make sure that you are actually compiling your function.

EDIT to add: you need to be careful, as the reader may enter a security hole if you are not sure about the source of the string. There is a reading macro #. , which may allow us to evaluate arbitrary code following it when reading from a line. The best way to protect yourself is to bind the *READ-EVAL* variable to NIL , which will force the reader to signal an error if it encounters #. . Alternatively, you can use one of the specialized libraries that Rainer Joswig mentions in his answer .

+9
source

Parsing a single line :

 (defun parse-string-to-floats (string) (let ((*read-eval* nil)) (with-input-from-string (stream string) (loop for number = (read stream nil nil) while number collect number)))) 

Process the list of strings and return one list :

 (defun parse-list-of-strings (list) (mapcan #'parse-string-to-floats list)) 

An example :

 CL-USER 114 > (parse-list-of-strings (list "1.1 2.3 4.5" "1.17 2.6 7.3")) (1.1 2.3 4.5 1.17 2.6 7.3) 

Note

A fairly expensive READ operation to read float values ​​from threads. Libraries exist, such as PARSE-NUMBER , which may be more efficient. Some Common Lisp implementations may also have the equivalent of READ-FLOAT / PARSE-FLOAT.

+8
source

Also try to improve performance.

 (declare (optimize (speed 3))) 

inside your defun. Some sheets (e.g. SBCL) will print useful messages about where they cannot be optimized, and the approximate cost of not having this optimization

+2
source

In terms of performance, try at least measuring memory allocation. I assume that all performance is eaten up by memory allocation and GC: you allocate a lot of large lines using a subsection. For example, (time (parse-string-to-float ..)) will show you how much time was spent on your code, how much in GC, and how much memory was allocated.

If so, use a stream of strings (e.g. in-input-from-string) to reduce GC pressure.

+2
source

All Articles