How to safely read untrusted Clojure code (and not just some serialized data)?

(def evil-code (str "(" (slurp "/mnt/src/git/clj/clojure/src/clj/clojure/core.clj") ")" )) (def r (read-string evil-code )) 

It works, but it’s not safe

 (def r (clojure.edn/read-string evil-code)) RuntimeException Map literal must contain an even number of forms clojure.lang.Util.runtimeException (Util.java:219) 

Does not work...

How to read Clojure code (it is desirable that everything "# as themselves") in a tree is safe? Imagine a Clojure antivirus that wants to scan code for threats and wants to work with a data structure, not plain text.

+7
source share
3 answers

First of all, you should never read clojure code directly from untrusted data sources. Instead, you should use EDN or another serialization format.

Considering that with clojure 1.5 there is a safe way to read lines without crawling them. Before using the read string, you must bind read-eval var to false. In clojure 1.4 and earlier, this potentially led to side effects caused by calling java constructors. Since then, these problems have been fixed.

Here is a sample code:

 (defn read-string-safely [s] (binding [*read-eval* false] (read-string s))) (read-string-safely "#=(eval (def x 3))") => RuntimeException EvalReader not allowed when *read-eval* is false. clojure.lang.Util.runtimeException (Util.java:219) (read-string-safely "(def x 3)") => (def x 3) (read-string-safely "#java.io.FileWriter[\"precious-file.txt\"]") => RuntimeException Record construction syntax can only be used when *read-eval* == true clojure.lang.Util.runtimeException (Util.java:219) 

As for the macro reader

The send macro (#) and tagged literals are invoked while reading. There is no data for them in clojure, because by this time all these constructs have been processed. As far as I know, there is no way to build a clojure code tree.

To save this information you will have to use an external parser. Either you roll your own parser, or you can use a parser generator such as Instaparse and ANTLR. A complete clojure grammar for any of these libraries can be hard to find, but you can extend one of the EDN grammars to include additional forms of clojure. A quick google showed the ANTLR grammar for clojure syntax , you can change it to support a construct that is missing if necessary.

There is also a Sjacket library created for clojure tools, which should contain information about the source code itself. This is similar to what you are trying to do, but I have no experience with him personally. Judging by the tests, it supports the reader macro in its parser.

+4
source

According to current documentation, you should never use read and read-string to read from untrusted data sources.

 WARNING: You SHOULD NOT use clojure.core/read or clojure.core/read-string to read data from untrusted sources. They were designed only for reading Clojure code and data from trusted sources (eg files that you know you wrote yourself, and no one else has permission to modify them). 

You should use read-edn or clojure.edn/read , which were designed for this purpose.

The mailing list talked about a lengthy discussion regarding the use of reading and reading methods and best practices regarding them.

+2
source

I wanted to point to an old library (used in LightTable) that uses read-string with methods offering client / server communication

Fetch: ClojureScript library for Client / Server interaction .

You can see, in particular, the safe-read method:

 (defn safe-read [s] (binding [*read-eval* false] (read-string s))) 

You can use *read-eval* binding to false . I think the rest of the code is worth watching for the abstractions that it offers.

In PR , a security issue is suggested that can be fixed with edn instead (... aaand back to your question):

 (require '[clojure.edn :as edn]) (defn safe-read [s] (edn/read-string s)) 
0
source

All Articles