How to change a column in an Incanter dataset?

I would like to be able to convert a single column into an incanter dataset and save the resulting dataset to a new (csv) file. What is the easiest way to do this?

Essentially, I would like to be able to map a function over a column in a dataset and replace the original column with this result.

+8
clojure incanter
source share
4 answers

You can define something like:

(defn map-data [dataset column fn] (conj-cols (sel dataset :except-cols column) ($map fn column dataset))) 

and use as

 (def data (get-dataset :cars)) (map-data data :speed #(* % 2)) 

there is only one problem with changing column names - I will try to fix it when I have free time ...

+5
source share

Here are two similar functions: column name and save order.

 (defn transform-column [col-name f data] (let [new-col-names (sort-by #(= % col-name) (col-names data)) new-dataset (conj-cols (sel data :except-cols col-name) (f ($ col-name data)))] ($ (col-names data) (col-names new-dataset new-col-names) ))) (defn transform-rows [col-name f data] (let [new-col-names (sort-by #(= % col-name) (col-names data)) new-dataset (conj-cols (sel data :except-cols col-name) ($map f col-name data))] 

And here is an example illustrating the difference:

 => (def test-data (to-dataset [{:a 1 :b 2} {:a 3 :b 4}])) => (transform-column :a (fn [x] (map #(* % 2) x)) test-data) [:a :b] [2 2] [6 4] => (transform-rows :a #(* % 2) test-data) [:a :b] [2 2] [6 4] 

transform-rows best suited for simple transforms, where as transform-column is when the conversion for one row depends on other rows (for example, when a column is normalized).

Saving and loading CSV can be done with the standard Incanter functions, so the full example looks like this:

 (use '(incanter core io))) (def data (col-names (read-dataset 'data.csv') [:a :b]) (save (transform-rows :a #(* % 2) data) 'transformed-data.csv') 
+5
source share

Again: perhaps you can use the internal structure of the dataset.

 user=> (defn update-column [dataset column f & args] (->> (map #(apply update-in % [column] f args) (:rows dataset)) vec (assoc dataset :rows))) #'user/update-column user=> d [:col-0 :col-1] [1 2] [3 4] [5 6] user=> (update-column d :col-1 str "d") [:col-0 :col-1] [1 "2d"] [3 "4d"] [5 "6d"] 

Again, you need to check whether this is a public API.

+2
source share

NOTE: Incanter 1.5.3 or greater is required for this solution.

For those who can use the latest versions of Incanter ...

add-column and add-derived-column were added to Incanter in 1.5.3 (pull request)

From the docs:

add-column

"Adds a column with the given values ​​to the dataset."

 (add-column column-name values) 

or

 (add-column column-name values data) 

Or you can use:

add-derived-column

"This function adds a column to the dataset, which is the existing columns function. If no dataset is provided, $ data (associated with the data macro). F must be a from-columns function, with arguments in that order."

 (add-derived-column column-name from-columns f) 

or

 (add-derived-column column-name from-columns f data) 

more complete example

 (use '(incanter core datasets)) (def cars (get-dataset :cars)) (add-derived-column :dist-over-speed [:dist :speed] (fn [ds] (/ ds)) cars) (with-data (get-dataset :cars) (view (add-derived-column :speed**-1 [:speed] #(/ 1.0 %)))) 
+2
source share

All Articles