Write the output of the drilling request in csv (or some other format)

I use drill in native mode, and I cannot figure out how to save the output of the request, except copy and paste it.

+5
source share
4 answers

If you use sqlline, you can create a new table as a CSV as follows:

use dfs.tmp; alter session set `store.format`='csv'; create table dfs.tmp.my_output as select * from cp.`employee.json`; 

Your CSV files will appear in / tmp / my_output.

+7
source

You can specify !record <file_path> to save all output to a specific file. Drill docs

+2
source

If you use SQLLINE use!

If you are using a query set, you need to specify the exact schema to use. This can be done using the use use schema command. Unfortunately, you should not use the root scheme either. Make sure you create the correct directory on your file system and use the correct storage configuration . The following is an example configuration. After that, you can create csv via java using the SQL driver , or in a tool like Pentaho to create CSV. With the proper specification, you can use the REST request tool on localhost: 8047 / query. The request to create csv at / out / data / csv is given below after the configuration example.

Storage configuration

 { "type": "file", "enabled": true, "connection": "file:///", "config": null, "workspaces": { "root": { "location": "/out", "writable": false, "defaultInputFormat": null }, "jsonOut": { "location": "/out/data/json", "writable": true, "defaultInputFormat": "json" }, "csvOut": { "location": "/out/data/csv", "writable": true, "defaultInputFormat": "csv" } }, "formats": { "json": { "type": "json", "extensions": [ "json" ] }, "csv": { "type": "text", "extensions": [ "csv" ], "delimiter": "," } } } 

Query

 USE fs.csvOut; ALTER SESSION SET `store.format`='csv'; CREATE TABLE fs.csvOut.mycsv_out AS SELECT * FROM fs.`my_records_in.json`; 

This will result in at least one CSV and possibly many with different header specifications in / out / data / csv / mycsv _out.

Each file should have the following format:

 \d+_\d+_\d+.csv 

Note. . While the query result can be read as a single CSV, the resulting CSVs (if there are several) cannot, because the number of headers will differ. Drop the file as a Json file and read with the code or later using Drill or another tool, if so.

+2
source

UPDATE: CHECK THE HARDWARE PROCESSING THE ROAD DISC ON A CSV FILE

Now at the beginning of 2018, and for some of you (in particular, Apache Drill in MAPR), the above commands do NOT work. In this case, try the following. As of 2018 03 02 this works on MapR 5.2 and Mapr 6 :-)

NOTE. I use "//" to indicate comments along with valid commands ... NOTE. I use "=>" to indicate a RESPONSE shell in a command ...

// FROM INSIDE A SHILL (i.e. SQLLINE) ... // first set the session () variables for the output (shell) ... enter code here >! set outputformat 'csv' => you see some output from the shell returning a new value back ...

// start to "write" any output to a file ... enter code here >! record '/user/user01/query_output.csv' => again you see some output from the shell, repeating that the "record" is turned on ...

// further, we actually send (say) a SELECT query, the output of which will now be CSV (even to the screen), unlike the TABLE format ... enter code here > SELECT * FROM hive.orders; => (formatted as CSV) will start streaming both on the screen and on the file you specify ...

// finally, you turn off the "record", so the csv file closes ... enter code here >! Record

THIS is you DONE! :-) Now you can process this CSV where it is located in the CLUSTER repository, or - if you need to - TRANSFER this OUT file from the cluster and to (say) another server that has Tableau, Kabana , PowerBI Desktop or some other visualization tools for further analysis.

-1
source

All Articles