SQoop is imported as an OrC file

Is there any parameter in sqoop for importing data from RDMS and storing it in ORC format in HDFS format?

The alternatives tried: imported as a text format and used a temporary table to read input as a text file and write to hdfs as orc in hive

+4
source share
4 answers

At least Sqoop 1.4.5 has hcatalog integration that supports the orc file format (among others).

For example, you have an option

--hcatalog-storage-stanza

which can be installed on

stored as orc tblproperties ("orc.compress"="SNAPPY")

Example:

sqoop import 
 --connect jdbc:postgresql://foobar:5432/my_db 
 --driver org.postgresql.Driver 
 --connection-manager org.apache.sqoop.manager.GenericJdbcManager 
 --username foo 
 --password-file hdfs:///user/foobar/foo.txt 
 --table fact 
 --hcatalog-home /usr/hdp/current/hive-webhcat 
 --hcatalog-database my_hcat_db 
 --hcatalog-table fact 
 --create-hcatalog-table 
 --hcatalog-storage-stanza 'stored as orc tblproperties ("orc.compress"="SNAPPY")'
+7
source

Sqoop import supports only the below formats.

--as-avrodatafile   Imports data to Avro Data Files

--as-sequencefile   Imports data to SequenceFiles

--as-textfile   Imports data as plain text (default)

--as-parquetfile    Imports data as parquet file (from sqoop 1.4.6 version)
+3
source

sqoop RDBS HDFS ORC . - sqoop. : https://issues.apache.org/jira/browse/SQOOP-2192

, , , , . .

+3

rdms ORC sqoop. , .

  • (, ).
  • Spark SQL orc.

: 1: .

sqoop import --connect jdbc:mysql://quickstart:3306/retail_db --username retail_dba --password cloudera \
--table orders \
--target-dir /user/cloudera/text \
--as-textfile

2: , scala REPL.

scala> val sqlHiveContext = new org.apache.spark.sql.hive.HiveContext(sc)
sqlHiveContext: org.apache.spark.sql.hive.HiveContext = org.apache.spark.sql.hive.HiveContext@638a9d61

scala> val textDF = sqlHiveContext.read.text("/user/cloudera/text")
textDF: org.apache.spark.sql.DataFrame = [value: string]

scala> textDF.write.orc("/user/cloudera/orc/")

3: .

[root@quickstart exercises]# hadoop fs -ls /user/cloudera/orc/
Found 5 items
-rw-r--r--   1 cloudera cloudera          0 2018-02-13 05:59 /user/cloudera/orc/_SUCCESS
-rw-r--r--   1 cloudera cloudera     153598 2018-02-13 05:59 /user/cloudera/orc/part-r-00000-24f75a77-4dd9-44b1-9e25-6692740360d5.orc
-rw-r--r--   1 cloudera cloudera     153466 2018-02-13 05:59 /user/cloudera/orc/part-r-00001-24f75a77-4dd9-44b1-9e25-6692740360d5.orc
-rw-r--r--   1 cloudera cloudera     153725 2018-02-13 05:59 /user/cloudera/orc/part-r-00002-24f75a77-4dd9-44b1-9e25-6692740360d5.orc
-rw-r--r--   1 cloudera cloudera     160907 2018-02-13 05:59 /user/cloudera/orc/part-r-00003-24f75a77-4dd9-44b1-9e25-6692740360d5.orc
0
source

All Articles