Import Sqoop with SQL query with where clause

Question

Import Sqoop with SQL query with where clause

sqoop import --connect jdbc:teradata://192.168.xx.xx/DBS_PORT=1025,DATABASE=ds_tbl_db --driver com.teradata.jdbc.TeraDriver --username dbc --password dbc --query 'select * from reason where id>20' --hive-import --hive-table reason_hive --target-dir <hdfs-location> -m 1

I got an error:

The request [select * from reason, where id> 20] must contain "$ CONDITIONS" in the WHERE clause.

I know that the query for Sqoop should have a where clause.

So for queries like

select * from reason

I change it to:

select * from reason WHERE $CONDITIONS

What to do for queries with a where clause?

+8

sqoop

dev ツ Feb 28 '16 at 8:36

source share

4 answers

Sqoop requires access to table metadata, such as column type information. The $ CONDITIONS placeholder is set to '1 = 0' by default to ensure that sqoop receives only type information. So, after executing the sqoop command, you will see the first query that is launched with unfulfilled $ CONDITIONS. Later it is replaced with different values defining different ranges based on the number of cartographers (-m) or -split-by column or -boundary query, so that the entire data set can be divided into different data slices or pieces and pieces can be imported in parallel with the size of concurrency. Sqoop will automatically replace this placeholder with the generated conditions that determine which piece of data should be transferred by each individual task.

For example, consider the sample_data table with the column name, ID, and salary. You want to get records with a salary> 1k.

  sqoop import \ --connect "jdbc:mysql://quickstart.cloudera:3306/retail_db" \ --username retail_dba --password cloudera \ --query 'select * from sample_data where $CONDITIONS AND salary > 1000' \ --split-by salary \ --target-dir hdfs://quickstart.cloudera/user/cloudera/sqoop_new

Below is the first query that returns an empty set.

 SqlManager: Executing SQL statement: select * from sample_data where (1 = 0) AND salary > 1000

Then the next query should get the min and max range.

 INFO db.DataDrivenDBInputFormat: BoundingValsQuery: SELECT MIN(salary), MAX(salary) FROM (select * from sample_data where (1 = 1) AND salary > 1000) AS t1;

+3

Nikhil Bhide Dec 30 '16 at 9:16

source share

You can use the Where clause

- where "order_status = 'CLOSED'"

https://sqoop.apache.org/docs/1.4.2/SqoopUserGuide.html

0

Rajen dharmendra Oct 25 '17 at 16:50

source share

I work in cloudera with a query on a list of tables in Mysql .
I got results with the query below:

 sqoop import --connect jdbc:mysql://127.0.0.1/Mydatabase --username root --password cloudera --query 'select * from employee where $CONDITIONS AND Sal<250000' --split-by Sal --target-dir=user/cloudera/myfirstdata2 -m 1

0

user3098458 Dec 7 '18 at 11:17

source share

vinayak_narune · Accepted Answer · 2016-02-29T05:29:03+0000

You need to add AND \$CONDITIONS

--query "select * from reason where id>20 AND \$CONDITIONS"

Please refer to the Sqoop documentation .

Import Sqoop with SQL query with where clause

More articles: