Import Sqoop with SQL query with where clause

sqoop import --connect jdbc:teradata://192.168.xx.xx/DBS_PORT=1025,DATABASE=ds_tbl_db --driver com.teradata.jdbc.TeraDriver --username dbc --password dbc --query 'select * from reason where id>20' --hive-import --hive-table reason_hive --target-dir <hdfs-location> -m 1

I got an error:

The request [select * from reason, where id> 20] must contain "$ CONDITIONS" in the WHERE clause.

I know that the query for Sqoop should have a where clause.

So for queries like

select * from reason

I change it to:

select * from reason WHERE $CONDITIONS

What to do for queries with a where clause?

+8
sqoop
source share
4 answers

You need to add AND \$CONDITIONS

--query "select * from reason where id>20 AND \$CONDITIONS"

Please refer to the Sqoop documentation .

+17
source share

Sqoop requires access to table metadata, such as column type information. The $ CONDITIONS placeholder is set to '1 = 0' by default to ensure that sqoop receives only type information. So, after executing the sqoop command, you will see the first query that is launched with unfulfilled $ CONDITIONS. Later it is replaced with different values ​​defining different ranges based on the number of cartographers (-m) or -split-by column or -boundary query, so that the entire data set can be divided into different data slices or pieces and pieces can be imported in parallel with the size of concurrency. Sqoop will automatically replace this placeholder with the generated conditions that determine which piece of data should be transferred by each individual task.

For example, consider the sample_data table with the column name, ID, and salary. You want to get records with a salary> 1k.

  sqoop import \ --connect "jdbc:mysql://quickstart.cloudera:3306/retail_db" \ --username retail_dba --password cloudera \ --query 'select * from sample_data where $CONDITIONS AND salary > 1000' \ --split-by salary \ --target-dir hdfs://quickstart.cloudera/user/cloudera/sqoop_new 

Below is the first query that returns an empty set.

 SqlManager: Executing SQL statement: select * from sample_data where (1 = 0) AND salary > 1000 

Then the next query should get the min and max range.

 INFO db.DataDrivenDBInputFormat: BoundingValsQuery: SELECT MIN(salary), MAX(salary) FROM (select * from sample_data where (1 = 1) AND salary > 1000) AS t1; 
+3
source share

You can use the Where clause

- where "order_status = 'CLOSED'"

https://sqoop.apache.org/docs/1.4.2/SqoopUserGuide.html

0
source share

I work in cloudera with a query on a list of tables in Mysql .
I got results with the query below:

 sqoop import --connect jdbc:mysql://127.0.0.1/Mydatabase --username root --password cloudera --query 'select * from employee where $CONDITIONS AND Sal<250000' --split-by Sal --target-dir=user/cloudera/myfirstdata2 -m 1 
0
source share

All Articles