Passing parameter to sqoop job

I am creating a sqoop job that will be scheduled in Oozie to load daily data into Hive.

I want to do incremental upload to Date-based bush as a parameter to be passed to sqoop job

After exploring the batch, I cannot find a way to pass the parameter to Sqoop job

+6
source share
2 answers

You do this by skipping the date to two steps:

  • Work process coordinator

In your coordinator, you can pass the date to a workflow that it executes as <property> , for example:

 <coordinator-app name="schedule" frequency="${coord:days(1)}" start="2015-01-01T00:00Z" end="2025-01-01T00:00Z" timezone="Etc/UTC" xmlns="uri:oozie:coordinator:0.2"> ... <action> <workflow> <app-path>${nameNode}/your/workflow.xml</app-path> <configuration> <property> <name>workflow_date</name> <value>${coord:formatTime(coord:nominalTime(), 'yyyyMMdd')}</value> </property> </configuration> </workflow> </action> ... </coordinator-app> 
  1. Workflow for Sqoop

In your workflow, you can reference this property in your Sqoop call using the ${workflow_date} variable, for example:

 <sqoop xmlns="uri:oozie:sqoop-action:0.2"> ... <command>import --connect jdbc:connect:string:here --table tablename --target-dir /your/import/dir/${workflow_date}/ -m 1</command> ... </sqoop> 
+5
source

Below is a solution from the Apache Sqoop Cookbook.

Saving the Last Imported Value

Problem

Incremental import is a great feature that you use a lot. The shoulder responsibility of remembering the last imported value becomes a problem.

Decision

You can use the built-in Sqoop metastore, which allows you to save all parameters for later reuse. You can create a simple incremental import job with the following command:

 sqoop job \ --create visits 3.3. Preserving the Last Imported Value | 27 -- import \ --connect jdbc:mysql://mysql.example.com/sqoop \ --username sqoop \ --password sqoop \ --table visits \ --incremental append \ --check-column id \ --last-value 0 

And run it with the --exec parameter :

 sqoop job --exec visits 

Discussion

Sqoop Metastor is a powerful part of Sqoop that allows you to save definitions of your work and easily run them at any time. Each saved job has a logical name that is used for reference. You can list all saved jobs using the --list :

 sqoop job --list 

You can delete old job definitions that are no longer needed with the --delete , for example:

 sqoop job --delete visits 

Finally, you can also view the contents of saved job definitions with the --show parameter , for example:

 sqoop job --show visits 

The output of the --show command will be in the form of properties. Unfortunately, Sqoop cannot currently rebuild the command line that you used to create the saved job.

The most important benefit of the built-in Sqoop metastar is incremental import. Sqoop automatically converts the last imported value back to the metastor after each successful incremental job. Thus, users do not need to remember the last imported value after each execution; everything is processed automatically.

+2
source

Source: https://habr.com/ru/post/1214741/


All Articles