Showing the wrong score after importing a table in Hive

I imported about 10 tables in Hive from MS SQL Server. But when I try to cross-check the entries in Hive in one of the tables, I found more entries when starting the query (select count (*) from tblName;).

Then I delete this table and import it into Hive again. I observed in the console logs that (203 entries received). And then I tried again (select count (*) from tblName;) and I got the score as 298.

I do not understand why this is happening. Something is wrong in the request, or this is due to an invalid sqoop-import command.

All entries in other tables are in order.

Pls help me with this.

+7
source share
1 answer

I got a solution to this problem from the mailing list, and I would like to share it. Their answer came as:

we faced a similar problem in the past - the table in the hive seems to contain more rows than sqoop was reportedly imported and which were actually available in the database.

The described problem on our side was in the wrong characters of the exported data, which broke lines in the exported test CSV file. For example, some of our strings contained data with new string characters. Since the pair of exported lines was divided into more lines, the number of lines of the hive turned out to be more than the number of imports. You may have encountered a similar problem. We solved the problem using the --hive-drop-import-delims (or you can use --hive-delims-replacement ). For semantics and usage, please consider looking at the manual:

http://incubator.apache.org/sqoop/docs/1.4.0-incubating/SqoopUserGuide.html#id1765770

thanks

+7
source

All Articles