Spark Datasets - Internal Registration Issue

I am using Spark 2.0.0, and I have two datasets (Dataset [Row]) as follows.

Dataset 'appUsage':

+----------+-------------------+----------+ |DATE |APP_ID |TIMES_USED| +----------+-------------------+----------+ |2016-08-03|06xgKq10eeq0REK4eAc|null | |2016-08-03|ssads2wsdsf |null | |2016-08-03|testApp |null | |2016-08-03|3222aClie-971837083|5 | |2016-08-03|V2aadingTLV02 |null | |2016-08-03|OurRating-985443645|5 | |2016-08-03|Trdssktin-743439164|null | |2016-08-03|myaa1-app |null | |2016-08-03|123123123-013663450|null | +----------+-------------------+----------+ 

Dataset 'appDev'

 +-------------------+------------------------------------+ |APP_ID |DEVELOPER_ID | +-------------------+------------------------------------+ |OurRating-985443645|5fff25c7-6a70-4d54-ad04-197be4b9a6a9| |Xa11d0-560090096095|5fff25c7-6a70-4d54-ad04-197be4b9a6a9| +-------------------+------------------------------------+ 

When I make a left join using the following code, everything works as expected.

 val result = appUsage.join(appDev, Seq("APP_ID"), "left") 

Output:

 +-------------------+----------+----------+------------------------------------+ |APP_ID |DATE |TIMES_USED|DEVELOPER_ID | +-------------------+----------+----------+------------------------------------+ |06xgKq10eeq0REK4eAc|2016-08-03|null |null | |ssads2wsdsf |2016-08-03|null |null | |testApp |2016-08-03|null |null | |3222aClie-971837083|2016-08-03|5 |null | |V2aadingTLV02 |2016-08-03|null |null | |OurRating-985443645|2016-08-03|5 |5fff25c7-6a70-4d54-ad04-197be4b9a6a9| |Trdssktin-743439164|2016-08-03|null |null | |myaa1-app |2016-08-03|null |null | |123123123-013663450|2016-08-03|null |null | +-------------------+----------+----------+------------------------------------+ 

But I want to make an inner join, so that only the rows that are present in both datasets will be part of the result set. However, when I do this using the following code, the output is empty.

 val result = appUsage.join(appDev, Seq("APP_ID"), "inner") 

Did I miss something?

+5
source share
1 answer

Try the following:

 val result = appUsage.join(appDev, "APP_ID") 

I tried it on the Databrics cloud using Spark 2.0.0 and it worked fine.

Please refer to this .

0
source

All Articles