Is there any spark hook like a hook for a hive?

I am working on a project and should track the lineage of the file conversion . Suppose that a single file named SomeTextFile.txt passes under several actions along the hives, and at the last stage it brings the desired result.

Case: 1 File went (if I applied the hive action on the file)

File -> FileAfterAction1 -> FileAfterAction2 ---> FinalResultantFile

In this case, I use the hive hook, which stores data related to the intermediate process applied to File.say in a text file, and from this text file lineageEngine code reads and generates a Lineage of this final file.

Now that there is a spark in the glass, and the client can apply spark over the file.

Case: 2 The same thing happens on the file, but now it's a Spark action.

Question. Is there a way to get intermediate information about what happened to the file between the beginning and the end of the transformations.

So far, what I got from the Internet is a spark transformation, which means an intermediate schedule, but in my case, the client will use the Spark action instead of the Spark transform. Enter this if you have bandwidth.

+4
source share
1 answer

https://issues.apache.org/jira/browse/SPARK-18127

This functionality will be implemented in Spark 2.2

+2
source

All Articles