I am working on a project and should track the lineage of the file conversion . Suppose that a single file named SomeTextFile.txt passes under several actions along the hives, and at the last stage it brings the desired result.
Case: 1 File went (if I applied the hive action on the file)
File -> FileAfterAction1 -> FileAfterAction2 ---> FinalResultantFile
In this case, I use the hive hook, which stores data related to the intermediate process applied to File.say in a text file, and from this text file lineageEngine code reads and generates a Lineage of this final file.
Now that there is a spark in the glass, and the client can apply spark over the file.
Case: 2 The same thing happens on the file, but now it's a Spark action.
Question. Is there a way to get intermediate information about what happened to the file between the beginning and the end of the transformations.
So far, what I got from the Internet is a spark transformation, which means an intermediate schedule, but in my case, the client will use the Spark action instead of the Spark transform. Enter this if you have bandwidth.
shaun source
share