How to tie an external can in Flink

everything. I tried to contact my company in Flink to copy it to FLINK / lib in all task managers, but failed. And I don’t want to pack a fat can, which is too heavy and a waste of time. I think the first method is also not a good idea, because I have to manage banks in the entire cluster. Does anyone know how to solve this problem? Any suggestion would be appreciated.

+9
apache-flink
source share
3 answers

All in all, making a thick can is the best way to go. Don't know how big your distant can is, what do you think is β€œtoo heavy”?

Copying cans to $FLINK/lib should work. However, you need to restart Flink so that banks are added to the Flink path. Thus, this approach does not allow you to dynamically add banks - it should work for a bunch of stable cans.

To manage banks throughout the cluster, it may be useful to use the NFS folder as $FLINK/lib to synchronize all task managers. Or you just write a bash script to distribute your jars.

+12
source share

The Flink Command Line Interface (CLI) allows you to pass additional can locations using the -C option. We use it to pass dependencies to each job.

Our problem: given that usually our workplaces develop throughout the project life cycle and that their external dependencies change their versions and that we run several processes in one cluster, we wanted to select the exact jar versions to load in each run. Therefore, we lacked $ FLINK / lib.

Details We do this to distribute banks in a fixed directory (other than $ FLINK / lib) on each node. Later we use the CLI to run the task (and not directly, since the call is quite long, but using a bash script to shorten the call).

+1
source share

If you want to avoid dependency conflict, do not copy your jar files to $ {FLINK} / lib. If you use yarn-cluster as your master, you can use -yt(--yarn-ship) , it will copy jar files to hdfs and as a class path to your distributed program.

+1
source share

All Articles