I have an Apache Beam pipeline that loads a large import file of about 90 GB in size. I wrote a pipeline in the Apache Beam Java SDK.
Using the default settings for PipelineOptionsFactory , my work takes a lot of time.
How can I control and programmatically determine concurrency for my work and therefore the number of employees?
google-cloud-dataflow apache-beam
Alex harvey
source share