How can I specify the number of employees for my data stream?

I have an Apache Beam pipeline that loads a large import file of about 90 GB in size. I wrote a pipeline in the Apache Beam Java SDK.

Using the default settings for PipelineOptionsFactory , my work takes a lot of time.

How can I control and programmatically determine concurrency for my work and therefore the number of employees?

+10
google-cloud-dataflow apache-beam
source share

No one has answered this question yet.

See related questions:

6
How to start Google Cloud Dataflow from App Engine?
6
Writing to BigQuery from Dataflow - JSON files are not deleted when the task is completed
2
How to handle "raw error caused by SDK Dataflow" (gz damaged as input)
2
Failed to complete Google Cloud Dataflow jobs (I / O errors)
one
Google Bigtable export freezes, gets stuck, and then refuses Dataflow. Workers have never allocated
one
Google Cloud DataFlow Autoscaling not working
0
Cloud Dataflow package, requiring several hours to join two PC combinations by a common key
0
The data flow job seems to be stuck because there is no working action
0
Any way to reduce the time it takes to start and stop a work pool in Dataflow
0
Does Google Cloud Dataflow service accounting not apply to workers?

All Articles