How can I specify the number of employees for my data stream?

Question

I have an Apache Beam pipeline that loads a large import file of about 90 GB in size. I wrote a pipeline in the Apache Beam Java SDK.

Using the default settings for PipelineOptionsFactory , my work takes a lot of time.

How can I control and programmatically determine concurrency for my work and therefore the number of employees?

+10

google-cloud-dataflow apache-beam

Alex harvey Jan 19 '15 at 10:16

source share

No one has answered this question yet.

See related questions:

6

How to start Google Cloud Dataflow from App Engine?

6

Writing to BigQuery from Dataflow - JSON files are not deleted when the task is completed

2

How to handle "raw error caused by SDK Dataflow" (gz damaged as input)

2

Failed to complete Google Cloud Dataflow jobs (I / O errors)

one

Google Bigtable export freezes, gets stuck, and then refuses Dataflow. Workers have never allocated

one

Google Cloud DataFlow Autoscaling not working

0

Cloud Dataflow package, requiring several hours to join two PC combinations by a common key

0

The data flow job seems to be stuck because there is no working action

0

Any way to reduce the time it takes to start and stop a work pool in Dataflow

0

Does Google Cloud Dataflow service accounting not apply to workers?