Using the Beam SDK in Cloud Dataflow

We are currently using the Google Cloud Dataflow SDK (1.6.0) to run data flow jobs in GCP, however we are considering moving to the Apache Beam SDK (0.1.0). We will continue to perform our tasks in GCP using the data flow service. Has anyone gone through this transition and got advice? Are there any compatibility issues here and is this step encouraged by GCP?

+5
source share
2 answers

Formally, Beam is not yet supported by Dataflow (although this is certainly what we are aiming for). We recommend that you stay in the Dataflow SDK, especially if SLA or support is important to you. that our tests show that Beam works in Dataflow, and although it can break at any time, you can certainly try at your own risk.

Update: The Dataflow SDKs are now Beam-based since the release of the Dataflow SDK 2.0 ( https://cloud.google.com/dataflow/release-notes/release-notes-java-2 ). Both Beam and the Dataflow SDK in Cloud Dataflow are currently supported.

+2
source

Now you can run the Beam SDK pipelines in the data stream. See:

https://beam.apache.org/documentation/runners/dataflow/

You need to add the dependency to pom.xml and possibly several command line options as described on this page.

0
source

All Articles