Beam Dataflow only writes to temp in GCS
up vote
0
down vote
favorite
I have a very basic Python Dataflow job that reads some data from Pub/Sub, applies a FixedWindow and writes to Google Cloud Storage.
transformed = ...
transformed | beam.io.WriteToText(known_args.output)
The output is written to the location specific in --output, but only the temporary stage, i.e.
gs://MY_BUCKET/MY_DIR/beam-temp-2a5c0e1eec1c11e8b98342010a800004/...some_UUID...
The file never gets placed into the correctly named location with the sharding template.
Tested on local and DataFlow runner.
When testing further, I have noticed that the streaming_wordcount example has the same issues, however the standard wordcount example writes fine. Perhaps the issues is to with windowing, or reading from pubsub?
It appears WriteToText is not compatible with the streaming source of PubSub. There are likely workarounds, or the Java version may be compatible, but I have opted to use a different solution altogether.
google-cloud-storage google-cloud-dataflow apache-beam google-cloud-pubsub
add a comment |
up vote
0
down vote
favorite
I have a very basic Python Dataflow job that reads some data from Pub/Sub, applies a FixedWindow and writes to Google Cloud Storage.
transformed = ...
transformed | beam.io.WriteToText(known_args.output)
The output is written to the location specific in --output, but only the temporary stage, i.e.
gs://MY_BUCKET/MY_DIR/beam-temp-2a5c0e1eec1c11e8b98342010a800004/...some_UUID...
The file never gets placed into the correctly named location with the sharding template.
Tested on local and DataFlow runner.
When testing further, I have noticed that the streaming_wordcount example has the same issues, however the standard wordcount example writes fine. Perhaps the issues is to with windowing, or reading from pubsub?
It appears WriteToText is not compatible with the streaming source of PubSub. There are likely workarounds, or the Java version may be compatible, but I have opted to use a different solution altogether.
google-cloud-storage google-cloud-dataflow apache-beam google-cloud-pubsub
can you please post the code?
– Tanveer Uddin
Nov 19 at 21:41
add a comment |
up vote
0
down vote
favorite
up vote
0
down vote
favorite
I have a very basic Python Dataflow job that reads some data from Pub/Sub, applies a FixedWindow and writes to Google Cloud Storage.
transformed = ...
transformed | beam.io.WriteToText(known_args.output)
The output is written to the location specific in --output, but only the temporary stage, i.e.
gs://MY_BUCKET/MY_DIR/beam-temp-2a5c0e1eec1c11e8b98342010a800004/...some_UUID...
The file never gets placed into the correctly named location with the sharding template.
Tested on local and DataFlow runner.
When testing further, I have noticed that the streaming_wordcount example has the same issues, however the standard wordcount example writes fine. Perhaps the issues is to with windowing, or reading from pubsub?
It appears WriteToText is not compatible with the streaming source of PubSub. There are likely workarounds, or the Java version may be compatible, but I have opted to use a different solution altogether.
google-cloud-storage google-cloud-dataflow apache-beam google-cloud-pubsub
I have a very basic Python Dataflow job that reads some data from Pub/Sub, applies a FixedWindow and writes to Google Cloud Storage.
transformed = ...
transformed | beam.io.WriteToText(known_args.output)
The output is written to the location specific in --output, but only the temporary stage, i.e.
gs://MY_BUCKET/MY_DIR/beam-temp-2a5c0e1eec1c11e8b98342010a800004/...some_UUID...
The file never gets placed into the correctly named location with the sharding template.
Tested on local and DataFlow runner.
When testing further, I have noticed that the streaming_wordcount example has the same issues, however the standard wordcount example writes fine. Perhaps the issues is to with windowing, or reading from pubsub?
It appears WriteToText is not compatible with the streaming source of PubSub. There are likely workarounds, or the Java version may be compatible, but I have opted to use a different solution altogether.
google-cloud-storage google-cloud-dataflow apache-beam google-cloud-pubsub
google-cloud-storage google-cloud-dataflow apache-beam google-cloud-pubsub
edited Nov 20 at 10:41
asked Nov 19 at 17:10
Daniel Messias
6151818
6151818
can you please post the code?
– Tanveer Uddin
Nov 19 at 21:41
add a comment |
can you please post the code?
– Tanveer Uddin
Nov 19 at 21:41
can you please post the code?
– Tanveer Uddin
Nov 19 at 21:41
can you please post the code?
– Tanveer Uddin
Nov 19 at 21:41
add a comment |
active
oldest
votes
active
oldest
votes
active
oldest
votes
active
oldest
votes
active
oldest
votes
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
StackExchange.ready(
function () {
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f53379585%2fbeam-dataflow-only-writes-to-temp-in-gcs%23new-answer', 'question_page');
}
);
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
can you please post the code?
– Tanveer Uddin
Nov 19 at 21:41