Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Disabling Dataflow? #9

Open
ahmetb opened this issue Oct 28, 2019 · 3 comments
Open

Disabling Dataflow? #9

ahmetb opened this issue Oct 28, 2019 · 3 comments
Assignees
Labels
enhancement New feature or request

Comments

@ahmetb
Copy link

ahmetb commented Oct 28, 2019

I feel like there should be a deployment option to disable the Cloud Dataflow use.

Pretty much everything else used by this tool feels pay-as-you-go/serverless.

However, it seems like Dataflow is provisioning a n1-standard-4 instance ($97/mo). This is simply not going to be within my budget.

I'd love to see an option to disable Dataflow in scripts –and also an explanation of what Dataflow does.

Note: I'm not at all familiar with Dataflow, so I'm not sure where it's currently utilized in this tool. (It feels like Cloud Run service could directly write to Stackdriver and/or BigQuery.

@mchmarny
Copy link
Owner

Agree, this is the one bit that's unlike the others in this stack. Cloud Run could insert the events directly into BigQuery but that's the untipatern given the low quota on individual inserts. Will consider alternative way of streaming events from PubSub to BigQuery

@mchmarny mchmarny self-assigned this Oct 29, 2019
@mchmarny mchmarny added the enhancement New feature or request label Oct 29, 2019
@mchmarny
Copy link
Owner

I've modified the PubSub to BigQuery pipeline to use max 1 worker so that should significantly reduce the cost (~$30/mo). Still need to test it but you should be able to use this in setup

gcloud dataflow jobs run $SERVICE_NAME \
    	--gcs-location gs://cloudylabs-public/cloudylabs-pipelines/pubsub-to-bigquery.json \
        --region $SERVICE_REGION \
    	--parameters "inputTopic=projects/${PROJECT}/topics/${SERVICE_NAME},outputTableSpec=${PROJECT}:${SERVICE_NAME}.events"

@mchmarny mchmarny pinned this issue Oct 29, 2019
@ahmetb
Copy link
Author

ahmetb commented Oct 29, 2019

$30 is still significant for something that runs maybe a few times a day.
I see the quota point, what if we published to pubsub, drained once a day and did a batch insert?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

No branches or pull requests

2 participants