Skip to content

scp7/dfcloud

Folders and files

NameName
Last commit message
Last commit date

Latest commit

Β 

History

9 Commits
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 

Repository files navigation

DeepFabric Cloud (dfcloud)

Run DeepFabric jobs on Google Cloud with Slack notifications.

Architecture

β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
β”‚                      Google Cloud                            β”‚
β”‚                                                              β”‚
β”‚  β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”         β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”    β”‚
β”‚  β”‚  Cloud Run      β”‚  HTTP   β”‚  Cloud Run Service      β”‚    β”‚
β”‚  β”‚  Job            │────────▢│  (Spin + mock data)     β”‚    β”‚
β”‚  β”‚  (deepfabric)   β”‚         β”‚  always-on              β”‚    β”‚
β”‚  β””β”€β”€β”€β”€β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”€β”€β”€β”€β”˜         β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜    β”‚
β”‚           β”‚                                                  β”‚
β”‚           β”‚ read config / write output                       β”‚
β”‚           β–Ό                                                  β”‚
β”‚  β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”                                        β”‚
β”‚  β”‚  Cloud Storage  β”‚                                        β”‚
β”‚  β”‚  (GCS bucket)   β”‚                                        β”‚
β”‚  β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜                                        β”‚
β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜
            β”‚
            β”‚ on completion
            β–Ό
   β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
   β”‚  Slack Webhook  β”‚
   β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜

Prerequisites

  • Google Cloud account with billing enabled
  • gcloud CLI installed and authenticated
  • Terraform >= 1.0
  • Docker for building container images
  • Python >= 3.10 for the CLI

Quick Start

1. Set up Slack Webhook

  1. Go to api.slack.com/apps
  2. Create a new app β†’ "From scratch"
  3. Add feature β†’ "Incoming Webhooks" β†’ Enable
  4. Click "Add New Webhook to Workspace" β†’ Select your channel
  5. Copy the webhook URL

2. Configure Terraform

cd infra

# Copy and edit the example config
cp terraform.tfvars.example terraform.tfvars

# Edit terraform.tfvars with your values:
# - project_id
# - spin_image (we'll create this next)
# - deepfabric_image (we'll create this next)
# - slack_webhook_url

3. Build and Push Container Images

First, deploy infrastructure to create the Artifact Registry:

cd infra

# Initialize and apply just the registry first
terraform init
terraform apply -target=google_artifact_registry_repository.dfcloud -target=google_project_service.apis

Get the registry URL:

terraform output artifact_registry
# Output: us-central1-docker.pkg.dev/your-project/dfcloud

Build and push the DeepFabric job image:

cd ../deepfabric-job

# Configure Docker for Artifact Registry
gcloud auth configure-docker us-central1-docker.pkg.dev

# Build and push
REGISTRY=$(cd ../infra && terraform output -raw artifact_registry)
docker build -t $REGISTRY/deepfabric-job:latest .
docker push $REGISTRY/deepfabric-job:latest

Build and push your Spin service image (assuming you have it):

# From your spin tools-sdk directory
docker build -t $REGISTRY/spin:latest .
docker push $REGISTRY/spin:latest

4. Deploy Infrastructure

cd infra

# Update terraform.tfvars with the image paths
# Then apply all resources
terraform apply

5. Install the CLI

cd cli
pip install -e .

# Initialize configuration
dfcloud config init
# Enter: project_id, region, bucket name

6. Load Mock Data into Spin (if needed)

If your Spin service needs mock data loaded:

# Get the Spin service URL
cd infra
terraform output spin_service_url

# Load mock data (adjust for your setup)
curl -X POST "$(terraform output -raw spin_service_url)/mock/load" \
  -H "Content-Type: application/json" \
  -d @path/to/mock-data.json

Usage

Submit a Job

# Submit a job from a config file
dfcloud submit spin-dataforseo-5x5.yaml --name seo-dataset-v1

# Submit and wait for completion
dfcloud submit spin-dataforseo.yaml --name my-job --wait

# With custom timeout (in seconds)
dfcloud submit config.yaml --timeout 7200

Check Job Status

# Check latest execution
dfcloud status

# Check specific execution
dfcloud status abc123-def456

View Logs

# View logs for latest execution
dfcloud logs

# View logs for specific execution
dfcloud logs abc123-def456

# Follow logs in real-time
dfcloud logs -f

List Executions

# List recent executions
dfcloud list

# Show more executions
dfcloud list --limit 20

Download Outputs

# List available outputs
dfcloud outputs

# Download outputs for a job
dfcloud download seo-dataset-v1

# Download to specific directory
dfcloud download seo-dataset-v1 --output ./my-outputs

Configuration

# Initialize config interactively
dfcloud config init

# Set individual values
dfcloud config set project_id my-project
dfcloud config set region us-central1
dfcloud config set bucket my-bucket-dfcloud

# List current config
dfcloud config list

# Get a specific value
dfcloud config get project_id

Configuration File Format

Your DeepFabric YAML configs work as-is. The job runner automatically:

  1. Downloads the config from GCS
  2. Updates spin_endpoint to point to the Cloud Run Spin service
  3. Updates tools_endpoint if it references localhost
  4. Runs deepfabric generate
  5. Uploads outputs to GCS
  6. Sends Slack notification

Example config structure:

topics:
  prompt: "Tasks for an SEO assistant..."
  mode: graph
  depth: 5
  degree: 5
  save_as: "topic-graph.jsonl"  # Will be uploaded to GCS
  llm:
    provider: "gemini"
    model: "gemini-2.5-flash"

generation:
  tools:
    spin_endpoint: "http://localhost:3000"  # Auto-updated to Cloud Run URL
    tools_endpoint: "http://localhost:3000/mock/list-tools"
  # ... rest of config

output:
  num_samples: 750
  save_as: "dataset.jsonl"  # Will be uploaded to GCS

Outputs

Job outputs are stored in GCS at:

gs://{bucket}/outputs/{job-name}/{timestamp}/
β”œβ”€β”€ topic-graph.jsonl
└── dataset.jsonl

Slack Notifications

You'll receive notifications for:

  • Job completed: Shows duration and output file locations
  • Job failed: Shows error message and duration

Example notification:

βœ… Job Completed: seo-dataset-v1

Status: Succeeded
Duration: 45.2 minutes

Outputs:
β€’ gs://my-bucket/outputs/seo-dataset-v1/20240110-143022/topic-graph.jsonl
β€’ gs://my-bucket/outputs/seo-dataset-v1/20240110-143022/dataset.jsonl

Cost Optimization

  • Spin service: Set spin_min_instances = 0 for scale-to-zero (adds cold start latency)
  • Job resources: Adjust deepfabric_job_memory and deepfabric_job_cpu based on your workloads
  • Storage: Outputs older than 90 days are automatically moved to Nearline storage

Troubleshooting

Job fails immediately

Check that:

  1. The Spin service is running: gcloud run services describe spin-service --region us-central1
  2. Config file exists in GCS: gsutil ls gs://your-bucket/configs/
  3. Service account has correct permissions

Can't connect to Spin

The DeepFabric job needs the run.invoker role on the Spin service. Verify:

gcloud run services get-iam-policy spin-service --region us-central1

Logs show authentication errors

Ensure your local gcloud is authenticated:

gcloud auth login
gcloud auth application-default login

Cleanup

To destroy all resources:

cd infra
terraform destroy

Note: This will delete the GCS bucket and all outputs. Back up any important data first.

About

DeepFabric Cloud - Run DeepFabric jobs on Google Cloud

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors