A Kubernetes-native platform for building modern, declarative data pipelines with clear boundaries between ingestion and transformation.
- 🎛️ Kubernetes Operator - Go-based CRD management and pipeline orchestration
- 📥 Ingest Workload - Type-safe data ingestion (Python)
- 🔄 Transform Workload - dbt-based data transformation
- ⚡ Trigger Workload - Event-driven pipeline activation (Go)
- 🛠️ Development Environment - Local development setup and database provisioning
- ☁️ Infrastructure - Cloud infrastructure automation and deployment
- 📋 Examples - Comprehensive YAML examples and use cases
A complete solution for orchestrating data pipelines in Kubernetes environments. Combines a powerful Kubernetes operator with specialized workloads to provide a declarative, event-driven approach to data pipeline management.
- Unified Pipeline Lifecycle - Connect ingestion with transformation in a single application lifecycle
- Native Kubernetes Resources - Each step runs on 100% native K8s resources
- Event-Driven Orchestration - React to file drops, Pub/Sub messages, and BigQuery updates
- Built-in Observability - Comprehensive status tracking and monitoring
Pipeline Forge consists of two main components:
Go-based CRD management and pipeline orchestration
- Custom Resource Definitions (CRDs) for pipeline definition
- Automatic reconciliation and lifecycle management
- RBAC integration and resource management
- Event-driven trigger management
Production-ready data processing components
- Ingest - Type-safe data ingestion from MySQL, PostgreSQL to BigQuery
- Transform - dbt-based data transformation
- Trigger - Event processing for GCS, Pub/Sub, and BigQuery
| Component | Technology | Purpose |
|---|---|---|
| Operator | Go, Kubernetes, Kubebuilder | Pipeline orchestration and CRD management |
| Ingest | Python 3.13+, Pydantic, Typer | Type-safe data ingestion with validation |
| Transform | dbt Core, BigQuery | Data transformation and analytics |
| Triggers | Go, Google Cloud APIs | Event-driven pipeline activation |
| Dev Environment | Docker Compose, SQL | Local development and testing |
| Infrastructure | Terraform, GCP | Cloud infrastructure automation |
pipeline-forge/
├── operator/ # Kubernetes operator (Go)
├── workloads/ # Data processing components
│ ├── ingest/ # Type-safe ingestion (Python)
│ ├── transform/ # dbt transformations
│ └── trigger/ # Event processing (Go)
├── dev/ # Development environment setup
├── infrastructure/ # Cloud infrastructure
├── integrations/ # Integration experiments
└── docs/ # Documentation
Current State: Work in Progress
| Component | Status | Description |
|---|---|---|
| 🎛️ Operator API | ⚡ Functional | CRD definitions and API contracts |
| 🎛️ Operator Reconciliation | 🚧 In Development | Pipeline orchestration and lifecycle management |
| 📥 Ingest Workload | ⚡ Functional | Type-safe data ingestion (Python) |
| 🔄 Transform Workload | ⚡ Functional | dbt-core data transformation |
| ⚡ Trigger Workload | 🚧 In Development | Event-driven pipeline activation (Go) |
In integrations/, you will find experiments and documentation for integrating Pipeline Forge with other systems, technologies, etc. Current integrations:
- pipeline-forge-jenkins-k8s - Separate repository for Jenkins CI/CD evaluation with Kubernetes-native deployment, Configuration as Code (JCasC), and Job DSL seed patterns.
This project is licensed under the Apache License 2.0. See the LICENSE file for details.
Contributions are welcome! Please open an issue first to discuss any changes before submitting a pull request.
This is a personal open-source project, developed independently on my own time and equipment.
It is not affiliated with, endorsed by, or representing my employer.