Skip to content

Latest commit

 

History

History
161 lines (114 loc) · 10 KB

File metadata and controls

161 lines (114 loc) · 10 KB

CLAUDE.md — Apache DolphinScheduler

Apache DolphinScheduler is a distributed, visual DAG workflow-scheduling platform. This is the monorepo: backend servers (master / worker / api / alert), a Vue 3 frontend, plugin families for tasks / datasources / storage / alerting / scheduling, and the release tooling.

This file is an index. Each module has its own CLAUDE.md with the details — do not duplicate module contents here.


Tech stack (project-wide)

  • Java 1.8 (do not assume 11+ APIs; dolphinscheduler-api-test is the only Java 11 island).
  • Spring Boot 2.6.1 across servers, Jetty (Tomcat is excluded transitively).
  • MyBatis-Plus for ORM; HikariCP for the metadata DB pool, Druid inside user-facing datasource plugins.
  • Quartz for cron scheduling (via scheduler-plugin).
  • Netty / gRPC for inter-server RPC (see extract-base).
  • Vue 3 + Vite + TypeScript + Naive UI for the frontend.
  • Maven multi-module reactor (26 modules in root pom.xml + 2 test modules).
  • Zookeeper 3.8 by default for the registry (Etcd and JDBC also supported).

Runnable services

A production deployment runs four independent services (plus an external registry and metadata DB). A fifth entry point — StandaloneServer — embeds all four in one JVM for development.

Service Module Main class Default ports
API dolphinscheduler-api org.apache.dolphinscheduler.api.ApiApplicationServer 12345 (HTTP / UI + REST)
Master dolphinscheduler-master org.apache.dolphinscheduler.server.master.MasterServer 5679 (RPC)
Worker dolphinscheduler-worker org.apache.dolphinscheduler.server.worker.WorkerServer 1235 (RPC)
Alert dolphinscheduler-alert (→ -alert-server) org.apache.dolphinscheduler.alert.AlertServer 50053 (HTTP), 50052 (RPC)
Standalone (dev only) dolphinscheduler-standalone-server org.apache.dolphinscheduler.StandaloneServer 12345 + 50052 (API + alert; master/worker use in-JVM calls)

Every service is a @SpringBootApplication on Jetty and implements IStoppable. Scale Master / Worker / Alert horizontally; coordination happens via the registry (Zookeeper by default). API is stateless and also scales horizontally behind a load balancer.

Ports are overridable via server.port / service-specific keys in each service's application.yaml.

Build & run

# Full build (release profile; produces dist tarball)
./mvnw clean install -Prelease

# Zookeeper 3.4 legacy
./mvnw clean install -Prelease -Dzk-3.4

# Skip UI build (faster iteration on backend only)
./mvnw -pl '!dolphinscheduler-ui' clean install

# Build one module (+ its required siblings)
./mvnw -pl dolphinscheduler-master -am clean install

# Format (Spotless is configured)
./mvnw spotless:apply

# Standalone server (after building)
cd dolphinscheduler-standalone-server/target && ./bin/start.sh

Binary artifact: dolphinscheduler-dist/target/apache-dolphinscheduler-*-bin.tar.gz.

Test

# Unit tests for one module
./mvnw -pl dolphinscheduler-master test

# API integration tests (separate reactor, requires Docker)
mvn -pl dolphinscheduler-api-test/dolphinscheduler-api-test-case test

# E2E browser tests (Selenium + Docker)
mvn -pl dolphinscheduler-e2e/dolphinscheduler-e2e-case test

# Apple Silicon: add -Dm1_chip=true to the Docker-driven suites

Module index

Click into a module's CLAUDE.md for details. Each description is one line here on purpose.

Core execution

API layer

Shared libraries

Plugin families

Build, ops, tools

Frontend & E2E


Architecture overview (one paragraph)

A user hits the UI, which calls the API server. The API server writes to the metadata DB and, for runtime operations (start / kill / pause workflow), talks to the master over RPC. The master consumes t_ds_command rows, runs the workflow state machine, and dispatches tasks to workers. Workers execute task plugins (shell, SQL, Spark, …) and stream lifecycle events back to master. Failures and SLA breaches flow to the alert server, which fans out through alert plugins. Registry (Zookeeper / Etcd / JDBC) provides service discovery, leader election, and distributed locks. Storage plugins back the resource center and distributed-task artifacts. Quartz (via scheduler plugin) fires scheduled workflows, which become new Command rows.

Where things live (quick lookup)

Looking for… Start here
A REST endpoint dolphinscheduler-api/src/main/java/.../api/controller/
Workflow execution logic dolphinscheduler-master/src/main/java/.../server/master/engine/
Task execution logic dolphinscheduler-worker + the specific task-plugin/<type>
How "X" is stored dolphinscheduler-dao/src/main/java/.../dao/entity/
SQL schema / upgrade dolphinscheduler-dao/src/main/resources/sql/
RPC contract between servers dolphinscheduler-extract/dolphinscheduler-extract-<role>
UI page source dolphinscheduler-ui/src/views/<feature>/
API call in the UI dolphinscheduler-ui/src/service/modules/<resource>.ts
Version of a dependency dolphinscheduler-bom/pom.xml

Project-wide conventions

  • Formatting: ./mvnw spotless:apply. CI will fail PRs that aren't formatted. Java imports are ordered; license headers are enforced.
  • Commit style: [Type-ISSUE_ID] [Scope] Subject, e.g. [Fix-18168] [Worker] .... Scopes match module names.
  • Branching: dev is the main integration branch (not main/master).
  • PRs must link a GitHub issue and keep their scope tight — one module / one concern.
  • Do not break wire / DB compatibility silently. Changes to extract-* RPC interfaces, dao entities, enum values, and spi.DbType ripple to deployed clusters mid-upgrade.
  • Only one registry / storage / DB dialect is active at runtime. Code paths that check "which one" belong inside the plugin SPI, not sprinkled through services.

External references