DataFlow-Skills

Reusable Claude Code skills for working with the DataFlow data-processing framework.

What's in here

Three skills, each loaded by Claude Code from ~/.claude/skills/<name>/SKILL.md:

Skill	What it does	Invoke with
`generating-dataflow-pipeline`	From a target description + a sample JSONL file, plan the operator chain and emit a runnable DataFlow pipeline.	`/generating-dataflow-pipeline`
`dataflow-dev`	DataFlow developer assistant. Loads architecture knowledge, routes intents (new operator / new pipeline / new prompt / diagnose error / code review / KB sync) into the right workflow. Run inside a DataFlow repo.	`/dataflow-dev`
`core_text`	Per-operator API reference (8 generators, 3 filters, 2 refiners, 5 evaluators) consulted by the pipeline skill when it needs operators beyond the 6 core primitives. No direct invocation — it's documentation the other skills read.	(not directly invoked)

Install

Prerequisite: Claude Code CLI on your PATH. If not yet installed, see docs/install-claude-code.md.

git clone https://github.com/haolpku/DataFlow-Skills.git
cd DataFlow-Skills
./install.sh

That copies all three skills into ~/.claude/skills/ (user-level — available in every project). Then in any Claude Code session:

/generating-dataflow-pipeline

If the slash command shows up in completion, you're done.

Install options

./install.sh --project                       # install into ./.claude/skills/ instead of ~/.claude/skills/
./install.sh dataflow-dev                    # install only the named skill(s)
./install.sh --force                         # overwrite existing skills (default: skip)
./install.sh --help

Update

cd DataFlow-Skills
git pull
./install.sh --force

Quick tour

`generating-dataflow-pipeline`

Video: Generate DataFlow Pipeline

Give it a target and 1–5 sample rows; it returns an operator decision (JSON) followed by a complete pipeline .py.

/generating-dataflow-pipeline
Target: Generate product descriptions and filter high-quality ones
Sample file: ./data/products.jsonl
Expected outputs: generated_description, quality_score

The 6 core primitives it picks from: PromptedGenerator, FormatStrPromptedGenerator, Text2MultiHopQAGenerator, PromptedFilter, GeneralFilter, and the KBC trio (FileOrURLToMarkdownConverterFlash → KBCChunkGenerator → KBCTextCleaner). When these aren't enough it cross-references the core_text skill for extended operators. Full rules and the generated pipeline schema live in generating-dataflow-pipeline/SKILL.md.

`dataflow-dev`

Run it inside a clone of the DataFlow repo. It loads the knowledge base, probes git state, and routes by intent:

Say something like…	Workflow
"new filter operator that…"	Operator creation (duplicate check → spec → code + registration reminder)
"new pipeline that…"	Pipeline creation with the standard `storage.step()` pattern
"new prompt for X"	Prompt creation (`PromptABC` / `DIYPromptABC`, `@prompt_restrict` placement)
"I'm getting `KeyError: …`"	Diagnose against `diagnostics/known_issues.md` (#001–#008)
"review this operator"	14-point checklist (registry, `run()` signature, `get_desc`, etc.)
"the upstream repo has new operators"	Compare local files to `knowledge_base.md`, emit update steps

Details: dataflow-dev/SKILL.md. Aligned to OpenDCAI/DataFlow main (v1.0.10).

`core_text`

Reference docs only — SKILL.md, SKILL_zh.md, examples/good.md, examples/bad.md per operator. Browse: core_text/.

Adding your own operator skill

Drop a core_text/<category>/<your-operator>/ directory with SKILL.md + examples/{good,bad}.md.
Register it in the appropriate table inside generating-dataflow-pipeline/SKILL.md under "Extended Operator Reference" — without that row the pipeline planner can't see it.

If the operator becomes a high-frequency primitive, promote it by editing the same file's "Operator Selection Priority Rule", "Operator Parameter Signature Rule", "Correct Import Paths", and (if it handles a new data type) "Input File Content Analysis Rule" sections.

Repository layout

DataFlow-Skills/
├── install.sh                       # one-shot installer (cp -r into ~/.claude/skills/)
├── docs/
│   └── install-claude-code.md       # CC CLI install guide
├── generating-dataflow-pipeline/    # pipeline planner skill
├── dataflow-dev/                    # DataFlow dev assistant skill
└── core_text/                       # extended operator reference
    ├── generate/
    ├── filter/
    ├── refine/
    └── eval/

Upstream

All knowledge bases in this repo are aligned to OpenDCAI/DataFlow main (v1.0.10).

Name		Name	Last commit message	Last commit date
Latest commit History 35 Commits
core_text		core_text
dataflow-dev		dataflow-dev
docs		docs
generating-dataflow-pipeline		generating-dataflow-pipeline
.gitignore		.gitignore
README.md		README.md
README_zh.md		README_zh.md
install.sh		install.sh

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

DataFlow-Skills

What's in here

Install

Install options

Update

Quick tour

`generating-dataflow-pipeline`

`dataflow-dev`

`core_text`

Adding your own operator skill

Repository layout

Upstream

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

DataFlow-Skills

What's in here

Install

Install options

Update

Quick tour

generating-dataflow-pipeline

dataflow-dev

core_text

Adding your own operator skill

Repository layout

Upstream

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

`generating-dataflow-pipeline`

`dataflow-dev`

`core_text`

Packages