Skip to content

OpenDCAI/DataFlow-Skills

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

35 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

DataFlow-Skills

Reusable Claude Code skills for working with the DataFlow data-processing framework.

中文文档:README_zh.md


What's in here

Three skills, each loaded by Claude Code from ~/.claude/skills/<name>/SKILL.md:

Skill What it does Invoke with
generating-dataflow-pipeline From a target description + a sample JSONL file, plan the operator chain and emit a runnable DataFlow pipeline. /generating-dataflow-pipeline
dataflow-dev DataFlow developer assistant. Loads architecture knowledge, routes intents (new operator / new pipeline / new prompt / diagnose error / code review / KB sync) into the right workflow. Run inside a DataFlow repo. /dataflow-dev
core_text Per-operator API reference (8 generators, 3 filters, 2 refiners, 5 evaluators) consulted by the pipeline skill when it needs operators beyond the 6 core primitives. No direct invocation — it's documentation the other skills read. (not directly invoked)

Install

Prerequisite: Claude Code CLI on your PATH. If not yet installed, see docs/install-claude-code.md.

git clone https://github.com/haolpku/DataFlow-Skills.git
cd DataFlow-Skills
./install.sh

That copies all three skills into ~/.claude/skills/ (user-level — available in every project). Then in any Claude Code session:

/generating-dataflow-pipeline

If the slash command shows up in completion, you're done.

Install options

./install.sh --project                       # install into ./.claude/skills/ instead of ~/.claude/skills/
./install.sh dataflow-dev                    # install only the named skill(s)
./install.sh --force                         # overwrite existing skills (default: skip)
./install.sh --help

Update

cd DataFlow-Skills
git pull
./install.sh --force

Quick tour

generating-dataflow-pipeline

Video: Generate DataFlow Pipeline

Give it a target and 1–5 sample rows; it returns an operator decision (JSON) followed by a complete pipeline .py.

/generating-dataflow-pipeline
Target: Generate product descriptions and filter high-quality ones
Sample file: ./data/products.jsonl
Expected outputs: generated_description, quality_score

The 6 core primitives it picks from: PromptedGenerator, FormatStrPromptedGenerator, Text2MultiHopQAGenerator, PromptedFilter, GeneralFilter, and the KBC trio (FileOrURLToMarkdownConverterFlashKBCChunkGeneratorKBCTextCleaner). When these aren't enough it cross-references the core_text skill for extended operators. Full rules and the generated pipeline schema live in generating-dataflow-pipeline/SKILL.md.

dataflow-dev

Run it inside a clone of the DataFlow repo. It loads the knowledge base, probes git state, and routes by intent:

Say something like… Workflow
"new filter operator that…" Operator creation (duplicate check → spec → code + registration reminder)
"new pipeline that…" Pipeline creation with the standard storage.step() pattern
"new prompt for X" Prompt creation (PromptABC / DIYPromptABC, @prompt_restrict placement)
"I'm getting KeyError: …" Diagnose against diagnostics/known_issues.md (#001–#008)
"review this operator" 14-point checklist (registry, run() signature, get_desc, etc.)
"the upstream repo has new operators" Compare local files to knowledge_base.md, emit update steps

Details: dataflow-dev/SKILL.md. Aligned to OpenDCAI/DataFlow main (v1.0.10).

core_text

Reference docs only — SKILL.md, SKILL_zh.md, examples/good.md, examples/bad.md per operator. Browse: core_text/.


Adding your own operator skill

  1. Drop a core_text/<category>/<your-operator>/ directory with SKILL.md + examples/{good,bad}.md.
  2. Register it in the appropriate table inside generating-dataflow-pipeline/SKILL.md under "Extended Operator Reference" — without that row the pipeline planner can't see it.

If the operator becomes a high-frequency primitive, promote it by editing the same file's "Operator Selection Priority Rule", "Operator Parameter Signature Rule", "Correct Import Paths", and (if it handles a new data type) "Input File Content Analysis Rule" sections.


Repository layout

DataFlow-Skills/
├── install.sh                       # one-shot installer (cp -r into ~/.claude/skills/)
├── docs/
│   └── install-claude-code.md       # CC CLI install guide
├── generating-dataflow-pipeline/    # pipeline planner skill
├── dataflow-dev/                    # DataFlow dev assistant skill
└── core_text/                       # extended operator reference
    ├── generate/
    ├── filter/
    ├── refine/
    └── eval/

Upstream

All knowledge bases in this repo are aligned to OpenDCAI/DataFlow main (v1.0.10).

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors