Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Replace static plan of action image with dynamic mermaid file #111

Open
wants to merge 1 commit into
base: main
Choose a base branch
from

Conversation

INF800
Copy link

@INF800 INF800 commented Jan 29, 2025

Current commit replaces the plan of action PNG file with below mermaid file:

flowchart LR
    subgraph one["Step 1"]
        A1[Deepseek-R1]:::modelStyle --> B1[Distilled reasoning data]:::dataStyle
        C1[Instruct Model]:::modelStyle --> D1[SFT]:::processStyle
        B1 --> D1
        D1 --> E1[Open R1-Distill]:::modelStyle
    end

    subgraph two["Step 2"]
        B2[RL reasoning data]:::dataStyle --> D2[GRPO with verifiable rewards]:::processStyle
        C2[Base Model]:::modelStyle --> D2
        D2 --> E2[Open R1-Zero]:::modelStyle
    end

    subgraph three["Step 3"]
        A3[Open R1-Zero]:::modelStyle --> B3[SFT reasoning data]:::dataStyle
        C3[Base Model]:::modelStyle --> D3[SFT]:::processStyle
        B3 --> D3
        D3 --> E3[GRPO with verifiable rewards]:::processStyle
        F3[RL reasoning data]:::dataStyle --> E3
        E3 --> G3[Open R1]:::modelStyle
    end

    one --> two --> three

    classDef modelStyle fill:#ffb6c1,stroke:#333,stroke-width:2px,color:#000000
    classDef dataStyle fill:#ffeb99,stroke:#333,stroke-width:2px,color:#000000
    classDef processStyle fill:#87ceeb,stroke:#333,stroke-width:2px,color:#000000
Loading

Need: To be able to make changes at code level and add more sophisticated diagrams like this one: (easy to see <-> easy to understand)

flowchart TD
    %% Define styles with darker font colors
    classDef blueBox fill:#e6f3ff,stroke:#000,stroke-width:1px,color:#000000
    classDef yellowBox fill:#fffbe6,stroke:#000,stroke-width:1px,color:#000000
    classDef pinkBox fill:#ffe6e6,stroke:#000,stroke-width:1px,color:#000000

    %% Base model
    A["DeepSeek-V3 Base<br/>(671B/37B Activated)"]:::blueBox

    %% First branch - Supervised Fine-Tuning
    B["Supervised<br/>Fine-Tuning<br/>(SFT)"]:::yellowBox
    C[("Cold Start<br/>Long CoT Data<br/>(~k samples)")]:::pinkBox

    %% RORL and CoT components
    D["Reasoning Oriented RL<br/>GRPO<br/>Rule-based Reward<br/>(Accuracy, Formatting)"]:::yellowBox
    E["+ CoT Language<br/>Consistency Reward"]:::yellowBox

    %% Middle section
    F["DeepSeek-V3 Base<br/>+ CS SFT + RORL"]:::blueBox
    G["Reasoning Prompts +<br/>Rejection Sampling<br/>(Rule-based &<br/>DS-V3 as judge)"]:::yellowBox
    H[("Reasoning Data<br/>(600k samples)")]:::pinkBox

    %% Right branch
    I["DeepSeek-V3<br/>(671B/37B Activated)"]:::blueBox
    J["CoT Prompting"]:::yellowBox
    K[("Non-Reasoning<br/>Data<br/>(200k samples)")]:::pinkBox

    %% Model variants
    L[("DeepSeek-V3<br/>SFT Data")]:::pinkBox
    M[("Combined<br/>SFT Data<br/>(800k samples)")]:::pinkBox

    %% Bottom section - Models and training
    N["Qwen2.5-Math-1.5B"]:::blueBox
    O["Qwen2.5-Math-7B"]:::blueBox
    P["Qwen2.5 14B"]:::blueBox
    Q["Qwen2.5 32B"]:::blueBox
    R["Llama-3.3-70B-Instruct"]:::blueBox
    S["Llama-3.1-8B"]:::blueBox

    T["SFT<br/>2 epochs<br/>800k samples"]:::yellowBox
    U["SFT<br/>2 epochs<br/>800k samples"]:::yellowBox
    V["RL<br/>Reasoning + Preference Reward<br/>Diverse Training Prompts"]:::yellowBox

    %% Final models
    W["DeepSeek-R1-Zero"]:::blueBox
    X["DeepSeek-R1-Distill-(Qwen/Llama)-{*B}"]:::blueBox
    Y["DeepSeek-R1"]:::blueBox

    %% Connections
    A --> B
    B --> C
    A --> D
    A --> E
    D & E --> F
    F --> G
    G --> H
    I --> J
    J --> K
    H --> M
    K --> M
    L --> M
    N & O & P & Q & R & S --> T
    T --> X
    M --> T
    M --> U
    U --> V
    V --> Y
    A --> W

    %% Subgraph for distillation
    subgraph Distillation
        N
        O
        P
        Q
        R
        S
        T
        X
    end
Loading

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

1 participant