Skip to content

Invalid submissions due to information leakage during TTT #402

@leloykun

Description

@leloykun

Don't train on eval tokens your model hasn't scored yet!


"Proper" TTT goes like this:

  1. For each 1 <= t <= T:
    1.1. Score on eval token t
    1.2. Adapt weights based on eval token <= t

What y'all are doing is something like this:

  1. For each 1 <= t <= T:
    1.1. Adapt weights based on eval token <= t
  2. For each 1 <= t <= T:
    2.1. Score on eval token t

But this is equivalent to appending the eval tokens to the training tokens and switching training strategies before eval! Also see: #152 (comment)


Potentially invalid submissions:

PR comment status
#136 TTT on half of the batch; eval on full batch [ ] open
#152 TTT on all eval tokens before evaluation [x] closed
#254 TTT on multiple parts of eval sequence for multiple epochs [ ] open
#264 TTT before eval [ ] open
#338 TTT on multiple parts of eval sequence for multiple epochs [ ] open
#398 TTT on all eval tokens before evaluation [ ] Open
#421 TTT before evals [ ] open
#417 TTT for multiple epochs before evals [ ] open
#442 TTT before evals [ ] open

cc @0hq


Please feel free to correct me if I'm wrong.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions