-
Notifications
You must be signed in to change notification settings - Fork 1.9k
Open
Description
Hi all! It's only been ~a day and we're already getting flooded with submissions, and this is just gonna get worse the more people use autoresearch-like tools. While I think it's cool to keep the barrier to entry low so more people can have fun with us, I think it's also worthwhile to enforce scientific rigor and shift the 'burden of proof' back to the participants.
That said, I propose setting up these rules to guard against the slop:
Each submission (record or non-record) must have at least 3-6 log files in the PR so we can run the t-test ourselves to verify. I think only requiring 1 log file encourages sloppiness. E.g., a participant can just cherry-pick the best one or submit one and pretend they ran more. If repo cleanliness is a concern, we can also set up an automation to remove all but one before merging to[Pardon, it seems this is already in the submission process; I got confused] [Fix: https://github.com/Make Submission Process section in the README less confusing #132]main.- Each participant can only make one "open" record attempt at a time. They can make as many non-record or draft attempts as they want, but only one has to be open (maintainers can mark older ones as duplicate or stale). To make a new record attempt, participants can either (1A) update the old one and request for another review before being 'accepted' or (1B) close the old one and raise a new PR.
- Merge sufficiently similar concurrent submissions into a single attempt. If accepted, we credit all contributors in the README. But we also require comments in
train_gpt.pyclarifying who contributed what (and also so it'd be easier for newbies to tell who to ask for help on a section of the code). - Enforce stricter code quality controls. This is so it'd be easier for us to spot (LLM-generated) reward hacking/malicious code. This would also help newbies (or agents) to build on top of each others work.
wdyt?
Reactions are currently unavailable
Metadata
Metadata
Assignees
Labels
No labels