Evolved Python solutions for the NeurIPS 2025 Google Code Golf Championship using LLM-driven evolutionary optimization.
| Metric | Value |
|---|---|
| Solved | 72 / 400 (18.0%) |
| Total Score | 163,412 points |
| Avg Score/Task | 2,270 points |
| % of Winner Avg | 94.4% (winner: 2,405 pts/task) |
| Projected Final | ~908,000 points (details) |
| Task | Pattern | Bytes | Score | Solution |
|---|---|---|---|---|
4c4377d9 |
Vertical flip concat | 24 | 2,476 | solution.py |
3c9b0459 |
180° rotation | 40 | 2,460 | solution.py |
44f52bb0 |
Horizontal symmetry check | 46 | 2,454 | solution.py |
d631b094 |
Collect non-zero cells | 47 | 2,453 | solution.py |
25d8a9c8 |
Row uniformity check | 50 | 2,450 | solution.py |
0520fde7 |
Grid AND comparison | 57 | 2,443 | solution.py |
22eb0ac0 |
Matching edge markers fill | 57 | 2,443 | solution.py |
0d3d703e |
Color mapping (LUT) | 58 | 2,442 | solution.py |
1b2d62fb |
Conditional grid coloring | 58 | 2,442 | solution.py |
007bbfb7 |
Outer product grid | 65 | 2,435 | solution.py |
2281f1f4 |
Row/column intersection fill | 67 | 2,433 | solution.py |
28bf18c6 |
Extract + duplicate shape | 67 | 2,433 | solution.py |
29c11459 |
Horizontal line splitting | 68 | 2,432 | solution.py |
1e0a9b12 |
Gravity (drop cells) | 69 | 2,431 | solution.py |
27a28665 |
Pattern shape classification | 70 | 2,430 | solution.py |
017c7c7b |
Extend pattern + double | 80 | 2,420 | solution.py |
3428a4f5 |
XOR halves by separator | 88 | 2,412 | solution.py |
1bfc4729 |
Dual frame pattern | 108 | 2,392 | solution.py |
1fad071e |
Count 2x2 blue blocks | 109 | 2,391 | solution.py |
22168020 |
Fill between endpoints | 112 | 2,388 | solution.py |
05269061 |
Diagonal color cycle | 113 | 2,387 | solution.py |
1190e5a7 |
Count cells by grid lines | 124 | 2,376 | solution.py |
137eaa0f |
Symmetry reflection | 130 | 2,370 | solution.py |
1cf80156 |
Bounding box extraction | 130 | 2,370 | solution.py |
2bee17df |
Cross line fill | 132 | 2,368 | solution.py |
2204b7a8 |
Border region coloring | 137 | 2,363 | solution.py |
08ed6ac7 |
Column rank labeling | 142 | 2,358 | solution.py |
363442ee |
Fill bottom row pattern | 144 | 2,356 | solution.py |
28e73c20 |
Spiral maze generation | 149 | 2,351 | solution.py |
3ac3eb23 |
Diagonal checkerboard | 150 | 2,350 | solution.py |
2013d3e2 |
Symmetry axis extraction | 152 | 2,348 | solution.py |
4258a5f9 |
3×3 box around 5s | 160 | 2,340 | solution.py |
09629e4f |
Fill grid segments | 170 | 2,330 | solution.py |
239be575 |
Small pattern movement | 170 | 2,330 | solution.py |
10fcaaa3 |
2x2 tiling + diagonal 8s | 174 | 2,326 | solution.py |
1f876c06 |
Diagonal line propagation | 174 | 2,326 | solution.py |
3aa6fb7a |
L-shaped 8s + corner 1 | 178 | 2,322 | solution.py |
1f85a75f |
Extract rare color region | 182 | 2,318 | solution.py |
2dc579da |
Extract quadrant with anomaly | 189 | 2,311 | solution.py |
23581191 |
Cross lines intersection | 198 | 2,302 | solution.py |
1e32b0e9 |
Grid template completion | 201 | 2,299 | solution.py |
23b5c85d |
Smallest colored rectangle | 201 | 2,299 | solution.py |
0ca9ddb6 |
Color spread from seeds | 207 | 2,293 | solution.py |
1caeab9d |
Line intersection marking | 207 | 2,293 | solution.py |
253bf280 |
Connect 8s with 3s | 207 | 2,293 | solution.py |
178fcbfb |
Extend markers to lines | 217 | 2,283 | solution.py |
00d62c1b |
Fill enclosed regions | 219 | 2,281 | solution.py |
3de23699 |
Extract marker-bounded region | 225 | 2,275 | solution.py |
0a938d79 |
Alternating stripe pattern | 237 | 2,263 | solution.py |
3bdb4ada |
Middle row stripe | 239 | 2,261 | solution.py |
0dfd9992 |
Color substitution pairs | 239 | 2,261 | solution.py |
0962bcdd |
T-junction detection | 241 | 2,259 | solution.py |
1c786137 |
Corner rectangle frames | 249 | 2,251 | solution.py |
1f0c79e5 |
Diagonal ray extension | 261 | 2,239 | solution.py |
025d127b |
Parallelogram to rect | 266 | 2,234 | solution.py |
1f642eb9 |
Marker position projection | 266 | 2,234 | solution.py |
32597951 |
Extract repeating tile | 274 | 2,226 | solution.py |
11852cab |
4-fold rotational symmetry | 280 | 2,220 | solution.py |
05f2a901 |
Move shape to reference | 326 | 2,174 | solution.py |
06df4c85 |
Grid line completion | 378 | 2,122 | solution.py |
1a07d186 |
Line projection | 434 | 2,066 | solution.py |
0b148d64 |
Quadrant extraction | 454 | 2,046 | solution.py |
2bcee788 |
Color replacement by marker | 465 | 2,035 | solution.py |
22233c11 |
Diagonal corner marking | 474 | 2,026 | solution.py |
045e512c |
Pattern replication | 486 | 2,014 | solution.py |
150deff5 |
Grid extraction borders | 494 | 2,006 | solution.py |
228f6490 |
Shape-to-hole matching | 520 | 1,980 | solution.py |
a64e4611 |
Largest rectangle + cross | 523 | 1,977 | solution.py |
39a8645d |
Most frequent shape | 526 | 1,974 | solution.py |
2dd70a9a |
U-shape connector | 673 | 1,827 | solution.py |
1b60fb0c |
Segment extraction | 1,026 | 1,474 | solution.py |
0e206a2e |
Rotated template placement | 1,135 | 1,365 | solution.py |
| Task ID | Pattern | Est. Difficulty |
|---|---|---|
234bbc79 |
Bounding box intersection | Very Hard |
25d487eb |
Container fill with color | Hard |
25ff71a9 |
Pattern isolation | Hard |
264363fd |
Grid region coloring | Very Hard |
272f95fa |
Grid cell quadrant coloring | Very Hard |
29623171 |
Grid cell fill by quadrant | Hard |
29ec7d0e |
Fill missing pattern | Hard |
2c608aff |
Connect marked cross lines | Hard |
2dee498d |
Shape replication | Hard |
31aa019c |
Vertical background lines | Hard |
321b1fc6 |
Find unique odd pattern | Hard |
3345333e |
Shape copy across shape | Hard |
3618c87e |
Grid splitting with marker | Hard |
3631a71a |
Remove colored block | Hard |
Click to expand full list of remaining tasks
| Task ID | Task ID | Task ID | Task ID |
|---|---|---|---|
36d67576 |
36fdfd69 |
3906de3d |
39a8645d |
39e1d7f9 |
3aa6fb7a |
3ac3eb23 |
3af2c5a8 |
3bd67248 |
3bdb4ada |
3befdf3e |
3c9b0459 |
3de23699 |
3e980e27 |
3eda0437 |
3f7978a0 |
40853293 |
4093f84a |
41e4d17e |
4258a5f9 |
4290ef0e |
42a50994 |
4347f46a |
444801d8 |
445eab21 |
447fd412 |
44d8ac46 |
44f52bb0 |
4522001f |
4612dd53 |
46442a0e |
469497ad |
46f33fce |
47c1f68c |
484b58aa |
48d8fb45 |
4938f0c2 |
496994bd |
49d1d64f |
4be741c5 |
4c4377d9 |
4c5c2cf0 |
50846271 |
508bd3b6 |
50cb2852 |
5117e062 |
5168d44c |
539a4f51 |
53b68214 |
543a7ed5 |
54d82841 |
54d9e175 |
5521c0d9 |
5582e5ca |
5614dbcf |
56dc2b01 |
56ff96f3 |
57aa92db |
5ad4f10b |
5bd6f4ac |
5c0a986e |
5c2c9af4 |
5daaa586 |
60b61512 |
6150a2bd |
623ea044 |
62c24649 |
63613498 |
6430c8c4 |
6455b5f5 |
662c240a |
67385a82 |
673ef223 |
6773b310 |
67a3c6ac |
67a423a3 |
67e8384a |
681b3aeb |
6855a6e4 |
68b16354 |
694f12f3 |
6a1e5592 |
6aa20dc0 |
6b9890af |
6c434453 |
6cdd2623 |
6cf79266 |
6d0160f0 |
6d0aefbc |
6d58a25d |
6d75e8bb |
6e02f1e3 |
6e19193c |
6e82a1ae |
6ecd11f4 |
6f8cd79b |
6fa7a44f |
72322fa7 |
72ca375d |
73251a56 |
7447852a |
7468f01a |
746b3537 |
74dd1130 |
75b8110e |
760b3cac |
776ffc46 |
77fdfe62 |
780d0b14 |
7837ac64 |
794b24be |
7b6016b9 |
7b7f7511 |
7c008303 |
7ddcd7ec |
7df24a62 |
7e0986d6 |
7f4411dc |
7fe24cdd |
80af3007 |
810b9b61 |
82819916 |
83302e8f |
834ec97d |
8403a5d5 |
846bdb03 |
855e0971 |
85c4e7cd |
868de0fa |
8731374e |
88a10436 |
88a62173 |
890034e9 |
8a004b2b |
8be77c9e |
8d5021e8 |
8d510a79 |
8e1813be |
8e5a5113 |
8eb1be9a |
8efcae92 |
8f2ea7aa |
90c28cc7 |
90f3ed37 |
913fb3ed |
91413438 |
91714a58 |
9172f3a0 |
928ad970 |
93b581b8 |
941d9a10 |
94f9d214 |
952a094c |
9565186b |
95990924 |
963e52fc |
97999447 |
97a05b5b |
98cf29f8 |
995c5fa3 |
99b1bc43 |
99fa7670 |
9aec4887 |
9af7a82c |
9d9215db |
9dfd6313 |
9ecd008a |
9edfc990 |
9f236235 |
a1570a43 |
a2fd1cf0 |
a3325580 |
a3df8b1e |
a416b8f3 |
a48eeaf7 |
a5313dff |
a5f85a15 |
a61ba2ce |
a61f2674 |
a65b410d |
a68b268e |
a699fb00 |
a740d043 |
a78176bb |
a79310a0 |
a85d4709 |
a87f7484 |
a8c38be5 |
a8d7556c |
a9f96cdd |
aabf363d |
aba27056 |
ac0a08a4 |
ae3edfdc |
ae4f1146 |
aedd82e4 |
af902bf9 |
b0c4d837 |
b190f7f5 |
b1948b0a |
b230c067 |
b27ca6d3 |
b2862040 |
b527c5c6 |
b548a754 |
b60334d2 |
b6afb2da |
b7249182 |
b775ac94 |
b782dc8a |
b8825c91 |
b8cdaf2b |
b91ae062 |
b94a9452 |
b9b7f026 |
ba26e723 |
ba97ae07 |
bb43febb |
bbc9ae5d |
bc1d5164 |
bd4472b8 |
bda2d7a6 |
bdad9b1f |
be94b721 |
beb8660c |
c0f76784 |
c1d99e64 |
c3e719e8 |
c3f564a4 |
c444b776 |
c59eb873 |
c8cbb738 |
c8f0f002 |
c909285e |
c9e6f938 |
c9f8e694 |
caa06a1f |
cbded52d |
cce03e0d |
cdecee7f |
ce22a75a |
ce4f8723 |
ce602527 |
ce9e57f2 |
cf98881b |
d037b0a7 |
d06dbe63 |
d07ae81c |
d0f5fe59 |
d10ecb37 |
d13f3404 |
d22278a0 |
d23f8c26 |
d2abd087 |
d364b489 |
d406998b |
d43fd935 |
d4469b4b |
d4a91cb9 |
d4f3cd78 |
d511f180 |
d5d6de2d |
d631b094 |
d687bc17 |
d6ad076f |
d89b689b |
d8c310e9 |
d90796e8 |
d9f24cd1 |
d9fac9be |
dae9d2b5 |
db3e9e38 |
db93a21d |
dbc1a6ce |
dc0a314f |
dc1df850 |
dc433765 |
ddf7fa4f |
de1cd16c |
ded97339 |
e179c5f4 |
e21d9049 |
e26a3af2 |
e3497940 |
e40b9e2f |
e48d4e1a |
e5062a87 |
e509e548 |
e50d258f |
e6721834 |
e73095fd |
e76a88a6 |
e8593010 |
e8dc4411 |
e9614598 |
e98196ab |
e9afcf9a |
ea32f347 |
ea786f4a |
eb281b96 |
eb5a1d5d |
ec883f72 |
ecdecbb3 |
ed36ccf7 |
ef135b50 |
f15e1fac |
f1cefba8 |
f25fbde4 |
f25ffba3 |
f2829549 |
f35d900a |
f5b8619d |
f76d97a5 |
f8a8fe49 |
f8b3ba0a |
f8c80d96 |
f8ff0b80 |
f9012d9b |
fafffa47 |
fcb5c309 |
fcc82909 |
feca6190 |
ff28f65a |
ff805c23 |
All solutions must include documentation. See CONTRIBUTING.md for requirements.
Each solution directory must contain:
solution.py- The golfed Python codeREADME.md- Pattern description, algorithm explanation, and golf tricks used
Each solved task has its own subdirectory:
code-golf/
├── 0520fde7/ # Grid AND comparison (57 bytes)
├── 017c7c7b/ # Extend pattern + double (80 bytes)
├── 00d62c1b/ # Fill enclosed regions (238 bytes)
├── a64e4611/ # Largest rectangle task (523 bytes)
├── evaluator.py # Scoring and validation
├── tasks/ # All 400 ARC-AGI tasks (legacy)
└── README.md # This file
Each task directory contains:
solution.py- Golfed Python solutiontask.json- ARC-AGI task definitionREADME.md- Task-specific notes and evolution history
The NeurIPS 2025 - Google Code Golf Championship challenged participants to write the shortest possible Python programs that correctly solve 400 ARC-AGI tasks.
| Detail | Value |
|---|---|
| Prize Pool | $100,000 |
| Tasks | 400 (ARC-AGI public training set) |
| Scoring | max(1, 2500 - bytes) per correct solution |
| Maximum Score | 1,000,000 (400 × 2500) |
| Deadline | October 30, 2025 |
Code golf is a unique optimization challenge:
- Correctness is binary - A solution that fails ANY test case scores 0.001
- Every byte matters - Saving 1 byte = +1 point
- Semantic equivalence required - Transformations must preserve behavior
- Language mastery needed - Exploiting Python quirks and shortcuts
- Algorithm selection critical - Sometimes a completely different approach is shorter
Code golf is an ideal testbed for LLM-driven evolution because:
- Clear fitness function: Byte count (lower = better)
- Automatic verification: Run tests to check correctness
- Rich mutation space: Syntax tricks, algorithm changes, refactoring
- Transferable learnings: Tricks discovered on one task apply to others
Unlike performance optimization (where we measure ops/sec), code golf evolution optimizes for minimum byte count:
fitness = correctness × (2500 - bytes) / 2500
┌─────────────────────────────────────────────────────────────────┐
│ Code Golf Evolution Pipeline │
├─────────────────────────────────────────────────────────────────┤
│ │
│ ┌──────────────┐ ┌──────────────┐ ┌──────────────┐ │
│ │ Stage 1: │───▶│ Stage 2: │───▶│ Stage 3: │ │
│ │ Find Correct │ │ Apply Known │ │ Discover New │ │
│ │ Solution │ │ Tricks │ │ Approaches │ │
│ └──────────────┘ └──────────────┘ └──────────────┘ │
│ │
│ "Make it work" "Make it short" "Make it shorter" │
│ │
└─────────────────────────────────────────────────────────────────┘
Tricks discovered during evolution, applicable to other tasks:
| Trick | Before | After | Saves |
|---|---|---|---|
| Lambda over def | def f(x):\n return E |
f=lambda x:E |
~6 bytes |
| Star unpacking | [0]+r+[0] |
[0,*r,0] |
1 byte |
| Walrus reuse | a=[0]*w;g=[a,...,a] |
g=[a:=[0]*w,...,a] |
1 byte |
| Trailing comma | s+=[(a,b),(c,d)] |
s+=(a,b),(c,d), |
1 byte |
| Trick | Before | After | Saves |
|---|---|---|---|
| Chain bounds | 0<=a and a<H |
H>a>=0 |
5 bytes |
| Zero check | x==0 |
x<1 |
1 byte |
| Nonzero check | x!=0 |
x>0 |
1 byte |
| Trick | Description | Savings |
|---|---|---|
| Padding for flood fill | Add border of 0s, start from corner | ~20 bytes |
| Smart marker values | Choose markers that simplify final lookup | 2-4 bytes |
| Direct row iteration | for r in g vs for i in range(len(g)) |
7+ bytes |
| Tuple indices | (0,1,2) instead of range(3) |
2 bytes |
| max() for best | b=max(b,new) vs if new>b:b=new |
5+ bytes |
| Unified dimension swap | O[(v,i)[z]][(i,v)[z]] for row/col toggling |
10+ bytes |
| Merged loops | Combine histogram + rect finding in one pass | 15+ bytes |
[*map(list,G)] |
Shorter deep copy than [r[:]for r in G] |
2 bytes |
I=range alias |
When range used 5+ times, alias saves bytes | 3+ bytes/use |
x and Y or Z |
Shorter than Y if x else Z for truthy Y |
2 bytes |
| E with default | E(i,L,z,d=0):d or F for edge-case bypass |
2+ bytes |
| Algorithm swap | O(n⁴) brute-force can be shorter than O(n²) | 70+ bytes |
-~x for x+1 |
Bitwise not trick: -~(c-a) = c-a+1 |
1 byte |
[0,] fallback |
Shorter than [(0,)] for empty fallback |
2 bytes |
| Range truthiness | if L works for empty range() checks |
4+ bytes |
| Lists in tuples | [I(f),I(j+1,C)] vs [(I(f),f),...] pairs |
10+ bytes |
| Merged conditionals | I(i-(i>A),i+(i<B)+1) combines 3 checks |
37 bytes |
| Tuple iteration | for A,B,P,M,z in(t1),(t2): vs list concat |
7 bytes |
| Single list comp | [f()for...for v in L] flattens nested loops |
22 bytes |
| Tuple vs range | (i,i-(i>a),i+(i<b)) vs I(i-(i>a),...) |
2 bytes |
*P unpacking |
for a,b,*P,z in... captures middle elements |
2 bytes |
| 1D array | O=sum(G,[]) + O[r*C+j] vs O=[*map(list,G)] + O[r][j] |
3 bytes |
| ~-any trick | ~-any(x for...) vs all(x<1for...) |
1 byte |
| [0] fallback | or[0] vs or[0,] for single-element fallback |
1 byte |
- Python 3.8+
cd showcase/code-golf
# Evaluate single task
python evaluator.py 00d62c1b solutions/00d62c1b.py
# Expected output:
# {
# "task_id": "00d62c1b",
# "fitness": 0.9048,
# "score": 2262,
# "byte_count": 238,
# "correct": true
# }# Use the /evolve-size skill
/evolve shortest Python solution for ARC task <task_id>For each of the 400 tasks:
score = max(1, 2500 - byte_count) if correct else 0.001- Maximum per task: 2500 (0 bytes - impossible)
- Practical maximum: ~2450 (50-byte solution)
- Incorrect solutions: 0.001 (effectively zero)
Each solution must define a solve function:
def solve(grid):
# grid: List[List[int]] - input grid
# return: List[List[int]] - output grid- Python Standard Library only (no numpy, scipy, etc.)
- Self-contained (no imports from other files)
- Must pass all train AND test examples
showcase/code-golf/
├── README.md # This file
├── evaluator.py # Scoring and validation harness
├── tasks/ # 400 ARC-AGI task JSONs
│ ├── 00d62c1b.json
│ ├── 0520fde7.json
│ └── ...
├── solutions/ # Evolved Python solutions
│ ├── 00d62c1b.py # 238 bytes (champion)
│ ├── 0520fde7.py # 57 bytes (champion)
│ ├── a64e4611.py # 541 bytes (champion)
│ └── 017c7c7b.py # 54 bytes (baseline)
└── mutations/ # Evolution logs
├── arc_fill_enclosed_regions.md
├── 0520fde7_evolution.md
└── a64e4611_evolution.md
cd showcase/code-golf
python evaluator.py 00d62c1b solutions/00d62c1b.py
python evaluator.py 0520fde7 solutions/0520fde7.py# Pick an unsolved task
ls tasks/ | head -20
# Evolve it
/evolve shortest Python solution for ARC task <task_id>python evaluator.py <task_id> solutions/<task_id>.py- Padding approach for flood-fill problems - dramatically simplifies boundary logic
- Lambda over def - saves 6+ bytes in most cases
- Direct iteration (
for r in g) over index iteration (for i in range(len(g))) - Lookup tables - usually shorter than arithmetic formulas
- Chain comparisons -
H>a>=0<=b<Wsaves multipleandoperators - Smart marker values - choose values that simplify final mapping
- Recursion - requires
setrecursionlimit, adds overhead - Sets for stacks -
|=syntax longer than tuple extension - Bitwise tricks - often need parentheses, same or longer
- String lookups - return strings, not ints
- Complex formulas - lookup tables usually shorter
| Metric | Current | Projected (Conservative) | Projected (Optimistic) | Winner |
|---|---|---|---|---|
| Tasks solved | 72 | 400 | 400 | 400 |
| Total score | 163,412 | ~908,000 | ~920,000 | 962,070 |
| Avg pts/task | 2,270 | 2,270 | 2,300 | 2,405 |
| % of winner | 94.4% | 94.4% | 95.6% | 100% |
| Est. Place | - | ~100th | ~80th | 1st |
Winner: Code Golf International (962,070 pts) - Final Leaderboard
- Conservative: Current average (2,262 pts/task) × 400 = ~904,800 pts → ~110th place
- Optimistic (tier-weighted): Maintain tier averages = ~918,000 pts → ~80th place
See PROJECTION.md for detailed tier breakdowns.
This showcase demonstrates the /evolve-size capability. The techniques transfer to any code golf challenge.
- NeurIPS 2025 - Google Code Golf Championship
- ARC Prize - The ARC-AGI benchmark
- Competition Details
- François Chollet's announcement
- No external data files required (tasks embedded in
tasks/) - No network requests during evaluation
- Deterministic scoring (byte count is exact)
- Same results every run