Skip to content

Latest commit

 

History

History

Folders and files

NameName
Last commit message
Last commit date

parent directory

..
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

README.md

ARC-AGI Code Golf: Evolving the Shortest Programs

Evolved Python solutions for the NeurIPS 2025 Google Code Golf Championship using LLM-driven evolutionary optimization.


Progress Summary

Metric Value
Solved 72 / 400 (18.0%)
Total Score 163,412 points
Avg Score/Task 2,270 points
% of Winner Avg 94.4% (winner: 2,405 pts/task)
Projected Final ~908,000 points (details)

Solved Problems (72)

Task Pattern Bytes Score Solution
4c4377d9 Vertical flip concat 24 2,476 solution.py
3c9b0459 180° rotation 40 2,460 solution.py
44f52bb0 Horizontal symmetry check 46 2,454 solution.py
d631b094 Collect non-zero cells 47 2,453 solution.py
25d8a9c8 Row uniformity check 50 2,450 solution.py
0520fde7 Grid AND comparison 57 2,443 solution.py
22eb0ac0 Matching edge markers fill 57 2,443 solution.py
0d3d703e Color mapping (LUT) 58 2,442 solution.py
1b2d62fb Conditional grid coloring 58 2,442 solution.py
007bbfb7 Outer product grid 65 2,435 solution.py
2281f1f4 Row/column intersection fill 67 2,433 solution.py
28bf18c6 Extract + duplicate shape 67 2,433 solution.py
29c11459 Horizontal line splitting 68 2,432 solution.py
1e0a9b12 Gravity (drop cells) 69 2,431 solution.py
27a28665 Pattern shape classification 70 2,430 solution.py
017c7c7b Extend pattern + double 80 2,420 solution.py
3428a4f5 XOR halves by separator 88 2,412 solution.py
1bfc4729 Dual frame pattern 108 2,392 solution.py
1fad071e Count 2x2 blue blocks 109 2,391 solution.py
22168020 Fill between endpoints 112 2,388 solution.py
05269061 Diagonal color cycle 113 2,387 solution.py
1190e5a7 Count cells by grid lines 124 2,376 solution.py
137eaa0f Symmetry reflection 130 2,370 solution.py
1cf80156 Bounding box extraction 130 2,370 solution.py
2bee17df Cross line fill 132 2,368 solution.py
2204b7a8 Border region coloring 137 2,363 solution.py
08ed6ac7 Column rank labeling 142 2,358 solution.py
363442ee Fill bottom row pattern 144 2,356 solution.py
28e73c20 Spiral maze generation 149 2,351 solution.py
3ac3eb23 Diagonal checkerboard 150 2,350 solution.py
2013d3e2 Symmetry axis extraction 152 2,348 solution.py
4258a5f9 3×3 box around 5s 160 2,340 solution.py
09629e4f Fill grid segments 170 2,330 solution.py
239be575 Small pattern movement 170 2,330 solution.py
10fcaaa3 2x2 tiling + diagonal 8s 174 2,326 solution.py
1f876c06 Diagonal line propagation 174 2,326 solution.py
3aa6fb7a L-shaped 8s + corner 1 178 2,322 solution.py
1f85a75f Extract rare color region 182 2,318 solution.py
2dc579da Extract quadrant with anomaly 189 2,311 solution.py
23581191 Cross lines intersection 198 2,302 solution.py
1e32b0e9 Grid template completion 201 2,299 solution.py
23b5c85d Smallest colored rectangle 201 2,299 solution.py
0ca9ddb6 Color spread from seeds 207 2,293 solution.py
1caeab9d Line intersection marking 207 2,293 solution.py
253bf280 Connect 8s with 3s 207 2,293 solution.py
178fcbfb Extend markers to lines 217 2,283 solution.py
00d62c1b Fill enclosed regions 219 2,281 solution.py
3de23699 Extract marker-bounded region 225 2,275 solution.py
0a938d79 Alternating stripe pattern 237 2,263 solution.py
3bdb4ada Middle row stripe 239 2,261 solution.py
0dfd9992 Color substitution pairs 239 2,261 solution.py
0962bcdd T-junction detection 241 2,259 solution.py
1c786137 Corner rectangle frames 249 2,251 solution.py
1f0c79e5 Diagonal ray extension 261 2,239 solution.py
025d127b Parallelogram to rect 266 2,234 solution.py
1f642eb9 Marker position projection 266 2,234 solution.py
32597951 Extract repeating tile 274 2,226 solution.py
11852cab 4-fold rotational symmetry 280 2,220 solution.py
05f2a901 Move shape to reference 326 2,174 solution.py
06df4c85 Grid line completion 378 2,122 solution.py
1a07d186 Line projection 434 2,066 solution.py
0b148d64 Quadrant extraction 454 2,046 solution.py
2bcee788 Color replacement by marker 465 2,035 solution.py
22233c11 Diagonal corner marking 474 2,026 solution.py
045e512c Pattern replication 486 2,014 solution.py
150deff5 Grid extraction borders 494 2,006 solution.py
228f6490 Shape-to-hole matching 520 1,980 solution.py
a64e4611 Largest rectangle + cross 523 1,977 solution.py
39a8645d Most frequent shape 526 1,974 solution.py
2dd70a9a U-shape connector 673 1,827 solution.py
1b60fb0c Segment extraction 1,026 1,474 solution.py
0e206a2e Rotated template placement 1,135 1,365 solution.py

Unsolved Problems (328)

Analyzed Tasks (15)

Task ID Pattern Est. Difficulty
234bbc79 Bounding box intersection Very Hard
25d487eb Container fill with color Hard
25ff71a9 Pattern isolation Hard
264363fd Grid region coloring Very Hard
272f95fa Grid cell quadrant coloring Very Hard
29623171 Grid cell fill by quadrant Hard
29ec7d0e Fill missing pattern Hard
2c608aff Connect marked cross lines Hard
2dee498d Shape replication Hard
31aa019c Vertical background lines Hard
321b1fc6 Find unique odd pattern Hard
3345333e Shape copy across shape Hard
3618c87e Grid splitting with marker Hard
3631a71a Remove colored block Hard

Remaining Tasks (320)

Click to expand full list of remaining tasks
Task ID Task ID Task ID Task ID
36d67576 36fdfd69 3906de3d 39a8645d
39e1d7f9 3aa6fb7a 3ac3eb23 3af2c5a8
3bd67248 3bdb4ada 3befdf3e 3c9b0459
3de23699 3e980e27 3eda0437 3f7978a0
40853293 4093f84a 41e4d17e 4258a5f9
4290ef0e 42a50994 4347f46a 444801d8
445eab21 447fd412 44d8ac46 44f52bb0
4522001f 4612dd53 46442a0e 469497ad
46f33fce 47c1f68c 484b58aa 48d8fb45
4938f0c2 496994bd 49d1d64f 4be741c5
4c4377d9 4c5c2cf0 50846271 508bd3b6
50cb2852 5117e062 5168d44c 539a4f51
53b68214 543a7ed5 54d82841 54d9e175
5521c0d9 5582e5ca 5614dbcf 56dc2b01
56ff96f3 57aa92db 5ad4f10b 5bd6f4ac
5c0a986e 5c2c9af4 5daaa586 60b61512
6150a2bd 623ea044 62c24649 63613498
6430c8c4 6455b5f5 662c240a 67385a82
673ef223 6773b310 67a3c6ac 67a423a3
67e8384a 681b3aeb 6855a6e4 68b16354
694f12f3 6a1e5592 6aa20dc0 6b9890af
6c434453 6cdd2623 6cf79266 6d0160f0
6d0aefbc 6d58a25d 6d75e8bb 6e02f1e3
6e19193c 6e82a1ae 6ecd11f4 6f8cd79b
6fa7a44f 72322fa7 72ca375d 73251a56
7447852a 7468f01a 746b3537 74dd1130
75b8110e 760b3cac 776ffc46 77fdfe62
780d0b14 7837ac64 794b24be 7b6016b9
7b7f7511 7c008303 7ddcd7ec 7df24a62
7e0986d6 7f4411dc 7fe24cdd 80af3007
810b9b61 82819916 83302e8f 834ec97d
8403a5d5 846bdb03 855e0971 85c4e7cd
868de0fa 8731374e 88a10436 88a62173
890034e9 8a004b2b 8be77c9e 8d5021e8
8d510a79 8e1813be 8e5a5113 8eb1be9a
8efcae92 8f2ea7aa 90c28cc7 90f3ed37
913fb3ed 91413438 91714a58 9172f3a0
928ad970 93b581b8 941d9a10 94f9d214
952a094c 9565186b 95990924 963e52fc
97999447 97a05b5b 98cf29f8 995c5fa3
99b1bc43 99fa7670 9aec4887 9af7a82c
9d9215db 9dfd6313 9ecd008a 9edfc990
9f236235 a1570a43 a2fd1cf0 a3325580
a3df8b1e a416b8f3 a48eeaf7 a5313dff
a5f85a15 a61ba2ce a61f2674 a65b410d
a68b268e a699fb00 a740d043 a78176bb
a79310a0 a85d4709 a87f7484 a8c38be5
a8d7556c a9f96cdd aabf363d aba27056
ac0a08a4 ae3edfdc ae4f1146 aedd82e4
af902bf9 b0c4d837 b190f7f5 b1948b0a
b230c067 b27ca6d3 b2862040 b527c5c6
b548a754 b60334d2 b6afb2da b7249182
b775ac94 b782dc8a b8825c91 b8cdaf2b
b91ae062 b94a9452 b9b7f026 ba26e723
ba97ae07 bb43febb bbc9ae5d bc1d5164
bd4472b8 bda2d7a6 bdad9b1f be94b721
beb8660c c0f76784 c1d99e64 c3e719e8
c3f564a4 c444b776 c59eb873 c8cbb738
c8f0f002 c909285e c9e6f938 c9f8e694
caa06a1f cbded52d cce03e0d cdecee7f
ce22a75a ce4f8723 ce602527 ce9e57f2
cf98881b d037b0a7 d06dbe63 d07ae81c
d0f5fe59 d10ecb37 d13f3404 d22278a0
d23f8c26 d2abd087 d364b489 d406998b
d43fd935 d4469b4b d4a91cb9 d4f3cd78
d511f180 d5d6de2d d631b094 d687bc17
d6ad076f d89b689b d8c310e9 d90796e8
d9f24cd1 d9fac9be dae9d2b5 db3e9e38
db93a21d dbc1a6ce dc0a314f dc1df850
dc433765 ddf7fa4f de1cd16c ded97339
e179c5f4 e21d9049 e26a3af2 e3497940
e40b9e2f e48d4e1a e5062a87 e509e548
e50d258f e6721834 e73095fd e76a88a6
e8593010 e8dc4411 e9614598 e98196ab
e9afcf9a ea32f347 ea786f4a eb281b96
eb5a1d5d ec883f72 ecdecbb3 ed36ccf7
ef135b50 f15e1fac f1cefba8 f25fbde4
f25ffba3 f2829549 f35d900a f5b8619d
f76d97a5 f8a8fe49 f8b3ba0a f8c80d96
f8ff0b80 f9012d9b fafffa47 fcb5c309
fcc82909 feca6190 ff28f65a ff805c23

Solution Standards

All solutions must include documentation. See CONTRIBUTING.md for requirements.

Each solution directory must contain:

  • solution.py - The golfed Python code
  • README.md - Pattern description, algorithm explanation, and golf tricks used

Directory Structure

Each solved task has its own subdirectory:

code-golf/
├── 0520fde7/           # Grid AND comparison (57 bytes)
├── 017c7c7b/           # Extend pattern + double (80 bytes)
├── 00d62c1b/           # Fill enclosed regions (238 bytes)
├── a64e4611/           # Largest rectangle task (523 bytes)
├── evaluator.py        # Scoring and validation
├── tasks/              # All 400 ARC-AGI tasks (legacy)
└── README.md           # This file

Each task directory contains:

  • solution.py - Golfed Python solution
  • task.json - ARC-AGI task definition
  • README.md - Task-specific notes and evolution history

Why This Problem Matters

The Competition

The NeurIPS 2025 - Google Code Golf Championship challenged participants to write the shortest possible Python programs that correctly solve 400 ARC-AGI tasks.

Detail Value
Prize Pool $100,000
Tasks 400 (ARC-AGI public training set)
Scoring max(1, 2500 - bytes) per correct solution
Maximum Score 1,000,000 (400 × 2500)
Deadline October 30, 2025

Why Code Golf is Hard

Code golf is a unique optimization challenge:

  1. Correctness is binary - A solution that fails ANY test case scores 0.001
  2. Every byte matters - Saving 1 byte = +1 point
  3. Semantic equivalence required - Transformations must preserve behavior
  4. Language mastery needed - Exploiting Python quirks and shortcuts
  5. Algorithm selection critical - Sometimes a completely different approach is shorter

Why This Matters for Evolution

Code golf is an ideal testbed for LLM-driven evolution because:

  • Clear fitness function: Byte count (lower = better)
  • Automatic verification: Run tests to check correctness
  • Rich mutation space: Syntax tricks, algorithm changes, refactoring
  • Transferable learnings: Tricks discovered on one task apply to others

The Evolution Approach

Unlike performance optimization (where we measure ops/sec), code golf evolution optimizes for minimum byte count:

fitness = correctness × (2500 - bytes) / 2500

Three-Stage Pipeline

┌─────────────────────────────────────────────────────────────────┐
│  Code Golf Evolution Pipeline                                    │
├─────────────────────────────────────────────────────────────────┤
│                                                                  │
│  ┌──────────────┐    ┌──────────────┐    ┌──────────────┐       │
│  │ Stage 1:     │───▶│ Stage 2:     │───▶│ Stage 3:     │       │
│  │ Find Correct │    │ Apply Known  │    │ Discover New │       │
│  │ Solution     │    │ Tricks       │    │ Approaches   │       │
│  └──────────────┘    └──────────────┘    └──────────────┘       │
│                                                                  │
│  "Make it work"      "Make it short"     "Make it shorter"      │
│                                                                  │
└─────────────────────────────────────────────────────────────────┘

Golf Tricks Library

Tricks discovered during evolution, applicable to other tasks:

Structural Tricks

Trick Before After Saves
Lambda over def def f(x):\n return E f=lambda x:E ~6 bytes
Star unpacking [0]+r+[0] [0,*r,0] 1 byte
Walrus reuse a=[0]*w;g=[a,...,a] g=[a:=[0]*w,...,a] 1 byte
Trailing comma s+=[(a,b),(c,d)] s+=(a,b),(c,d), 1 byte

Comparison Tricks

Trick Before After Saves
Chain bounds 0<=a and a<H H>a>=0 5 bytes
Zero check x==0 x<1 1 byte
Nonzero check x!=0 x>0 1 byte

Algorithm Tricks

Trick Description Savings
Padding for flood fill Add border of 0s, start from corner ~20 bytes
Smart marker values Choose markers that simplify final lookup 2-4 bytes
Direct row iteration for r in g vs for i in range(len(g)) 7+ bytes
Tuple indices (0,1,2) instead of range(3) 2 bytes
max() for best b=max(b,new) vs if new>b:b=new 5+ bytes
Unified dimension swap O[(v,i)[z]][(i,v)[z]] for row/col toggling 10+ bytes
Merged loops Combine histogram + rect finding in one pass 15+ bytes
[*map(list,G)] Shorter deep copy than [r[:]for r in G] 2 bytes
I=range alias When range used 5+ times, alias saves bytes 3+ bytes/use
x and Y or Z Shorter than Y if x else Z for truthy Y 2 bytes
E with default E(i,L,z,d=0):d or F for edge-case bypass 2+ bytes
Algorithm swap O(n⁴) brute-force can be shorter than O(n²) 70+ bytes
-~x for x+1 Bitwise not trick: -~(c-a) = c-a+1 1 byte
[0,] fallback Shorter than [(0,)] for empty fallback 2 bytes
Range truthiness if L works for empty range() checks 4+ bytes
Lists in tuples [I(f),I(j+1,C)] vs [(I(f),f),...] pairs 10+ bytes
Merged conditionals I(i-(i>A),i+(i<B)+1) combines 3 checks 37 bytes
Tuple iteration for A,B,P,M,z in(t1),(t2): vs list concat 7 bytes
Single list comp [f()for...for v in L] flattens nested loops 22 bytes
Tuple vs range (i,i-(i>a),i+(i<b)) vs I(i-(i>a),...) 2 bytes
*P unpacking for a,b,*P,z in... captures middle elements 2 bytes
1D array O=sum(G,[]) + O[r*C+j] vs O=[*map(list,G)] + O[r][j] 3 bytes
~-any trick ~-any(x for...) vs all(x<1for...) 1 byte
[0] fallback or[0] vs or[0,] for single-element fallback 1 byte

Quick Start

Prerequisites

  • Python 3.8+

Evaluate a Solution

cd showcase/code-golf

# Evaluate single task
python evaluator.py 00d62c1b solutions/00d62c1b.py

# Expected output:
# {
#   "task_id": "00d62c1b",
#   "fitness": 0.9048,
#   "score": 2262,
#   "byte_count": 238,
#   "correct": true
# }

Evolve a Solution

# Use the /evolve-size skill
/evolve shortest Python solution for ARC task <task_id>

Technical Details

Scoring Formula

For each of the 400 tasks:

score = max(1, 2500 - byte_count) if correct else 0.001
  • Maximum per task: 2500 (0 bytes - impossible)
  • Practical maximum: ~2450 (50-byte solution)
  • Incorrect solutions: 0.001 (effectively zero)

Solution Format

Each solution must define a solve function:

def solve(grid):
    # grid: List[List[int]] - input grid
    # return: List[List[int]] - output grid

Constraints

  • Python Standard Library only (no numpy, scipy, etc.)
  • Self-contained (no imports from other files)
  • Must pass all train AND test examples

File Structure

showcase/code-golf/
├── README.md                    # This file
├── evaluator.py                 # Scoring and validation harness
├── tasks/                       # 400 ARC-AGI task JSONs
│   ├── 00d62c1b.json
│   ├── 0520fde7.json
│   └── ...
├── solutions/                   # Evolved Python solutions
│   ├── 00d62c1b.py             # 238 bytes (champion)
│   ├── 0520fde7.py             # 57 bytes (champion)
│   ├── a64e4611.py             # 541 bytes (champion)
│   └── 017c7c7b.py             # 54 bytes (baseline)
└── mutations/                   # Evolution logs
    ├── arc_fill_enclosed_regions.md
    ├── 0520fde7_evolution.md
    └── a64e4611_evolution.md

Reproducing Results

Step 1: Verify Existing Solutions

cd showcase/code-golf
python evaluator.py 00d62c1b solutions/00d62c1b.py
python evaluator.py 0520fde7 solutions/0520fde7.py

Step 2: Evolve a New Task

# Pick an unsolved task
ls tasks/ | head -20

# Evolve it
/evolve shortest Python solution for ARC task <task_id>

Step 3: Verify Improvement

python evaluator.py <task_id> solutions/<task_id>.py

What Works

  1. Padding approach for flood-fill problems - dramatically simplifies boundary logic
  2. Lambda over def - saves 6+ bytes in most cases
  3. Direct iteration (for r in g) over index iteration (for i in range(len(g)))
  4. Lookup tables - usually shorter than arithmetic formulas
  5. Chain comparisons - H>a>=0<=b<W saves multiple and operators
  6. Smart marker values - choose values that simplify final mapping

What Doesn't Work

  1. Recursion - requires setrecursionlimit, adds overhead
  2. Sets for stacks - |= syntax longer than tuple extension
  3. Bitwise tricks - often need parentheses, same or longer
  4. String lookups - return strings, not ints
  5. Complex formulas - lookup tables usually shorter

Competition Status

Metric Current Projected (Conservative) Projected (Optimistic) Winner
Tasks solved 72 400 400 400
Total score 163,412 ~908,000 ~920,000 962,070
Avg pts/task 2,270 2,270 2,300 2,405
% of winner 94.4% 94.4% 95.6% 100%
Est. Place - ~100th ~80th 1st

Winner: Code Golf International (962,070 pts) - Final Leaderboard

Projection Methods

  • Conservative: Current average (2,262 pts/task) × 400 = ~904,800 pts → ~110th place
  • Optimistic (tier-weighted): Maintain tier averages = ~918,000 pts → ~80th place

See PROJECTION.md for detailed tier breakdowns.

This showcase demonstrates the /evolve-size capability. The techniques transfer to any code golf challenge.


References


Deterministic Reproduction

  • No external data files required (tasks embedded in tasks/)
  • No network requests during evaluation
  • Deterministic scoring (byte count is exact)
  • Same results every run