How I made my Queens generator solve like a human

I built a Queens-style puzzle maker — the LinkedIn-style game where you place one queen per row, column, and colored region. Most of the project was unremarkable web app work. The interesting part was the generator. Specifically: how to make sure the puzzles it produces are fair.

You'd think uniqueness — exactly one valid solution — was enough. It isn't.

A puzzle has a unique solution if exactly one assignment of pieces satisfies the rules. Every puzzle generator I've seen verifies uniqueness, usually by running a backtracking solver and checking that it finds exactly one solution. But uniqueness ≠ fairness. A puzzle can have a unique solution that no human can reach by deduction — only by guess-and-backtrack. Those puzzles ship feeling broken, even though they're technically valid. The gap between "uniquely solvable" and "humanly solvable" is where unfair puzzles come from.

The strategy in two sentences: model human deduction explicitly, and refuse any puzzle the model can't solve. When the model gets stuck, mutate the puzzle until it isn't.

The puzzle, briefly

Queens (a.k.a. Star Battle in its 1-star variant): place N stars on an NxN grid so there's exactly one in each row, one in each column, one in each colored region, and no two stars are adjacent — including diagonally. The board is partitioned into N regions; the regions and the rules together determine the solution.

Sudoku with regions of arbitrary shape, basically.

Where the board comes from

The generator has two entry points.

Human submission. Someone uses the web tool to paint a zone partition by hand and submit it. The generator's job here is purely validation: run the deduction solver on the submitted partition, accept it if the rules can solve it, reject otherwise. The submitter doesn't get to specify the solution — the solver derives it from the regions.

Procedural generation. Start by placing N stars randomly (one per row, one per column, no two adjacent), then flood-fill outward from each star until every cell is assigned to a region. Each star becomes the seed of a region; the randomized flood-fill order produces the irregular shapes Queens puzzles need.

# 1. seed: N non-adjacent stars, one per row and one per column
seeds = place_random_stars(n)

# 2. flood: grow each seed's region outward until every cell is owned
zone_grid = flood_fill_from(seeds)

That's just construction. Turning some board into a fair one — submitted or generated — is what the rest of this post is about.

Modeling human deduction

The generator's solver isn't a SAT solver, and it isn't a generic backtracking solver. It's a list of deduction rules in priority order, applied repeatedly until the puzzle is solved or no rule fires. The order matters — humans reach for the simplest pattern first, so the solver does too.

while not is_solved(grid, zone_grid):
    if (d := _find_forced_star(...)):           # 1. only one cell left in row/col/zone
        ...
    elif (d := _find_subset_elimination(...)):  # 2. pigeonhole
        ...
    elif (d := _find_cell_crossout(...)):       # 3. hypothesize star → contradiction → crossed
        ...
    elif (d := _find_cell_forced_star(...)):    # 4. hypothesize crossout → contradiction → star
        ...
    elif (d := _find_xwing_deduction(...)):     # 5. X-Wing
        ...
    elif (d := _find_lookahead_deduction(...)): # 6. one-level guess and check
        ...

The interesting choice here is rule 5, X-Wing. X-Wing is a well-known Sudoku grandmaster technique, and seeing it in a puzzle generator is unusual. The reason it has to be there is subtle: if your generator is allowed to use a deduction technique, the puzzles it ships may require that technique. The deduction set the generator can apply is also the difficulty ceiling for every puzzle the generator produces. Drop X-Wing, and every puzzle that requires X-Wing gets rejected as unsolvable, never reaching players. Add a stronger rule, and harder puzzles become reachable.

The X-Wing rule itself is a 2-row × 2-column generalization of pigeonhole. Pick two empty rows {r1, r2} and two empty columns {c1, c2}. Their union — the "plus sign" of those rows and columns — contains at most four stars. If exactly four regions are entirely contained inside that plus sign, those four regions account for all four stars in it, and every empty cell in the plus sign that doesn't belong to one of those regions can be crossed out:

for r1, r2 in itertools.combinations(available_rows, 2):
    for c1, c2 in itertools.combinations(available_cols, 2):
        contained_zones = {
            z for z, cells in zone_empty_cells.items()
            if all(cell.x in {r1, r2} or cell.y in {c1, c2} for cell in cells)
        }
        if len(contained_zones) != 4:
            continue
        # Cross out every empty cell in the plus sign whose region isn't
        # one of the four contained zones, plus the four intersection cells
        # (which can't hold stars under the X-Wing argument).

The argument is the same shape as Sudoku's X-Wing, generalized to N stars and arbitrary regions. Every player who solves a puzzle that hit this rule is implicitly using the same logic — even if they don't have a name for it.

Don't regenerate. Refine.

Now you've got a deduction-based solver. You generate a random board, run the solver on it, and... it gets stuck. There are five cells the rules can't pin down. What now?

The naive answer is to throw the board out and try a new one. Don't do that.

The unsolvability is almost never global. It's a few cells the deduction engine can't separate, usually because two regions abut in a way that creates an ambiguity. The board shape is fine; the region partition is wrong. You don't need a new puzzle. You need a slightly different partition.

So instead of regenerating, the generator refines:

for round_num in range(max_rounds):
    ambiguous = [Position(i, j)
                 for i in range(n) for j in range(n)
                 if result.grid[i][j] == EMPTY_CELL_VALUE]

    candidates = [(b.x, b.y, adj_zone)
                  for b in ambiguous
                  for adj_zone in adjacent_zones_of(b)
                  if adj_zone != current_zone_of(b)]
    random.shuffle(candidates)

    improved = False
    for i, j, target_zone in candidates:
        old_zone = zone_grid[i][j]
        if violates_size_constraints(...):
            continue
        zone_grid[i][j] = target_zone

        if _is_zone_contiguous(zone_grid, old_zone, n):
            new_result = solve(zone_grid)
            if new_result.status == "solved":
                return new_result
            new_ambiguous = count_empty(new_result.grid)
            if new_ambiguous < len(ambiguous):
                result = new_result
                improved = True
                break  # accept improving swap; restart round

        zone_grid[i][j] = old_zone  # revert

    if not improved:
        return None  # stuck at local optimum

This is gradient descent in zone-partition space.

State space: ways to partition the grid into N contiguous regions of allowed sizes.
Cost function: number of cells the deduction solver can't pin down.
Move: reassign one cell from its current region to an adjacent one.
Termination: cost reaches zero (puzzle solved) or no move improves it (local optimum).

When it works — and it usually does — only the boundary cells around an ambiguity get nudged. The rest of the board is preserved. When it doesn't, the generator gives up and the next attempt starts from scratch. In practice the refinement step rescues a sizable fraction of boards that pure constraint propagation can't solve, which is a real win on generation throughput.

The constraints around each swap matter more than they look. A swap that breaks region contiguity is silently reverted. A swap that grows a "small section" past its size cap is skipped. A swap that shrinks a region below the minimum size is skipped. Without these guards, the gradient descent happily wanders into valid-but-degenerate region shapes — long thin snakes, single-cell regions — that are technically solvable but visually unreadable.

A self-aware fallback

Sometimes neither propagation nor refinement gets all the way. The deduction engine has one last tool: a single level of lookahead. Hypothesize a star at some empty cell, run propagation, and if it produces a contradiction, the cell must be crossed. The same dual exists for forced stars.

But — and this is the part I like most about how the codebase ended up — when lookahead fires, it logs:

if not lookahead_logged:
    logging.warning(
        "generate_deductions: rules 1-5 stalled; falling back to "
        "lookahead (zone_grid=%s, partial_grid=%s, ...)",
        zone_grid, grid, ...
    )
    lookahead_logged = True

The log isn't for users. It's a signal that the deduction rule set is incomplete for this puzzle, and that a real human solver might also struggle. Every logged warning is a candidate for a new deduction rule — the X-Wing rule was originally added in response to exactly these logs.

The generator has become the first line of QA on its own deduction engine. Every warning reads as: "I shipped a puzzle that even I had to guess on. Add a rule."

What this gets you

Three things, end to end:

No unfair puzzles ship. The generator refuses to accept a board it can't solve by ranked deduction. Pure backtracking exists in the codebase, but it's gated behind an --allow-blocked flag and is off the default path.
Hints come for free. The same deduction engine that validates the puzzle also produces a sequence of human-readable reasons as it solves — "Last option in row 3", "Would be adjacent to star at row 5, column 2", and so on. The in-app hint system is just deductions[next_unrevealed].
Difficulty rating becomes mechanical. Count rule applications by tier. A puzzle that solves with only forced stars and pigeonhole is easy. One that needs cell-by-cell hypotheticals is medium. One that needs X-Wing is hard. No ML, no opinion, no calibration drift.

The whole thing runs on the puzzle maker at queenspuzzlemaker.com — pick a size, click generate, get something fair.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

How I made my Queens generator solve like a human

The puzzle, briefly

Where the board comes from

Modeling human deduction

Don't regenerate. Refine.

A self-aware fallback

What this gets you

FilesExpand file tree

DESIGN.md

Latest commit

History

DESIGN.md

File metadata and controls

How I made my Queens generator solve like a human

The puzzle, briefly

Where the board comes from

Modeling human deduction

Don't regenerate. Refine.

A self-aware fallback

What this gets you