All work

Chess Engine

Building a C++ chess engine from scratch, then cutting 98% of its search.

Year
2026, Present
Stack
C++17 · g++ · EPD puzzles
Code
01 Why

Wanted to know how engines actually think.

I'd played chess for years and used Stockfish without ever understanding what it was doing under the hood. "Search the game tree" is a vague answer. I wanted the specific one: the data structures, the pruning, the move ordering that turns a 10⁴⁰ search space into something a laptop can handle.

So I started building one. No tutorials, no copying existing engines, just the rules of chess, a textbook on minimax, and a willingness to read other people's commit histories when I got stuck.

The constraint that made it interesting: it had to play reasonable moves. Anyone can write a legal move generator. The hard part is making it choose well with limited compute. That's the whole field.

02 What I built

A full engine, twenty commits of incremental wins.

The engine handles standard chess: legal move generation including check filtering, alpha-beta minimax search with iterative deepening, quiescence search to dodge the horizon effect, and a Zobrist-hashed transposition table for caching evaluated positions. It plays at a reasonable amateur level.

FIG. 1: Engine output for the starting position. Standard algebraic notation; uppercase white, lowercase black, dots for empty squares. Evaluation in pawns from White's perspective.
~/chess-engine, search --depth 6 --visualize
Depth
Nodes
0
Speed
Eval
White advantage
0.00
Best move e2e4 a Ruy López setup
DepthNodesTimeScorePrincipal variation
820k nps TT hits 40.7% β-cutoff 1st move 91.2%
FIG. 1: Iterative deepening on the starting position, depth 1 → 6. The same run as the terminal output, replayable. Standard algebraic notation; evaluation in pawns from White's perspective.

The most important architectural decision was abandoning the array board (char board[8][8]) for bitboards: twelve 64-bit integers, one per piece type per color. A bishop on e4 isn't "the bishop at row 3 column 4". It's bit 28 of the white_bishops bitboard. Attack detection becomes a single AND. Move generation becomes a series of bitwise shifts.

// Knight attacks from any square, computed once at startup uint64_t knight_attacks(int square) { uint64_t b = 1ULL << square; return ((b & NOT_H_FILE) << 17) | ((b & NOT_A_FILE) << 15) | ((b & NOT_GH_FILE) << 10) | ((b & NOT_AB_FILE) << 6) | ((b & NOT_A_FILE) >> 17) | ((b & NOT_H_FILE) >> 15) | ((b & NOT_AB_FILE) >> 10) | ((b & NOT_GH_FILE) >> 6); }

Eight bit-shifts and eight ANDs, computed once and stored in a lookup table. Compare to the array approach: nested loops, bounds checks on every step. The bitboard isn't just faster. It's a different way of seeing the board.

03 The interesting choice

How to search 98% fewer nodes.

Plain minimax to depth 6 on the opening position was visiting 3.28M nodes and taking several seconds a move. That's unusable. The next ten commits were dedicated to bringing that number down without sacrificing playing strength.

Alpha-beta pruning was the first and biggest cut, dropping the node count by over 90% on its own. Three more changes took it the rest of the way:

Move ordering. Alpha-beta only prunes well if you look at the best move first. I added MVV-LVA (Most Valuable Victim, Least Valuable Aggressor) for captures, the transposition-table move at the top, and killer-move plus history heuristics for quiet moves. Better ordering means more cutoffs, means fewer nodes.

Transposition table. Chess has transpositions: different move sequences leading to the same position. Caching evaluated positions in a Zobrist-hashed table (1M entries) means we never re-search a position we've seen, even if we got there a different way.

struct TTEntry // 32 bytes per entry
uint64_t key Zobrist hash of the position. Checked on probe() to detect collisions. 8 B
int depth Search depth when this result was stored. Only reused if entry.depth ≥ current depth. 4 B
int score Eval score for this position at this depth. 4 B
int flag TT_EXACT (0), TT_ALPHA (1, upper bound), or TT_BETA (2, caused a cutoff). 4 B
Move best_move Best move found at this node, used to order moves at the top of the next search. 12 B
TTEntry entries[1 << 20] — 1,048,576 entries × 32 B = 32 MB
// probe: idx = key & (TT_SIZE - 1) — bit-mask instead of modulo (power-of-2 size)
// store: always overwrites — no eviction policy. Shallow results replace deep ones (known issue).
The transposition-table entry. Fixed 32-byte layout; one million of them held in a flat array indexed by a bit-mask of the Zobrist key.

Iterative deepening. Instead of jumping straight to depth 6, I search depth 1, 2, 3... each shallower search fills the transposition table with information the next depth uses to order moves better. Counterintuitively, searching more total positions ends up faster because each pass is so much more focused.

The honest reframe: I didn't make the algorithm faster. I made it look at fewer things.
BASELINE OPTIMIZATIONS, CUMULATIVE 3.28M 2M 1M 0.5M 0 3.28M 248K ↓ 92% 189K ↓ 24% 58K ↓ 69% 45K ↓ 23% search_ minimax search_ alphabeta search_ tt search_ ordered search_ iterative
98.6% fewer nodes same best move (e2e4)
FIG. 2: Node count across optimization passes, same depth-6 search on the same starting position.
04 Numbers

Benchmarked on 100 Lichess puzzles.

Two things to measure: how much the search shrank, and whether the moves actually got better. The node counts and timings below are from a fixed reference position the opening, the same one in Fig. 1: so the optimization passes are compared like-for-like. Strength is measured separately, on a harness that loads 100 EPD-formatted Lichess mate-in-3 puzzles and checks the engine's move against the published best move.

Search nodes (depth 6, opening) 3.28M → 45,231 (98.6% fewer)
Throughput ~820K nodes/sec
Search time (depth 6, opening) 55 ms
Puzzle accuracy (100 mate-in-3) 38% → 42%
Bitboard board size ~3KB → ~96 bytes

The accuracy improvement is the one I'm least proud of. Going from 38% to 42% is real but small. At depth 6 the engine is still being out-thought by Lichess's puzzle setters. The next wins are in evaluation (king safety, pawn structure, passed pawns) rather than search. That's the next phase.

05 What I learned

Most of the wins were data structures, not algorithms.

Going in, I assumed performance work meant inventing clever algorithms. Almost none of the wins came from that. They came from:

Choosing the right representation (bitboards over arrays).
Caching results instead of recomputing them (transposition tables).
Doing the same work in a smarter order (move ordering).
Trusting profiling over intuition (perft test fixing a subtle bitboard mask bug).

The other thing this taught me: real benchmarks lie less than you want them to. I had a working theory after every commit about why the engine should be faster. Half the time the harness told me the win was smaller than I expected. The other half it told me I'd accidentally made it worse. Either way the harness was right and I was wrong. Now I write benchmark harnesses early.

Things still on the TODO list: underpromotion, pawn structure evaluation, depth-preferred TT replacement. The README has the full bug list. The fun of a project like this is that it never really ends.

Reference

Almost everything I learned about search, bitboards, and transposition tables came from the Chess Programming Wiki (chessprogramming.org) the single best resource for anyone building an engine from scratch.

Next project
CinemaRanked