I'd played chess for years and used Stockfish without ever understanding what it was doing under the hood. "Search the game tree" is a vague answer. I wanted the specific one: the data structures, the pruning, the move ordering that turns a 10⁴⁰ search space into something a laptop can handle.
So I started building one. No tutorials, no copying existing engines, just the rules of chess, a textbook on minimax, and a willingness to read other people's commit histories when I got stuck.
The constraint that made it interesting: it had to play reasonable moves. Anyone can write a legal move generator. The hard part is making it choose well with limited compute. That's the whole field.
The engine handles standard chess: legal move generation including check filtering, alpha-beta minimax search with iterative deepening, quiescence search to dodge the horizon effect, and a Zobrist-hashed transposition table for caching evaluated positions. It plays at a reasonable amateur level.
$ ./engine --depth 6 --pv // starting position, white to move a b c d e f g h 8 r n b q k b n r 7 p p p p p p p p 6 . . . . . . . . 5 . . . . . . . . 4 . . . . . . . . 3 . . . . . . . . 2 P P P P P P P P 1 R N B Q K B N R // iterative deepening depth nodes time score pv 1 20 <1 ms +0.30 e2e4 2 102 <1 ms +0.28 e2e4 e7e5 3 487 2 ms +0.31 e2e4 e7e5 g1f3 4 2,114 8 ms +0.32 e2e4 e7e5 g1f3 b8c6 5 8,769 19 ms +0.34 e2e4 e7e5 g1f3 b8c6 f1b5 6 33,739 26 ms +0.34 e2e4 e7e5 g1f3 b8c6 f1b5 a7a6 // summary total nodes 45,231 total time 55 ms (820k nps) tt hits 18,408 (40.7%) beta cutoffs 9,612 first move 91.2% of the time eval +0.34 (slight white edge) best e2e4 (Ruy López setup)
The most important architectural decision was abandoning the array board (char board[8][8]) for
bitboards:
twelve 64-bit integers, one per piece type per color. A bishop on e4 isn't "the bishop at row 3 column 4".
It's bit 28 of the white_bishops bitboard. Attack detection becomes a single AND. Move generation
becomes a series of bitwise shifts.
Eight bit-shifts and eight ANDs, computed once and stored in a lookup table. Compare to the array approach: nested loops, bounds checks on every step. The bitboard isn't just faster. It's a different way of seeing the board.
Plain minimax to depth 6 on the opening position was visiting 3.28M nodes and taking several seconds a move. That's unusable. The next ten commits were dedicated to bringing that number down without sacrificing playing strength.
Alpha-beta pruning was the first and biggest cut, dropping the node count by over 90% on its own. Three more changes took it the rest of the way:
Move ordering. Alpha-beta only prunes well if you look at the best move first. I added MVV-LVA (Most Valuable Victim, Least Valuable Aggressor) for captures, the transposition-table move at the top, and killer-move plus history heuristics for quiet moves. Better ordering means more cutoffs, means fewer nodes.
Transposition table. Chess has transpositions: different move sequences leading to the same position. Caching evaluated positions in a Zobrist-hashed table (1M entries) means we never re-search a position we've seen, even if we got there a different way.
| uint64_t key | Zobrist hash of the position. Checked on probe() to detect collisions. |
8 B |
|---|---|---|
| int depth | Search depth when this result was stored. Only reused if entry.depth ≥ current depth.
|
4 B |
| int score | Eval score for this position at this depth. | 4 B |
| int flag | TT_EXACT (0), TT_ALPHA (1, upper bound), or TT_BETA (2, caused a cutoff). | 4 B |
| Move best_move | Best move found at this node, used to order moves at the top of the next search. | 12 B |
TTEntry entries[1 << 20] — 1,048,576 entries × 32 B = 32 MBIterative deepening. Instead of jumping straight to depth 6, I search depth 1, 2, 3... each shallower search fills the transposition table with information the next depth uses to order moves better. Counterintuitively, searching more total positions ends up faster because each pass is so much more focused.
Two things to measure: how much the search shrank, and whether the moves actually got better. The node counts and timings below are from a fixed reference position the opening, the same one in Fig. 1: so the optimization passes are compared like-for-like. Strength is measured separately, on a harness that loads 100 EPD-formatted Lichess mate-in-3 puzzles and checks the engine's move against the published best move.
The accuracy improvement is the one I'm least proud of. Going from 38% to 42% is real but small. At depth 6 the engine is still being out-thought by Lichess's puzzle setters. The next wins are in evaluation (king safety, pawn structure, passed pawns) rather than search. That's the next phase.
Going in, I assumed performance work meant inventing clever algorithms. Almost none of the wins came from that. They came from:
Choosing the right representation (bitboards over arrays).
Caching results instead of recomputing them (transposition tables).
Doing the same work in a smarter order (move ordering).
Trusting profiling over intuition (perft test fixing a subtle bitboard mask bug).
The other thing this taught me: real benchmarks lie less than you want them to. I had a working theory after every commit about why the engine should be faster. Half the time the harness told me the win was smaller than I expected. The other half it told me I'd accidentally made it worse. Either way the harness was right and I was wrong. Now I write benchmark harnesses early.
Things still on the TODO list: underpromotion, pawn structure evaluation, depth-preferred TT replacement. The README has the full bug list. The fun of a project like this is that it never really ends.
Almost everything I learned about search, bitboards, and transposition tables came from the Chess Programming Wiki (chessprogramming.org) the single best resource for anyone building an engine from scratch.