Solving Sudoku puzzles with Python
To those of you unfamilar with a Sudoku puzzle, they are a logic puzzle that has recently (2005) swept the UK and US, appearing in most newspapers alongside the crosswords and other diversions. Congratulations to Carol Vorderman of "Countdown" fame for cashing in early with her book.
The puzzle was introduced in the USA as Number Place, and later became popular in Japan as Nanpure or Sudoku -- an abbreviation of suji wa dokushin ni kagiru (only single numbers allowed), which is tradedmarked in Japan by Nikoli. Other (mis)spellings include So Duko, Soduko and Suduko.
Paul Stephens has a very nice introduction site, Sudoku @ Paulspages, which includes a rather detailed description of some basic tactics for How to solve Sudoku. There are more tactics pages available of course, for example by Simon Armstrong and Angus Johnson.
There is also this Sudoku Dictionary, and Dan Rice's Sudoku Blog which includes puzzles to try and discussions of tactics etc.
I'm sure there are other good sites out there, but that's more than enough links!
Motivation
Anyway, after reading Solving Sudoku in the Autumn 2005 issue of Warwick the Magazine (catchy title!) by Psychology lecturer Dr Neil Stewart, I finally got round to trying to solve Sudoku with Python.
My Python Sudoku solver is available to download here.
Neil reported it took him about three hours to write his solver, mine took a similar time but I was distracted by trying to solve a very nasty puzzle I found online - which I now suspect to be impossible (or at least, ambigous).
Test cases
Of course, any good programmer should test their handy work. So, where can we find lots of (hard) Sudoku puzzles to test with? It turns out there is an active group on this web forum and they have compiled a whole bunch of test puzzles (hosted here) including:
- msk_009 - A set of 1011 random (mostly "easy") puzzles
- subig20 - A very very large set of puzzles
- top91 - A set of 91 "hard" puzzles
- top95 - A set of 95 "hard" puzzles, their favoured benchmark set
- top99 - A set of 99 "hard" puzzles
- top100 - A set of 100 "hard" puzzles
- top870 - A set of 870 "hard" puzzles
- top2365 - A set of 2,365 "hard" puzzles
These are simple text files, with one line of 81 characters per puzzle. They use a full stop (period) for an unknown entry.
How to represent Sudoku Puzzles in Python
Most "introductions" to solving Sudoku will generally discuss or use what I call "candidate lists". In each empty grid square (or cell), they write down all nine digits (very small!) and then cross out those which are not allowed due to existing solved squares.
To me, this lead to an obvious way to store the information in python - for each of the 81 cells, I would have a python list of the possible candidates (as integers). If you are using a recent version of Python it might be more efficient to use a set object instead of a list...
In this setup, a cell is "solved" when its list of candidates is just one number.
I decided to store the 81 cells as a list of lists (the closest I could get to a 9x9 array in simple Python).
Solving - Step One - Trivially easy
When any cell is solved, then that number can be removed from the candidate lists for all the other cells in the same row, column or 3x3 square. See method update_neighbours.
This may, of course, cause other cells to become solved - so we would also have to update the candidate lists for all the other cells in that row/col/3x3 square.
Note that if you reduce any cell's list of possibilites to none, then you have made a mistake - or possibly been given an impossible puzzle.
Solving - Step Two - Easy
Each digit must appear once (and only once) in each of the nine rows, nine columns, nine 3x3 sub-sqaures.
So, if a digit only appears on the candiate list of a single cell in any row/col/3x3 square, then the digit must be in that cell (and the other candiates for that cell removed). See method check_for_single_occurances in my code.
Also, if in any row/col/3x3 square, eight of the cells have been solved with a single remaining question mark, then it must be the missing nineth digit. See method check_for_last_in_row_col_3x3.
Solving - Step Three - Medium - Slicing & Dicing
You don't actually need this (a few levels of supposition will do, its just a bit slow). Anyway, this works by considering each of the nine numbers in each of the nine 3x3 blocks. If all the candidates fall into a single row (or column) then that number can be removed from the other two rows (or columns). See method overlapping_3x3_and_row_or_col in the code. Simon Armstrong calls this "Block and Column / Row Interactions".
Solving - Step Four - Hard - Supposition and Contradiction
Just doing the above simple logical steps above turns out to be enough to solve most "easy" puzzles you will find. In fact, most newspapers etc seem to grade their puzzles using computer programs - if this is enough then its an easy puzzle. You might still need a very good memory or a pencil.
There are more advanced techniques you can then turn to. What I choose for my program is akin to proof by contradiction. We pick an unsolved cell, and chose one of the possible candidate values. Then apply the above rules, and see if this leads to an error (contradiction). If so, then the original suposition (or guess) was false, and that value can be removed from the candiate list for the cell. See method one_level_supposition in my code.
I have used this method myself when solving puzzles on paper (without resorting to making a photocopy before each supposition as suggested here) but only when there are relatively few unsolved cells.
This seems to be (similar to) the technique known as 'Nishio' in Japanese. Despite it being completely logical, some people consider it to be "guessing and backtracking" and assert that for true Sudoku puzzles it is not needed, and that other "non-guessing" methods suffice.
Before I added the "Slicing & Dicing" code, I needed needed two levels of supposition for 13 of the top95 test cases. Adding that code also made the program four times faster.
Download
My Python Sudoku solver is available to download:
- Version 1.00 - only does one level supposition (and no slicing & dicing)
- Version 1.03 - does two level supposition and can load 81 character strings (still no slicing & dicing)
- Version 1.04 - Tidying, added timing for the file based tests (still no slicing & dicing)
- Version 1.05 - Added "Slicing & dicing" which makes it faster
Note that my code has no graphical user interface (GUI) - its purely text based. If you are looking for something like that, then there are lots more python Sudoku solvers out there (e.g. this GPL Python Sudoku), and some are prettier than others...
Shed Skin (Python to C++)
Mark Dufour used an early version of my code as a test case ("Sudoku Solver #3") in his "Optimizing Python to C++ Compiler", Shed Skin (a play on words - pythons like all snakes shed their skin to grow). He said its use of lists-of-lists-of-lists-of-integers gave its type inference code a good workout! You can download his tool from the Shed Skin sourceforge page, or read more on the blog.