A few years ago, when the Sudoku craze briefly swept the world, I became interested in one particular mathematical aspect of Sudoku – namely, what is the ** minimum number of clues** needed for a puzzle that can be uniquely completed.

This is actually just one instance of a whole family of combinatorial problems, usually studied under names such as *defining sets* or *critical sets*, that considers when a combinatorial structure (such as a Latin square) can be reconstructed uniquely from a (small) portion of the structure.

Around that time, I found an obscure Japanese website (whose identity I have long forgotten) that had a couple of uniquely-completable Sudoku puzzles with only 17 clues. (As I am only interested in uniquely completable puzzles, I’ll drop the adjectives from now on and just assume that “puzzle” always means “uniquely completable puzzle”.) I took these puzzles and started to “perturb” them in various ways, perhaps by swapping a number for another number, or shifting an entry to another cell, all the time keeping track of any new ones that I found. I was hoping that by finding enough 17-clue puzzles, I would eventually stumble across a 16-clue puzzle contained in one of the 17-clue ones.

In fact, what happened is that the collection of 17-clue Sudoku puzzles gradually grew bigger and bigger, until I had hundreds, then thousands and then tens-of-thousands. With encouragement from various people that I “met” in forums devoted to the mathematics and programming of Sudoku, I put the collection online and encouraged other people who were searching to contribute their 17-clue puzzles to the growing database. Several people contributed and after a couple of years the database contained just under 50000 puzzles.

At that point, I moved departments and my website didn’t survive the move, and as I had lost interest (or at least hope of finding a 16-clue puzzle) the database sort of went “off-air” for a couple of years. However as I continue to receive a steady trickle of queries about 17-clue puzzles, I have resurrected the database, recovered almost all of the data (including all of the actual puzzles) and gone back live again:

Click here for the Sudoku Identification Service.

### Identifying Puzzles

One question that I frequently receive is how the program checks that a puzzle is truly * new*, and not just equivalent (in some sense to be made precise below) to one of the existing puzzles.

To do this, we start with the observation that the *numbers* in the puzzle don’t really matter – if I swapped all the 1s and 2s then the puzzle would be “essentially the same” puzzle. It could be solved in exactly the same way, and the solution would just be the original solution with the 1s and 2s exchanged. In mathematical language, we say that if one puzzle is obtained from another by applying any permutation of the 9 symbols then the two puzzles are *equivalent*. Suppose now that we just physically rotated the puzzle by 90, 180 or 270 degrees – this would *change* the puzzle but again it would be “essentially the same” as before. So we want to say that two puzzles are equivalent if one is obtained by rotating the other. Continuing this line of argument, we end up with a precise definition saying that two puzzles are equivalent if one can be obtained from the other by a sequence of the following operations:

- Permuting the 9 symbols
- Permuting the columns within any single stack (a
is a set of three columns completely containing three of the 3×3 subsquares)*stack* - Permuting the rows within any single band (a
is a set of three rows completely containing three of the 3×3 subsquares)*band* - Permuting the three stacks
- Permuting the three bands
- Transposing the array (that is, reflecting it along the diagonal from top-left to bottom-right)

The total number of operations allowed is quite large: if we count them we get

where each term in the product gives the contribution from the operations described in the list above.

The * equivalence class* of a particular Sudoku puzzle S is defined to be the set of all puzzles equivalent to S; clearly this may be a very large set. The database of 17-clue puzzles should only contain

*of each equivalence class and, most importantly, if someone submits (via the online form) a 17-clue puzzle then we must check whether or not it is*

**one representative***equivalent*to one of the puzzles already contained in the database, and only accept it if it is not.

Of course we don’t want to make nearly 50000 equivalence-checks each time a new puzzle is submitted, so to avoid this we use the concept of a * canonical representative,* which is a specially identified puzzle in each equivalence class. These canonical representatives are the ones stored in the database. When a new puzzle is submitted, a

**c***algorithm is used to calculate the canonical representative of the equivalence class containing the submitted puzzle, and this can be directly compared (for equality) against the contents of the database.*

**anonical labelling**There are lots of different ways to choose a canonical representative from each equivalence class, but there must be an effective canonical labelling algorithm to find the canonical representative of an arbitrary puzzle. The one that I use currently is to view each puzzle as an 81-digit number using the digits 0..9 (empty cells are assigned 0) and then choose the puzzle that yields the smallest possible number (that is, the * minlex* puzzle) to be the canonical representative. The canonical labelling algorithm takes a submitted puzzle and essentially tries all the 3359232 permutations of the cells to find the “best” pattern of non-zeros (the first non-zero should be as far to the right as possible) and then assigns the symbols so that the first occurrence of each non-zero symbol occurs in the order 1, 2, …, 9. This sounds slow but the vast majority of the permutations can be eliminated almost immediately just because they put a non-zero too far to the left in the 81-digit string.

Hi Gordon

You didn’t answer your question: Is there a 16-clue Sudoku puzzle?

Thank you

ChPicard

Well, it is still open, but my hunch is that there is NO 16-clue Sudoku puzzle. A single 16-clue Sudoku puzzle would give us a whole bunch of 17-clue puzzles and given how hard it now seems to eke out a single extra 17-clue puzzle, I would guess that it just doesn’t happen.

If there exists a uniquely solvable 16-clue puzzle, then there exist 81 – 16 = 65 uniquely solvable 17-clue puzzles, all with the same final result as the initial 16-clue puzzle. Inductively, we can deduce that if there exists an n < 17 clue puzzle, then there exist 17-clue puzzles obtained by adding more clues to the initial n-clue puzzle. So every n < 17 clue puzzle can be obtained by eliminating clues from 17-clue puzzles. We can then deduce that if there are n < 16 clue puzzles, they can be obtained by eliminating clues from 16-clue puzzles. So if there exists an n < 16 clue puzzle, there exists 16 clue puzzles.

If your database is complete, then you only need to do 17 tasks for each element in your database – eliminate one of the clues, and try to solve the puzzle uniquely. There are 50000 puzzles in your database, so you'd only need to test 17×50000 =850000 puzzles. At a puzzle a second (a reasonable estimate for a sudoku-solver program), it'd be about 10 days of computer time.

Run this program, and you'll either obtain a 16-clue puzzle, or be absolutely sure that no n < 17 clue puzzles exist within your database of 50000 17-clue puzzles.

You're probably well aware of all of this. I'm just thinking aloud here. It's a pretty interesting topic.

A good Sudoku solver can work a lot faster than a puzzle a second, so there’s no problem in checking the existing 17s to see if they contain a 16.

The problem is that we have no way of knowing that the database is complete – because of the way they were constructed they are (in some sense) “close” to each other, and so they form a sort of large clump of puzzles in “puzzle space”. It is conceivable, though I believe unlikely, that there is a completely separate clump of puzzles that might contain 16s…

Doing the numbers suggests that something clever will be needed to solve this; even projected computer advances won’t be enough to resolve it in my lifetime..

This is very interesting. Would you let us know the exact number of 17s that have been found, and what was the approximate time-span between each of the last few discoveries? Is it a few days, or are new ones only being found about once a month or even less frequently?

Thanks for your article and blog. Really cool!

I think I have 47051 17-clue Sudokus now, but the machine on which they were stored is no longer functional. I am sure that I have them stored in a file somewhere else, but I will have to dig them up.

They were found very fast initially, but then as people lost interest and the search became harder due to fewer new ones left to find, the rate of discovery dropped to daily, then weekly, then monthly.