How many Sudokus?

The answer to the question in the headline is: Too many. After I wrote about Sudoku a couple a years ago, I thought I had cured my addiction; but I’ve been a shameless backslider.

I return to the subject now, in this public confessional mode, to correct an error in my 2006 column. Citing a paper by Bertram Felgenhauer and Fraser Jarvis (pdf), I stated that there are 6,670,903,752,021,072,936,960 Sudoku solutions. This is simply the number of ways of filling in the 81 squares without violating the no-two-alike constraints in rows, columns and blocks. I then went on to discuss the symmetries of the Sudoku grid, and noted:

When all these symmetries are taken into account, the number of essentially different Sudoku patterns is reduced substantially…. although [the reduction] still leaves an impressive number of genuinely different solutions: 3,546,146,300,288, or 4×1012.

The latter number also comes from Felgenhauer and Jarvis, but I misinterpreted it. Contrary to my statement, it takes into account only a few of the simplest symmetries. The true number of essentially distinct Sudoku solutions is reported in a slightly later publication by Ed Russell and Jarvis. The correct tally is smaller by three orders of magnitude: 5,472,730,538.

Thanks to Richard E. Dickerson for pointing out my error.

It needs to be emphasized that both of these calculations count the number of solutions, not the number of puzzles. For any given solution, there are many possible puzzles, each created by erasing a different subset of the entries in the solution. How many puzzles per solution? Does anyone know? An exact count seems beyond reckoning, but random sampling ought to yield a rough estimate. For example, consider just those puzzles that have 17 initial clues (the apparent minimum). For any given solution grid, there are (81 choose 17) ways of forming a 17-clue puzzle; this works out to 128,447,994,798,305,325. Most of those puzzles will not have a unique solution, and so they are disqualified as proper Sudokus. The random sampling is needed to estimate how many of the potential puzzles are admissible.

I’d try doing the calculation myself, but I’m too busy solving Sudokus.

Update 2008-04-05: Richard E. Dickerson discovered the error mentioned above in the course of preparing his own discourse on Sudoku-solving strategies. That work now has its own web site, The Sudokerson Matrix, where you can download the 61-page PDF file and an accompanying spreadsheet. Dickerson also mentions some other resources on Sudoku strategy, including a recent book by Andrew C. Stuart that I haven’t yet seen. Dickerson comments:

Few books on Sudoku go into strategies in any depth. Stuart’s The Logic of Sudoku (Michael Mepham, 2007) is a wonderful exception. But his book and mine represent the two opposite poles of Sudoku-solving strategy. I regard many of his methods as superfluous, and I am sure that he regards most of my strategies as heretical. You need to have a close look at both.

What I find intriguing in all this is the very idea that we could have contending doctrines of Sudoku strategy. But before the talk of “heresy” gets out of hand and leads to Sudoku holy wars, I’d like to remind everyone that there’s another view of Sudoku in which all strategies become superfluous. Give me a computer and a program for solving constraint-satisfaction problems, and every 9×9 Sudoku yields in milliseconds.

This entry was posted in games, mathematics.

8 Responses to How many Sudokus?

  1. Barry Cipra says:

    Sudoku theorists are rightly fixated on finding the minimum number of clues necessary for a puzzle to have a unique solution, but I’d like to raise a question in the opposite direction: What is the *largest* possible “irreducible” set of clues? By “irreducible” I mean that the removal of any clue produces a puzzle with more than one solution. In general, each of the (roughly) 5.5 billion Sudoku solutions comes equipped with a collection of irreducible sets of clues. These irreducible sets of clues have various sizes. Among all them, there will be one (or more) with the least possible number of clues and one (or more) with the most. The former number is thought to be 17; I’m wondering about the latter.

  2. brian says:

    @ Barry: Seems like a hard problem, but I can solve the very easiest trivial bit: The maximum maximum is 78. The reason is that no grid with only one, two or three blanks can be ambiguous: It has either one solution or none. Thus, starting with a fully filled-in valid solution, you can always remove three entries to leave a 78-clue puzzle. But four blanks *can* be ambiguous. Suppose they form a square block spanning two rows, two columns, and two blocks and can be filled in:

    12
    21

    Then they will also accept the entries:

    21
    12

    So we have to supply either a 1 or a 2 somewhere in the block as a clue in order to suppress the ambiguity. Of course it might still be possible to remove some other number elsewhere, so this is only an upper bound on the maximum. Not even a very useful one — but I worked this out while negotiating stop-and-go traffic on the Deegan Expressway through the Bronx, an it’s the best I have to offer.

  3. Claire says:

    Problem B in this year’s Mathematical Contest in Modeling was to devise an algorithm for generating Sudoku puzzles of various difficulty levels. The submissions are usually evaluated in March, so maybe we’ll see some interesting new approaches (and maybe some of the teams will turn their work into actual software!).

  4. Barry Cipra says:

    @Brian: The obvious, trivial upper bound on the number of clues in an irreducible puzzles is 72, not 78. No row (or column or 3×3 subgrid) needs more than 8 clues.

    @Claire: This is great. I look forward to seeing what the modeling contestants come up with.

  5. George Bell says:

    It seems to me Sudoku puzzle creators are always careful that their puzzles have unique solutions, but it is not clear to me if they make sure that removing any clue results in multiple solutions. Is this really the case? In any case, the question Barry raises is interesting. Can one get a lower bound just by looking at a bunch of Sudoku puzzles and taking the largest number of clues? I doubt it for the reason given in my first sentence.

    A much simpler hexagonal-grid variant goes by the name of Septoku. I have analyzed this puzzle, and there are only 6 possible solutions, and the minimum number of clues is 6. I’m not sure what the total number of possible puzzles or maximum number of clues, but these numbers should be very easy to compute in this case. You can find out more about this puzzle by googling “Septoku”.

  6. Mary L. Brown-Wallace says:

    even if solutions to sudoku puzzles are found and mastered today, tomorrow the stragety of the sudoku puzzle creators will change just to keep us on our toes. Thanks for your solutions.

  7. Kevin Cormier says:

    @Barry – If two of the missing 4 digits exist in the same block, and the other two are in rows/columns so that they form a square, then the puzzle has multiple solutions. Brian’s answer of 78 is in fact correct as far as I can tell.

  8. Chad Musick says:

    I think the point Barry is making is that any solution which has at least 73 clues is guaranteed by the pigeonhole principle to have a full group of 9 filled in. Removing one of these 9 leaves no new ambiguity; if it wasn’t ambiguous before, it isn’t after. The bound would be when the removal of any digit, not just a particular one, makes the puzzle ambiguous.