A friend pointed me to Zimaths, a magazine for high school mathematics students, published in Zimbabwe in happier times. (The last issue available online is from 2005. The next-to-the-last issue has a helpful tutorial on inflation, which was then running at about 600 percent per year in Zimbabwe. The rate has gone up considerably since then, and lately the country has had even worse troubles.)
Looking back to the first issue of Zimaths, from October of 1996, I happened upon a list of problems from the Zimbabwe Maths Olympiad of that year. Problem 5 looked like something I might be able to do in my head:
The number a is randomly selected from the set {0, 1, 2, 3, …, 98, 99}. The number b is selected from the same set. What is the probabilty that the number 3a + 7b has a units digit equal to 8?
Clearly, all positive integer powers of 3 and 7 are odd numbers not divisible by 5, and so they must end in 1, 3, 7 or 9. Running through the first few of those powers (1, 3, 9, 27, …; 1, 7, 49, 343, …) suggested that all possible terminal digits appear with equal probability. The sum of two such odd numbers must be an even number. It seemed like a good bet that all even terminal digits would be equally likely—that is, each of 0, 2, 4, 6 and 8 would appear with probability 1/5. I checked this hypothesis by listing all possible pairs of the four allowed odd digits (with addition done modulo 10):
1 + 1 = 2 1 + 3 = 4 1 + 7 = 8 1 + 9 = 0 3 + 3 = 6 3 + 7 = 0 3 + 9 = 2 7 + 7 = 4 7 + 9 = 6 9 + 9 = 8
Sure enough, in this list of 10 combinations each even digit appears twice, confirming that the probability of seeing an 8 should be 0.2. If I had been competing in the Olympiad, I would have written down that answer and gone on to the next problem without a further thought.
Later in the day, though, I was looking for something I could use to test a new toy—the latest version of Mathematica. Here’s what I came up with:
f[] := Mod[3^RandomInteger[{0, 99}] + 7^RandomInteger[{0, 99}], 10]
BarChart[Part[#,2]& /@ Sort[Tally[Table[f[], {100000}]]]]
This generates 100,000 values of 3a + 7b mod 10 and tallies them up:
So I’ve discovered a bug in Mathematica, right? I know there’s nothing wrong with my program, and as for a mistake in my thinking—well, that’s just unthinkable.
Update: Thank you, commenters.
Svat’s question — “How do you think that error could have been avoided?” — goes right to the heart of the matter, but I need to start with a prior question: “How do I know which of my answers is the erroneous one?” When a computer program and my own reasoning yield different results, how should I resolve the conflict? Where should I look first for a mistake?
I don’t expect to find a universal and eternal answer to this question. On the contrary, everything depends on one’s personal equation. Some of us are better at analytical thinking and some of us are bit-players. Nevertheless, in this case I think it’s pretty clear that the computer calculation relies on fewer and simpler assumptions. It asks us to trust the random number generator and certain arithmetic operations, but if we grant that the implementation of those functions is correct, the program can hardly go wrong. Its structure follows directly from the problem statement. (Of course there’s also statistical uncertainty, but if we’re patient enough, we can reduce that to any level we please.)
The probability analysis depends crucially — as I discovered to my chagrin — on properly counting all the cases. My mistake was a very elementary one; I suspect it is also a very common one. It was when I saw the empirical results, with probability values clustered near 3/16, that I realized I should be looking at 16 cases rather than 10.
The point I want to make here isn’t about mathematical truths but about the art of problem solving — even when, as in this case, there’s no artistry involved. Barry’s variation on the problem (see the comments) offers me another sterling opportunity to make a fool of myself. So I present below, in first-person present-tense stream-of-consciousness diary style, a record of my wary engagement with that problem.
- First, why the shift from {0..99} to {1..100}? Because including zeroth powers would contaminate each term in the sum with a 1. For example, 2n mod 10 is {2, 4, 6, 8} for n={1..100}, but for n={0..99} there’s a 1 percent chance of turning up a 1. That would make the sum much messier.
- 1n and 5n are easy. After a further moment’s thought: So is 6n. So every sum will include a term equal to (1 + 5 + 6) mod 10, or in other words 2.
- 4n is {4, 6} and 9n is {1, 9}. We can combine them — but carefully, now, let’s not make the same mistake again — to produce a single term of {3, 5, 5, 7}.
- We already know that 3n and 7n both yield {1, 3, 7, 9}. And 2n and 8n generate {2, 4, 6, 8}. Therefore the whole sum is
{2}1 + {3, 5, 5, 7}1 + {1, 3, 7, 9}2 + {2, 4, 6, 8}2,
where the superscripts indicate the number of elements we are to choose (independently at random with replacement) from each of these sets. (Not sets, really. Bags.)
- Parity: There are an odd number of odd terms, so the sum will be odd.
- Is there some cute number-theoretical trick that’s going to make this easy? I don’t see it.
- Stuck. Back to the computer to do it the unclever way:
g[]:=Mod[Sum[i^RandomInteger[{1,100}],{i,1,9}],10]
Tallying 100,000 repetitions of this function produces the following result:
Repeating the computation a few times more suggests that the slight excess of 7s is not statistical noise. So now I think I know the answer, but I don’t know where it came from.
- Back to the probability calculation. The 4 × 4 sum matrix for the original problem was easy (once I knew what to do with it). This one is 1 × 4 × 4 × 4 × 4 × 4. There must be a shortcut; that’s too much to do with pencil and paper.
- [After a break.] Aha: The sum matrix for {2, 4, 6, 8}2 is exactly the same as the matrix for {1, 3, 7, 9}2 (after permuting some columns and rows). Thus the sum can be expressed as:
{2}1 + {3, 5, 5, 7}1 + {1, 3, 7, 9}4
.However, this still looks like a 4 × 256 matrix, which exceeds my patience.
- Back to the computer again, this time to do all those little sums. What I want is something along the lines of:
Outer[PlusMod, {1, 3, 7, 9}, {1, 3, 7, 9}]
,where
Outer
is like a cross product; but instead of multiplying it applies thePlusMod
procedure, which I define to do addition modulo 10. The result returned by the expression above is the answer to the original Zimbabwe Olympiad problem:{{2, 4, 8, 0}, {4, 6, 0, 2}, {8, 0, 4, 6}, {0, 2, 6, 8}}
For Barry’s more-elaborate variation the full program (without shortcuts) is:
Sort[Tally[Flatten[ Outer[PlusMod, {1}, {2, 4, 6, 8}, {1, 3, 7, 9}, {4, 6}, {5}, {6}, {1, 3, 7, 9}, {2, 4, 6, 8}, {1, 9}]]]]
The
Outer[]
expression yields 1,024 values, which are then tallied up. The returned value is:{{1, 204}, {3, 204}, {5, 205}, {7, 206}, {9, 205}}
This appears to be our answer. Barry asked about the frequency of the digit 5: It is 205/1024. The 7 digit is most common by a slight edge, at 206/1024.
- Going back to the simulation program and running 1,024 × 100,000 iterations to get better statistics yields this result (about an hour later):
{{1, 20400852}, {3, 20396224}, {5, 20501004}, {7, 20601096}, {9, 20500824}}
- In the end I still wonder if there’s an easier way. Barry?
There’s a mistake in your thinking regarding the list 1 digit sums mod 10. When you perform the final ones addition, you’re twice as likely to add a 1 and a 3 than you are to add a 1 and a 1. If you write out the full table of all combinations, you’ll see that the result 0 has 4 possible combinations and the other even numbers have 3 possible combinations, leading to a total of 16 combinations. Therefore the probability distribution should be:
0 = 4/16
2,4,6,8 = 3/16
Your graph confirms this.
7 = -3 (mod 10)
How do you think that error could have been avoided?
(Perhaps by being more systematic about making that list, or by writing that list with an “open mind” instead of using it to verify a conjecture, so that the fact that your answer was 2/10 instead of something/16 would have tipped you off?)
You could have realized that your answer 1/5 has to be wrong without ever computing what the correct answer is. Here’s how.
As soon as you determine that the units digit in the powers of 3 and 7 repeats every four steps, then, since 4 divides 100, it’s no longer necessary to choose at random from 0 to 99, it suffices to choose from 0 to 3 (or 1 to 4). This reduces the number of combinations from 10,000 to 16, all equally likely, some number of which — lets call it k — have the desired property. It doesn’t really matter what this desired property is, the probability of its occurrence is k/16. And here’s the rub: There’s no way a fraction with denominator 16 can reduce to one with denominator 5. (The fancy name for this is the Fundamental Theorem of Arithmetic.)
Note, this doesn’t tell you *where* you went wrong in your solution, it merely tells you that you *did* go wrong.
I see I should have read the other comments more carefully before making my own. I basically restated what svat had to say.
If you’re interested, here’s a variation on the original problem that’s a little trickier:
Let a, b, c, etc. be randomly (and independently) chosen from the set {1,2,3,…,100}. What is the probability that
1^a + 2^b + 3^c + 4^d + 5^e + 6^f + 7^g + 8^h + 9^i
has units digit equal to 5? Is there some units digit that’s more likely than any other (and if so, which one)?
Note that the powers here are drawn between 1 and 100, not between 0 and 99. Your Mathematica simulation should have no problem estimating the probability, but it may have a hard time answering the follow-up question.
The “easier way” that I would use is a generating series for each term’s “bag”.
1: x^1
2: (x^2 + x^4 + x^6 + x^8)
3: …
Multiply them all together, and set x^10 = 1 since you only care about the last digit. The coeffiecient of each number is the weighting for that value. Of course, this basically does the same as your “[Tally [ Flatten [ Outer [ PlusMod …”, but you can do it by hand, if you’re comfortable multiplying polynomials with 5 terms :)
204x + 204x^3 + 205x^5 + 206x^7 + 205x^9
Zac’s generating function is a *great* way to organize the calculation, much better than the ad hoc approach I originally took.
A little finagling with the various terms, using the identity x^10 = 1, shows that the final polynomial is
x^3 (1+x^2)^6 (1+x^4)^4
From Pascal’s triangle one gets
(1+x^2)^6 = 7 + 7x^2 + 15x^4 + 20x^6 + 15x^8
and
(1+x^4)^4 = 1 + 4x^2 + 4x^4 + x^6 + 6x^8
Everything else can now be done with fairly simple arithmetic. The coefficient of x^3 in the final result, for example, is
7 + 42 + 15 + 80 + 60 = 204.
However, this seems to only make it more mysterious as to why the final numbers come out so nearly equal!