Taxation without rationalization

I am the child of a bookkeeper, and I’ve inherited the habit of double-checking receipts and balancing accounts. My friends make fun of me when I carefully note down the dime that I put in a parking meter, but lately I’ve been fretting over even smaller sums—charges that come to less than a penny. It’s all about sales tax.

For the benefit of those who live outside the U.S. (or in New Hampshire) I should explain how American sales tax works. The price marked on an item in a store excludes the tax, which is added at the cash register when you pay for your purchases. Where I live, in North Carolina, the tax rate for most merchandise is 7 percent. Thus if I buy a magazine with a cover price of $3.95, I’ll actually pay $3.95 + (0.07 × $3.95), or $4.2265. Fractions of a cent are rounded to the nearest penny, making the final price $4.23. The rounding is what I’ve been worrying about.

If rounding is done fairly, one might expect that the tax would be rounded up and rounded down with roughly equal frequency, and everything would come out even in the end. But for a long time I’ve been noticing that the sales tax on my purchases is almost always rounded up; very seldom, it seems, does my shopping basket produce a total for which the tax amount needs to be rounded down. Is this bias real, or do I just imagine that numbers are conspiring against me? As I was gathering up papers for a different kind of tax—the annual income-tax ritual—I decided to find out.

For any tax rate that’s an integer percentage, all possible rounding situations are covered by considering prices modulo $1. In other words, if the tax on $0.xy rounds in a certain way, then $1.xy will get exactly the same treatment, and likewise $2.xy, $3.xy, and so on. Thus for purposes of tax rounding, we can pretend that all prices lie between $0.00 and $0.99. In the sales-tax tables (PDF) issued by the state of North Carolina, the tax on 50 of these amounts is rounded up; in 49 cases it is rounded down; the one remaining case needs no rounding in either direction. This rounding protocol introduces a very slight upward bias, amounting to $0.00005 when averaged over all possible prices. Apart from this tiny asymmetry, the system looks like it ought to be fair if purchase prices are uniformly distributed among the 100 possibilities.

The problem, of course, is that the distribution of prices is far from uniform. I took a fat envelope full of sales receipts from 2005 and recorded the prices of all items subject to the 7 percent North Carolina tax. I was able to identify 471 items. Here is the distribution of their prices, modulo $1:

bar graph of item prices modulo $1

Red bars are prices for which the tax is rounded up, blue bars represent prices whose tax rounds down; the tax on $0.00 (black bar) is exact. The source of the rounding bias that I’ve suspected is immediately obvious: Almost two-thirds of the items I purchased over the course of the year had a price (modulo $1) of 99 cents, which happens to be an amount for which the tax rounds upward. I’ve always heard that the popularity of 99-cent pricing has a psychological premise: Shaving that penny off makes things seem cheaper and therefore encourages sales. Could there be another motive as well?

Before I start accusing shopkeepers of a nefarious plot to mulct American consumers, there’s a complicating factor to consider. Sales tax is not calculated separately on each item purchased; instead, when you dump your basketful of goods at the checkout counter, the prices are totalled, and the tax is calculated only once, on the sum. (In accounting lingo, the calculation is done “per invoice” not “per item.”) It turns out that my 471 taxable purchases were grouped into 195 invoices. (On a typical trip to the store I bought 2.4 items.) When I reanalyze my receipts on this basis, the 99-cent effect is somewhat softened:

bar chart of number of invoices as a function of invoice total modulo $1

Nevertheless, the invoice totals are still strongly clustered in the region between $0.93 and $0.99, where tax amounts are all rounded upward.

With these data in hand I can now calculate the total “roundage,” which I’m going to define as N(T_exact – T_rounded), where N is the number of transactions at a given invoice amount. With this convention, the roundage is positive when it favors the merchant and negative when it favors me.

bar chart of total roundage

An interesting statistical question is whether this pattern offers any evidence of a deliberate policy of setting prices in order to maximize roundage for the merchant. Obviously the prevalence of prices just under a dollar has this effect, but since prices in that range have other possible justifications, they can’t support any firm conclusion. Are the sharp upward spikes at 79 cents and 50 cents a result of careful strategizing, or are they just random accidents? I have not tried to answer that question, and I don’t think I have enough data to do so. It should be noted that if you exclude the region of the graph between $0.93 and $0.99, the net roundage of the remaining transactions is slightly in my favor.

How much money are we talking about here? For calendar year 2005, based on the 195 transactions I was able to document, the skewed rounding of sales tax cost me 13.43 cents. This deficit is probably not the biggest hole in my pocket. On the other hand, if my case is typical, and if we extrapolate my loss to the entire U.S. population, that comes to $40 or $50 million slipping through the cracks.

Where is the insidious excess of roundage going to? That’s the question. If merchants are required to remit to the state the exact amount of tax actually collected, then the bias in roundage merely raises the effective tax rate a tad. No big deal. As far as I can tell, however, that’s not what happens—at least not here in North Carolina. The form on which sales tax is reported asks for total receipts (exclusive of tax collected) and then multiplies this amount by the appropriate percentage. In this way all of a merchant’s sales over an entire month or quarter are lumped together before any rounding is done, and any bias in the roundage of individual transactions will not be taken into account. The form does include a line for “excess collections,” and so a conscientious merchant could keep track of the roundage and report it there. But the instructions that accompany the form do not mention this possibility, and my supposition is that the excess roundage usually stays with the retailer.

We can fight back, though. Using my sample of 471 item prices, I ran a Monte Carlo simulation to estimate the average roundage as a function of the number of items purchased on each shopping trip:

roundage per year as a function of number of items per shopping trip

The model assumes that I buy exactly k items, selected at random (with replacement) from the set of 471 prices in the 2005 data set. The sales-tax roundage on this shopping basket is calculated, and then is multiplied by 471/k, to give the total expected roundage for the year. The graph shows averages of 10⁶ repetitions of this process for each value of k from 1 through 25. The lesson is clear: If I bought 9 or 10 items every time I went to the store, I would come out slightly ahead in the roundage game.

However, I have a better solution to propose, one that could make the system more resistant to manipulation by either the seller or the buyer. Any integer tax rate brings the curse of commensurability: a pattern of roundage that repeats every dollar, rewarding strategies such as setting all prices to $x.99. Commerce might be more interesting with an irrational tax rate. For example, if the tax percentage were not 7 but rather the square root of 50 (equal to 7.0710678118654755…), then it would be a little harder to rig prices so that the tax consistently rounds upward. Besides, it would help with a problem discussed in my February 20th post: Pundits could no longer claim that algebra has no role in ordinary life. Every time you buy a cup of coffee, you’d have to solve the quadratic equation x² – 50 = 0.

Further notes and questions.

The Monte Carlo model discussed above chooses k items at random. If you could plan a year’s worth of shopping in advance, you could list all N items you intend to buy and search for the best possible partitioning of these goods into k-item batches. How hard a computational problem is that? The brute-force approach (trying all possible partitionings) is exponential in N, but some such problems turn out to be easier than they look. What if you remove the constraint that each batch must have exactly k items?

Playing the game from the point of view of the vendor, what pricing policy will maximize roundage gains, given a buyer who is determined (at all costs!) to minimize them? Under a fixed, integer tax rate, is there any set of prices that guarantees a win for the merchant, no matter how the shopper assembles purchases into market baskets? Setting all prices to $x.00 enforces a trivial tie. Can the seller do better than that?

Instead of an irrational tax rate, we might try giving the dollar a prime number of cents. The obvious choice is 101 (in which case we might call them not cents but centunos). How would this affect roundage calculations?

The state of Ohio has recently changed its sales-tax regulations, allowing merchants to calculate tax either on a per-invoice or a per-item basis. The per-item option foils all the market-basket strategies for manipulating roundage, and it gives the merchant the potential of earning up to an extra $0.005 per item sold. It will be interesting to see whether pricing patterns change in Ohio.

One Response to Taxation without rationalization

Barry Cipra says:

27 February 2006 at 1:27 pm

Assuming you can buy your N items in batches of any size (not strictly k at a time), here’s a recursive algorithm that I think will minimize the total sales tax when the tax rate r divides the unit of currency exactly, with a quotient that is odd. (I’ll explain below why I need those assumptions.) For the dollar there are only two such rates: 4% and 20%. ( I recall 4% as the sales tax in Illinois in the early 1960s; the 20% rate, or something like it, has been proposed as a national sales tax to replace the federal income tax.) I will do the 4% example for simplicity of explanation, although the general principle should be fairly clear.

The first thing is to compute the taxable residue mod 25 (=1/.04) of your N items, and sort them by size:

0 <= r₁ <= r₂ <=…<= r_N <= 24

(This is why I need to assume the tax rate divides the unit of currency exactly; otherwise there are “dislocations” in the residue structure.) Actually you can ignore any residue-0 items; they don’t affect the roundoff no matter what you do with them. Next, if there are any items with residue between 1 and 12 (inclusive), buy each of them SEPARATELY — the sales tax on each of these purchases will be rounded down, and you can’t do better than that. Therefore we may assume at this point that all of the residues are between 13 and 24. (This is why I assume the quotient is odd; otherwise you have to decide whether tax at the halfway residue gets rounded up or down.)

The final (pre-recursion) step is to bundle together the items with the smallest and largest residues, r₁ and r_N (both now greater than or equal to 13), treat them as ONE item with residue r₁‘ =r₁ + r_N mod 25, and re-run the algorithm with the smaller set of residues r₁‘, r₂,…, r_Nâ€“1 (re-ordered to put r₁‘ in its proper place).

For example, suppose you have N=4 items costing 13, 14, 23, and 24 cents. The algorithm will have you bundle the 13 and 24 cent items together, paying 1 cent tax on the 37 (=12 mod25) cent total, and then bundle the 14 and 23 cent items, paying another 1 cent tax, for 2 cents total sales tax. Any other way of combining the items results in either 3 or 4 cents tax (the latter if you buy each item separately).

Come to think of it, this approach probably works even when the quotient is even (e.g., 5% tax). All that really matters is that there is a threshold below which the tax is rounded down and above (or at) which the tax is rounded up; the threshold doesn’t even have to be at or near the halfway point. The key idea is to buy items below the threshold separately and then bundle together the least and greatest residues above the threshold.

I don’t have a rigorous proof that this algorithm minimizes the total tax; in fact I have lingering doubts that it always does. Maybe some other loyal reader can supply a proof — or a counterexample.