The Bug That Ate Thursday

As I was finishing up the previous post here at bit-player.org, I noticed something off-kilter about the appearance of a few mathematical expressions. Here’s an enlarged example:

Notice the spacing around the minus sign. It’s too close to the argument on its left, whereas the plus sign lies right in the middle. The proper rendering of this expression looks like this:

Closely comparing the two images, I realized that spacing isn’t the only issue. In the malformed version the minus sign is also a little too long, too low, and too skinny.

For typesetting mathematics, I rely on MathJax, an amazing JavaScript program created by Davide Cervone of Union College. It works like magic: I write in standard TeX (math mode only), and the typeset output appears beautifully formatted in your web browser, with no need to bother about installing fonts or downloading plugins. For the past few years MathJax has been totally reliable, so this spacing glitch came as an annoying surprise.

The notes that follow record both what I did and what I thought as I tried to track down the cause of this problem. If anyone else ever bumps into the bug, the existence of this document might save them some angst and agita. Besides, everybody likes a detective story—even if the detective turns out to be more bumbling than brilliant. (If you just want to know how it comes out, skip to the end.)

Hypothesis: My first thought on seeing the wayward minus sign was that I must have typed something wrong. The TeX source code for the expression shown above is so simple (just a + b - c) that there’s not much room for error, but accidents happen. Maybe one of those space characters is not an ordinary word space (ASCII 0x20) but a non-breaking space (HTML &nbsp;). Or maybe the hyphen that represents a minus sign is not really a hyphen (ASCII 0x2D) but an en-dash (HTML &ndash; or &#8211;) or a discretionary hyphen (HTML &shy; or &#0173;). Experiment 1: Try typing the expression again, very carefully. Result: No change. Experiment 2: Copy the original source text into an editor that shows raw hexadecimal byte values. Result: Nothing exotic. Experiment 3: Copy the source text into a different TeX system (Pierre-Yves Chatelier’s LaTeXiT). Result: Typesets correctly. Conclusion: Probably not a typo.

Question: Could it be a browser bug? Tests: Try it in Chrome, Firefox, Safari, Opera. Results: Same appearance in all of them. Conclusion: It’s not the browser.

Internet interlude: The most important debugging tools today are Google and Stack Overflow. Most likely the answer is already out there. But searches for “minus sign spacing MathJax” and “minus sign spacing TeX” turn up nothing useful. The most promising leads take me to discussions of the binary subtraction operator $$a - b$$ vs. the unary negation operator $$-b$$. That’s not the issue here, so I am thrown back on my own resources.

Question: Is it just my machine? Test: Try opening the same page on another laptop. Result: Same appearance. However, these two computers are very similar. In particular, they have the same fonts installed. Test: Try a third machine, with different fonts. Result: No change.

Question: Is the problem confined to the one article I’m currently writing, or does it show up in earlier blog posts as well? Research: Page back through the bit-player archives. I find several more instances of the bug. Followup question: Was the minus-sign spacing in those earlier articles already botched when I wrote and published them? Or were they correct then, and the bug was introduced by some later change in the software environment?

Clue: In the course of rummaging through old blog posts, I discover that the spacing anomaly appears only in “inline” math expressions (those that appear within the flow of a paragraph), not in “display” equations (which are set off on a line of their own). The two rendering modes are invoked by surrounding an expression with different sets of delimiters: $$...$$ for inline and $...$ for display. By merely toggling between round and square brackets, I find I can turn the bug on and off. This discovery leads me to suppose there really might be something awry within MathJax. If it formats an expression correctly in one mode, why does it fail on the same input text in another mode?

Investigation: Using browser developer tools, I examine the HTML markup that MathJax writes into the document. In display mode (where the spacing is correct), here’s the coding for the minus sign:

<span class="mo" id="MathJax-Span-15" style="font-family: STIXGeneral-Regular; padding-left: 0.228em;">-</span>

The phrase I have highlighted in red is the crucial bit of styling that sets the spacing on the left side of the minus operator. Here’s the corresponding markup for the minus sign in the inline version of the same expression:

<span class="mo" id="MathJax-Span-22" style="font-family: STIXGeneral-Regular;">–</span>

The padding-left statement is absent. This is the proximate cause of the incorrect spacing. But why does MathJax supply the appropriate spacing in display mode but omit it in inline mode? That’s the puzzle.

Inquiry: I turn to the MathJax source-code repository on GitHub, and browse the issues database. Nothing relevant turns up. Likewise the MathJax user group forum. Baffling. If the problem really is a MathJax bug, someone would surely have reported it, unless it’s quite new. I consider opening a new issue, but decide to wait until I know more.

Question: The bug seems to be everywhere on bit-player.org, but what about the rest of the web? On MathOverflow (which I know uses MathJax) it doesn’t take long to find an inline equation that includes a minus sign. It is formatted perfectly. David Mumford’s blog is another MathJax site; I poke around there and find another inline equation with a correctly spaced minus sign. Uh oh. The finger of blame is pointing back toward me and away from MathJax.

Question: Am I using the same version of MathJax as those other sites, and the same configuration file? Not exactly, but when I try several other versions (including older ones, in case this is a recently introduced bug), there’s no change.

Pause for reflection: MathJax seems to be behaving differently on bit-player than it does on other sites. What could account for that difference? There are dozens of possible factors, but I have a leading candidate: bit-player is built on the WordPress blogging platform, and the other sites I’m looking at are not. I have no idea how the interaction of WordPress and MathJax could lead to this particular outcome, but they are both complicated software systems, with lots going on behind the curtains.

Experiment: I can test the WordPress hypothesis by setting up a web page that has everything in common with the bit-player site—the same server hardware and software, and the same MathJax processor—but that lives outside the WordPress system. I do exactly that, and find that minus signs are correctly formatted in both display and inline equations. Conclusion: It sure looks like WordPress is messing with my TeX!

Revelation: Throughout this diagnostic adventure, I’ve been relying heavily on the developer tools in the Chrome and Firefox browsers. These tools provide a peek into a page’s HTML encoding as it is displayed by the browser, after MathJax and any other JavaScript programs have worked their transformations on the source text. Now, for sheer lack of any better ideas, I decide to try the View Source command, which shows the HTML as received from the server, before any JavaScript programs run, and in particular before MathJax has converted TeX source code into typeset mathematical output. Instantly, the root of the problem is staring me in the face. The display-mode TeX is exactly as I wrote it: $a + b - c$. But the inline-mode markup is this: $$a + b &#8211; c$$. The HTML entity &#8211; specifies an en-dash. Where did that come from? Actually, I’m pretty sure I know where; what I don’t know is why. WordPress has built-in functions to “prettify” text, converting typewriter quote marks ('', "") to typographer’s quotes (‘ ’, “ ”). More to the point, the program also replaces a double hyphen (--) with an en-dash (–) and a triple hyphen (---) with an em-dash (—). Although I haven’t been typing double hyphens in the math expressions, I still suspect that the WordPress character substitution process has something to do with those troublesome en-dashes.

Confirmation: Before investing more effort in this hypothesis, I try to make sure I’m on the right track. Typing my test expression with an en-dash instead of a hyphen produces output identical to the buggy version, in display mode as well as inline mode. Performing the same experiment in LaTeXiT yields a very similar result.

The culprit exposed: Searching for #8211 in the WordPress source code takes me to the file formatting.php, where I find a function called wptexturize. PHP is not my favorite programming language, but it’s easy enough to guess what these lines are about (I have simplified and abbreviated the statements for clarity):

$static_characters = array( '---', ' -- ', '--', ' - ')$static_replacements = array( $em_dash, ' ' .$em_dash . ' ',
$en_dash, ' ' .$en_dash . ' ')


Note the fourth element of the $static_characters array: a hyphen surrounded by spaces. The corresponding element of $static_replacements is an en-dash surrounded by spaces. I call that a smoking gun. MathJax, like other TeX processors, expects an ASCII hyphen as a minus sign; if you feed it an en-dash, it’s not going to recognize it as a mathematical operator. (When Knuth was developing TeX, circa 1980, no standard character encoding existed beyond the 96 codes of plain ASCII.)

The fix: It could be as simple as writing a+b-c instead of a + b - c! When I make that minor change to the text, it works like a charm. Why didn’t I think of trying that sooner? I guess because TeX in math mode promises to ignore whitespace in the source code, and it never occurred to me that WordPress doesn’t have to honor that promise. Thus I can solve the immediate problem just by removing spaces around minus signs. As a permanent remedy, however, changing my writing habits is not appealing. Nor is sifting through all my earlier posts to remove those spaces. The fact is, I don’t want hyphens to magically become en-dashes while I’m not looking. It may be a feature for some people, but for me it’s a bug.

What I did. The first commandment of WordPress development is “Thou shall not modify the core files.” But in that respect I’m already a sinner, and unrepentant. Yeah, I edited those two arrays in the formatting.php file, and it felt good.

Lessons learned. In hindsight, I see that I missed several opportunities to root out the problem more quickly. Next time I’ll remember View Source. And if I had done a better job of early-stage analysis, I would have been able to find help more efficiently. I am not the only one to confront this glitch, but I needed better search terms to follow the breadcrumbs of those who went before. Also, along the way I misinterpreted some important clues. When I discovered that the bug affects only inline mode and not display mode, I was quite sure that fact implicated MathJax, but I was wrong. (As it happens, I still don’t really understand why display mode is immune to the bug. Why is the hyphen converted to an en-dash when I enclose it in slashed round brackets, but not when it appears in slashed square brackets? Evidently the wptexturizing treatment is skipped in the latter case, but I lack the stamina to slog through all that PHP to figure out why.)

The big picture: I’m not mad at WordPress. I still believe it is a wonder of the age, making millions of people into instant, pushbutton publishers. According to some reports, it powers a quarter of all web sites. In this respect it may well be the most important application-layer software for fulfilling the original promise of the World Wide Web: allowing all of us to be contributors and creators rather than merely consumers of mass media. But there’s a cost: Keeping WordPress easy on the outside seems to require a dense thicket of thorns and briers on the inside. As the years go by I find I spend too much time fighting against its automation, which is a joyless task. I would prefer something simpler. I have Jekyll envy.

Yet my main takeaway after this episode is gratitude for open-source software. If MathJax and WordPress had been sealed, blackbox applications, I would have been helpless to help myself, unable to do anything about the problem beyond whining and pleading.

This entry was posted in computing.

5 Responses to The Bug That Ate Thursday

1. Jack Rusher says:

Should you decide to make your escape from WordPress, may I encourage you to consider Pollen over Jekyll? It’s similar software coded in Racket (a scheme dialect) with a LaTeX-inspired markup/extension system that seems — in the estimation of a reader who has never met you personally — as if it would be just the sort of tool you would enjoy.

• Brian Hayes says:

Thanks for the suggestion. As a Schemer since the days of R2RS, I find the idea instantly appealing.

2. David Eisner says:

A gripping read, actually. I enjoy seeing other’s trouble-shooting philosophy. I also appreciated your post-mortem meta-analysis — what could you have done better?

As to the specific problem, there might be an unsinful way to get WP to leave your LaTeX unmolested: “Text enclosed in the tags <pre>, <code>, <kbd>, <style>, <script>, and <tt> will be skipped.” (link). That might be more extra typing than you want though — more than omitting the white space in the first place, certainly. You’d still have to modify old entries, too.

3. Christopher F. Chiesa says:

Your observation that ” Keeping WordPress easy on the outside seems to require a dense thicket of thorns and briers on the inside” is absolutely on point. To the extent that I’m a philosopher/savant, I noticed several decades ago that anything that was to appear simple to the user had to be complex internally. This applies noticeably to software systems: think of the degree to which, when personal computers were new, humans had to adapt their ways of doing things, to the way the computer needed them to be done. The burden of complexity was on humans. Now think of how much of that has gone away, for instance the degree to which applications on our smartphones are able to interact: I can take a photo with my phone (camera app), and from within that same app click e.g. a Facebook icon to post it on Facebook, i.e. using a completely different Facebook app. From firsthand experience I can tell you that it takes a lot more work to accomplish that “simple” feat, if you don’t have the Facebook app installed on which to offload the complexity of the job.

This is true in the real/physical world, too, but people are so accustomed to it after three million years of evolution, and 40,000 years of civilization, that they think everything in it is “simple.” Consider water. If you’re a Stone Age caveman, you have to know a lot about geography, geology, hydrology, animal behavior, chemistry, and probably a few other things, just to determine which body of water is safe to drink out of, and when. Is it muddy? Does it smell funny? Do other animals come to drink there? Dangerous animals? At what time of day? Do they poop in it? Etc. You may not be aware that you know all this, but it’s all on you to get your drink safely. Today, though, in America anyway, anyone can turn a handle in any number of different places right in his own home, and nearly-guaranteed clean, potable water appears, travels eight inches in plain sight, and disappears forever. No muss, no fuss — but all happening because of an enormous and complicated infrastructure behind the scenes.

Even something as basic as clay can be considered inordinately complex if you think about it: a human thinks, “mush it up into a shape you want; heat (fire) it and it hardens. Done.” Well, the ability to be mushed up into an infinite number of arbitrary shapes arises from the fact that the human-scale mass of clay is comprised of uncountable vigintillions (?) of molecules bound together by just the right combinations of electromagnetic forces, and then somehow (I don’t really know) treating it with heat, and letting it cool again (a step most people overlook; potters know about it, though!) changes those interrelationships to what we think of as “solid.” All of that complexity is shouldered by the uncountable molecules of the clay itself, so humans don’t have to think about it.

And so it goes, in everything.

4. We would need to see more of your attitude: post-mortem and analysis of the causes of why it took so long to solve a problem. This is the way to start being engineers for real.