This weekend I’ll be attending a workshop at the University of Vermont, “Kummer Classes and Anabelian Geometry: An introduction to concepts involved in Mochizuki’s work on the ABC conjecture, intended for non-experts.” It was the last phrase in this title that gave me the courage to sign up for the meeting, but I have no illusions. There are degrees of non-expertness, and mine is really quite advanced. The best I can hope for is to go home a little less ignorant than I arrived.

I’m glad to be here all the same. The ABC conjecture is one of those beguiling artifacts in number theory that seem utterly simple one moment and utterly baffling the next. As for Shinichi Mochizuki’s 500-page treatise on the conjecture, that’s baffling from start to finish, and not just for me. Four years after the manuscript was released, it remains a proof-on-probation because even the non-non-experts have yet to fully digest it. The workshop here in Burlington is part of the community’s digestive effort. I’m grateful for an opportunity to see the process at work, and naturally I’m curious about the eventual outcome. Will we finally have a proof, or will a gap be discovered? Or, as has happened in other sad cases, will the question remain unresolved? For me the best result will be not just a proof certified by a committee of experts but a proof I can understand, at least in outline, if I make the effort—a proof I might be able to explain to a broader audience.

The drive up to Burlington gave me a few hours of solitude to think about the conjecture and how it fits in with more familiar ideas in number theory. One notable connection is summed up as “ABC implies FLT.” Proving the ABC conjecture will bring us a new proof of Fermat’s Last Theorem, independent of the celebrated Andrew Wiles–Richard Taylor proof published 20 years ago. Interesting. But how are the two problems linked? As I cruised through the chlorophyll-soaked hills of the Green Mountain state, I noodled away at this question.

[Correction: ABC actually implies only that FLT has no more than a finite number of counterexamples, and only for exponent \(n \ge 4.\)]

I have written about the ABC conjecture twice before (2007 and 2012). Here’s a third attempt to explain what it’s all about.

The basic ingredients are three distinct positive integers, \(a\), \(b\), and \(c\), that satisfy the equation \(a + b = c\). Given this statement alone, the problem is so simple it’s silly. Even I can solve that equation. Pick any \(a\) and \(b\) you please, and I’ll give you the value of \(c\).

To make the exercise worth bothering with, we need to put some constraints on the values of \(a\), \(b\), and \(c\). One such constraint is that the three integers should have no factors in common. In other words they are relatively prime, or in still other words their greatest common divisor is 1. Excluding common factors doesn’t really make it any harder to find solutions to the equation, but it eliminates redundant solutions. Suppose that \(a\), \(b\), and \(c\) are all multiples of 7; then dividing out this common factor yields a set of smaller integers that also satisfy the equation: \(a\,/\,7 + b\,/\,7 = c\,/\,7\).

The ABC conjecture imposes a further constraint, and this is where the arithmetic finally gets interesting. We are to restrict our attention to triples of distinct positive integers that pass a certain test. First we find the prime factors of \(a\), \(b\), and \(c\), and cast out any duplicates. For example, given the triple \(a = 4, b = 45, c = 49\), the prime factors are \(2, 2, 3, 3, 5, 7, 7\); eliminating duplicates leaves us with the set \(\{2, 3, 5, 7\}\). Now we multiply all the distinct primes in the set, and call the product the radical, \(R\), of \(abc\). Here’s the punchline: The solution is admissible—it is an “\(abc\)-hit”—only if \(R \lt c\). For the example of \(a = 4, b = 45, c = 49\), this condition is not met: \(2 \times 3 \times 5 \times 7 = 210\), which of course is not less than 49.

The ABC conjecture holds that \(abc\)-hits are rare, in some special sense. Hits do exist; try working out the radical of \(a = 5, b = 27, c = 32\) to see an example. In fact, there are infinitely many \(abc\)-hits, with constructive algorithms for generating endless sequences of them. Yet, it’s part of the maddening charm of modern mathematics that objects can be both infinitely abundant and vanishingly rare at the same time. The particular kind of rareness at issue here says that \(R\) can be less than \(c\), but seldom by very much. As a measure of how much, define the power \(P\) of an \(abc\)-hit as \(\log(c) \,/\, \log(R)\). Then one version of the ABC conjecture states that there are only finitely many \(abc\)-hits with \(P \gt (1 + \epsilon)\) for any \(\epsilon \gt 0\).

On first acquaintance, all this rigmarole about radicals seems arbitrary and baroque. Who came up with that, and why? The answer to the who question is Joseph Oesterlé of the University of Paris and David W. Masser of the University of Basel, in 1985. If you play around long enough with integers and their prime factors, you can find all sorts of curious relations, so what’s so special about this one? Whether or not the conjecture is true—and whether or not it’s provable—why should we care?

As I tooled along the Interstate, I tried to answer this question to my own satisfaction. I made a little progress by thinking about what kinds of numbers we might expect to produce \(abc\)-hits. Are they big or small, nearly equal or of very different magnitudes? Are they primes or composites? Do we find squares or other perfect powers among them?

Primes are always a good place to start. Can we have an \(abc\)-hit with \(a\), \(b\), and \(c\) all prime? One complication is that either \(a\) or \(b\) will have to be equal to 2, because 2 is the only even prime, and we can’t have \(a + b = c\) with all three numbers odd. But that’s all right; there are still lots of triples (and conjecturally infinitely many) of the form \(2 + b = c\) with \(b\) and \(c\) prime; they are called twin primes. However, none of them are \(abc\)-hits. It’s easy to see why: If \(a\), \(b\), and \(c\) are all prime, then their radical is simply the product \(abc\), which for numbers larger than 1 is always going to be greater than \(a + b\).

We can extend this reasoning from the primes to all squarefree numbers, that is, numbers that have no repeated prime factors. (They are called squarefree because they are not divisible by any square.) For example, \(10, 21,\) and \(31\) form a squarefree \(abc\) triple, with prime factorizations \(2 \times 5\), \(3 \times 7\), and \(31\). But they do not produce an \(abc\)-hit, because their radical \(2 \times 3 \times 5 \times 7 \times 31 = 6510\) is clearly larger than \(a + b = c = 31\). And the same argument that rules out all-prime \(abc\)-hits applies here to exclude all-squarefree hits.

These results suggest that we look in the opposite direction, at squarefull numbers—and even cubefull numbers, and so on. We want lots of repeated factors. This strategy immediately pays off in the search for \(abc\)-hits. It maximizes the sum \(a + b = c\) without overly enlarging the radical—the product of the distinct primes. The very first of all \(abc\)-hits (in any reasonable ordering) offers an example. It is \(1 + 8 = 9\), or in factored form \(1 + 2^3 = 3^2\). This is a high-power hit, with \(\log(9) \,/\, \log(6) = 1.23\). The triple with highest known power is \(2 + (3^{10} \times 109) = 23^5\), yielding \(\log(6436342) \,/\, \log(15042) = 1.63.\)

Let’s look more closely at that first \(abc\)-hit, \(1 + 8 = 9\). Note that 8 and 9 are not just squarefull numbers; they are perfect powers. This triple is the subject of another famous conjecture. Eugène Catalan asked if \(8\) and \(9\) are the only consecutive perfect powers. Preda Mihailescu answered affirmatively in 2002. Thus we know that the equation \(1 + x^m = y^n\) has only this single solution. However, if we relax the rules just a little bit, we can find solutions to \(a + x^m = y^n\) where \(a\) has some value greater than 1. For example, there’s 3 + 125 = 128 (or 3 + 5^3 = 2^7), which is another high-power \(abc\)-hit.

Suppose we tighten the rules instead of relaxing them and ask for solutions to \(x^n + y^n = z^n\), where the three members of the triple are all nth powers of integers. If we could find solutions of this equation with large values of n, they would surely be a rich ore for high-power \(abc\)-hits. But alas, that’s the equation that Fermat’s Last Theorem tells us has no solutions in integers for \(n \gt 2\). The ABC conjecture turns this implication on its head. It says (if it can be proved) that the rarity of \(abc\)-hits implies there are no solutions to the Fermat equation.

That’s as far as I was able to get while musing behind the wheel—a vague intuition about the balance between addition and multiplication, a tradeoff between increasing the sum and reducing the radical, a hint of a connection between ABC and FLT. Not much, but a better sense of why it’s worth focusing some attention on this particular relation among numbers.

Now morning breaks over Burlington. Time to go learn something from those who are less non-expert.

Posted in mathematics | 5 Comments

The secret life of tweets

On Twitter you can say anything you want as long as it fits in 140 characters. The length limit is one of those frozen accidents of history, like QWERTY and the genetic code. In olden days (2006), tweets had to fit into cell-phone text messages, which imposed a limit of 160 characters. (Twitter reserves 20 characters for the sender’s @handle.) Back then, resources were so scarce the company had to squeeze the vowels out of its name: “twttr,” they called it. Now, we have bandwidth to burn. On the other hand, human attention is still a constraint.

Tweet being composed. The text reads: 'To my prolific tweeps: I dote upon every precious character you send my way, which is why I am sometimes grateful you can send me no more than 140.' This message exceeds the 140-character limit by 5 characters.

Last year, a proposal to raise the limit to 10,000 characters was shouted down in a storm of very terse but intense tweets.

The 140-character limit is enforced by the Twitter software. When you compose a tweet, a counter starts at 140 and is decremented with each character you type; if the number goes negative, the Tweet button is disabled (as in the screen capture above). Based on this observation, I had long believed that every tweet was indeed a little snippet of pure text composed of no more than 140 characters. Was I naïve, or what?

My belated enlightenment began earlier this week, when I began having trouble with links embedded in tweets. Clicking on a link opened a new browser tab, but the requested page failed to load. The process got stuck waiting to connect to a URL such as The “” domain gave me a clue to the source of the problem. A long URL (, for example) can use up your 140-character quota in a hurry, and so twitterers long ago turned to URL-shortening services such as and TinyURL, which allow you to substitute an abbreviated URL for the original web address. The shortening services work by redirection. When your browser issues the request “GET″, what comes back is not the web page you’re seeking but a message such as “REDIRECT”. The browser then automatically issues a second GET request to the provided destination address.

In 2011 Twitter introduced its own shortening service, Use of this service is automatic and inescapable. That is, any link included in a tweet will be converted into a 23-character URL, whether you want it to be or not, and even if it’s already shorter than 23 characters. The displayed link may appear to refer to the original URL, but when you click on it, the browser will go first to a address and only afterwards to the true target. Embedded images also have URLs.

A drawback of all redirection services is that they become a bottleneck and a potential point of failure for the sites that depend on them. If goes down, every link posted on Twitter becomes unreachable, and every image disappears. Is that what happened earlier this week when I was having trouble following Twitter links? Probably not; a disruption of that scale would have been widely noted. Indeed, I soon discovered that the problem was quite localized: It plagued all browsers on my computer, but other machines in the household were unaffected.

When I did a web search for “ broken links,” I quickly discovered a long discussion of the issue in the Twitter developer forum, with 87 messages going back to 2012. Grouchy complaints are interspersed with a welter of conflicting diagnoses and inconsistent remedies. Much attention focused on Apple hardware and software (which I use). A number of contributors argued that the problem is not in the browser but somewhere upstream—in the operating system, the router, the cable interface, or even the internet service provider.

After a day or two, my problem with Twitter links went away, and I never learned the exact cause. I hate it when that happens, although I hate it more when the problem doesn’t go away. However, that’s not why I’m writing this. What I want to talk about is something I stumbled upon in the course of my troubleshooting. I found a plugin for the Google Chrome browser, Goodbye, that promised to bypass and thereby fix the problem. How could it do that? If is not responding, or if the response is not getting through to the browser, how can code running in the browser make any difference? It seems like tinkering with your television set when the broadcaster is off the air.

The source code for Goodbye is on GitHub, so I took a look. The program is just a couple dozen lines of JavaScript. What I saw there sent me running back to my Twitter feed, to examine the web page using the browser’s developer tools.

Here’s a tweet I posted a few days ago, as it is displayed by the Twitter web site. Note the link to an arXiv paper:

Beckett tweet

And here’s the HTML that encodes that tweet in the web page:

Beckett tweet HTML

The text of the tweet (“A problem in coding theory that comes from a Samuel Beckett play: ”) amounts to 66 characters, plus 25 more for the link (“ “). But that’s not all that Twitter is sending out to my followers. Far from it. The block of HTML shown above is 751 characters, and the complete markup for this one tweet comes to just under 7,000 characters, or 50 times the nominal limit.

Take a closer look at the anchor tag in that HTML block:

Beckett tweet HTML anchor tag

The href attribute of the anchor tag is a URL; that’s where the browser will go when you click the link. But, reading on, we come to a data-expanded-url that gives the final destination link in full. And then that same final destination URL appears again in the title attribute. This explains immediately how Goodbye can “bypass” the service. It simply retrieves the data-expanded-url and sends the browser there, without making the detour through

I have two questions. First, if you’re going to use a shortened, redirected URL, why also include the full-length URL in the page markup? The apparent answer is: So that the web browser can show the user the true destination. This is clearly the point of the title attribute. When you hover on a link, the content of the title attribute is displayed in a “tooltip.” I’m not so sure about the purpose of the data-expanded-url attribute. It’s surely not there to help the author of Goodbye Twitter presumably has some JavaScript of its own that accesses that field.

The second question is the inverse of the first: If you’re going to include the full-length URL, why bother with the shortening-and-redirecting rigmarole? Twitter could shut down the servers and doubtless save a pile of money. Those servers have to deal with all the links and images in some 200 billion tweets per year. The use of redirection doubles the number of requests and responses—that’s a lot of internet bandwidth—and introduces delays of a few hundred milliseconds (even when the service works correctly). Note that Twitter could still display a shortened URL within the text of the tweet, without requiring redirection.

Twitter’s own developer documents offer an answer to the second question:

Tens of millions of links are tweeted on Twitter each day. Wrapping these shared links helps Twitter protect users from malicious content while offering useful insights on engagement.

The promise to “protect users from malicious content” presumably means that if I link to a sufficiently sleazy site, Twitter will refuse to redirect readers there, or perhaps will just warn them of the danger. (I don’t know which because I’ve never encountered this behavior.) As for “offering useful insights on engagement,” I believe that phrase could be translated as “helping us target advertising and collect data with potential market value.” In other words, is not just a cost center but also a revenue center. Every time you click on a link within a tweet, Twitter knows exactly where you’re going and can add that information to your profile.

A few months ago, Twitter announced a slight change to the 140-character rule. @handles included in the text will no longer count toward the character total, and neither will images or other media attachments. Some press reports suggested that links would also be excluded from the count, but the official announcement made no mention of links. And redirection is clearly here to stay.

I can suggest two takeaway messages from this little episode in my life as an internaut.

If you want to limit the “insights on engagement” that Twitter accumulates about your activities, you might consider installing a plugin to bypass redirection. There’s an ongoing argument about the wisdom and morality of such actions, focused in particular on ad-blocking software. I have my own views on this issue, but I’m not going to air them here and now.

The other small lesson I’ve learned is that using alternative URL-shortening services with Twitter is worse than pointless. Pre-shrinking the URL has no effect on the character count. It also obscures the true destination from the reader (since the title attribute is “”). Most important, it interposes two layers of redirection, with two delays, two potential points of failure, and two opportunities to collect saleable data. Yet I still see lots of and links in tweets. Am I missing or misunderstanding something?

Posted in computing | 4 Comments

Bertrand Russell, Donald Trump, and Archimedes

A habit of finding pleasure in thought rather than in action is a safeguard against unwisdom and excessive love of power, a means of preserving serenity in misfortune and peace of mind among worries. A life confined to what is personal is likely, sooner or later, to become unbearably painful; it is only by windows into a larger and less fretful cosmos that the more tragic parts of life become endurable.

—Bertrand Russell, “Useless” Knowledge, 1935

For Russell, mathematics was one of those windows opening on a calmer universe. So it is for me too, and for many others. When you are absorbed in solving a problem, understanding a theorem, or writing a computer program, the world’s noisy bickering is magically muted. For a little while, at least, you can hold back life’s conflicts, heartaches, and disappointments.

But Russell was no self-absorbed savant, standing aloof from the issues of his time. On the contrary, he was deeply engaged in public discourse. During World War I he took a pacifist position (and went to prison for it), and he continued to speak his peace into the Vietnam era. is meant to be a little corner of Russell’s less fretful cosmos, both for me and, I hope, for my readers. In this space I would prefer to shut out the clamor of the hustings and the marketplace. And yet there comes a time to look up from bit-playing and listen to what’s going on outside the window.

A candidate for the U.S. presidency is goading his followers to murder his opponent. Here are his words (in the New York Times transcription):

Hillary wants to abolish—essentially abolish—the Second Amendment. By the way, and if she gets to pick—if she gets to pick her judges, nothing you can do folks. Although the Second Amendment people—maybe there is, I don’t know.

A day later, Donald Trump said he was merely suggesting that gun owners might be roused to come out and vote, not that they might assassinate a president. Yeah, sure. And when Henry II of England mused, “Who will rid me of this troublesome priest?” he was just asking an idle question. But soon enough Thomas Becket was hacked to death on the floor of Canterbury Cathedral.

This is not the first time Trump has strayed beyond tasteless buffoonery into reckless incitement. But this instance is so egregious I just cannot keep quiet. His words are vile and dangerous. I have to speak out against them. We face a threat to the survival of democracy and civil society.

Mathematics, after all, is one of those luxuries we can afford only so long as the thugs do not come crashing through the door. Another great mathematician offers a lesson here, in a tale told by Plutarch. During the sack of Syracuse, according to one version of the legend, Archimedes was puzzling out a mathemetical problem. He was staring at a diagram sketched in the sand when Roman soldiers came upon him. Deep in thought, he refused to turn away from his work until he had finished the proof of his theorem. One of the soldiers drew a sword and ran him through.

Death of Archimedes. Engraving after a painting by Gustave Courtois (1853-1923).Death of archimedes courtois 075dpi

Posted in off-topic | 2 Comments

The 39th Root of 92

My new office space suffers from a shortage of photons, so I’ve been wiring up light fixtures. As I was snaking 14-gauge cable through the ceiling cavity, I began to wonder: stripped end of 14/2 Romex cable: two insulated 14-gauge conductors with bare ground wireWhy is it called 14-gauge? I know that the gauge specifies the size of the copper conductors, but how exactly? The number can’t be a simple measure of diameter or cross-sectional area, because thicker wires have smaller gauge numbers. Twelve-gauge is heavier than 14-gauge, and 10-gauge is even beefier. Going in the other direction, larger numbers denote skinnier wires: 20-gauge for doorbells and thermostats, 24-gauge for telephone wiring.

Why do the numbers run backwards? Could there be a connection with shotguns, whose sizes also seem to go the wrong way? A 20-gauge shotgun has a smaller bore than a 12-gauge, which in turn is smaller than a 10-gauge gun. Mere coincidence?

Answers are not hard to find. The Wikipedia article on American Wire Gauge (AWG) is a good place to start. And there’s a surprising bit of mathematical fun along the way. It turns out that American wire sizes make essential use of the 39th root of 92, a somewhat frillier number than I would have expected to find in this workaday, blue-collar context.

Wire is made by pulling a metal rod through a die—a block of hard material with a hole in it. In cross section, the hole is shaped something like a rocket nozzle, with conical walls that taper down to a narrow throat.Sketch of a wire drawing die, with copper wire moving left to right through a round aperture with a double-cone profile. As the rod passes through the die, the metal deforms plastically, reducing the diameter while increasing the length. But there’s a limit to this squeezing and stretching; you can’t transform a short, fat rod into a long, thin wire all in one go. On each pass through a die, the diameter is only slightly reduced—maybe by 10 percent or so. To make a fine wire, you need to shrink the thickness in stages, drawing the wire through several dies in succession. And therein lies the key to wire gauge numbers: The gauge of a wire is the number of dies it must pass through to reach its final diameter. Zero-gauge is the thickness of the original rod, without any drawing operations. Fourteen-gauge wire has been pulled through 14 dies in series.

Or at least that was how it worked back when wire-drawing was a hand craft, and nobody worried too much about exact specifications. If two wires had both been pulled through 14 dies, they would both be labeled 14-gauge, but they might well have different diameters if the dies were not identical. By the middle of the 19th century this sort of variation was becoming troublesome; it was time to adopt some standards.

The AWG standard keeps the traditional sequence of gauge numbers but changes their meaning. The gauge is no longer a count of drawing operations; instead each gauge number corresponds to a specific wire diameter. Even so, there’s an effort to keep the new standardized sizes reasonably close to what they were under the old die-counting system.

Wire gauge PSF WikipediaCredit: WikipediaThe mapping from gauge numbers to diameters has two fixed reference points. At the thin end of the scale, \(36\)-gauge wire is defined as having a diameter of exactly \(0.005\) inch. At the stout end, a wire size designated \(0000\)-gauge is assigned a diameter of \(0.46\) inch. This quadruple-zero gauge is three steps larger than \(0\)-gauge (and might more sensibly be named \(–3\)-gauge). Thus there are \(40\) integer-valued gauge numbers, and \(39\) steps between them. The extreme values are known, and we need to devise some interpolation process to assign diameters to all the gauges between \(36\) and \(0000\).

The wire-drawing process itself suggests how to do this. Each pass through a die reduces the wire diameter to some fraction of its former size, but the value of the fraction might vary a little from one die to the next. The standard simply decrees that the fraction is exactly the same in all cases. In other words, for every pair of adjacent gauge numbers, the corresponding wire diameters have the same ratio, \(R\).

What remains is to work out the value of \(R\). If we start with \(d_{36} = 0.005\) and multiply by \(R\), we’ll get \(d_{35}\); then, multiplying \(d_{35}\) by \(R\) yields \(d_{34}\), and so on. Continuing in this way, after multiplying by \(R\) \(39\) times, we should arrive at \(d_{-3} = 0.46.\) This iterative process can be summarized as:

\[\frac{d_{-3}}{d_{36}} = R^{39}.\]

Filling in the numeric values, we get:

\[\frac{0.46}{0.005} = 92 = R^{39}, \quad \textrm{and thus}\quad R = \sqrt[39]{92}.\]

And there the number lies before us, the \(39\)th root of \(92\). The numerical value is about \(1.122932\), with \(1/R \approx 0.890526\).

With this fact in hand we can now write down a formula that gives the AWG gauge number \(G\) as a function of wire diameter \(d\) in inches:

\[G(d) = -39 \log_{92} \frac{d}{0.005} + 36.\]

That’s a fairly bizarre-looking formula, with base-92 logarithms and a bunch of arbitrary constants floating around. On the other hand, at least it’s a genuine mathematical function, with a domain covering all the positive real numbers. It’s also smooth and invertible. That’s more than you can say for some other standards, such as the British Imperial Wire Gauge, which pastes together several piecewise linear segments.

Who came up with the rule of \(\sqrt[39]{92}\)? As far as I can tell it was Lucian Sharpe, of Brown and Sharpe, a maker of precision instruments and machine tools in Providence, Rhode Island. A history of the company published in 1949 gives this account:

Another activity begun in the [1850s] was the production of accurate gages. The brass business of Connecticut, centered in the Naugatuck Valley, required sheet metal and wire gauges for measuring their products. Mr. Sharpe, with his methodical mind, conceived the idea of producing sizes of wire in a regular progression, choosing a geometric series as best adapted to these needs. Such gages as were in use prior to this time were the product of English manufacture and were very irregular in their sizes.

The first Brown and Sharpe wire gauge was produced in 1857 and later became the basis of an American standard, which is now administered by ASTM.

Wire gauges are not the only numbers defined by a weird-and-wonderful root-taking procedure. The equal-tempered scale of music theory is based on the 12th root of 2. A musical octave represents a doubling of frequency, and the scale divides this interval into 12 semitones. In the equal-tempered version of the scale, any two adjacent semitones differ by a ratio of \(\sqrt[12]{2}\), or about \(1.05946\). It’s worth noting that instruments were being tuned to this scale well before the invention of logarithms. I assume it was done by ear or perhaps by geometry, not by algebra. Around 1600 Simon Stevin did attempt to calculate numerical values for the pitch intervals by decomposing 12th roots into combinations of square and cube roots; his results were not flawless. What would he have done with 39th roots?

Another example of a backward-running logarithmic progression is the magnitude scale for the brightness of stars and other celestial objects. For the astronomers, the magic number is the fifth root of \(100\), or about \(2.511886\); if two stars differ by one unit of magnitude, this is their brightness ratio. A difference of five magnitudes therefore works out to a hundredfold brightness ratio. Brighter bodies have smaller magnitudes. The star Vega defines magnitude \(0\); the sun has magnitude \(-27\); the faintest stars visible without a telescope are at magnitude \(6\) or \(7\).

The idea of stellar magnitudes is ancient, but the numerical scheme in current use was developed by the British/Indian astronomer N. R. Pogson in 1856. That was just a year before Sharpe came up with his wire gauge scale. Could there be a connection? It would make a nice story if we could find some timely account of Pogson’s work that Sharpe might plausibly have read (maybe in Scientific American, founded 1845), but that’s a pure flight of fancy for now.

And what about those shotguns? Are their gauges also governed by some sort of logarithmic law? No, the numerical similarity of gauges for wires and shotguns really is nothing but coincidence. The shotgun law is not logarithmic but reciprocal. Wikipedia explains:

The gauge of a firearm is a unit of measurement used to express the diameter of the barrel. Gauge is determined from the weight of a solid sphere of lead that will fit the bore of the firearm, and is expressed as the multiplicative inverse of the sphere’s weight as a fraction of a pound, e.g., a one-twelfth pound ball fits a 12-gauge bore. Thus there are twelve 12-gauge balls per pound, etc. The term is related to the measurement of cannon, which were also measured by the weight of their iron round shot; an 8 pounder would fire an 8 lb (3.6 kg) ball.

Addendum 2016-08-08: Leon Harkleroad has brought to my attention his excellent article on “Tuning with Triangles” (College Mathematics Journal, Vol. 39, No. 5 (Nov. 2008), pp. 367–373). He describes a simple geometric procedure that Vincenzo Galilei (father of Galileo) used for fretting stringed instruments. In essence it takes \(18/17 \approx 1.05882\) as an approximation to \(\sqrt[12]{2} \approx 1.05946\).

Posted in computing, mathematics, technology | 11 Comments

Getting to Know Julia

When I first started playing with computers, back in the Pleistocene, writing a few lines of code and watching the machine carry out my instructions was enough to give me a little thrill. “Look! It can count to 10!” Today, learning a new programming language is my way of reviving that sense of wonder.

Lately I’ve been learning Julia, which describes itself as “a high-level, high-performance dynamic programming language for technical computing.” I’ve been dabbling with Julia for a couple of years, but this spring I completed my first serious project—my first attempt to do something useful with the language. I wrote a bunch of code to explore correlations between consecutive prime numbers mod m, inspired by the recent paper of Lemke Oliver and Soundararajan. The code from that project, wrapped up in a Jupyter notebook, is available on GitHub. A bit-player article published last month presented the results of the computation. Here I want to say a few words about my experience with the language.

I also discussed Julia a year ago, on the occasion of JuliaCon 2015, the annual gathering of the Julia community. Parts of this post were written at JuliaCon 2016, held at MIT June 21–25.

This document is itself derived from a Jupyter notebook, although it has been converted to static HTML—meaning that you can only read it, not run the code examples. (I’ve also done some reformatting to save space and improve appearance.) If you have a working installation of Julia and Jupyter, I suggest you download the fully interactive notebook from GitHub, so that you can edit and run the code. If you haven’t installed Julia, download the notebook anyway. You can upload it to and run it online.

What Is Julia?

A 2014 paper by the founders of the Julia project sets an ambitious goal. To paraphrase: There are languages that make programming quick and easy, and there are languages that make computation fast and efficient. The two sets of languages have long been disjoint. Julia aims to fix that. It offers high-level programming, where algorithms are expressed succinctly and without much fussing over data types and memory allocation. But it also strives to match the performance of lower-level languages such as Fortran and C. Achieving these dual goals requires attention both to the design of the language itself and to the implementation.

The Julia project was initiated by a small group at MIT, including Jeff Bezanson, Alan Edelman, Stefan Karpinski, and Viral B. Shah, and has since attracted about 500 contributors. The software is open source, hosted at GitHub.

Who Am I?

In my kind of programming, the end product is not the program itself but the answers it computes. As a result, I favor an incremental and interactive style. I want to write and run small chunks of code—typically individual procedures—without having to build a lot of scaffolding to support them. For a long time I worked mainly in Lisp, although in recent years I’ve also written a fair amount of Python and JavaScript. I mention all this because my background inevitably colors my judgment of programming languages and environments. If you’re a software engineer on a large team building large systems, your needs and priorities are surely different from mine.

What Julia Code Looks Like

Enough generalities. Here’s a bit of Julia code, plucked from my recent project:

# return the next prime after x

# define the function...
function next_prime(x)
    x += iseven(x) ? 1 : 2
    while !isprime(x)
        x += 2
    return x

# and now invoke it
next_prime(10000000) ⇒ 10000019

Given an integer x, we add 1 if x is even and otherwise add 2, thereby ensuring that the new value of x is odd. Now we repeatedly check the isprime(x) predicate, or rather its negation !isprime(x). As long as x is not a prime number, we continue incrementing by 2; when x finally lands on a prime value (it must do so eventually, according to Euclid), the while loop terminates and the function returns the prime value of x.

In this code snippet, note the absence of punctuation: No semicolons separate the statements, and no curly braces delimit the function body. The syntax owes a lot to MATLAB—most conspicuously the use of the end keyword to mark the end of a block, without any corresponding begin. The resemblance to MATLAB is more than coincidence; some of the key people in the Julia project have a background in numerical analysis and linear algebra, where MATLAB has long been a standard tool. One of those people is Alan Edelman. I remember asking him, 15 years ago, “How do you compute the eigenvalues of a matrix?” He explained: “You start up MATLAB and type eig.” The same incantation works in Julia.

The sparse punctuation and the style of indentation also make Julia code look a little like Python, but in this case the similarity is only superficial. In Python, indentation determines the scope of statements, but in Julia indentation is merely an aid to the human reader; the program would compute the same result even if all the lines were flush left.

More Basics

The next_prime function above shows a while loop. Here’s a for loop:

# for-loop factorial

function factorial_for(n)
    fact = 1
    for i in 1:n         # i takes each value in 1, 2, ..., n
        fact *= i        # equivalent to fact = fact * i
    return fact

factorial_for(10) ⇒ 3628800

The same idea can be expressed more succinctly and idiomatically:

# factorial via range product

function factorial_range(n)
    return prod(1:n)

factorial_range(11) ⇒ 39916800

Here 1:n denotes the numeric range \(1, 2, 3, …, n\), and the prod function returns the product of all the numbers in the range. Still too verbose? Using an abbreviated style of function definition, it becomes a true one-liner:

fact_range(n) = prod(1:n)

fact_range(12) ⇒ 479001600


As an immigrant from the land of Lisp, I can’t in good conscience discuss the factorial function without also presenting a recursive version:

# recursive definition of factorial

function factorial_rec(n)
    if n < 2
        return 1
        return n * factorial_rec(n - 1)

factorial_rec(13) ⇒ 6227020800

You can save some keystrokes by replacing the if... else... end statement with a ternary operator. The syntax is

      predicate ? consequent : alternative.

If predicate evaluates to true, the expression returns the value of consequent; otherwise it returns the value of alternative. With this device, the recursive factorial function looks like this:

# recursive factorial with ternary operator

function factorial_rec_ternary(n)
    n < 2 ? 1 : n * factorial_rec_ternary(n - 1)

factorial_rec_ternary(14) ⇒ 87178291200

One sour note on the subject of recursion: The Julia compiler does not perform tail-call conversion, which endows recursive procedure definitions with the same space and time efficiency as iterative structures such as while loops. Consider this pair of daisy-plucking procedures:

function she_loves_me(petals)
    petals == 0 ? true : she_loves_me_not(petals - 1)

function she_loves_me_not(petals)
    petals == 0 ? false : she_loves_me(petals - 1)

she_loves_me(1001) ⇒ false

These are mutually recursive definitions: Control passes back and forth between them until one or the other terminates when the petals are exhausted. The function works correctly on small values of petals. But let’s try a more challenging exercise:


LoadError: StackOverflowError:
while loading In[8], in expression starting on line 1

 in she_loves_me at In[7]:2 (repeats 80000 times)

The stack overflows because Julia is pushing a procedure-call record onto the stack for every invocation of either she_loves_me or she_loves_me_not. These records will never be needed; when the answer (true or false) is finally determined, it can be returned directly to the top level, without having to percolate up through the cascade of stack frames. The technique for eliminating such unneeded stack frames has been well known for more than 30 years. Implementing tail-call optimization has been discussed by the Julia developers but is “not a priority.” For me this is not a catastrophic defect, but it’s a disappointment. It rules out a certain style of reasoning that in some cases is the most direct way of expressing mathematical ideas.

Array Comprehensions

Given Julia’s heritage in linear algebra, it’s no surprise that there are rich facilities for describing matrices and other array-like objects. One handy tool is the array comprehension, which is written as a pair of square brackets surrounding a description of what the compiler should compute to build the array.

A = [x^2 for x in 1:5] ⇒ [1, 4, 9, 16, 25]

The idea of comprehensions comes from the set-builder notation of mathematics. It was adopted in the programming language SETL in the 1960s, and it has since wormed its way into several other languages, including Haskell and Python. But Julia’s comprehensions are unusual: Whereas Python comprehensions are limited to one-dimensional vectors or lists, Julia comprehensions can specify multidimensional arrays.

B = [x^y for x in 1:3, y in 1:3] ⇒

3x3 Array{Int64,2}:
 1  1   1
 2  4   8
 3  9  27

Multidimensionality comes at a price. A Python comprehension can have a filter clause that selects some of the array elements and excludes others. If Julia had such comprehension filters, you could generate an array of prime numbers with an expression like this: [x for x in 1:100 if isprime(x)]. But adding filters to multidimensional arrays is problematic; you can’t just pluck values out of a matrix and keep the rest of the structure intact. Nevertheless, it appears a solution is in hand. After three years of discussion, a fix has recently been added to the code base. (I have not yet tried it out.)

Multiple Dispatch

This sounds like something the 911 operator might do in response to a five-alarm fire, but in fact multiple dispatch is a distinctive core idea in Julia. Understanding what it is and why you might care about it requires a bit of context.

In almost all programming languages, an expression along the lines of x * y multiplies x by y, whether x and y are integers, floating-point numbers, or maybe even complex numbers or matrices. All of these operations qualify as multiplication, but the underlying algorithms—the ways the bits are twiddled to compute the product—are quite different. Thus the * operator is polymorphic: It’s a single symbol that evokes many different actions. One way to handle this situation is to write a single big function—call it mul(x, y)—that has a chain of if... then... else statements to select the right multiplication algorithm for each x, y pair. If you want to multiply all possible combinations of \(n\) kinds of numbers, the if statement needs \(n^2\) branches. Maintaining that monolithic mul procedure can become a headache. Suppose you want to add a new type of number, such as quaternions. You would have to patch in new routines throughout the existing mul code.

When object-oriented programming (OOP) swept through the world of computing in the 1980s, it offered an alternative. In an object-oriented setting, each class of numbers has its own set of multiplication procedures, or methods. Instead of one universal mul function, there’s a mul method for integers, another mul for floats, and so on. To multiply x by y, you don’t write mul(x, y) but rather something like x.mul(y). The object-oriented programmer thinks of this protocol as sending a message to the x object, asking it to apply its mul method to the y argument. You can also describe the scheme as a single-dispatch mechanism. The dispatcher is a behind-the-scenes component that keeps track of all methods with the same name, such as mul, and chooses which of those methods to invoke based on the data type of the single argument x. (The x object still needs some internal mechanism to examine the type of y to know which of its own methods to apply.)

Multiple dispatch is a generalization of single dispatch. The dispatcher looks at all the arguments and their types, and chooses the method accordingly. Once that decision is made, no further choices are needed. There might be separate mul methods for combining two integers, for an integer and a float, for a float and a complex, etc. Julia has 138 methods linked to the multiplication operator *, including a few mildly surprising combinations. (Quick, what’s the value of false * 42?)

# Typing "methods(f)" for any function f, or for an operator
# such as "*", produces a list of all the methods and their
# type signatures. The links take you to the source code.

138 methods for generic function *:
  • *(x::Bool, y::Bool) at bool.jl:38
  • *{T<:Unsigned}(x::Bool, y::T<:Unsigned) at bool.jl:53
  • *(x::Bool, z::Complex{Bool}) at complex.jl:122
  • *(x::Bool, z::Complex{T<:Real}) at complex.jl:129
  • *{T<:Number}(x::Bool, y::T<:Number) at bool.jl:49
  • *(x::Float32, y::Float32) at float.jl:211
  • *(x::Float64, y::Float64) at float.jl:212
  • *(z::Complex{T<:Real}, w::Complex{T<:Real}) at complex.jl:113
  • *(z::Complex{Bool}, x::Bool) at complex.jl:123
  • *(z::Complex{T<:Real}, x::Bool) at complex.jl:130
  • *(x::Real, z::Complex{Bool}) at complex.jl:140
  • *(z::Complex{Bool}, x::Real) at complex.jl:141
  • *(x::Real, z::Complex{T<:Real}) at complex.jl:152
  • *(z::Complex{T<:Real}, x::Real) at complex.jl:153
  • *(x::Rational{T<:Integer}, y::Rational{T<:Integer}) at rational.jl:186
  • *(a::Float16, b::Float16) at float16.jl:136
  • *{N}(a::Integer, index::CartesianIndex{N}) at multidimensional.jl:50
  • *(x::BigInt, y::BigInt) at gmp.jl:256
  • *(a::BigInt, b::BigInt, c::BigInt) at gmp.jl:279
  • *(a::BigInt, b::BigInt, c::BigInt, d::BigInt) at gmp.jl:285
  • *(a::BigInt, b::BigInt, c::BigInt, d::BigInt, e::BigInt) at gmp.jl:292
  • *(x::BigInt, c::Union{UInt16,UInt32,UInt64,UInt8}) at gmp.jl:326
  • *(c::Union{UInt16,UInt32,UInt64,UInt8}, x::BigInt) at gmp.jl:330
  • *(x::BigInt, c::Union{Int16,Int32,Int64,Int8}) at gmp.jl:332
  • *(c::Union{Int16,Int32,Int64,Int8}, x::BigInt) at gmp.jl:336
  • *(x::BigFloat, y::BigFloat) at mpfr.jl:208
  • *(x::BigFloat, c::Union{UInt16,UInt32,UInt64,UInt8}) at mpfr.jl:215
  • *(c::Union{UInt16,UInt32,UInt64,UInt8}, x::BigFloat) at mpfr.jl:219
  • *(x::BigFloat, c::Union{Int16,Int32,Int64,Int8}) at mpfr.jl:223
  • *(c::Union{Int16,Int32,Int64,Int8}, x::BigFloat) at mpfr.jl:227
  • *(x::BigFloat, c::Union{Float16,Float32,Float64}) at mpfr.jl:231
  • *(c::Union{Float16,Float32,Float64}, x::BigFloat) at mpfr.jl:235
  • *(x::BigFloat, c::BigInt) at mpfr.jl:239
  • *(c::BigInt, x::BigFloat) at mpfr.jl:243
  • *(a::BigFloat, b::BigFloat, c::BigFloat) at mpfr.jl:379
  • *(a::BigFloat, b::BigFloat, c::BigFloat, d::BigFloat) at mpfr.jl:385
  • *(a::BigFloat, b::BigFloat, c::BigFloat, d::BigFloat, e::BigFloat) at mpfr.jl:392
  • *{T<:Number}(x::T<:Number, D::Diagonal{T}) at linalg/diagonal.jl:89
  • *(x::Irrational{sym}, y::Irrational{sym}) at irrationals.jl:72
  • *(y::Real, x::Base.Dates.Period) at dates/periods.jl:55
  • *(x::Number) at operators.jl:74
  • *(y::Number, x::Bool) at bool.jl:55
  • *(x::Int8, y::Int8) at int.jl:19
  • *(x::UInt8, y::UInt8) at int.jl:19
  • *(x::Int16, y::Int16) at int.jl:19
  • *(x::UInt16, y::UInt16) at int.jl:19
  • *(x::Int32, y::Int32) at int.jl:19
  • *(x::UInt32, y::UInt32) at int.jl:19
  • *(x::Int64, y::Int64) at int.jl:19
  • *(x::UInt64, y::UInt64) at int.jl:19
  • *(x::Int128, y::Int128) at int.jl:456
  • *(x::UInt128, y::UInt128) at int.jl:457
  • *{T<:Number}(x::T<:Number, y::T<:Number) at promotion.jl:212
  • *(x::Number, y::Number) at promotion.jl:168

  • *{T<:Union{Complex{Float32},Complex{Float64},Float32,Float64},S} (A::Union{DenseArray{T<:Union{Complex{Float32}, Complex{Float64},Float32,Float64},2}, SubArray{T<:Union{Complex{Float32}, Complex{Float64}, Float32,Float64},2, A<:DenseArray{T,N}, I<:Tuple{Vararg{Union{Colon,Int64,Range{Int64}}}},LD}}, x::Union{DenseArray{S,1}, SubArray{S,1,A<:DenseArray{T,N}, I<:Tuple{Vararg{Union{Colon,Int64,Range{Int64}}}},LD}}) at linalg/matmul.jl:82
  • *(A::SymTridiagonal{T}, B::Number) at linalg/tridiag.jl:86
  • *(A::Tridiagonal{T}, B::Number) at linalg/tridiag.jl:406
  • *(A::UpperTriangular{T,S<:AbstractArray{T,2}}, x::Number) at linalg/triangular.jl:454
  • *(A::Base.LinAlg.UnitUpperTriangular{T,S<:AbstractArray{T,2}}, x::Number) at linalg/triangular.jl:457
  • *(A::LowerTriangular{T,S<:AbstractArray{T,2}}, x::Number) at linalg/triangular.jl:454
  • *(A::Base.LinAlg.UnitLowerTriangular{T,S<:AbstractArray{T,2}}, x::Number) at linalg/triangular.jl:457
  • *(A::Tridiagonal{T}, B::UpperTriangular{T,S<:AbstractArray{T,2}}) at linalg/triangular.jl:969
  • *(A::Tridiagonal{T}, B::Base.LinAlg.UnitUpperTriangular{T,S<:AbstractArray{T,2}}) at linalg/triangular.jl:969
  • *(A::Tridiagonal{T}, B::LowerTriangular{T,S<:AbstractArray{T,2}}) at linalg/triangular.jl:969
  • *(A::Tridiagonal{T}, B::Base.LinAlg.UnitLowerTriangular{T,S<:AbstractArray{T,2}}) at linalg/triangular.jl:969
  • *(A::Base.LinAlg.AbstractTriangular{T,S<:AbstractArray{T,2}}, B::Base.LinAlg.AbstractTriangular{T,S<:AbstractArray{T,2}}) at linalg/triangular.jl:975
  • *{TA,TB}(A::Base.LinAlg.AbstractTriangular{TA,S<:AbstractArray{T,2}}, B:: Union{DenseArray{TB,1},DenseArray{TB,2}, SubArray{TB,1,A<:DenseArray{T,N}, I<:Tuple{Vararg{Union{Colon,Int64,Range{Int64}}}},LD}, SubArray{TB,2,A<:DenseArray{T,N}, I<:Tuple{Vararg{Union{Colon,Int64,Range{Int64}}}},LD}}) at linalg/triangular.jl:989
  • *{TA,TB}(A:: Union{DenseArray{TA,1},DenseArray{TA,2}, SubArray{TA,1,A<:DenseArray{T,N}, I<:Tuple{Vararg{Union{Colon,Int64,Range{Int64}}}},LD}, SubArray{TA,2,A<:DenseArray{T,N}, I<:Tuple{Vararg{Union{Colon,Int64,Range{Int64}}}},LD}}, B::Base.LinAlg.AbstractTriangular{TB,S<:AbstractArray{T,2}}) at linalg/triangular.jl:1016
  • *{TA,Tb}(A::Union{Base.LinAlg.QRCompactWYQ{TA,M<:AbstractArray{T,2}}, Base.LinAlg.QRPackedQ{TA,S<:AbstractArray{T,2}}}, b:: Union{DenseArray{Tb,1}, SubArray{Tb,1,A<:DenseArray{T,N}, I<:Tuple{Vararg{Union{Colon,Int64,Range{Int64}}}},LD}}) at linalg/qr.jl:166
  • *{TA,TB}(A::Union{Base.LinAlg.QRCompactWYQ{TA,M<:AbstractArray{T,2}}, Base.LinAlg.QRPackedQ{TA,S<:AbstractArray{T,2}}}, B:: Union{DenseArray{TB,2},SubArray{TB,2,A<:DenseArray{T,N}, I<:Tuple{Vararg{Union{Colon,Int64,Range{Int64}}}},LD}}) at linalg/qr.jl:178
  • *{TA,TQ,N}(A:: Union{DenseArray{TA,N}, SubArray{TA,N,A<:DenseArray{T,N}, I<:Tuple{Vararg{Union{Colon,Int64,Range{Int64}}}},LD}}, Q::Union{Base.LinAlg.QRCompactWYQ{TQ,M<:AbstractArray{T,2}}, Base.LinAlg.QRPackedQ{TQ,S<:AbstractArray{T,2}}}) at linalg/qr.jl:253
  • *(A::Union{Hermitian{T,S},Symmetric{T,S}}, B::Union{Hermitian{T,S},Symmetric{T,S}}) at linalg/symmetric.jl:117
  • *(A::Union{DenseArray{T,2}, SubArray{T,2, A<:DenseArray{T,N}, I<:Tuple{Vararg{Union{Colon,Int64,Range{Int64}}}},LD}}, B::Union{Hermitian{T,S},Symmetric{T,S}}) at linalg/symmetric.jl:118
  • *{T<:Number}(D::Diagonal{T}, x::T<:Number) at linalg/diagonal.jl:90
  • *(Da::Diagonal{T}, Db::Diagonal{T}) at linalg/diagonal.jl:92
  • *(D::Diagonal{T}, V::Array{T,1}) at linalg/diagonal.jl:93
  • *(A::Array{T,2}, D::Diagonal{T}) at linalg/diagonal.jl:94
  • *(D::Diagonal{T}, A::Array{T,2}) at linalg/diagonal.jl:95
  • *(A::Bidiagonal{T}, B::Number) at linalg/bidiag.jl:192
  • *(A::Union{Base.LinAlg.AbstractTriangular{T,S<:AbstractArray{T,2}}, Bidiagonal{T}, Diagonal{T}, SymTridiagonal{T}, Tridiagonal{T}}, B::Union{Base.LinAlg.AbstractTriangular{T, S<:AbstractArray{T,2}}, Bidiagonal{T},Diagonal{T}, SymTridiagonal{T},Tridiagonal{T}}) at linalg/bidiag.jl:198
  • *{T}(A::Bidiagonal{T}, B::AbstractArray{T,1}) at linalg/bidiag.jl:202
  • *(B::BitArray{2}, J::UniformScaling{T<:Number}) at linalg/uniformscaling.jl:122
  • *{T,S}(s::Base.LinAlg.SVDOperator{T,S}, v::Array{T,1}) at linalg/arnoldi.jl:261
  • *(S::SparseMatrixCSC{Tv,Ti<:Integer}, J::UniformScaling{T<:Number}) at sparse/linalg.jl:23
  • *{Tv,Ti}(A::SparseMatrixCSC{Tv,Ti}, B::SparseMatrixCSC{Tv,Ti}) at sparse/linalg.jl:108
  • *{TvA,TiA,TvB,TiB}(A::SparseMatrixCSC{TvA,TiA}, B::SparseMatrixCSC{TvB,TiB}) at sparse/linalg.jl:29
  • *{TX,TvA,TiA}(X:: Union{DenseArray{TX,2}, SubArray{TX,2,A<:DenseArray{T,N}, I<:Tuple{Vararg{Union{Colon,Int64,Range{Int64}}}},LD}}, A::SparseMatrixCSC{TvA,TiA}) at sparse/linalg.jl:94
  • *(A::Base.SparseMatrix.CHOLMOD.Sparse {Tv<:Union{Complex{Float64},Float64}}, B:: Base.SparseMatrix.CHOLMOD.Sparse{Tv<:Union{Complex{Float64}, Float64}}) at sparse/cholmod.jl:1157
  • *(A::Base.SparseMatrix.CHOLMOD.Sparse {Tv<:Union{Complex{Float64},Float64}}, B:: Base.SparseMatrix.CHOLMOD.Dense {T<:Union{Complex{Float64},Float64}}) at sparse/cholmod.jl:1158
  • *(A::Base.SparseMatrix.CHOLMOD.Sparse {Tv<:Union{Complex{Float64},Float64}}, B:: Union{Array{T,1},Array{T,2}}) at sparse/cholmod.jl:1159
  • *{Ti}(A::Symmetric{Float64,SparseMatrixCSC{Float64,Ti}}, B::SparseMatrixCSC{Float64,Ti}) at sparse/cholmod.jl:1418
  • *{Ti}(A::Hermitian{Complex{Float64}, SparseMatrixCSC{Complex{Float64},Ti}}, B::SparseMatrixCSC{Complex{Float64},Ti}) at sparse/cholmod.jl:1419
  • *{T<:Number}(x::AbstractArray{T<:Number,2}) at abstractarraymath.jl:50
  • *(B::Number, A::SymTridiagonal{T}) at linalg/tridiag.jl:87
  • *(B::Number, A::Tridiagonal{T}) at linalg/tridiag.jl:407
  • *(x::Number, A::UpperTriangular{T,S<:AbstractArray{T,2}}) at linalg/triangular.jl:464
  • *(x::Number, A::Base.LinAlg.UnitUpperTriangular{T, S<:AbstractArray{T,2}}) at linalg/triangular.jl:467
  • *(x::Number, A::LowerTriangular{T,S<:AbstractArray{T,2}}) at linalg/triangular.jl:464
  • *(x::Number, A::Base.LinAlg.UnitLowerTriangular{T, S<:AbstractArray{T,2}}) at linalg/triangular.jl:467
  • *(B::Number, A::Bidiagonal{T}) at linalg/bidiag.jl:193
  • *(A::Number, B::AbstractArray{T,N}) at abstractarraymath.jl:54
  • *(A::AbstractArray{T,N}, B::Number) at abstractarraymath.jl:55
  • *(s1::AbstractString, ss::AbstractString…) at strings/basic.jl:50
  • *(this::Base.Grisu.Float, other::Base.Grisu.Float) at grisu/float.jl:138
  • *(index::CartesianIndex{N}, a::Integer) at multidimensional.jl:54
  • *{T,S}(A::AbstractArray{T,2}, B::Union{DenseArray{S,2}, SubArray{S,2,A<:DenseArray{T,N}, I<:Tuple{Vararg{Union{Colon,Int64,Range{Int64}}}},LD}}) at linalg/matmul.jl:131
  • *{T,S}(A::AbstractArray{T,2}, x::AbstractArray{S,1}) at linalg/matmul.jl:86
  • *(A::AbstractArray{T,1}, B::AbstractArray{T,2}) at linalg/matmul.jl:89
  • *(J1::UniformScaling{T<:Number}, J2::UniformScaling{T<:Number}) at linalg/uniformscaling.jl:121
  • *(J::UniformScaling{T<:Number}, B::BitArray{2}) at linalg/uniformscaling.jl:123
  • *(A::AbstractArray{T,2}, J::UniformScaling{T<:Number}) at linalg/uniformscaling.jl:124
  • *{Tv,Ti}(J::UniformScaling{T<:Number}, S::SparseMatrixCSC{Tv,Ti}) at sparse/linalg.jl:24
  • *(J::UniformScaling{T<:Number}, A::Union{AbstractArray{T,1},AbstractArray{T,2}}) at linalg/uniformscaling.jl:125
  • *(x::Number, J::UniformScaling{T<:Number}) at linalg/uniformscaling.jl:127
  • *(J::UniformScaling{T<:Number}, x::Number) at linalg/uniformscaling.jl:128
  • *{T,S}(R::Base.LinAlg.AbstractRotation{T}, A::Union{AbstractArray{S,1},AbstractArray{S,2}}) at linalg/givens.jl:9
  • *{T}(G1::Base.LinAlg.Givens{T}, G2::Base.LinAlg.Givens{T}) at linalg/givens.jl:307
  • *(p::Base.DFT.ScaledPlan{T,P,N}, x::AbstractArray{T,N}) at dft.jl:262
  • *{T,K,N}(p::Base.DFT.FFTW.cFFTWPlan{T,K,false,N}, x::Union{DenseArray{T,N}, SubArray{T,N,A<:DenseArray{T,N}, I<:Tuple{Vararg{Union{Colon,Int64,Range{Int64}}}},LD}}) at fft/FFTW.jl:621
  • *{T,K}(p::Base.DFT.FFTW.cFFTWPlan{T,K,true,N}, x::Union{DenseArray{T,N}, SubArray{T,N,A<:DenseArray{T,N}, I<:Tuple{Vararg{Union{Colon,Int64,Range{Int64}}}},LD}}) at fft/FFTW.jl:628
  • *{N}(p::Base.DFT.FFTW.rFFTWPlan{Float32,-1,false,N}, x:: Union{DenseArray{Float32,N}, SubArray{Float32,N,A<:DenseArray{T,N}, I<:Tuple{Vararg{Union{Colon,Int64,Range{Int64}}}},LD}}) at fft/FFTW.jl:698
  • *{N}(p::Base.DFT.FFTW.rFFTWPlan{Complex{Float32},1,false,N}, x::Union{DenseArray{Complex{Float32},N}, SubArray{Complex{Float32},N,A<:DenseArray{T,N}, I<:Tuple{Vararg{Union{Colon,Int64,Range{Int64}}}},LD}}) at fft/FFTW.jl:705
  • *{N}(p::Base.DFT.FFTW.rFFTWPlan{Float64,-1,false,N}, x:: Union{DenseArray{Float64,N}, SubArray{Float64,N,A<:DenseArray{T,N}, I<:Tuple{Vararg{Union{Colon,Int64,Range{Int64}}}},LD}}) at fft/FFTW.jl:698
  • *{N}(p::Base.DFT.FFTW.rFFTWPlan{Complex{Float64},1,false,N}, x:: Union{DenseArray{Complex{Float64},N}, SubArray{Complex{Float64},N,A<:DenseArray{T,N}, I<:Tuple{Vararg{Union{Colon,Int64,Range{Int64}}}},LD}}) at fft/FFTW.jl:705
  • *{T,K,N}(p::Base.DFT.FFTW.r2rFFTWPlan{T,K,false,N}, x:: Union{DenseArray{T,N}, SubArray{T,N,A<:DenseArray{T,N}, I<:Tuple{Vararg{Union{Colon,Int64,Range{Int64}}}},LD}}) at fft/FFTW.jl:866
  • *{T,K}(p::Base.DFT.FFTW.r2rFFTWPlan{T,K,true,N}, x:: Union{DenseArray{T,N}, SubArray{T,N,A<:DenseArray{T,N}, I<:Tuple{Vararg{Union{Colon,Int64,Range{Int64}}}},LD}}) at fft/FFTW.jl:873
  • *{T}(p::Base.DFT.FFTW.DCTPlan{T,5,false}, x:: Union{DenseArray{T,N}, SubArray{T,N,A<:DenseArray{T,N}, I<:Tuple{Vararg{Union{Colon,Int64,Range{Int64}}}},LD}}) at fft/dct.jl:188
  • *{T}(p::Base.DFT.FFTW.DCTPlan{T,4,false}, x::Union{DenseArray{T,N}, SubArray{T,N,A<:DenseArray{T,N}, I<:Tuple{Vararg{Union{Colon,Int64,Range{Int64}}}},LD}}) at fft/dct.jl:191
  • *{T,K}(p::Base.DFT.FFTW.DCTPlan{T,K,true}, x::Union{DenseArray{T,N}, SubArray{T,N,A<:DenseArray{T,N}, I<:Tuple{Vararg{Union{Colon,Int64,Range{Int64}}}},LD}}) at fft/dct.jl:194
  • *{T}(p::Base.DFT.Plan{T}, x::AbstractArray{T,N}) at dft.jl:221
  • *(?::Number, p::Base.DFT.Plan{T}) at dft.jl:264
  • *(p::Base.DFT.Plan{T}, ?::Number) at dft.jl:265
  • *(I::UniformScaling{T<:Number}, p::Base.DFT.ScaledPlan{T,P,N}) at dft.jl:266
  • *(p::Base.DFT.ScaledPlan{T,P,N}, I::UniformScaling{T<:Number}) at dft.jl:267
  • *(I::UniformScaling{T<:Number}, p::Base.DFT.Plan{T}) at dft.jl:268
  • *(p::Base.DFT.Plan{T}, I::UniformScaling{T<:Number}) at dft.jl:269
  • *{P<:Base.Dates.Period} (x::P<:Base.Dates.Period, y::Real) at dates/periods.jl:54
  • *(a, b, c, xs…) at operators.jl:103

Before Julia came along, the best-known example of multiple dispatch was the Common Lisp Object System, or CLOS. As a Lisp guy, I’m familiar with CLOS, but I’ve seldom found it useful; it’s too heavy a hammer for most of my little nails. But whereas CLOS is an optional add-on to Lisp, multiple dispatch is deeply embedded in the infrastructure of Julia. You can’t really ignore it.

Multiple dispatch encourages a style of programming in which you write an abundance of short, single-purpose methods that handle specific combinations of argument types. This is surely an improvement over writing one huge function with an internal tangle of spaghetti logic to handle dozens special cases. The Julia documentation offers this example:

# roll-your-own method dispatch

function norm(A)
    if isa(A, Vector)
        return sqrt(real(dot(A,A)))
    elseif isa(A, Matrix)
        return max(svd(A)[2])
        error("norm: invalid argument")

# let the compiler do method dispatch

norm(x::Vector) = sqrt(real(dot(x,x)))
norm(A::Matrix) = max(svd(A)[2])

# Note: The 'x::y syntax declares that variable x will have
# a value of type y.

Splitting the procedure into two methods makes the definition clearer and more concise, and it also promises better performance. There’s no need to crawl through the if... else... chain at runtime; the appropriate method is chosen at compile time.

That example seems pretty compelling. Edelman gives another illuminating demo, defining tropical arithmetic in a few lines of code. (In tropical arithmetic, plus is min and times is plus.)

Unfortunately, my own attempts to structure code in this way have not been so successful. Take the next_prime(x) function shown at the beginning of this notebook. The argument x might be either an ordinary integer (some subtype of Int, such as Int64) or a BigInt, a numeric type accommodating integers of unbounded size. So I tried writing separate methods next_prime(x::Int) and next_prime(x::BigInt). The result, however, was annoying code duplication: The bodies of those two methods were identical. Furthermore, splitting the two methods yielded no performance gain; the compiler was able to produce efficient code without any type annotations at all.

I remain somewhat befuddled about how best to exploit multiple dispatch. Suppose you are creating a graphics program that can plot either data from an array or a mathematical function. You could define a generic plot procedure with two methods:

function plot(data::Array)
    # ... code for array plotting

function plot(fn::Function)
    # ... code for function plotting

With these definitions, calls to plot([2, 4, 6, 8]) and plot(sin) would automatically be routed to the appropriate method. However, making the two methods totally independent is not quite right either. They both need to deal with scales, grids, tickmarks, labels, and other graphics accoutrements. I’m still learning how to structure such code. No doubt it will become clearer with experience; in the meantime, advice is welcome.

Multiple dispatch is not a general pattern-matching mechanism. It discriminates only on the basis of the type signature of a function’s arguments, not on the values of those arguments. In Haskell—a language that does have pattern matching—you can write a function definition like this:

factorial 0 = 1
factorial n = n * factorial (n - 1)

That won’t work in Julia. (Actually, there may be a baroque way to do it, but it’s not recommended.) (And I’ve just learned there’s a pattern-matching package.) (And another.)

Multiple dispatch vs. OOP

In CLOS, multiple dispatch is an adjunct to an object-oriented programming system; in Julia, multiple dispatch is an alternative to OOP, and perhaps a repudiation of it. OOP is a noun-centered protocol: Objects are things that own both data and procedures that act on the data. Julia is verb-centered: Procedures (or functions) are the active elements of a program, and they can be applied to any data, not just their own private variables. The Julia documentation notes:

Multiple dispatch is particularly useful for mathematical code, where it makes little sense to artificially deem the operations to “belong” to one argument more than any of the others: does the addition operation in x + y belong to x any more than it does to y? The implementation of a mathematical operator generally depends on the types of all of its arguments. Even beyond mathematical operations, however, multiple dispatch ends up being a powerful and convenient paradigm for structuring and organizing programs.

I agree about the peculiar asymmetry of asking x to add y to itself. And yet that’s only one small aspect of what object-oriented programming is all about. There are many occasions when it’s really handy to bundle up related code and data in an object, if only for the sake of tidyness. If I had been writing my primes program in an object-oriented language, I would have created a class of objects to hold various properties of a sequence of consecutive primes, and then instantiated that class for each sequence I generated. Some of the object fields would have been simple data, such as the length of the sequence. Others would have been procedures for calculating more elaborate statistics. Some of the data and procedures would have been private—accessible only from within the object.

No such facility exists in Julia, but there are other ways of addressing some of the same needs. Julia’s user-defined composite types look very much like objects in some respects; they even use the same dot notation for field access that you see in Java, JavaScript and Python. A type definition looks like this.

type Location

Given a variable of this type, such as Paris = Location(48.8566, 2.3522), you can access the two fields as Paris.latitude and Paris.longitude. You can even have a field of type Function inside a composite type, as in this declaration:

type Location

Then, if you store an appropriate function in that field, you can invoke it as Paris.distance_to_pole(). However, the function has no special access to the other fields of the type; in particular, it doesn’t know which Location it’s being called from, so it doesn’t work like an OOP method.

Julia also has modules, which are essentially private namespaces. Only variables that are explicitly marked as exported can be seen from outside a module. A module can encapsulate both code and data, and so it is probably the closest approach to objects as a mechanism of program organization. But there’s still nothing like the this or self keyword of true object-oriented languages.

Is it brave or foolish, at this point in the history of computer science, to turn your back on the object-oriented paradigm and try something else? It’s surely bold. Two or three generations of programmers have grown up on a steady diet OOP. On the other hand, we have not yet found the One True Path to software wisdom, so there’s surely room for exploring other corners of the ecosystem.


A language for “technical computing” had better have strong numerical abilities. Julia has the full gamut of numeric types, with integers of various fixed sizes (from 8 bits to 128 bits), both signed and unsigned, and a comparable spectrum of floating-point precisions. There are also BigInts and BigFloats that expand to accommodate numbers of any size (up to the bounds of available memory). And you can do arithmetic with complex numbers and exact rationals.

This is a well-stocked toolkit, suitable for many kinds of computations. However, the emphasis is on fast floating-point arithmetic with real or complex values, approximated at machine precision. If you want to work in number theory, combinatorics, or cryptography—fields that require exact integers and rational numbers of unbounded size—the tools are provided, but using them requires some extra contortions.

To my taste, the most elegant implementation of numeric types is found in the Scheme programming language, a dialect of Lisp. The Scheme numeric “tower” begins with integers at its base. The integers are a subset of the rational numbers, which in turn are a subset of the reals, which are a subset of the complex numbers. Alongside this hierarchy, and orthogonal to it, Scheme also distinguishes between exact and inexact numbers. Integers and rationals can be represented exactly in a computer system, but many real and complex values can only be approximated, usually by floating-point numbers. In doing basic arithmetic, the Scheme interpreter or compiler preserves exactness. This is easy when you’re adding, subtracting, or multiplying integers, since the result is also invariably an integer. With division of integers, Scheme returns an integer when possible (e.g., \(15 \div 3 = 5\)) and otherwise an exact rational (\(4 \div 6 = 2/3\)). Some Schemes go a step further and return exact results for operations such as square root, when it’s possible to do so (\(\sqrt{4/9} = 2/3\); but \(\sqrt{5} = 2.236068\); the latter is inexact). These are numbers you can count on; they mostly obey well-known principles of mathematics, such as \(\frac{m}{n} \times \frac{n}{m} = 1\).

Julia has the same tower of numeric types, but it is not so scrupulous about exact and inexact values. Consider this series of results:

15 / 3 ⇒ 5.0

4 / 6 ⇒ 0.6666666666666666

(15 / 17) * (17 / 15) ⇒ 0.9999999999999999

Even when the result of dividing two integers is another integer, Julia converts the quotient to a floating-point value (signaled by the presence of a decimal point). We also get floats instead of exact rationals for noninteger quotients. Because exactness is lost, mathematical identities become unreliable.

I think I know why the Julia designers chose this path. It’s all about type stability. The compiler can produce faster code if the output type of a method is always the same. If division of integers could yield either an integer or a float, the result type would not be known until runtime. Using exact rationals rather than floats would impose a double penalty: Rational arithmetic is much slower than (hardware-assisted) floating point.

Given the priorities of the Julia project, consistent floating-point quotients were probably the right choice. I’ll concede that point, but it still leaves me grumpy. I’m tempted to ask, “Why not just make all numbers floating point, the way JavaScript does?” Then all arithmetic operations would be type stable by default. (This is not a serious proposal. I deplore JavaScript’s all-float policy.)

Julia does offer the tools to build your own procedures for exact arithmetic. Here’s one way to conquer divide:

function xdiv(x::Int, y::Int)
    q = x // y                 # yields rational quotient
    den(q) == 1 ? num(q) : q   # return int if possible

xdiv(15, 5) ⇒ 3

xdiv(15, 9) ⇒ 5//3

My Int Overfloweth

Another numerical quirk gives me the heebie-jeebies. Let’s look again at one of those factorial functions defined above (it doesn’t matter which one).

# run the factorial function f on integers from 1 to limit

function test_factorial(f::Function, limit::Int)
    for n in 1:limit
        @printf("n = %d, f(n) = %d\n", n, f(n))

test_factorial(factorial_range, 25) ⇒

n = 1, f(n) = 1
n = 2, f(n) = 2
n = 3, f(n) = 6
n = 4, f(n) = 24
n = 5, f(n) = 120
n = 6, f(n) = 720
n = 7, f(n) = 5040
n = 8, f(n) = 40320
n = 9, f(n) = 362880
n = 10, f(n) = 3628800
n = 11, f(n) = 39916800
n = 12, f(n) = 479001600
n = 13, f(n) = 6227020800
n = 14, f(n) = 87178291200
n = 15, f(n) = 1307674368000
n = 16, f(n) = 20922789888000
n = 17, f(n) = 355687428096000
n = 18, f(n) = 6402373705728000
n = 19, f(n) = 121645100408832000
n = 20, f(n) = 2432902008176640000
n = 21, f(n) = -4249290049419214848
n = 22, f(n) = -1250660718674968576
n = 23, f(n) = 8128291617894825984
n = 24, f(n) = -7835185981329244160
n = 25, f(n) = 7034535277573963776

Everything is copacetic through \(n = 20\), and then suddenly we enter an alternative universe where multiplying a bunch of positive integers gives a negative result. You can probably guess what’s happening here. The computation is being done with 64-bit, twos-complement signed integers, with the most-significant bit representing the sign. When the magnitude of the number exceeds \(2^{63} - 1\), the bit pattern is interpreted as a negative value; then, at \(2^{64}\), it crosses zero and becomes positive again. Essentially, we’re doing arithmetic modulo \(2^{64}\), with an offset.

When a number grows beyond the space allotted for it, something has to give. In some languages the overflowing integer is gracefully converted to a bignum format, so that it can keep growing without constraint. Most Lisps are in this family; so is Python. JavaScript, with its all-floating-point arithmetic, gives approximate answers beyond \(20!\), and beyond \(170!\) all results are deemed equal to Infinity. In several other languages, programs halt with an error message on integer overflow. C is one language that has the same wraparound behavior as Julia. (Actually, the C standard says that signed-integer overflow is “undefined,” which means the compiler can do anything it pleases; but the C compiler I just tested does a wraparound, and I think that’s the common policy.)

The Julia documentation includes a thorough and thoughtful discussion of integer overflow, defending the wraparound strategy by showing that all the alternatives are unacceptable for one reason or another. But even if you go along with that conclusion, you might still feel that wraparound is also unacceptable.

Casual disregard of integer overflow has produced some notably stinky bugs, causing everything from video-game glitches in Pac Man and Donkey Kong to the failure of the first Ariane 5 rocket launch. Dietz, Li, Regehr, and Adve have surveyed bugs of this kind in C programs, finding them even in a module called SafeInt that was specifically designed to protect against such errors. In my little prime-counting project with Julia, I stumbled over overflow problems several times, even after I understood exactly where the risks lay.

In certain other contexts Julia doesn’t play quite so fast and loose. It does runtime bounds checking on array references, throwing an error if you ask for element five of a four-element vector. But it also provides a macro (@inbounds) that allows self-confident daredevils to turn off this safeguard for the sake of speed. Perhaps someday there will be a similar option for integer overflows.

Until then, it’s up to us to write in any overflow checks we think might be prudent. The Julia developers themselves have done so in many places, including their built-in factorial function. See error message below.

test_factorial(factorial, 25) ⇒

n = 1, f(n) = 1
n = 2, f(n) = 2
n = 3, f(n) = 6
n = 4, f(n) = 24
n = 5, f(n) = 120
n = 6, f(n) = 720
n = 7, f(n) = 5040
n = 8, f(n) = 40320
n = 9, f(n) = 362880
n = 10, f(n) = 3628800
n = 11, f(n) = 39916800
n = 12, f(n) = 479001600
n = 13, f(n) = 6227020800
n = 14, f(n) = 87178291200
n = 15, f(n) = 1307674368000
n = 16, f(n) = 20922789888000
n = 17, f(n) = 355687428096000
n = 18, f(n) = 6402373705728000
n = 19, f(n) = 121645100408832000
n = 20, f(n) = 2432902008176640000

LoadError: OverflowError()
while loading In[19], in expression starting on line 1

 in factorial_lookup at combinatorics.jl:29
 in factorial at combinatorics.jl:37
 in test_factorial at In[18]:5

Julia is gaining traction in scientific computing, in finance, and other fields; it has even been adopted as the specification language for an aircraft collision-avoidance system. I do hope that everyone is being careful.

Batteries Installed

“Batteries included” is a slogan of the Python community, boasting that everything you need is provided in the box. And indeed Python comes with a large standard library, supplemented by a huge variety of user-contributed modules (the PyPI index lists about 85,000 of them). On the other hand, although batteries of many kinds are included, they are not installed by default. You can’t accomplish much in Python without first importing a few modules. Just to take a square root, you have to import math. If you want complex numbers, they’re in a different module, and rationals are in a third.

In contrast, Julia is refreshingly self-contained. The standard library has a wide range of functions, and they’re all immediately available, without tracking down and loading external files. There are also about a thousand contributed packages, but you can do quite a lot of useful computing without ever glancing at them. In this respect Julia reminds me of Mathematica, which also tries to put all the commonly needed tools at your fingertips.

A Julia sampler: you get sqrt(x) of course. Also cbrt(x) and hypot(x, y) and log(x) and exp(x). There’s a full deck of trig functions, both circular and hyperbolic, with a variant subset that take their arguments in degrees rather than radians. For combinatorialists we have the factorial function mentioned above, as well as binomial, and various kinds of permutations and combinations. Number theorists can test numbers for primality, and factor those that are composite, calculate gcds and lcms, and so on. Farther afield we have beta, gamma, digamma, eta, and zeta functions, Airy functions, and a dozen kinds of Bessel functions.

For me Julia is like a well-equipped playground, where I can scamper from the swings to the teeter-totter to the monkey bars. At the recent JuliaCon Stefan Karpinsky mentioned a plan to move some of these functions out of the automatically loaded Base module into external libraries. My request: Please don’t overdo it.

Incremental Programming

Julia works splendidly as a platform for incremental, interactive, exploratory programming. There’s no need to write and compile a program as a monolithic structure, with a main function as entry point. You can write a single procedure, immediately compile it and run it, test it, modify it, and examine the assembly code generated by the compiler. It’s ad hack programming at its best.

To support this style of work and play, Julia comes with a built-in REPL, or read-eval-print loop, that operates from a command-line interface. Type an expression at the prompt, press enter, and the system executes the code (compiling if necessary); the value is printed, followed by a new prompt for the next command:

A snippet from the Julia REPL

The REPL is pretty good, but I prefer working in a Jupyter notebook, which runs in a web browser. There’s also a development environment called Juno, but I haven’t yet done much with it.

An incremental and iterative style of development can be a challenge for the compiler. When you process an entire program in one piece, the compiler starts from a blank slate. When you compile individual functions, which need to interact with functions compiled earlier, it’s all too easy for the various parts of the system to get out of sync. Gears may fail to mesh.

Problems of this kind can come up with any language, but multiple dispatch introduces some unique quirks. Suppose you’re writing Julia code in a Jupyter notebook. You write and compile a function f(x), which the system accepts as a “generic function with one method.” Later you realize that f should have been a function of two arguments, and so you go back to edit the definition. The modified source code now reads f(x, y). In most languages, this new definition of f would replace the old one, which would thereafter cease to exist. But in Julia you have haven’t replaced anything; instead you now have a “generic function with 2 methods.” The first definition is still present and active in the internal workspace, and so you can call both f(x) and f(x, y). However, the source code for f(x) is nowhere to be found in the notebook. Thus if you end the Jupyter session and later reload the file, any calls to f(x) will produce an error message.

A similar but subtler problem with redefinitions has been a subject of active discussion among the Julia developers for almost five years. There’s hope for a fix in the next major release.

If redefining a function during an interactive session can lead to confusion, trying to redefine a custom type runs into an impassable barrier. Types are immutable. Once defined, they can’t be modified in any way. If you create a type T with two fields a and b, then later decide you also need a field c, you’re stuck. The only way to redefine the type is to shut down the session and restart, losing all computations completed so far. How annoying. There’s a workaround: If you put the type definition inside a module, you can simply reload the module.


Sometimes it’s the little things that arouse the most heated controversy.

In counting things, Julia generally starts with 1. A vector with n elements is indexed from 1 through n. This is also the default convention in Fortran, MATLAB, Mathematica, and a few other languages; the C family, Python, and JavaScript count from 0 through n-1. The wisdom of 1-based indexing has been a subject of long and spirited debate on the Julia issues forum, on the developer mailing list, and on StackOverflow.

Programs where the choice of indexing origin makes any difference are probably rare, but as it happens my primes study was one of them. I was looking at primes reduced modulo \(m\), and counting the number of primes in each of the \(m\) possible residue classes. Since the residues range from \(0\) to \(m - 1\), it would have been convenient to store the counts in an array with those indices. However, I did not find it terribly onerous to add \(1\) to each index.

Blessedly, the indexing war may soon be over. Tim Holy, a biologist and intrepid Julia contributor, discovered a way to give users free choice of indexing with only a few lines of code and little cost in efficiency. An experimental version of this new mechanism is in the 0.5 developer release.

After the indexing truce, we can still fight over the storage order for arrays and matrices. Julia again follows the precedent of Fortran and MATLAB, reading arrays in column-major order. Given the matrix

x_{11} & x_{12} & x_{13} \\
x_{21} & x_{22} & x_{23} \\

Julia will serialize the elements as \(x_{11}, x_{21}, x_{12}, x_{22}, x_{13}, x_{23}\). C and many other recent languages read across the rows rather than down the columns. This is another largely arbitrary convention, although it may be of some practical importance that most graphic formats are row-major.

Finally there’s the pressing question of what symbol to choose for the string-concatenation operator. Where Python has "abc" + "def" ⇒ "abcdef", Julia does "abc" * "def" ⇒ "abcdef". The reason cited is pretty doctrinaire: In abstract algebras, ‘+’ generally denotes a commutative operation, and string concatenation is not commutative. If you’d like to know more, see the extensive, long-running discussion.


The official online documentation is reasonably complete, consistent, and clearly written, but it gives too few examples. Moreover, the organization is sometimes hard to fathom. For example, there’s a chapter on “Multidimensional Arrays” and another titled “Arrays”; I’m often unsure where to look first.

Some of the contributed packages have quite sparse documentation. I made heavy use of a graphics package called GadFly, which is a Julia version of the R package called ggplot2. I had a hard time getting started with the supplied documentation, until I realized that GadFly is close enough to ggplot2 that I could rely on the manual for the latter program.

The simple advice when you’re stumped is: Read the source, Luke. It’s all readily available and easily searched. And almost everything is written in Julia, so you’ll not only find the answers to your questions but also see instructive examples of idiomatic usage.

A few other online resources:

As far as I know, Julia has no standard or specification document, which is an interesting absence. For a long time, programming languages had rather careful, formal descriptions of syntax and semantics, laying out exactly which expressions were legal and meaningful in a program. Sometimes the standard came first, before anyone tried to write a compiler or interpreter; Algol 60 was the premier example of this discipline. Sometimes the standard came after, as a device to reconcile diverging implementations; this was the situation with Lisp and C. For now, at least, Julia is apparently defined only by its evolving implementation. Where’s the BNF?

The Future

The rise and fall of programming languages seems to be governed by a highly nonlinear dynamic. Why is Python so popular? Because it’s so popular! With millions of users, there are more add-on packages, more books and articles, more conferences, more showcase applications, more jobs for Python programmers, more classes where Python is the language of instruction. Success breeds success. It’s a bandwagon effect.

If this rich-get-richer economy were the whole story, the world would have only one viable programming language. But there’s a price to be paid for popularity: The overcrowded bandwagon gets trapped in its own wheel ruts and becomes impossible to steer. Think of those 85,000 Python packages. If an improvement in the core language is incompatible with too much of that existing code, the change can be made only with great effort. For example, persuading Python users to migrate from version 2 to version 3 has taken eight years, and the transition is not complete yet.

Julia is approaching the moment when these contrary imperatives—to gain momentum but to stay maneuverable—both become powerful. At the recent JuliaCon, Stefan Karpinsky gave a talk on the path to version 1.0 and beyond. The audience listened with the kind of studious attention that Janet Yellin gets when she announces Fed policy on inflation and interest rates. Toward the end, Karpinsky unveiled the plan: to release version 1.0 within a year, by the time of the 2017 JuliaCon. Cheers and applause.

The 1.0 label connotes maturity—or at least adulthood. It’s a signal that the language is out of school and ready for work in the real-world. And thus begins the struggle to gain momentum while continuing to innovate. For members of Julia’s developer group, there might be a parallel conflict between making a living and continuing to have fun. I’m fervently hoping all these tensions can be resolved.

Will Julia attract enough interest to grow and thrive? I feel I can best address that question by turning it back on myself. Will I continue to write code in Julia?

I’ve said that I do exploratory programming, but I also do expository programming. Computational experiments are an essential part of my work as a science writer. They help me understand what I’m writing about, and they also aid in the storytelling process, through simulations, computer-generated illustrations, and interactive gadgets. Although only a small subset of my readers ever glance at the code I publish, I want to maximize their number and do all I can to deepen their engagement—enticing them to read the code, run it, modify it, and ideally build on it. At this moment, Julia is not the most obvious choice for reaching a broad audience of computational dabblers, but I think it has considerable promise. In some respects, getting started with Julia is easier than it is with Python, if only because there’s less prior art—less to install and configure. I expect to continue working in other languages as well, but I do want to try some more Julia experiments.

Posted in computing | 12 Comments