Bertrand Russell, Donald Trump, and Archimedes

A habit of finding pleasure in thought rather than in action is a safeguard against unwisdom and excessive love of power, a means of preserving serenity in misfortune and peace of mind among worries. A life confined to what is personal is likely, sooner or later, to become unbearably painful; it is only by windows into a larger and less fretful cosmos that the more tragic parts of life become endurable.

—Bertrand Russell, “Useless” Knowledge, 1935

For Russell, mathematics was one of those windows opening on a calmer universe. So it is for me too, and for many others. When you are absorbed in solving a problem, understanding a theorem, or writing a computer program, the world’s noisy bickering is magically muted. For a little while, at least, you can hold back life’s conflicts, heartaches, and disappointments.

But Russell was no self-absorbed savant, standing aloof from the issues of his time. On the contrary, he was deeply engaged in public discourse. During World War I he took a pacifist position (and went to prison for it), and he continued to speak his peace into the Vietnam era. is meant to be a little corner of Russell’s less fretful cosmos, both for me and, I hope, for my readers. In this space I would prefer to shut out the clamor of the hustings and the marketplace. And yet there comes a time to look up from bit-playing and listen to what’s going on outside the window.

A candidate for the U.S. presidency is goading his followers to murder his opponent. Here are his words (in the New York Times transcription):

Hillary wants to abolish—essentially abolish—the Second Amendment. By the way, and if she gets to pick—if she gets to pick her judges, nothing you can do folks. Although the Second Amendment people—maybe there is, I don’t know.

A day later, Donald Trump said he was merely suggesting that gun owners might be roused to come out and vote, not that they might assassinate a president. Yeah, sure. And when Henry II of England mused, “Who will rid me of this troublesome priest?” he was just asking an idle question. But soon enough Thomas Becket was hacked to death on the floor of Canterbury Cathedral.

This is not the first time Trump has strayed beyond tasteless buffoonery into reckless incitement. But this instance is so egregious I just cannot keep quiet. His words are vile and dangerous. I have to speak out against them. We face a threat to the survival of democracy and civil society.

Mathematics, after all, is one of those luxuries we can afford only so long as the thugs do not come crashing through the door. Another great mathematician offers a lesson here, in a tale told by Plutarch. During the sack of Syracuse, according to one version of the legend, Archimedes was puzzling out a mathemetical problem. He was staring at a diagram sketched in the sand when Roman soldiers came upon him. Deep in thought, he refused to turn away from his work until he had finished the proof of his theorem. One of the soldiers drew a sword and ran him through.

Death of Archimedes. Engraving after a painting by Gustave Courtois (1853-1923).Death of archimedes courtois 075dpi

Posted in off-topic | 2 Comments

The 39th Root of 92

My new office space suffers from a shortage of photons, so I’ve been wiring up light fixtures. As I was snaking 14-gauge cable through the ceiling cavity, I began to wonder: stripped end of 14/2 Romex cable: two insulated 14-gauge conductors with bare ground wireWhy is it called 14-gauge? I know that the gauge specifies the size of the copper conductors, but how exactly? The number can’t be a simple measure of diameter or cross-sectional area, because thicker wires have smaller gauge numbers. Twelve-gauge is heavier than 14-gauge, and 10-gauge is even beefier. Going in the other direction, larger numbers denote skinnier wires: 20-gauge for doorbells and thermostats, 24-gauge for telephone wiring.

Why do the numbers run backwards? Could there be a connection with shotguns, whose sizes also seem to go the wrong way? A 20-gauge shotgun has a smaller bore than a 12-gauge, which in turn is smaller than a 10-gauge gun. Mere coincidence?

Answers are not hard to find. The Wikipedia article on American Wire Gauge (AWG) is a good place to start. And there’s a surprising bit of mathematical fun along the way. It turns out that American wire sizes make essential use of the 39th root of 92, a somewhat frillier number than I would have expected to find in this workaday, blue-collar context.

Wire is made by pulling a metal rod through a die—a block of hard material with a hole in it. In cross section, the hole is shaped something like a rocket nozzle, with conical walls that taper down to a narrow throat.Sketch of a wire drawing die, with copper wire moving left to right through a round aperture with a double-cone profile. As the rod passes through the die, the metal deforms plastically, reducing the diameter while increasing the length. But there’s a limit to this squeezing and stretching; you can’t transform a short, fat rod into a long, thin wire all in one go. On each pass through a die, the diameter is only slightly reduced—maybe by 10 percent or so. To make a fine wire, you need to shrink the thickness in stages, drawing the wire through several dies in succession. And therein lies the key to wire gauge numbers: The gauge of a wire is the number of dies it must pass through to reach its final diameter. Zero-gauge is the thickness of the original rod, without any drawing operations. Fourteen-gauge wire has been pulled through 14 dies in series.

Or at least that was how it worked back when wire-drawing was a hand craft, and nobody worried too much about exact specifications. If two wires had both been pulled through 14 dies, they would both be labeled 14-gauge, but they might well have different diameters if the dies were not identical. By the middle of the 19th century this sort of variation was becoming troublesome; it was time to adopt some standards.

The AWG standard keeps the traditional sequence of gauge numbers but changes their meaning. The gauge is no longer a count of drawing operations; instead each gauge number corresponds to a specific wire diameter. Even so, there’s an effort to keep the new standardized sizes reasonably close to what they were under the old die-counting system.

Wire gauge PSF WikipediaCredit: WikipediaThe mapping from gauge numbers to diameters has two fixed reference points. At the thin end of the scale, \(36\)-gauge wire is defined as having a diameter of exactly \(0.005\) inch. At the stout end, a wire size designated \(0000\)-gauge is assigned a diameter of \(0.46\) inch. This quadruple-zero gauge is three steps larger than \(0\)-gauge (and might more sensibly be named \(–3\)-gauge). Thus there are \(40\) integer-valued gauge numbers, and \(39\) steps between them. The extreme values are known, and we need to devise some interpolation process to assign diameters to all the gauges between \(36\) and \(0000\).

The wire-drawing process itself suggests how to do this. Each pass through a die reduces the wire diameter to some fraction of its former size, but the value of the fraction might vary a little from one die to the next. The standard simply decrees that the fraction is exactly the same in all cases. In other words, for every pair of adjacent gauge numbers, the corresponding wire diameters have the same ratio, \(R\).

What remains is to work out the value of \(R\). If we start with \(d_{36} = 0.005\) and multiply by \(R\), we’ll get \(d_{35}\); then, multiplying \(d_{35}\) by \(R\) yields \(d_{34}\), and so on. Continuing in this way, after multiplying by \(R\) \(39\) times, we should arrive at \(d_{-3} = 0.46.\) This iterative process can be summarized as:

\[\frac{d_{-3}}{d_{36}} = R^{39}.\]

Filling in the numeric values, we get:

\[\frac{0.46}{0.005} = 92 = R^{39}, \quad \textrm{and thus}\quad R = \sqrt[39]{92}.\]

And there the number lies before us, the \(39\)th root of \(92\). The numerical value is about \(1.122932\), with \(1/R \approx 0.890526\).

With this fact in hand we can now write down a formula that gives the AWG gauge number \(G\) as a function of wire diameter \(d\) in inches:

\[G(d) = -39 \log_{92} \frac{d}{0.005} + 36.\]

That’s a fairly bizarre-looking formula, with base-92 logarithms and a bunch of arbitrary constants floating around. On the other hand, at least it’s a genuine mathematical function, with a domain covering all the positive real numbers. It’s also smooth and invertible. That’s more than you can say for some other standards, such as the British Imperial Wire Gauge, which pastes together several piecewise linear segments.

Who came up with the rule of \(\sqrt[39]{92}\)? As far as I can tell it was Lucian Sharpe, of Brown and Sharpe, a maker of precision instruments and machine tools in Providence, Rhode Island. A history of the company published in 1949 gives this account:

Another activity begun in the [1850s] was the production of accurate gages. The brass business of Connecticut, centered in the Naugatuck Valley, required sheet metal and wire gauges for measuring their products. Mr. Sharpe, with his methodical mind, conceived the idea of producing sizes of wire in a regular progression, choosing a geometric series as best adapted to these needs. Such gages as were in use prior to this time were the product of English manufacture and were very irregular in their sizes.

The first Brown and Sharpe wire gauge was produced in 1857 and later became the basis of an American standard, which is now administered by ASTM.

Wire gauges are not the only numbers defined by a weird-and-wonderful root-taking procedure. The equal-tempered scale of music theory is based on the 12th root of 2. A musical octave represents a doubling of frequency, and the scale divides this interval into 12 semitones. In the equal-tempered version of the scale, any two adjacent semitones differ by a ratio of \(\sqrt[12]{2}\), or about \(1.05946\). It’s worth noting that instruments were being tuned to this scale well before the invention of logarithms. I assume it was done by ear or perhaps by geometry, not by algebra. Around 1600 Simon Stevin did attempt to calculate numerical values for the pitch intervals by decomposing 12th roots into combinations of square and cube roots; his results were not flawless. What would he have done with 39th roots?

Another example of a backward-running logarithmic progression is the magnitude scale for the brightness of stars and other celestial objects. For the astronomers, the magic number is the fifth root of \(100\), or about \(2.511886\); if two stars differ by one unit of magnitude, this is their brightness ratio. A difference of five magnitudes therefore works out to a hundredfold brightness ratio. Brighter bodies have smaller magnitudes. The star Vega defines magnitude \(0\); the sun has magnitude \(-27\); the faintest stars visible without a telescope are at magnitude \(6\) or \(7\).

The idea of stellar magnitudes is ancient, but the numerical scheme in current use was developed by the British/Indian astronomer N. R. Pogson in 1856. That was just a year before Sharpe came up with his wire gauge scale. Could there be a connection? It would make a nice story if we could find some timely account of Pogson’s work that Sharpe might plausibly have read (maybe in Scientific American, founded 1845), but that’s a pure flight of fancy for now.

And what about those shotguns? Are their gauges also governed by some sort of logarithmic law? No, the numerical similarity of gauges for wires and shotguns really is nothing but coincidence. The shotgun law is not logarithmic but reciprocal. Wikipedia explains:

The gauge of a firearm is a unit of measurement used to express the diameter of the barrel. Gauge is determined from the weight of a solid sphere of lead that will fit the bore of the firearm, and is expressed as the multiplicative inverse of the sphere’s weight as a fraction of a pound, e.g., a one-twelfth pound ball fits a 12-gauge bore. Thus there are twelve 12-gauge balls per pound, etc. The term is related to the measurement of cannon, which were also measured by the weight of their iron round shot; an 8 pounder would fire an 8 lb (3.6 kg) ball.

Addendum 2016-08-08: Leon Harkleroad has brought to my attention his excellent article on “Tuning with Triangles” (College Mathematics Journal, Vol. 39, No. 5 (Nov. 2008), pp. 367–373). He describes a simple geometric procedure that Vincenzo Galilei (father of Galileo) used for fretting stringed instruments. In essence it takes \(18/17 \approx 1.05882\) as an approximation to \(\sqrt[12]{2} \approx 1.05946\).

Posted in computing, mathematics, technology | 12 Comments

Getting to Know Julia

When I first started playing with computers, back in the Pleistocene, writing a few lines of code and watching the machine carry out my instructions was enough to give me a little thrill. “Look! It can count to 10!” Today, learning a new programming language is my way of reviving that sense of wonder.

Lately I’ve been learning Julia, which describes itself as “a high-level, high-performance dynamic programming language for technical computing.” I’ve been dabbling with Julia for a couple of years, but this spring I completed my first serious project—my first attempt to do something useful with the language. I wrote a bunch of code to explore correlations between consecutive prime numbers mod m, inspired by the recent paper of Lemke Oliver and Soundararajan. The code from that project, wrapped up in a Jupyter notebook, is available on GitHub. A bit-player article published last month presented the results of the computation. Here I want to say a few words about my experience with the language.

I also discussed Julia a year ago, on the occasion of JuliaCon 2015, the annual gathering of the Julia community. Parts of this post were written at JuliaCon 2016, held at MIT June 21–25.

This document is itself derived from a Jupyter notebook, although it has been converted to static HTML—meaning that you can only read it, not run the code examples. (I’ve also done some reformatting to save space and improve appearance.) If you have a working installation of Julia and Jupyter, I suggest you download the fully interactive notebook from GitHub, so that you can edit and run the code. If you haven’t installed Julia, download the notebook anyway. You can upload it to and run it online.

What Is Julia?

A 2014 paper by the founders of the Julia project sets an ambitious goal. To paraphrase: There are languages that make programming quick and easy, and there are languages that make computation fast and efficient. The two sets of languages have long been disjoint. Julia aims to fix that. It offers high-level programming, where algorithms are expressed succinctly and without much fussing over data types and memory allocation. But it also strives to match the performance of lower-level languages such as Fortran and C. Achieving these dual goals requires attention both to the design of the language itself and to the implementation.

The Julia project was initiated by a small group at MIT, including Jeff Bezanson, Alan Edelman, Stefan Karpinski, and Viral B. Shah, and has since attracted about 500 contributors. The software is open source, hosted at GitHub.

Who Am I?

In my kind of programming, the end product is not the program itself but the answers it computes. As a result, I favor an incremental and interactive style. I want to write and run small chunks of code—typically individual procedures—without having to build a lot of scaffolding to support them. For a long time I worked mainly in Lisp, although in recent years I’ve also written a fair amount of Python and JavaScript. I mention all this because my background inevitably colors my judgment of programming languages and environments. If you’re a software engineer on a large team building large systems, your needs and priorities are surely different from mine.

What Julia Code Looks Like

Enough generalities. Here’s a bit of Julia code, plucked from my recent project:

# return the next prime after x

# define the function...
function next_prime(x)
    x += iseven(x) ? 1 : 2
    while !isprime(x)
        x += 2
    return x

# and now invoke it
next_prime(10000000) ⇒ 10000019

Given an integer x, we add 1 if x is even and otherwise add 2, thereby ensuring that the new value of x is odd. Now we repeatedly check the isprime(x) predicate, or rather its negation !isprime(x). As long as x is not a prime number, we continue incrementing by 2; when x finally lands on a prime value (it must do so eventually, according to Euclid), the while loop terminates and the function returns the prime value of x.

In this code snippet, note the absence of punctuation: No semicolons separate the statements, and no curly braces delimit the function body. The syntax owes a lot to MATLAB—most conspicuously the use of the end keyword to mark the end of a block, without any corresponding begin. The resemblance to MATLAB is more than coincidence; some of the key people in the Julia project have a background in numerical analysis and linear algebra, where MATLAB has long been a standard tool. One of those people is Alan Edelman. I remember asking him, 15 years ago, “How do you compute the eigenvalues of a matrix?” He explained: “You start up MATLAB and type eig.” The same incantation works in Julia.

The sparse punctuation and the style of indentation also make Julia code look a little like Python, but in this case the similarity is only superficial. In Python, indentation determines the scope of statements, but in Julia indentation is merely an aid to the human reader; the program would compute the same result even if all the lines were flush left.

More Basics

The next_prime function above shows a while loop. Here’s a for loop:

# for-loop factorial

function factorial_for(n)
    fact = 1
    for i in 1:n         # i takes each value in 1, 2, ..., n
        fact *= i        # equivalent to fact = fact * i
    return fact

factorial_for(10) ⇒ 3628800

The same idea can be expressed more succinctly and idiomatically:

# factorial via range product

function factorial_range(n)
    return prod(1:n)

factorial_range(11) ⇒ 39916800

Here 1:n denotes the numeric range \(1, 2, 3, …, n\), and the prod function returns the product of all the numbers in the range. Still too verbose? Using an abbreviated style of function definition, it becomes a true one-liner:

fact_range(n) = prod(1:n)

fact_range(12) ⇒ 479001600


As an immigrant from the land of Lisp, I can’t in good conscience discuss the factorial function without also presenting a recursive version:

# recursive definition of factorial

function factorial_rec(n)
    if n < 2
        return 1
        return n * factorial_rec(n - 1)

factorial_rec(13) ⇒ 6227020800

You can save some keystrokes by replacing the if... else... end statement with a ternary operator. The syntax is

      predicate ? consequent : alternative.

If predicate evaluates to true, the expression returns the value of consequent; otherwise it returns the value of alternative. With this device, the recursive factorial function looks like this:

# recursive factorial with ternary operator

function factorial_rec_ternary(n)
    n < 2 ? 1 : n * factorial_rec_ternary(n - 1)

factorial_rec_ternary(14) ⇒ 87178291200

One sour note on the subject of recursion: The Julia compiler does not perform tail-call conversion, which endows recursive procedure definitions with the same space and time efficiency as iterative structures such as while loops. Consider this pair of daisy-plucking procedures:

function she_loves_me(petals)
    petals == 0 ? true : she_loves_me_not(petals - 1)

function she_loves_me_not(petals)
    petals == 0 ? false : she_loves_me(petals - 1)

she_loves_me(1001) ⇒ false

These are mutually recursive definitions: Control passes back and forth between them until one or the other terminates when the petals are exhausted. The function works correctly on small values of petals. But let’s try a more challenging exercise:


LoadError: StackOverflowError:
while loading In[8], in expression starting on line 1

 in she_loves_me at In[7]:2 (repeats 80000 times)

The stack overflows because Julia is pushing a procedure-call record onto the stack for every invocation of either she_loves_me or she_loves_me_not. These records will never be needed; when the answer (true or false) is finally determined, it can be returned directly to the top level, without having to percolate up through the cascade of stack frames. The technique for eliminating such unneeded stack frames has been well known for more than 30 years. Implementing tail-call optimization has been discussed by the Julia developers but is “not a priority.” For me this is not a catastrophic defect, but it’s a disappointment. It rules out a certain style of reasoning that in some cases is the most direct way of expressing mathematical ideas.

Array Comprehensions

Given Julia’s heritage in linear algebra, it’s no surprise that there are rich facilities for describing matrices and other array-like objects. One handy tool is the array comprehension, which is written as a pair of square brackets surrounding a description of what the compiler should compute to build the array.

A = [x^2 for x in 1:5] ⇒ [1, 4, 9, 16, 25]

The idea of comprehensions comes from the set-builder notation of mathematics. It was adopted in the programming language SETL in the 1960s, and it has since wormed its way into several other languages, including Haskell and Python. But Julia’s comprehensions are unusual: Whereas Python comprehensions are limited to one-dimensional vectors or lists, Julia comprehensions can specify multidimensional arrays.

B = [x^y for x in 1:3, y in 1:3] ⇒

3x3 Array{Int64,2}:
 1  1   1
 2  4   8
 3  9  27

Multidimensionality comes at a price. A Python comprehension can have a filter clause that selects some of the array elements and excludes others. If Julia had such comprehension filters, you could generate an array of prime numbers with an expression like this: [x for x in 1:100 if isprime(x)]. But adding filters to multidimensional arrays is problematic; you can’t just pluck values out of a matrix and keep the rest of the structure intact. Nevertheless, it appears a solution is in hand. After three years of discussion, a fix has recently been added to the code base. (I have not yet tried it out.)

Multiple Dispatch

This sounds like something the 911 operator might do in response to a five-alarm fire, but in fact multiple dispatch is a distinctive core idea in Julia. Understanding what it is and why you might care about it requires a bit of context.

In almost all programming languages, an expression along the lines of x * y multiplies x by y, whether x and y are integers, floating-point numbers, or maybe even complex numbers or matrices. All of these operations qualify as multiplication, but the underlying algorithms—the ways the bits are twiddled to compute the product—are quite different. Thus the * operator is polymorphic: It’s a single symbol that evokes many different actions. One way to handle this situation is to write a single big function—call it mul(x, y)—that has a chain of if... then... else statements to select the right multiplication algorithm for each x, y pair. If you want to multiply all possible combinations of \(n\) kinds of numbers, the if statement needs \(n^2\) branches. Maintaining that monolithic mul procedure can become a headache. Suppose you want to add a new type of number, such as quaternions. You would have to patch in new routines throughout the existing mul code.

When object-oriented programming (OOP) swept through the world of computing in the 1980s, it offered an alternative. In an object-oriented setting, each class of numbers has its own set of multiplication procedures, or methods. Instead of one universal mul function, there’s a mul method for integers, another mul for floats, and so on. To multiply x by y, you don’t write mul(x, y) but rather something like x.mul(y). The object-oriented programmer thinks of this protocol as sending a message to the x object, asking it to apply its mul method to the y argument. You can also describe the scheme as a single-dispatch mechanism. The dispatcher is a behind-the-scenes component that keeps track of all methods with the same name, such as mul, and chooses which of those methods to invoke based on the data type of the single argument x. (The x object still needs some internal mechanism to examine the type of y to know which of its own methods to apply.)

Multiple dispatch is a generalization of single dispatch. The dispatcher looks at all the arguments and their types, and chooses the method accordingly. Once that decision is made, no further choices are needed. There might be separate mul methods for combining two integers, for an integer and a float, for a float and a complex, etc. Julia has 138 methods linked to the multiplication operator *, including a few mildly surprising combinations. (Quick, what’s the value of false * 42?)

# Typing "methods(f)" for any function f, or for an operator
# such as "*", produces a list of all the methods and their
# type signatures. The links take you to the source code.

138 methods for generic function *:
  • *(x::Bool, y::Bool) at bool.jl:38
  • *{T<:Unsigned}(x::Bool, y::T<:Unsigned) at bool.jl:53
  • *(x::Bool, z::Complex{Bool}) at complex.jl:122
  • *(x::Bool, z::Complex{T<:Real}) at complex.jl:129
  • *{T<:Number}(x::Bool, y::T<:Number) at bool.jl:49
  • *(x::Float32, y::Float32) at float.jl:211
  • *(x::Float64, y::Float64) at float.jl:212
  • *(z::Complex{T<:Real}, w::Complex{T<:Real}) at complex.jl:113
  • *(z::Complex{Bool}, x::Bool) at complex.jl:123
  • *(z::Complex{T<:Real}, x::Bool) at complex.jl:130
  • *(x::Real, z::Complex{Bool}) at complex.jl:140
  • *(z::Complex{Bool}, x::Real) at complex.jl:141
  • *(x::Real, z::Complex{T<:Real}) at complex.jl:152
  • *(z::Complex{T<:Real}, x::Real) at complex.jl:153
  • *(x::Rational{T<:Integer}, y::Rational{T<:Integer}) at rational.jl:186
  • *(a::Float16, b::Float16) at float16.jl:136
  • *{N}(a::Integer, index::CartesianIndex{N}) at multidimensional.jl:50
  • *(x::BigInt, y::BigInt) at gmp.jl:256
  • *(a::BigInt, b::BigInt, c::BigInt) at gmp.jl:279
  • *(a::BigInt, b::BigInt, c::BigInt, d::BigInt) at gmp.jl:285
  • *(a::BigInt, b::BigInt, c::BigInt, d::BigInt, e::BigInt) at gmp.jl:292
  • *(x::BigInt, c::Union{UInt16,UInt32,UInt64,UInt8}) at gmp.jl:326
  • *(c::Union{UInt16,UInt32,UInt64,UInt8}, x::BigInt) at gmp.jl:330
  • *(x::BigInt, c::Union{Int16,Int32,Int64,Int8}) at gmp.jl:332
  • *(c::Union{Int16,Int32,Int64,Int8}, x::BigInt) at gmp.jl:336
  • *(x::BigFloat, y::BigFloat) at mpfr.jl:208
  • *(x::BigFloat, c::Union{UInt16,UInt32,UInt64,UInt8}) at mpfr.jl:215
  • *(c::Union{UInt16,UInt32,UInt64,UInt8}, x::BigFloat) at mpfr.jl:219
  • *(x::BigFloat, c::Union{Int16,Int32,Int64,Int8}) at mpfr.jl:223
  • *(c::Union{Int16,Int32,Int64,Int8}, x::BigFloat) at mpfr.jl:227
  • *(x::BigFloat, c::Union{Float16,Float32,Float64}) at mpfr.jl:231
  • *(c::Union{Float16,Float32,Float64}, x::BigFloat) at mpfr.jl:235
  • *(x::BigFloat, c::BigInt) at mpfr.jl:239
  • *(c::BigInt, x::BigFloat) at mpfr.jl:243
  • *(a::BigFloat, b::BigFloat, c::BigFloat) at mpfr.jl:379
  • *(a::BigFloat, b::BigFloat, c::BigFloat, d::BigFloat) at mpfr.jl:385
  • *(a::BigFloat, b::BigFloat, c::BigFloat, d::BigFloat, e::BigFloat) at mpfr.jl:392
  • *{T<:Number}(x::T<:Number, D::Diagonal{T}) at linalg/diagonal.jl:89
  • *(x::Irrational{sym}, y::Irrational{sym}) at irrationals.jl:72
  • *(y::Real, x::Base.Dates.Period) at dates/periods.jl:55
  • *(x::Number) at operators.jl:74
  • *(y::Number, x::Bool) at bool.jl:55
  • *(x::Int8, y::Int8) at int.jl:19
  • *(x::UInt8, y::UInt8) at int.jl:19
  • *(x::Int16, y::Int16) at int.jl:19
  • *(x::UInt16, y::UInt16) at int.jl:19
  • *(x::Int32, y::Int32) at int.jl:19
  • *(x::UInt32, y::UInt32) at int.jl:19
  • *(x::Int64, y::Int64) at int.jl:19
  • *(x::UInt64, y::UInt64) at int.jl:19
  • *(x::Int128, y::Int128) at int.jl:456
  • *(x::UInt128, y::UInt128) at int.jl:457
  • *{T<:Number}(x::T<:Number, y::T<:Number) at promotion.jl:212
  • *(x::Number, y::Number) at promotion.jl:168

  • *{T<:Union{Complex{Float32},Complex{Float64},Float32,Float64},S} (A::Union{DenseArray{T<:Union{Complex{Float32}, Complex{Float64},Float32,Float64},2}, SubArray{T<:Union{Complex{Float32}, Complex{Float64}, Float32,Float64},2, A<:DenseArray{T,N}, I<:Tuple{Vararg{Union{Colon,Int64,Range{Int64}}}},LD}}, x::Union{DenseArray{S,1}, SubArray{S,1,A<:DenseArray{T,N}, I<:Tuple{Vararg{Union{Colon,Int64,Range{Int64}}}},LD}}) at linalg/matmul.jl:82
  • *(A::SymTridiagonal{T}, B::Number) at linalg/tridiag.jl:86
  • *(A::Tridiagonal{T}, B::Number) at linalg/tridiag.jl:406
  • *(A::UpperTriangular{T,S<:AbstractArray{T,2}}, x::Number) at linalg/triangular.jl:454
  • *(A::Base.LinAlg.UnitUpperTriangular{T,S<:AbstractArray{T,2}}, x::Number) at linalg/triangular.jl:457
  • *(A::LowerTriangular{T,S<:AbstractArray{T,2}}, x::Number) at linalg/triangular.jl:454
  • *(A::Base.LinAlg.UnitLowerTriangular{T,S<:AbstractArray{T,2}}, x::Number) at linalg/triangular.jl:457
  • *(A::Tridiagonal{T}, B::UpperTriangular{T,S<:AbstractArray{T,2}}) at linalg/triangular.jl:969
  • *(A::Tridiagonal{T}, B::Base.LinAlg.UnitUpperTriangular{T,S<:AbstractArray{T,2}}) at linalg/triangular.jl:969
  • *(A::Tridiagonal{T}, B::LowerTriangular{T,S<:AbstractArray{T,2}}) at linalg/triangular.jl:969
  • *(A::Tridiagonal{T}, B::Base.LinAlg.UnitLowerTriangular{T,S<:AbstractArray{T,2}}) at linalg/triangular.jl:969
  • *(A::Base.LinAlg.AbstractTriangular{T,S<:AbstractArray{T,2}}, B::Base.LinAlg.AbstractTriangular{T,S<:AbstractArray{T,2}}) at linalg/triangular.jl:975
  • *{TA,TB}(A::Base.LinAlg.AbstractTriangular{TA,S<:AbstractArray{T,2}}, B:: Union{DenseArray{TB,1},DenseArray{TB,2}, SubArray{TB,1,A<:DenseArray{T,N}, I<:Tuple{Vararg{Union{Colon,Int64,Range{Int64}}}},LD}, SubArray{TB,2,A<:DenseArray{T,N}, I<:Tuple{Vararg{Union{Colon,Int64,Range{Int64}}}},LD}}) at linalg/triangular.jl:989
  • *{TA,TB}(A:: Union{DenseArray{TA,1},DenseArray{TA,2}, SubArray{TA,1,A<:DenseArray{T,N}, I<:Tuple{Vararg{Union{Colon,Int64,Range{Int64}}}},LD}, SubArray{TA,2,A<:DenseArray{T,N}, I<:Tuple{Vararg{Union{Colon,Int64,Range{Int64}}}},LD}}, B::Base.LinAlg.AbstractTriangular{TB,S<:AbstractArray{T,2}}) at linalg/triangular.jl:1016
  • *{TA,Tb}(A::Union{Base.LinAlg.QRCompactWYQ{TA,M<:AbstractArray{T,2}}, Base.LinAlg.QRPackedQ{TA,S<:AbstractArray{T,2}}}, b:: Union{DenseArray{Tb,1}, SubArray{Tb,1,A<:DenseArray{T,N}, I<:Tuple{Vararg{Union{Colon,Int64,Range{Int64}}}},LD}}) at linalg/qr.jl:166
  • *{TA,TB}(A::Union{Base.LinAlg.QRCompactWYQ{TA,M<:AbstractArray{T,2}}, Base.LinAlg.QRPackedQ{TA,S<:AbstractArray{T,2}}}, B:: Union{DenseArray{TB,2},SubArray{TB,2,A<:DenseArray{T,N}, I<:Tuple{Vararg{Union{Colon,Int64,Range{Int64}}}},LD}}) at linalg/qr.jl:178
  • *{TA,TQ,N}(A:: Union{DenseArray{TA,N}, SubArray{TA,N,A<:DenseArray{T,N}, I<:Tuple{Vararg{Union{Colon,Int64,Range{Int64}}}},LD}}, Q::Union{Base.LinAlg.QRCompactWYQ{TQ,M<:AbstractArray{T,2}}, Base.LinAlg.QRPackedQ{TQ,S<:AbstractArray{T,2}}}) at linalg/qr.jl:253
  • *(A::Union{Hermitian{T,S},Symmetric{T,S}}, B::Union{Hermitian{T,S},Symmetric{T,S}}) at linalg/symmetric.jl:117
  • *(A::Union{DenseArray{T,2}, SubArray{T,2, A<:DenseArray{T,N}, I<:Tuple{Vararg{Union{Colon,Int64,Range{Int64}}}},LD}}, B::Union{Hermitian{T,S},Symmetric{T,S}}) at linalg/symmetric.jl:118
  • *{T<:Number}(D::Diagonal{T}, x::T<:Number) at linalg/diagonal.jl:90
  • *(Da::Diagonal{T}, Db::Diagonal{T}) at linalg/diagonal.jl:92
  • *(D::Diagonal{T}, V::Array{T,1}) at linalg/diagonal.jl:93
  • *(A::Array{T,2}, D::Diagonal{T}) at linalg/diagonal.jl:94
  • *(D::Diagonal{T}, A::Array{T,2}) at linalg/diagonal.jl:95
  • *(A::Bidiagonal{T}, B::Number) at linalg/bidiag.jl:192
  • *(A::Union{Base.LinAlg.AbstractTriangular{T,S<:AbstractArray{T,2}}, Bidiagonal{T}, Diagonal{T}, SymTridiagonal{T}, Tridiagonal{T}}, B::Union{Base.LinAlg.AbstractTriangular{T, S<:AbstractArray{T,2}}, Bidiagonal{T},Diagonal{T}, SymTridiagonal{T},Tridiagonal{T}}) at linalg/bidiag.jl:198
  • *{T}(A::Bidiagonal{T}, B::AbstractArray{T,1}) at linalg/bidiag.jl:202
  • *(B::BitArray{2}, J::UniformScaling{T<:Number}) at linalg/uniformscaling.jl:122
  • *{T,S}(s::Base.LinAlg.SVDOperator{T,S}, v::Array{T,1}) at linalg/arnoldi.jl:261
  • *(S::SparseMatrixCSC{Tv,Ti<:Integer}, J::UniformScaling{T<:Number}) at sparse/linalg.jl:23
  • *{Tv,Ti}(A::SparseMatrixCSC{Tv,Ti}, B::SparseMatrixCSC{Tv,Ti}) at sparse/linalg.jl:108
  • *{TvA,TiA,TvB,TiB}(A::SparseMatrixCSC{TvA,TiA}, B::SparseMatrixCSC{TvB,TiB}) at sparse/linalg.jl:29
  • *{TX,TvA,TiA}(X:: Union{DenseArray{TX,2}, SubArray{TX,2,A<:DenseArray{T,N}, I<:Tuple{Vararg{Union{Colon,Int64,Range{Int64}}}},LD}}, A::SparseMatrixCSC{TvA,TiA}) at sparse/linalg.jl:94
  • *(A::Base.SparseMatrix.CHOLMOD.Sparse {Tv<:Union{Complex{Float64},Float64}}, B:: Base.SparseMatrix.CHOLMOD.Sparse{Tv<:Union{Complex{Float64}, Float64}}) at sparse/cholmod.jl:1157
  • *(A::Base.SparseMatrix.CHOLMOD.Sparse {Tv<:Union{Complex{Float64},Float64}}, B:: Base.SparseMatrix.CHOLMOD.Dense {T<:Union{Complex{Float64},Float64}}) at sparse/cholmod.jl:1158
  • *(A::Base.SparseMatrix.CHOLMOD.Sparse {Tv<:Union{Complex{Float64},Float64}}, B:: Union{Array{T,1},Array{T,2}}) at sparse/cholmod.jl:1159
  • *{Ti}(A::Symmetric{Float64,SparseMatrixCSC{Float64,Ti}}, B::SparseMatrixCSC{Float64,Ti}) at sparse/cholmod.jl:1418
  • *{Ti}(A::Hermitian{Complex{Float64}, SparseMatrixCSC{Complex{Float64},Ti}}, B::SparseMatrixCSC{Complex{Float64},Ti}) at sparse/cholmod.jl:1419
  • *{T<:Number}(x::AbstractArray{T<:Number,2}) at abstractarraymath.jl:50
  • *(B::Number, A::SymTridiagonal{T}) at linalg/tridiag.jl:87
  • *(B::Number, A::Tridiagonal{T}) at linalg/tridiag.jl:407
  • *(x::Number, A::UpperTriangular{T,S<:AbstractArray{T,2}}) at linalg/triangular.jl:464
  • *(x::Number, A::Base.LinAlg.UnitUpperTriangular{T, S<:AbstractArray{T,2}}) at linalg/triangular.jl:467
  • *(x::Number, A::LowerTriangular{T,S<:AbstractArray{T,2}}) at linalg/triangular.jl:464
  • *(x::Number, A::Base.LinAlg.UnitLowerTriangular{T, S<:AbstractArray{T,2}}) at linalg/triangular.jl:467
  • *(B::Number, A::Bidiagonal{T}) at linalg/bidiag.jl:193
  • *(A::Number, B::AbstractArray{T,N}) at abstractarraymath.jl:54
  • *(A::AbstractArray{T,N}, B::Number) at abstractarraymath.jl:55
  • *(s1::AbstractString, ss::AbstractString…) at strings/basic.jl:50
  • *(this::Base.Grisu.Float, other::Base.Grisu.Float) at grisu/float.jl:138
  • *(index::CartesianIndex{N}, a::Integer) at multidimensional.jl:54
  • *{T,S}(A::AbstractArray{T,2}, B::Union{DenseArray{S,2}, SubArray{S,2,A<:DenseArray{T,N}, I<:Tuple{Vararg{Union{Colon,Int64,Range{Int64}}}},LD}}) at linalg/matmul.jl:131
  • *{T,S}(A::AbstractArray{T,2}, x::AbstractArray{S,1}) at linalg/matmul.jl:86
  • *(A::AbstractArray{T,1}, B::AbstractArray{T,2}) at linalg/matmul.jl:89
  • *(J1::UniformScaling{T<:Number}, J2::UniformScaling{T<:Number}) at linalg/uniformscaling.jl:121
  • *(J::UniformScaling{T<:Number}, B::BitArray{2}) at linalg/uniformscaling.jl:123
  • *(A::AbstractArray{T,2}, J::UniformScaling{T<:Number}) at linalg/uniformscaling.jl:124
  • *{Tv,Ti}(J::UniformScaling{T<:Number}, S::SparseMatrixCSC{Tv,Ti}) at sparse/linalg.jl:24
  • *(J::UniformScaling{T<:Number}, A::Union{AbstractArray{T,1},AbstractArray{T,2}}) at linalg/uniformscaling.jl:125
  • *(x::Number, J::UniformScaling{T<:Number}) at linalg/uniformscaling.jl:127
  • *(J::UniformScaling{T<:Number}, x::Number) at linalg/uniformscaling.jl:128
  • *{T,S}(R::Base.LinAlg.AbstractRotation{T}, A::Union{AbstractArray{S,1},AbstractArray{S,2}}) at linalg/givens.jl:9
  • *{T}(G1::Base.LinAlg.Givens{T}, G2::Base.LinAlg.Givens{T}) at linalg/givens.jl:307
  • *(p::Base.DFT.ScaledPlan{T,P,N}, x::AbstractArray{T,N}) at dft.jl:262
  • *{T,K,N}(p::Base.DFT.FFTW.cFFTWPlan{T,K,false,N}, x::Union{DenseArray{T,N}, SubArray{T,N,A<:DenseArray{T,N}, I<:Tuple{Vararg{Union{Colon,Int64,Range{Int64}}}},LD}}) at fft/FFTW.jl:621
  • *{T,K}(p::Base.DFT.FFTW.cFFTWPlan{T,K,true,N}, x::Union{DenseArray{T,N}, SubArray{T,N,A<:DenseArray{T,N}, I<:Tuple{Vararg{Union{Colon,Int64,Range{Int64}}}},LD}}) at fft/FFTW.jl:628
  • *{N}(p::Base.DFT.FFTW.rFFTWPlan{Float32,-1,false,N}, x:: Union{DenseArray{Float32,N}, SubArray{Float32,N,A<:DenseArray{T,N}, I<:Tuple{Vararg{Union{Colon,Int64,Range{Int64}}}},LD}}) at fft/FFTW.jl:698
  • *{N}(p::Base.DFT.FFTW.rFFTWPlan{Complex{Float32},1,false,N}, x::Union{DenseArray{Complex{Float32},N}, SubArray{Complex{Float32},N,A<:DenseArray{T,N}, I<:Tuple{Vararg{Union{Colon,Int64,Range{Int64}}}},LD}}) at fft/FFTW.jl:705
  • *{N}(p::Base.DFT.FFTW.rFFTWPlan{Float64,-1,false,N}, x:: Union{DenseArray{Float64,N}, SubArray{Float64,N,A<:DenseArray{T,N}, I<:Tuple{Vararg{Union{Colon,Int64,Range{Int64}}}},LD}}) at fft/FFTW.jl:698
  • *{N}(p::Base.DFT.FFTW.rFFTWPlan{Complex{Float64},1,false,N}, x:: Union{DenseArray{Complex{Float64},N}, SubArray{Complex{Float64},N,A<:DenseArray{T,N}, I<:Tuple{Vararg{Union{Colon,Int64,Range{Int64}}}},LD}}) at fft/FFTW.jl:705
  • *{T,K,N}(p::Base.DFT.FFTW.r2rFFTWPlan{T,K,false,N}, x:: Union{DenseArray{T,N}, SubArray{T,N,A<:DenseArray{T,N}, I<:Tuple{Vararg{Union{Colon,Int64,Range{Int64}}}},LD}}) at fft/FFTW.jl:866
  • *{T,K}(p::Base.DFT.FFTW.r2rFFTWPlan{T,K,true,N}, x:: Union{DenseArray{T,N}, SubArray{T,N,A<:DenseArray{T,N}, I<:Tuple{Vararg{Union{Colon,Int64,Range{Int64}}}},LD}}) at fft/FFTW.jl:873
  • *{T}(p::Base.DFT.FFTW.DCTPlan{T,5,false}, x:: Union{DenseArray{T,N}, SubArray{T,N,A<:DenseArray{T,N}, I<:Tuple{Vararg{Union{Colon,Int64,Range{Int64}}}},LD}}) at fft/dct.jl:188
  • *{T}(p::Base.DFT.FFTW.DCTPlan{T,4,false}, x::Union{DenseArray{T,N}, SubArray{T,N,A<:DenseArray{T,N}, I<:Tuple{Vararg{Union{Colon,Int64,Range{Int64}}}},LD}}) at fft/dct.jl:191
  • *{T,K}(p::Base.DFT.FFTW.DCTPlan{T,K,true}, x::Union{DenseArray{T,N}, SubArray{T,N,A<:DenseArray{T,N}, I<:Tuple{Vararg{Union{Colon,Int64,Range{Int64}}}},LD}}) at fft/dct.jl:194
  • *{T}(p::Base.DFT.Plan{T}, x::AbstractArray{T,N}) at dft.jl:221
  • *(?::Number, p::Base.DFT.Plan{T}) at dft.jl:264
  • *(p::Base.DFT.Plan{T}, ?::Number) at dft.jl:265
  • *(I::UniformScaling{T<:Number}, p::Base.DFT.ScaledPlan{T,P,N}) at dft.jl:266
  • *(p::Base.DFT.ScaledPlan{T,P,N}, I::UniformScaling{T<:Number}) at dft.jl:267
  • *(I::UniformScaling{T<:Number}, p::Base.DFT.Plan{T}) at dft.jl:268
  • *(p::Base.DFT.Plan{T}, I::UniformScaling{T<:Number}) at dft.jl:269
  • *{P<:Base.Dates.Period} (x::P<:Base.Dates.Period, y::Real) at dates/periods.jl:54
  • *(a, b, c, xs…) at operators.jl:103

Before Julia came along, the best-known example of multiple dispatch was the Common Lisp Object System, or CLOS. As a Lisp guy, I’m familiar with CLOS, but I’ve seldom found it useful; it’s too heavy a hammer for most of my little nails. But whereas CLOS is an optional add-on to Lisp, multiple dispatch is deeply embedded in the infrastructure of Julia. You can’t really ignore it.

Multiple dispatch encourages a style of programming in which you write an abundance of short, single-purpose methods that handle specific combinations of argument types. This is surely an improvement over writing one huge function with an internal tangle of spaghetti logic to handle dozens special cases. The Julia documentation offers this example:

# roll-your-own method dispatch

function norm(A)
    if isa(A, Vector)
        return sqrt(real(dot(A,A)))
    elseif isa(A, Matrix)
        return max(svd(A)[2])
        error("norm: invalid argument")

# let the compiler do method dispatch

norm(x::Vector) = sqrt(real(dot(x,x)))
norm(A::Matrix) = max(svd(A)[2])

# Note: The 'x::y syntax declares that variable x will have
# a value of type y.

Splitting the procedure into two methods makes the definition clearer and more concise, and it also promises better performance. There’s no need to crawl through the if... else... chain at runtime; the appropriate method is chosen at compile time.

That example seems pretty compelling. Edelman gives another illuminating demo, defining tropical arithmetic in a few lines of code. (In tropical arithmetic, plus is min and times is plus.)

Unfortunately, my own attempts to structure code in this way have not been so successful. Take the next_prime(x) function shown at the beginning of this notebook. The argument x might be either an ordinary integer (some subtype of Int, such as Int64) or a BigInt, a numeric type accommodating integers of unbounded size. So I tried writing separate methods next_prime(x::Int) and next_prime(x::BigInt). The result, however, was annoying code duplication: The bodies of those two methods were identical. Furthermore, splitting the two methods yielded no performance gain; the compiler was able to produce efficient code without any type annotations at all.

I remain somewhat befuddled about how best to exploit multiple dispatch. Suppose you are creating a graphics program that can plot either data from an array or a mathematical function. You could define a generic plot procedure with two methods:

function plot(data::Array)
    # ... code for array plotting

function plot(fn::Function)
    # ... code for function plotting

With these definitions, calls to plot([2, 4, 6, 8]) and plot(sin) would automatically be routed to the appropriate method. However, making the two methods totally independent is not quite right either. They both need to deal with scales, grids, tickmarks, labels, and other graphics accoutrements. I’m still learning how to structure such code. No doubt it will become clearer with experience; in the meantime, advice is welcome.

Multiple dispatch is not a general pattern-matching mechanism. It discriminates only on the basis of the type signature of a function’s arguments, not on the values of those arguments. In Haskell—a language that does have pattern matching—you can write a function definition like this:

factorial 0 = 1
factorial n = n * factorial (n - 1)

That won’t work in Julia. (Actually, there may be a baroque way to do it, but it’s not recommended.) (And I’ve just learned there’s a pattern-matching package.) (And another.)

Multiple dispatch vs. OOP

In CLOS, multiple dispatch is an adjunct to an object-oriented programming system; in Julia, multiple dispatch is an alternative to OOP, and perhaps a repudiation of it. OOP is a noun-centered protocol: Objects are things that own both data and procedures that act on the data. Julia is verb-centered: Procedures (or functions) are the active elements of a program, and they can be applied to any data, not just their own private variables. The Julia documentation notes:

Multiple dispatch is particularly useful for mathematical code, where it makes little sense to artificially deem the operations to “belong” to one argument more than any of the others: does the addition operation in x + y belong to x any more than it does to y? The implementation of a mathematical operator generally depends on the types of all of its arguments. Even beyond mathematical operations, however, multiple dispatch ends up being a powerful and convenient paradigm for structuring and organizing programs.

I agree about the peculiar asymmetry of asking x to add y to itself. And yet that’s only one small aspect of what object-oriented programming is all about. There are many occasions when it’s really handy to bundle up related code and data in an object, if only for the sake of tidyness. If I had been writing my primes program in an object-oriented language, I would have created a class of objects to hold various properties of a sequence of consecutive primes, and then instantiated that class for each sequence I generated. Some of the object fields would have been simple data, such as the length of the sequence. Others would have been procedures for calculating more elaborate statistics. Some of the data and procedures would have been private—accessible only from within the object.

No such facility exists in Julia, but there are other ways of addressing some of the same needs. Julia’s user-defined composite types look very much like objects in some respects; they even use the same dot notation for field access that you see in Java, JavaScript and Python. A type definition looks like this.

type Location

Given a variable of this type, such as Paris = Location(48.8566, 2.3522), you can access the two fields as Paris.latitude and Paris.longitude. You can even have a field of type Function inside a composite type, as in this declaration:

type Location

Then, if you store an appropriate function in that field, you can invoke it as Paris.distance_to_pole(). However, the function has no special access to the other fields of the type; in particular, it doesn’t know which Location it’s being called from, so it doesn’t work like an OOP method.

Julia also has modules, which are essentially private namespaces. Only variables that are explicitly marked as exported can be seen from outside a module. A module can encapsulate both code and data, and so it is probably the closest approach to objects as a mechanism of program organization. But there’s still nothing like the this or self keyword of true object-oriented languages.

Is it brave or foolish, at this point in the history of computer science, to turn your back on the object-oriented paradigm and try something else? It’s surely bold. Two or three generations of programmers have grown up on a steady diet OOP. On the other hand, we have not yet found the One True Path to software wisdom, so there’s surely room for exploring other corners of the ecosystem.


A language for “technical computing” had better have strong numerical abilities. Julia has the full gamut of numeric types, with integers of various fixed sizes (from 8 bits to 128 bits), both signed and unsigned, and a comparable spectrum of floating-point precisions. There are also BigInts and BigFloats that expand to accommodate numbers of any size (up to the bounds of available memory). And you can do arithmetic with complex numbers and exact rationals.

This is a well-stocked toolkit, suitable for many kinds of computations. However, the emphasis is on fast floating-point arithmetic with real or complex values, approximated at machine precision. If you want to work in number theory, combinatorics, or cryptography—fields that require exact integers and rational numbers of unbounded size—the tools are provided, but using them requires some extra contortions.

To my taste, the most elegant implementation of numeric types is found in the Scheme programming language, a dialect of Lisp. The Scheme numeric “tower” begins with integers at its base. The integers are a subset of the rational numbers, which in turn are a subset of the reals, which are a subset of the complex numbers. Alongside this hierarchy, and orthogonal to it, Scheme also distinguishes between exact and inexact numbers. Integers and rationals can be represented exactly in a computer system, but many real and complex values can only be approximated, usually by floating-point numbers. In doing basic arithmetic, the Scheme interpreter or compiler preserves exactness. This is easy when you’re adding, subtracting, or multiplying integers, since the result is also invariably an integer. With division of integers, Scheme returns an integer when possible (e.g., \(15 \div 3 = 5\)) and otherwise an exact rational (\(4 \div 6 = 2/3\)). Some Schemes go a step further and return exact results for operations such as square root, when it’s possible to do so (\(\sqrt{4/9} = 2/3\); but \(\sqrt{5} = 2.236068\); the latter is inexact). These are numbers you can count on; they mostly obey well-known principles of mathematics, such as \(\frac{m}{n} \times \frac{n}{m} = 1\).

Julia has the same tower of numeric types, but it is not so scrupulous about exact and inexact values. Consider this series of results:

15 / 3 ⇒ 5.0

4 / 6 ⇒ 0.6666666666666666

(15 / 17) * (17 / 15) ⇒ 0.9999999999999999

Even when the result of dividing two integers is another integer, Julia converts the quotient to a floating-point value (signaled by the presence of a decimal point). We also get floats instead of exact rationals for noninteger quotients. Because exactness is lost, mathematical identities become unreliable.

I think I know why the Julia designers chose this path. It’s all about type stability. The compiler can produce faster code if the output type of a method is always the same. If division of integers could yield either an integer or a float, the result type would not be known until runtime. Using exact rationals rather than floats would impose a double penalty: Rational arithmetic is much slower than (hardware-assisted) floating point.

Given the priorities of the Julia project, consistent floating-point quotients were probably the right choice. I’ll concede that point, but it still leaves me grumpy. I’m tempted to ask, “Why not just make all numbers floating point, the way JavaScript does?” Then all arithmetic operations would be type stable by default. (This is not a serious proposal. I deplore JavaScript’s all-float policy.)

Julia does offer the tools to build your own procedures for exact arithmetic. Here’s one way to conquer divide:

function xdiv(x::Int, y::Int)
    q = x // y                 # yields rational quotient
    den(q) == 1 ? num(q) : q   # return int if possible

xdiv(15, 5) ⇒ 3

xdiv(15, 9) ⇒ 5//3

My Int Overfloweth

Another numerical quirk gives me the heebie-jeebies. Let’s look again at one of those factorial functions defined above (it doesn’t matter which one).

# run the factorial function f on integers from 1 to limit

function test_factorial(f::Function, limit::Int)
    for n in 1:limit
        @printf("n = %d, f(n) = %d\n", n, f(n))

test_factorial(factorial_range, 25) ⇒

n = 1, f(n) = 1
n = 2, f(n) = 2
n = 3, f(n) = 6
n = 4, f(n) = 24
n = 5, f(n) = 120
n = 6, f(n) = 720
n = 7, f(n) = 5040
n = 8, f(n) = 40320
n = 9, f(n) = 362880
n = 10, f(n) = 3628800
n = 11, f(n) = 39916800
n = 12, f(n) = 479001600
n = 13, f(n) = 6227020800
n = 14, f(n) = 87178291200
n = 15, f(n) = 1307674368000
n = 16, f(n) = 20922789888000
n = 17, f(n) = 355687428096000
n = 18, f(n) = 6402373705728000
n = 19, f(n) = 121645100408832000
n = 20, f(n) = 2432902008176640000
n = 21, f(n) = -4249290049419214848
n = 22, f(n) = -1250660718674968576
n = 23, f(n) = 8128291617894825984
n = 24, f(n) = -7835185981329244160
n = 25, f(n) = 7034535277573963776

Everything is copacetic through \(n = 20\), and then suddenly we enter an alternative universe where multiplying a bunch of positive integers gives a negative result. You can probably guess what’s happening here. The computation is being done with 64-bit, twos-complement signed integers, with the most-significant bit representing the sign. When the magnitude of the number exceeds \(2^{63} - 1\), the bit pattern is interpreted as a negative value; then, at \(2^{64}\), it crosses zero and becomes positive again. Essentially, we’re doing arithmetic modulo \(2^{64}\), with an offset.

When a number grows beyond the space allotted for it, something has to give. In some languages the overflowing integer is gracefully converted to a bignum format, so that it can keep growing without constraint. Most Lisps are in this family; so is Python. JavaScript, with its all-floating-point arithmetic, gives approximate answers beyond \(20!\), and beyond \(170!\) all results are deemed equal to Infinity. In several other languages, programs halt with an error message on integer overflow. C is one language that has the same wraparound behavior as Julia. (Actually, the C standard says that signed-integer overflow is “undefined,” which means the compiler can do anything it pleases; but the C compiler I just tested does a wraparound, and I think that’s the common policy.)

The Julia documentation includes a thorough and thoughtful discussion of integer overflow, defending the wraparound strategy by showing that all the alternatives are unacceptable for one reason or another. But even if you go along with that conclusion, you might still feel that wraparound is also unacceptable.

Casual disregard of integer overflow has produced some notably stinky bugs, causing everything from video-game glitches in Pac Man and Donkey Kong to the failure of the first Ariane 5 rocket launch. Dietz, Li, Regehr, and Adve have surveyed bugs of this kind in C programs, finding them even in a module called SafeInt that was specifically designed to protect against such errors. In my little prime-counting project with Julia, I stumbled over overflow problems several times, even after I understood exactly where the risks lay.

In certain other contexts Julia doesn’t play quite so fast and loose. It does runtime bounds checking on array references, throwing an error if you ask for element five of a four-element vector. But it also provides a macro (@inbounds) that allows self-confident daredevils to turn off this safeguard for the sake of speed. Perhaps someday there will be a similar option for integer overflows.

Until then, it’s up to us to write in any overflow checks we think might be prudent. The Julia developers themselves have done so in many places, including their built-in factorial function. See error message below.

test_factorial(factorial, 25) ⇒

n = 1, f(n) = 1
n = 2, f(n) = 2
n = 3, f(n) = 6
n = 4, f(n) = 24
n = 5, f(n) = 120
n = 6, f(n) = 720
n = 7, f(n) = 5040
n = 8, f(n) = 40320
n = 9, f(n) = 362880
n = 10, f(n) = 3628800
n = 11, f(n) = 39916800
n = 12, f(n) = 479001600
n = 13, f(n) = 6227020800
n = 14, f(n) = 87178291200
n = 15, f(n) = 1307674368000
n = 16, f(n) = 20922789888000
n = 17, f(n) = 355687428096000
n = 18, f(n) = 6402373705728000
n = 19, f(n) = 121645100408832000
n = 20, f(n) = 2432902008176640000

LoadError: OverflowError()
while loading In[19], in expression starting on line 1

 in factorial_lookup at combinatorics.jl:29
 in factorial at combinatorics.jl:37
 in test_factorial at In[18]:5

Julia is gaining traction in scientific computing, in finance, and other fields; it has even been adopted as the specification language for an aircraft collision-avoidance system. I do hope that everyone is being careful.

Batteries Installed

“Batteries included” is a slogan of the Python community, boasting that everything you need is provided in the box. And indeed Python comes with a large standard library, supplemented by a huge variety of user-contributed modules (the PyPI index lists about 85,000 of them). On the other hand, although batteries of many kinds are included, they are not installed by default. You can’t accomplish much in Python without first importing a few modules. Just to take a square root, you have to import math. If you want complex numbers, they’re in a different module, and rationals are in a third.

In contrast, Julia is refreshingly self-contained. The standard library has a wide range of functions, and they’re all immediately available, without tracking down and loading external files. There are also about a thousand contributed packages, but you can do quite a lot of useful computing without ever glancing at them. In this respect Julia reminds me of Mathematica, which also tries to put all the commonly needed tools at your fingertips.

A Julia sampler: you get sqrt(x) of course. Also cbrt(x) and hypot(x, y) and log(x) and exp(x). There’s a full deck of trig functions, both circular and hyperbolic, with a variant subset that take their arguments in degrees rather than radians. For combinatorialists we have the factorial function mentioned above, as well as binomial, and various kinds of permutations and combinations. Number theorists can test numbers for primality, and factor those that are composite, calculate gcds and lcms, and so on. Farther afield we have beta, gamma, digamma, eta, and zeta functions, Airy functions, and a dozen kinds of Bessel functions.

For me Julia is like a well-equipped playground, where I can scamper from the swings to the teeter-totter to the monkey bars. At the recent JuliaCon Stefan Karpinsky mentioned a plan to move some of these functions out of the automatically loaded Base module into external libraries. My request: Please don’t overdo it.

Incremental Programming

Julia works splendidly as a platform for incremental, interactive, exploratory programming. There’s no need to write and compile a program as a monolithic structure, with a main function as entry point. You can write a single procedure, immediately compile it and run it, test it, modify it, and examine the assembly code generated by the compiler. It’s ad hack programming at its best.

To support this style of work and play, Julia comes with a built-in REPL, or read-eval-print loop, that operates from a command-line interface. Type an expression at the prompt, press enter, and the system executes the code (compiling if necessary); the value is printed, followed by a new prompt for the next command:

A snippet from the Julia REPL

The REPL is pretty good, but I prefer working in a Jupyter notebook, which runs in a web browser. There’s also a development environment called Juno, but I haven’t yet done much with it.

An incremental and iterative style of development can be a challenge for the compiler. When you process an entire program in one piece, the compiler starts from a blank slate. When you compile individual functions, which need to interact with functions compiled earlier, it’s all too easy for the various parts of the system to get out of sync. Gears may fail to mesh.

Problems of this kind can come up with any language, but multiple dispatch introduces some unique quirks. Suppose you’re writing Julia code in a Jupyter notebook. You write and compile a function f(x), which the system accepts as a “generic function with one method.” Later you realize that f should have been a function of two arguments, and so you go back to edit the definition. The modified source code now reads f(x, y). In most languages, this new definition of f would replace the old one, which would thereafter cease to exist. But in Julia you have haven’t replaced anything; instead you now have a “generic function with 2 methods.” The first definition is still present and active in the internal workspace, and so you can call both f(x) and f(x, y). However, the source code for f(x) is nowhere to be found in the notebook. Thus if you end the Jupyter session and later reload the file, any calls to f(x) will produce an error message.

A similar but subtler problem with redefinitions has been a subject of active discussion among the Julia developers for almost five years. There’s hope for a fix in the next major release.

If redefining a function during an interactive session can lead to confusion, trying to redefine a custom type runs into an impassable barrier. Types are immutable. Once defined, they can’t be modified in any way. If you create a type T with two fields a and b, then later decide you also need a field c, you’re stuck. The only way to redefine the type is to shut down the session and restart, losing all computations completed so far. How annoying. There’s a workaround: If you put the type definition inside a module, you can simply reload the module.


Sometimes it’s the little things that arouse the most heated controversy.

In counting things, Julia generally starts with 1. A vector with n elements is indexed from 1 through n. This is also the default convention in Fortran, MATLAB, Mathematica, and a few other languages; the C family, Python, and JavaScript count from 0 through n-1. The wisdom of 1-based indexing has been a subject of long and spirited debate on the Julia issues forum, on the developer mailing list, and on StackOverflow.

Programs where the choice of indexing origin makes any difference are probably rare, but as it happens my primes study was one of them. I was looking at primes reduced modulo \(m\), and counting the number of primes in each of the \(m\) possible residue classes. Since the residues range from \(0\) to \(m - 1\), it would have been convenient to store the counts in an array with those indices. However, I did not find it terribly onerous to add \(1\) to each index.

Blessedly, the indexing war may soon be over. Tim Holy, a biologist and intrepid Julia contributor, discovered a way to give users free choice of indexing with only a few lines of code and little cost in efficiency. An experimental version of this new mechanism is in the 0.5 developer release.

After the indexing truce, we can still fight over the storage order for arrays and matrices. Julia again follows the precedent of Fortran and MATLAB, reading arrays in column-major order. Given the matrix

x_{11} & x_{12} & x_{13} \\
x_{21} & x_{22} & x_{23} \\

Julia will serialize the elements as \(x_{11}, x_{21}, x_{12}, x_{22}, x_{13}, x_{23}\). C and many other recent languages read across the rows rather than down the columns. This is another largely arbitrary convention, although it may be of some practical importance that most graphic formats are row-major.

Finally there’s the pressing question of what symbol to choose for the string-concatenation operator. Where Python has "abc" + "def" ⇒ "abcdef", Julia does "abc" * "def" ⇒ "abcdef". The reason cited is pretty doctrinaire: In abstract algebras, ‘+’ generally denotes a commutative operation, and string concatenation is not commutative. If you’d like to know more, see the extensive, long-running discussion.


The official online documentation is reasonably complete, consistent, and clearly written, but it gives too few examples. Moreover, the organization is sometimes hard to fathom. For example, there’s a chapter on “Multidimensional Arrays” and another titled “Arrays”; I’m often unsure where to look first.

Some of the contributed packages have quite sparse documentation. I made heavy use of a graphics package called GadFly, which is a Julia version of the R package called ggplot2. I had a hard time getting started with the supplied documentation, until I realized that GadFly is close enough to ggplot2 that I could rely on the manual for the latter program.

The simple advice when you’re stumped is: Read the source, Luke. It’s all readily available and easily searched. And almost everything is written in Julia, so you’ll not only find the answers to your questions but also see instructive examples of idiomatic usage.

A few other online resources:

As far as I know, Julia has no standard or specification document, which is an interesting absence. For a long time, programming languages had rather careful, formal descriptions of syntax and semantics, laying out exactly which expressions were legal and meaningful in a program. Sometimes the standard came first, before anyone tried to write a compiler or interpreter; Algol 60 was the premier example of this discipline. Sometimes the standard came after, as a device to reconcile diverging implementations; this was the situation with Lisp and C. For now, at least, Julia is apparently defined only by its evolving implementation. Where’s the BNF?

The Future

The rise and fall of programming languages seems to be governed by a highly nonlinear dynamic. Why is Python so popular? Because it’s so popular! With millions of users, there are more add-on packages, more books and articles, more conferences, more showcase applications, more jobs for Python programmers, more classes where Python is the language of instruction. Success breeds success. It’s a bandwagon effect.

If this rich-get-richer economy were the whole story, the world would have only one viable programming language. But there’s a price to be paid for popularity: The overcrowded bandwagon gets trapped in its own wheel ruts and becomes impossible to steer. Think of those 85,000 Python packages. If an improvement in the core language is incompatible with too much of that existing code, the change can be made only with great effort. For example, persuading Python users to migrate from version 2 to version 3 has taken eight years, and the transition is not complete yet.

Julia is approaching the moment when these contrary imperatives—to gain momentum but to stay maneuverable—both become powerful. At the recent JuliaCon, Stefan Karpinsky gave a talk on the path to version 1.0 and beyond. The audience listened with the kind of studious attention that Janet Yellin gets when she announces Fed policy on inflation and interest rates. Toward the end, Karpinsky unveiled the plan: to release version 1.0 within a year, by the time of the 2017 JuliaCon. Cheers and applause.

The 1.0 label connotes maturity—or at least adulthood. It’s a signal that the language is out of school and ready for work in the real-world. And thus begins the struggle to gain momentum while continuing to innovate. For members of Julia’s developer group, there might be a parallel conflict between making a living and continuing to have fun. I’m fervently hoping all these tensions can be resolved.

Will Julia attract enough interest to grow and thrive? I feel I can best address that question by turning it back on myself. Will I continue to write code in Julia?

I’ve said that I do exploratory programming, but I also do expository programming. Computational experiments are an essential part of my work as a science writer. They help me understand what I’m writing about, and they also aid in the storytelling process, through simulations, computer-generated illustrations, and interactive gadgets. Although only a small subset of my readers ever glance at the code I publish, I want to maximize their number and do all I can to deepen their engagement—enticing them to read the code, run it, modify it, and ideally build on it. At this moment, Julia is not the most obvious choice for reaching a broad audience of computational dabblers, but I think it has considerable promise. In some respects, getting started with Julia is easier than it is with Python, if only because there’s less prior art—less to install and configure. I expect to continue working in other languages as well, but I do want to try some more Julia experiments.

Posted in computing | 12 Comments

Prime After Prime

Are the prime numbers sprinkled randomly along the number line, like windblown seeds?Of course not: Primality is not a matter of chance but a product of simple arithmetic. A number is prime if and only if no smaller positive integer other than 1 divides it evenly.

Yet that’s not the end of the story. The distribution of the primes looks random, with irregular gaps and clusters that seem quite haphazard. If there’s a pattern, it’s inscrutable. As a matter of fact, the primes look random enough that you could play dice with them. Make a list of consecutive prime numbers (perhaps starting with 11, 13, 17, 19, . . . ) and reduce them modulo 7. In other words, divide each prime by 7 and keep only the remainder. The result is a sequence of integers drawn from the set {1, 2, 3, 4, 5, 6} that looks much like the outcome of repeatedly rolling a fair die.

11 \bmod 7 & \rightarrow 4 \qquad 47 \bmod 7 \rightarrow 5 \\
13 \bmod 7 & \rightarrow 6 \qquad 53 \bmod 7 \rightarrow 4 \\
17 \bmod 7 & \rightarrow 3 \qquad 59 \bmod 7 \rightarrow 3 \\
19 \bmod 7 & \rightarrow 5 \qquad 61 \bmod 7 \rightarrow 5 \\
23 \bmod 7 & \rightarrow 2 \qquad 67 \bmod 7 \rightarrow 4 \\
29 \bmod 7 & \rightarrow 1 \qquad 71 \bmod 7 \rightarrow 1 \\
31 \bmod 7 & \rightarrow 3 \qquad 73 \bmod 7 \rightarrow 3 \\
37 \bmod 7 & \rightarrow 2 \qquad 79 \bmod 7 \rightarrow 2 \\
41 \bmod 7 & \rightarrow 6 \qquad 83 \bmod 7 \rightarrow 6 \\
43 \bmod 7 & \rightarrow 1 \qquad 89 \bmod 7 \rightarrow 5 \\

Working with a larger sample (the first million primes greater than \(10^7\)), I have tallied up the number of primes with each of the six possible remainders mod 7 (otherwise known as the six possible congruence classes mod 7). I have also simulated a million rolls of a six-sided die. Looking at the results of these two exercises, can you tell which is which?

               1        2        3        4        5        6
         166,787  166,569  166,714  166,573  166,665  166,692
             120      -98       47      -94       -2       25

               1        2        3        4        5        6
         166,768  166,290  166,412  166,638  167,282  166,610
             101     -377     -255      -29      615      -57

In each table the first line counts the number of outcomes, \(x\), in each of the six classes; the second line shows the difference \(x - \bar{x}\), where \(\bar{x}\) is the mean value 1,000,000 / 6 = 166,667. In both cases the numbers seem to be spread pretty evenly, without any obvious biases. The first table represents the prime residues mod 7. They have a somewhat flatter distribution than the simulated die, with smaller departures from the mean; the standard deviations of the two samples are 84 and 346 respectively. On the evidence of these tables it looks like either process could supply the randomness needed for a casual game of dice.

There’s more to randomness, however, than just ensuring that the results are evenly distributed across the allowed range. Individual events in the series must also be independent of one another. One roll of a die should have no effect on the outcome of the next roll. As a test of independence, we can look at pairs of successive events. How many times is a 1 followed by another 1, by a 2, by a 3, and so on? A 6 × 6 matrix serves to record the counts of the 36 possible pairs. If the process is truly random, all 36 pairs should be equally frequent, apart from small statistical fluctuations. We can turn the matrix into a color-coded “heatmap,” where cells with higher-than-average counts are shown in warm shades of pink and red, while those below the mean are filled with cooler shades of blue. (The quantity plotted is not the actual count \(x\) but a normalized variable \(w = (x_{i,\,j} - \bar{x})\, /\, \bar{x}\), where \(\bar{x}\) is again the mean value—in this case 1,000,000 / 36 = 27,778.) Here is the heatmap for the simulated fair die:

Figure 1.

Heatmap fair die

Not much going on there. Almost all the counts are so close to the mean value that the matrix cells appear as a neutral gray; a few are very pale pink or blue. It’s just what you would expect if consecutive rolls of a die are uncorrelated, and all possible pairs are equally likely.

Now for the corresponding matrix of consecutive primes mod 7:

Figure 2.

Heatmap 8 digit primes 1000001 mod 7

Well! I guess we’re not in Randomland anymore; this is where the old gray movie turns into Technicolor. The heatmap has a blue streak along the main diagonal (upper left to lower right), indicating that consecutive pairs of primes that have the same value mod 7 are strongly suppressed. In other words, the pairs \((1, 1), (2, 2), \ldots (6, 6)\) appear less often than they would in a truly random sequence. The superdiagonal (just above the main diagonal) is a lighter blue, meaning that \((i, j)\) pairs with \(j=i+1\) are seen at a little less than average frequency; for example, \((2, 3)\) and \((5, 6)\) have slightly negative values of normalized frequency. On the other hand, the subdiagonal (below the main diagonal) is all pink and red; pairs such as \((3, 2)\) or \((5, 4)\), with \(j=i-1\), occur with higher than average frequency. Away from the diagonal, in the upper right and lower left corners, we see a pastel checkerboard pattern.

If you’d prefer to squint at numbers rather than colored squares, here’s the underlying matrix:

                 pairs of consecutive primes mod 7
                   1      2      3      4      5      6
          1    15656  24376  33891  29964  33984  28916
          2    37360  15506  22004  32645  25095  33959
          3    25307  41107  14823  22747  32721  30009
          4    32936  26183  37129  14681  21852  33791
          5    24984  34207  26231  41154  15560  24529
          6    30543  25190  32636  25382  37453  15488

The departures from uniformity are anything but subtle. The third row, for example, shows that if you have just seen a 3 in the sequence of primes mod 7, the next number is much more likely to be a 2 than another 3. If you were wagering on a game played with prime-number dice, this bias could make a huge difference in the outcome. The prime dice are rigged!

These remarkably strong correlations in pairs of consecutive primes were discovered by Robert J. Lemke Oliver and Kannan Soundararajan of Stanford University, who discuss them in a preprint posted to the arXiv in March. What I find most surprising about the discovery is that no one noticed these patterns long ago. They are certainly conspicuous enough once you know how to look for them.

I suppose we can’t fault Euclid for missing them; ideas about randomness and probability were not well developed in antiquity. But what about Gauss? He was a connoisseur of prime tables, and he compiled his own lists of thousands of primes. In his youth, he wrote, “one of my first projects was to turn my attention to the decreasing frequency of primes, to which end I counted the primes in several chiliads . . . .” Furthermore, Gauss more less invented the idea of congruence classes and modular arithmetic. But apparently he never suspected there might be anything odd lurking in the congruences of pairs of consecutive primes.

In the 1850s the Russian mathematician Pafnuty Lvovich Chebyshev pointed out a subtle bias in the primes. Reducing the odd primes modulo 4 splits them into two subsets. All primes in the sequence 5, 13, 17, 29, 37, . . . are congruent to 1 mod 4; those in the sequence 3, 7, 11, 19, 23, 31, . . . are congruent to 3 mod 4. Chebyshev observed that primes in the latter category seem to be more abundant. Among the first 10,000 odd primes, for example, there are 4,943 congruent to 1 and 5,057 congruent to 3. However, the effect is tiny compared with the disparities seen in pairs of consecutive primes.

In modern times a few authors have reported glimpses of the consecutive-primes phenomenon; Lemke Oliver and Soundararajan mention three such sightings. (See references at the end of this article.) In the 1950s and 60s, Stanislaw Knapowski and Paul Turán investigated various aspects of prime residues mod m; in one paper, published posthumously in 1977, they discuss consecutive primes mod 4, with residues of 1 or 3. They “guess” that consecutive pairs with the same residue and those with different residues “are not equally probable.” In 2002 Chung-Ming Ko looked at sequences of consecutive primes (not just pairs of them) and constructed elaborate fractal patterns based on their varying frequencies. Then in 2011 Avner Ash and colleagues published an extended analysis of “Frequencies of Successive Pairs of Prime Residues,” including some matrices in which the diagonal depression is clearly evident.

Given these precedents, are Lemke Oliver and Soundararajan really the discoverers of the consecutive prime correlations? In my view the answer is yes. Although others may have seen the patterns before, they did not describe them in a way that registered on the consciousness of the mathematical community. As a matter of fact, when Lemke Oliver and Soundararajan announced their findings, the response was surprise verging on incredulity. Erica Klarreich, writing in Quanta, cited the reaction of James Maynard, a number theorist at Oxford:

When Soundararajan first told Maynard what the pair had discovered, “I only half believed him,” Maynard said. “As soon as I went back to my office, I ran a numerical experiment to check this myself.”

Evidently that was a common reaction. Evelyn Lamb, writing in Nature, quotes Soundararajan: “Every single person we’ve told this ends up writing their own computer program to check it for themselves.”

Well, me too! For the past few weeks I’ve been noodling away at lots of code to analyze primes mod m. What follows is an account of my attempts to understand where the patterns come from. My methods are computational and visual more than mathematical; I can’t prove a thing. Lemke Oliver and Soundararajan take a more rigorous and analytical approach; I’ll say a little more about their results at the end of this article.

If you would like to launch your own investigation, you’re welcome to use my code as a starting point. It is written in the Julia programming language, packed up in a Jupyter notebook, and available on GitHub. (Incidentally, this program was my first nontrivial experiment with Julia. I’ll have more to say about my experience with the language in a later post.)

All the examples presented above concern primes taken modulo 7, but there’s nothing special about the number 7 here. I chose it merely because the six possible remainders {1, 2, 3, 4, 5, 6} match the faces of an ordinary cubical die. Other moduli give similar results. Lemke Oliver and Soundararajan do much of their analysis with primes modulo 3, where there are only two congruence classes: A prime greater than 3 must leave a remainder of either 1 or 2 when divided by 3. This is the matrix of pair counts for the first million primes above \(10^7\):

                             1       2
                    1   218578  281453
                    2   281454  218514

Figure 3.

Heatmap 8 digit primes 1000001 mod 3

The pattern is rather minimalist but still recognizable: The off-diagonal entries for sequences \((1, 2)\) and \((2, 1)\) are larger than the on-diagonal entries for \((1, 1)\) and \((2, 2)\).

Primes modulo 10 have four congruence classes: 1, 3, 7, 9. Working in decimal notation, we don’t even need to do any arithmetic to see this. When numbers are written in base 10, every prime greater than 5 has a trailing digit of 1, 3, 7, or 9. Here are the frequency counts for the 16 pairs of consecutive final digits:

                          1      3      7      9
                 1    43811  76342  78170  51644
                 3    58922  41148  71824  78049
                 7    64070  68542  40971  76444
                 9    83164  63911  59063  43924

Figure 4.

Heatmap 8 digit primes 1000001 mod 10

The blue stripe along the main diagonal is clearly present, although elsewhere in the matrix the pattern is somewhat muted and muddled.

I have found that the correlations between successive primes show through most clearly when the modulus itself is a prime and also is not too small. Take a look at the heatmaps for consecutive primes mod 13 and mod 17:

Figure 5.

Heatmap 8 digit primes 1000001 mod 13

Figure 6.

Heatmap 8 digit primes 1000001 mod 17

Or how about mod 31?

Figure 7.

Heatmap 8 digit primes 1000001 mod 31

These would make great patterns for a quilt or a tiled floor, no? And there are interesting regularities visible in all of them. Diagonal stripes are prominent not just on the main corner-to-corner diagonal but throughout the matrix. Those stripes also generate a checkerboard motif; along any row or column, cells tend to alternate between red and blue. A subtler feature is an approximate bilateral symmetry across the antidiagonal (which runs from lower left to upper right). If you were to fold the square along this line, the cells brought together would be closely matched in color. (This is a fact noticed by Ash and his co-authors.)

As a focus of further analysis I have settled on looking at consecutive primes mod 19, a modulus large enough to yield clearly differentiated stripes but not so large as to make the matrix unweildy.

Figure 8.

Heatmap 8 digit primes 1000001 mod 19

How to make sense of what we’re seeing? A starting point is the observation that all the primes in our sample are odd numbers, and hence all the intervals between those primes are even numbers. For any given prime \(p\), the next candidates for primehood are \(p+2, p+4, p+6, \ldots\). Could this have something to do with the checkerboard pattern? If the steps between primes must be multiples of 2, that could certainly create correlations between every second cell in a given column or row. (Indeed, the every-other-cell correlations would be starkly obvious—all even-numbered entries would be exactly zero—if the modulus were an even number. It is only by “wrapping around” the edge of the matrix at an odd boundary that any of the even-numbered cells can be populated.)

The diagonal stripes in the matrix suggest strong correlations between all pairs of primes separated by a certain numerical interval. For example, the deepest blue diagonal and the brightest red diagonal are formed by cells separated by six places along the j axis. In the first row are cells 1 and 7, then 2 and 8, 3 and 9, and so on. It occurred to me that this relationship would be easier to perceive if I could “twist” the matrix, so that diagonals became columns. The idea is to apply a cyclic shift to each row; all the values in the row slide to the left, and those that fall off the left edge are reinserted on the right. The first row shifts by zero positions, the next row by one position, and so on. (Is there a name for this transformation? I’m just calling it a twist.)

When I wrote the code to apply this transformation, the result was not quite what I expected:

Figure 9.

Twisted nz heatmap 8 digit primes 1000001 mod 19

What are those zigzags all along the antidiagonal? I guessed that I must have an off-by-one error. Indeed this was the nature of the problem, though the bug lay in the data, not the algorithm. The matrices I’ve displayed in all the figures above are only partial; they suppress empty congruence classes. In particular, the matrix for pairs of primes modulo 19 ignores all primes congruent to 0 mod 19—on the sensible-sounding grounds that there are no such primes. After all, if \(p > 19\) and \(p \equiv 0 \bmod 19\), then \(p\) cannot be prime because it is divisible by 19. Nevertheless, a row and a column for \(p \equiv 0 \bmod 19\) are properly part of the matrix. When they are included, the color-coded tableau looks like this:

Figure 10.

Heatmap z 8 digit primes 1000001 mod 19

The presence of the zero row and column makes the definition of the twist transformation somewhat tidier: For each row \(i\), apply a leftward cyclic shift of \(i\) places. The resulting twisted matrix is also tidier:

Figure 11.

Twisted z heatmap 8 digit primes 1000001 mod 19

What do those vertical stripes tell us? In the original matrix, entry \(i, j\) represents the frequency with which \(i \bmod 19\) is followed by \(j \bmod 19\). Here, the color in cell \(i, j\) indicates the frequency with which \(i \bmod 19\) is followed by \((i + j) \bmod 19\). In other words, each column brings together entries with same interval mod 19 between two primes. For example, the leftmost column includes all pairs separated by an interval of length \(0 \bmod 19\), and the bright red column at \(j = 6\) counts all the cases where successive primes are separated by \(6 \bmod 19\).

The color coding gives a qualitative impression of which intervals are seen more or less commonly. For a more precise quantitative measure, we can sum along the columns and display the totals in a bar chart:

Figure 12.

Column sums twisted 8 digit primes 1000001 mod 19

Three obervations:

  • The even-odd disparity noted above is clearly visible in this graphic as well. With the exception of 0, every even interval stands out above its odd neighbors.
  • An inter-prime interval of 6 mod 19 is the most popular of all. Multiples of 6 (namely 12 and 18) are also favored but a little less so.
  • The interval 0 mod 19 is remarkably unpopular. These are the entries along the main diagonal of the original matrix, so it’s no surprise their sum is on the low side, but the deficit is deeper than I had guessed.

I wanted to understand the origin of these patterns. What makes interval 6 such a magnet for pairs of consecutive primes, and why do almost all the primes shun poor interval 0?

For the popularity of 6, I already had an inkling. In the 1990s Andrew Odlyzko, Michael Rubinstein, and Marek Wolf undertook a computational study of prime “jumping champions”:

An integer D is called a jumping champion if it is the most frequently occurring difference between consecutive primes ≤ x for some x.

Among the smallest primes (x less than about 600), the jumping champion is usually 2, but then 6 takes over and dominates for quite a long stretch of the number line. Somewhere around \(x = 10^{35}\), 6 cedes the championship to 30, which eventually gives way to 210. Odlyzko et al. estimate that the latter transition takes place near \(x = 10^{425}\). The numbers in this sequence of jumping champions—2, 6, 30, 210, . . . —are the primorials; the nth primorial is the product of the first n primes.

Why should primorials be the favored intervals between consecutive primes? If \(p\) is a large enough prime, then \(p + 2\) cannot be divisible by 2, \(p + 6\) cannot be divisible by either 2 or 3, \(p + 30\) cannot be divisible by 2, 3, or 5, and in general \(p + P_{n}\), where \(P_{n}\) is the nth primorial, cannot be divisible by any of the first n primes. Of course \(p + P_{n}\) might still be divisible by some larger prime, or there might be another prime between \(p\) and \(p + P_{n}\), so that the interprime interval is certainly not guaranteed to be a primorial. But these intervals have an edge over other contenders.

We can see this reasoning in action by taking the differences between successive elements in our list of a million eight-digit primes, then plotting their frequencies:

Figure 13.

Diffs 8 digit primes 1000001 color

Again interval 6 is the clear standout, accounting for 13.7 percent of the total; higher multiples of 6 also poke above their immediate neighbors. And note the overall shape of the distribution: a lump at the left (with a peak at 6), followed by a steady decline. The trend looks a little like a Poisson distribution, and indeed this is thought to be the correct description.

The color scheme slices the data set into tranches of 19 values each. The blue tranche, which includes inter-prime intervals of length 0 to 18, accounts for 68 percent of all the intervals present in the sample of a million primes; the gold tranche adds another 23 percent. The remaining 9 percent are spread widely and thinly. Not all of the intervals are shown in the graph; the spectrum extends as far as 210. (A single pair of consecutive primes in the sample has a separation of 210, namely 20,831,323 and 20,831,533.)

Figure 13 seems to reveal a great deal about the patterns of consecutive primes mod 19. I can make the graph even more informative with a simple rearrangement. Slide each 19-element tranche to the left until it aligns with the 0 tranche, stacking up bars that fall in the same column. Thus the second (gold) tranche moves left until bar 19 lines up with bar 0, and the third (rose) tranche brings together bar 38 with bar 0. Physically, this process can be imagined as wrapping the graph around a cylinder of circumference 19; mathematically, it amounts to reducing the inter-prime intervals modulo 19.

Figure 14.

Diffs 8 digit primes 1000001 color stacked mod 19

If you ignore the garish colors, Figure 14 is identical to Figure 12: All the bar heights match up. This should not be a surprise. In Figure 12 we reduce the primes modulo 19 and then take the differences between successive reduced primes; in Figure 14 we take the differences between the primes themselves and then reduce those differences modulo 19. The two procedures are equivalent:

\[(p \bmod 19 - q \bmod 19) \bmod 19 = (p - q) \bmod 19.\]

Looking at the colors now, the pieces of the puzzle fall into place. Why are primes mod 19 so often separated by an interval of 6? Well, “mod 19” has very little to do with it; 6 itself is by far the most common interval between primes in this sample. The only other nonnegligible contribution to \(\delta \equiv 6 \bmod 19\) comes from the third tranche, specifically a few pairs of primes at a distance of 44.

The predominance of the first tranche also explains the disparity between odd and even intervals. All the intervals in the first tranche are necessarily even; odd intervals (mod 19) begin to appear only with the second tranche (intervals 19 to 37) and for that reason alone they are less well populated. For the eight-digit primes in this sample, more than two-thirds of consecutive pairs are closer than 19 units and thus wind up in the first tranche. (The median spacing between the primes is 12. The mean interval is 16.68, in close accord with the theoretical prediction of 16.72.)

Finally, Figure 14 also has something to say about the rarity of 0 intervals mod 19. No two consecutive primes can fall into the same congruence class mod 19 unless they are separated by a distance of 38 or some multiple of 38. Thus such pairs don’t enter the scene until the beginning of the third tranche, and there can’t be very many of them. The million-prime sample has 8,384 consecutive pairs at a distance of 38—less than 1 percent of the total. This is the main reason that a prime-number die so rarely shows the same face twice in a row. It’s the origin of the blue diagonal streak in all the matrices.

I find it interesting that we can explain so much about the pattern of consecutive primes mod m without delving into any of the deep and distinctive properties of prime numbers. In fact, we can replicate much of the pattern without introducing primes at all.

Two hundred years ago, Gauss and Legendre observed that in the neighborhood of a number \(x\), the fraction of all integers that are prime is about \(1 / \log x\). In 1936 the Swedish mathematician Harald Cramér suggested that we interpret this fraction as a probability. The idea is to go through the integers in sequence, accepting each \(x\) with probability \(1 / \log x\). The numbers in the accepted set will not be prime except by coincidence, but they will have the same large-scale distribution as the primes. Here are the first few entries in a list of a million such “Cramér primes,” where the random selection process started with \(x = 10^7\):


Now suppose we put these numbers through the same machinery we applied to the primes. We’ll reduce each Carmér prime mod 19 and then construct the 19 × 19 matrix of successors:

Figure 15.

Heatmap 8 digit cramers 1000007 mod 19

The prominent diagonal features look familiar, but they are much simpler than those in the corresponding prime diagrams. For any Cramér prime p mod 19, the most likely successor is p + 1 mod 19, and the least likely is p + 19 mod 19. Between these extremes there’s a smooth gradient in frequency or probability, with just a few small fluctuations that can probably be attributed to statistical noise.

One thing that’s missing from this matrix is the checkerboard motif. We can restore some of that structure by generating a new set of numbers I call Cramér semiprimes. They are formed by the same kind of probabilistic sieving of the integers, but this time we consider only the odd numbers as candidates, and adjust the probability to \(2\, / \log x\) to keep the overall density the same:

Figure 16.

Heatmap 8 digit semicramers 1000001 mod 19

That’s more like it! With all even numbers excluded from the sequence, the minimum interval between semicramers is 2, and that is also the likeliest spacing.

With one further modification, we get an even closer imitation of the true prime matrix. In addition to excluding all integers divisible by 2, we knock out those divisible by 3, and adjust the probability of selection accordingly. Call the resulting numbers Cramér demisemiprimes.

Figure 17.

Heatmap 8 digit demisemicramers 1000001 mod 19

Note that 6 mod 19 is the likeliest interval between Cramér demisemiprimes, just as it is between true primes, and there are the same echoes at intervals of 12 and 18. Indeed, the only conspicuous difference between this matrix and Figure 10 (the corresponding diagram for true primes) is in the column and the row for numbers congruent to 0 mod 19. There can be no such numbers among the primes. If we eliminate them also from the Cramér numbers, the two matrices become hard to distinguish. Here they are side by side:

Figure 18.

Primes cramers comparison z

If you look closely, there are differences to be found—check out the diagonal extending southeast from row 1, column 15—but overall these modified Cramer numbers are shockingly good fake primes. Even the symmetry about the antidiagonal is visible in both diagrams. And keep in mind that the two sets have only about 19 percent of their values in common; the Cramérs include 189,794 genuine primes.

I have one more twist to add to the tale. All the examples above are based on primes (or prime analogues) of eight decimal digits, or in other words numbers in the vicinity of \(10^7\). Do the same conclusions hold up with larger primes? Consider the tableau created by consecutive pairs of a million 40-digit primes, taken mod 19. The pattern is familiar but faded:

Figure 19.

Heatmap 40 digit primes 9166432 mod 19

Going on to primes of 400 digits each, again reduced mod 19, we find the colors have been bleached almost to oblivion:

Figure 20.

Heatmap 400 digit primes 6567379 mod 19

The blue streak on the main diagonal is barely discernible, and other features amount to mere random mottling.

Thus it seems that size matters when it comes to pairs of consecutive primes. For a hint as to why, take a look at the tally of differences between successive primes for the 40-digit sample:

Figure 21.

Diffs 40 digit primes 9166432

Compared with the distribution of intervals for eight-digit primes (Figure 13), the spectrum is much broader and flatter. In this representation the graph is truncated at interval 240; the long tail actually stretches far to the right, with the largest gap between consecutive primes at 1,328. Also, as predicted by Odlyzko and his colleagues, the most frequent interval between 40-digit primes is not 6 but 30.

Because of the wider distribution of intervals, the first tranche cannot dominate the behavior of the system in the way it does with eight-digit primes. When we stack up the tranches mod 19 (Figure 22, below), the first six or eight tranches all make substantial contributions. The odd-even alternation is still present, but the amplitude of these oscillations is much reduced. The leftmost bar in the graph, representing intervals congruent to 0 mod 19, is stunted but not as severely.

Figure 22.

Diffs 40 digit primes 9166432 stacked mod 19

The flattening of the spectrum becomes even more pronounced in the sample of a million 400-digit primes:

Figure 23.

Diffs 400 digit primes 6567379

Now the gaps between primes extend all the way out to 15,080, creating almost 800 tranches mod 19 (though only 13 are shown). And there’s a lot of intriguing, comblike structure in the array. In general, bars at multiples of 6 stand out at almost double the height of their near neighbors, showing the continuing influence of the smallest prime factors 2 and 3. Multiples of 6 that are also multiples of 30 reach even greater heights. Values in the sequence 42, 84, 126, 168, 210, . . . are also enhanced; these numbers are multiples of 42 = 2 × 3 × 7. And notice that 210, which is a multiple of 6 and 30 and 42, is the new champion interval, again supporting an Odlyzko prediction.

Despite all this intricate structure, when the bars are stacked up mod 19, the mixing of the 800 tranches is so thorough that the heights are almost uniform. All that’s left is a tiny bit of even-odd disparity.

Figure 24.

Diffs 400 digit primes 6567379 stacked mod 19

And the chronically unpopular class of intervals congruent to 0 mod 19 has finally caught up with its peers. Most of the height of the bars comes not from the dozen early tranches but from the hundreds of later ones representing intervals between 228 and 15,080 (all lumped together in the teal green area of the graph).

The experiments with large primes suggest a plausible surmise: As the size of the primes goes to infinity, all traces of correlations will fade to gray, and consecutive pairs of primes will be as random as rolls of an ideal die. But is it so? There are several reasons to be skeptical of this hypothesis. First, if we scale the modulus m along with the size of the primes—making it comparable in magnitude to the median gap between primes—the correlations may still show through. For my 40-digit sample, the median gap between primes is 66, so let’s look at the successive-pairs matrix mod 61. (To limit statistical noise, I did this computation with a sample of 10 million 40-digit primes rather than 1 million.)

Figure 25.

Heatmap 40 digit primes 5053744 mod61

The stripes are back! Indeed, in addition to the familiar bright red stripes at intervals of 6, there’s a more diffuse pink-and-blue undulation with a period of 30. I would love to see a matrix for primes of 400 digits, which might well have even more complex features, with interacting waves at periods of 6, 30, and 210. Regrettably, I can’t show you that figure. The median gap between 400-digit primes is about 640, so we’d need to set m equal to a prime in this range, say 641. Filling that 641 × 641 matrix would require about a billion consecutive 400-digit primes, which is more than I’m prepared to calculate.

There are other reasons to doubt that the correlations disappear entirely as the primes grow larger. The comblike structure seen so clearly in Figures 21 and 23 suggests that rules of divisibility by small primes have a major influence on the distribution of large primes mod m—and this influence does not wane when the primes grow larger still. Furthermore, even when m is much smaller than the median inter-prime interval, the blue streak remains faintly visible. Here is the matrix for pairs of consecutive 400-digit primes mod 3:

                             1       2
                    1   248291  251128
                    2   251127  249453

Differences between on-diagonal and off-diagonal elements are much smaller than with eight-digit primes (compare Figure 3), but the discrepancies still don’t look like random variation.

To get a clearer picture of how the correlations vary as a function of the size of the primes, I set out to sample the sequence of primes over the entire range from 1-digit numbers to 400-digit numbers. In this project I decided to go Gauss one better: He tabulated primes by the chiliad (a group of 1,000), and I’ve been computing them by the myriad (a group of 10,000). To measure the correlations among primes mod m, I calculated the mean value of the diagonal elements of the matrix and the mean of the off-diagonal elements, then took the off/on ratio. If successive primes were totally uncorrelated, the ratio should converge to 1.

Figure 26 shows the result for 797 myriads of primes mod 3. The curve is concave upward, with a steep initial decline and then a much flatter segment. Starting at about 100 digits, there are samples with off/on ratios of less than 1, meaning that the diagonal is more densely populated than the off-diagonal regions. But even at 400 digits the majority of the ratios are still above 1. What are we looking at here? Does the curve slowly approach a ratio of 1, or is there a limiting value slightly greater than 1? Unfortunately, computational experiments will not give a definitive answer.

Figure 26.

Myriads diag ratios 3

The paper by Lemke Oliver and Soundararajan brings quite different tools to bear on this problem. Although they do some numerical exploration, their focus is on finding an analytic solution. The goal is a mathematical function or formula whose inputs are four positive integers: m is a modulus, a and b are congruence classes of primes mod m, and x is an upper limit on the size of the primes. The formula should tell us how often a is followed by b among all primes smaller than x. If we were in possession of such a formula, we could color in all the squares in the m × m successor matrix without ever having to compute the actual primes up to x.

Describing the behavior of all primes up to x is far more challenging than taking a sample of primes in the neighborhood of x. And the analytic approach is harder in other ways: It requires ideas rather than just cpu cycles. The reward is also potentially greater. The equation \(A = \pi r^2\) yields an exact truth about all circles, something no finite series of computations (with a finite approximation to \(\pi\)) can give us. There’s the promise not just of rigor but of insight.

Sadly, I’ve not yet been able to gain much insight from reading the analysis of Lemke Oliver and Soundararajan. The blame lies mainly with gaping holes in my knowledge of analytic number theory, but I think it’s also fair to say that the math gets pretty hairy at certain points in this discourse. The equation below constitutes the Main Conjecture of Lemke Oliver and Soundararajan. (I have made a minor change of notation and simplified one aspect of the equation: The original applies to sequences of r consecutive primes, but this version describes pairs only, i.e., \(r = 2\).)

\[\pi(x; m, a, b) = \frac{\mathrm{li}(x)}{\phi(m)^2}\left(1 + c_1(m; a, b)\frac{\log \log x}{\log x} + c_2(m; a, b) \frac{1}{\log x} + O\Big( \frac{1}{(\log x)^{7/4}} \Big) \right)\]

I think I understand enough of what’s going on here to at least offer a glossary. To the left of the equal sign, \(\pi(x; m, a, b)\) denotes a counting function; whereas \(\pi(x)\) counts the primes up to \(x\), \(\pi(x; m, a, b)\) is the number of pairs of consecutive primes mod \(m\) that fall into the congruence classes \(a\) and \(b\). To the right of the equal sign, the main coefficient \(\mathrm{li}(x) / \phi(m)^2\) is essentially the mean or expected number of pairs if the primes were distributed uniformly at random, with no correlations between successive primes; \(\mathrm{li}(x)\) is the logarithmic integral of \(x\), an approximation to \(\pi(x)\), and \(\phi(m)^2\) is the Euler totient function, which counts the square of the number of possible congruence classes for \(m\), or in other words the number of elements in the successor matrix.

The leading term inside the large parentheses is simply \(1\), and so it takes on the value of the main coefficient \(\mathrm{li}(x) / \phi(m)^2\); thus the mean number of pairs \((a, b)\) becomes the first approximation to the counting function. The three following terms act as corrections to this first approximation; for large \(x\) they should get progressively smaller, because \(\log \log x / \log x \gt 1 / \log x \gt 1 / (\log x)^{7/4}\) whenever \(x \gt e^e \approx 15\).

What about the coefficients of those three correction terms? The notation O(\cdot) for the smallest term indicates that we’re only going to worry about the term’s order of magnitude—which will be small for large \(x\). The coefficient \(c_1\) takes the following form in the case of \(r = 2\):

\[c_1(m; a, b) = \frac{1}{2} - \frac{\phi(m)}{2} (\#\{a \equiv b \bmod m\})\]

The expression \(\#\{a \equiv b \bmod m\}\) counts the number of cases where \(a\) and \(b\) lie in the same congruence class mod \(m\). Thus the effect of the term (if I understand correctly) is to reduce the overall count along the matrix diagonal, where \(a \equiv b \bmod m\).

As for coefficient \(c_2\), Lemke Oliver and Soundararajan remark that “in general, [it] seems complicated.” Indeed it does. And so this is the place where I should encourage those readers who want to know more to go read the original.

The complexity of the mathematical treatment leaves me feeling frustrated, but it’s hardly unusual for an easily stated problem to require a deep and difficult solution. I hang onto the hope that some of the technicalities will be brushed aside and the main ideas will emerge more clearly with further work. In the meantime, it’s still possible to explore a fascinating and long-hidden corner of number theory with the simplest of computational tools and a bit of graphics.

“God may not play dice with the universe, but something strange is going on with the prime numbers”—so said Paul Erdős and/or Mark Kac, though only with a little help from Carl Pomerance. The strangeness seems to be at its strangest when we play dice with the primes.

Addendum 2016-06-14. I noted above that the distribution of primes mod 7 seems flatter, or more nearly uniform, than the result of rolling a fair die. John D. Cook has taken a chi-squared test to the data and shows that the fit to uniform distribution is way too good to be the plausible outcome of a random process. His first post deals with the specific case of primes modulo 7; his second post considers other moduli.


Ash, Avner, Laura Beltis, Robert Gross, and Warren Sinnott. 2011. Frequencies of successive pairs of prime residues. Experimental Mathematics 20(4):400–411.

Chebyshev, Pafnuty Lvovich. 1853. Lettre de M. le Professeur Tchébychev à M. Fuss sur un nouveaux théorème relatif aux nombres premiers contenus dans les formes 4n + 1 et 4n + 3. Bulletin de la Class Physico-mathematique de l’Academie Imperiale des Sciences de Saint-Pétersbourg 11:208. Google Books

Cramér, Harald. 1936. On the order of magnitude of the difference between consecutive prime numbers. Acta Arithmetica 2:23–46. PDF

Derbyshire, John. 2002. Chebyshev’s bias.

Granville, Andrew. 1995. Harald Cramér and the distribution of prime numbers. Harald Cramér Symposium, Stockholm, 1993. Scandinavian Actuarial Journal 1:12–28. PDF

Granville, Andrew, and Greg Martin. 2004. Prime number races. arXiv

Hamza, Kais, and Fima Klebaner. 2012. On the statistical independence of primes. The Mathematical Scientist 37:97–105.

Klarreich, Erica. 2016. Mathematicians discover prime conspiracy. Quanta.

Knapowski, S., and P. Turán. 1977. On prime numbers ? 1 resp. 3 mod 4. In Number Theory and Algebra: Collected Papers Dedicated to Henry B. Mann, Arnold E. Ross, and Olga Taussky-Todd, pp. 157–165. Edited by Hans Zassenhaus. New York: Academic Press.

Ko, Chung-Ming. 2002. Distribution of the units digit of primes. Chaos Solitons Fractals 13(6):1295–1302.

Lamb, Evelyn. 2016. Peculiar pattern found in ‘random’ prime numbers. Nature doi:10.1038/nature.2016.19550.

Lemke Oliver, Robert J., and Kannan Soundararajan. 2016 preprint. Unexpected biases in the distribution of consecutive primes. arXiv

Odlyzko, Andrew, Michael Rubinstein, and Marek Wolf. 1999. Jumping champions. Experimental Mathematics 8(2):107–118.

Rubinstein, Michael, and Peter Sarnak. 1994. Chebyshev’s bias. Experimental Mathematics 3:173–197. Project Euclid

Tao, Terrence. Structure and randomness in the prime numbers. PDF

Posted in computing, featured, mathematics | 7 Comments

Where’s My Petabyte Disk Drive?

Fourteen years ago I noted that disk drives were growing so fast I couldn’t fill them up. Between 1997 and 2002, storage capacity doubled every year, allowing me to replace a 3 gigabyte drive with a new 120 gigabyte model. I wrote:

Extrapolating the steep trend line of the past five years predicts a thousandfold increase in capacity by about 2012; in other words, today’s 120-gigabyte drive becomes a 120-terabyte unit.

Extending that same growth curve into 2016 would allow for another four doublings, putting us on the threshold of the petabyte disk drive (i.e., \(10^{15}\) bytes).

None of that has happened. The biggest drives in the consumer marketplace hold 2, 4, or 6 terabytes. A few 8- and 10-terabyte drives were recently introduced, but they are not yet widely available. In any case, 10 terabytes is only 1 percent of a petabyte. We have fallen way behind the growth curve.

The graph below extends an illustration that appeared in my 2002 article, recording growth in the areal density of disk storage, measured in bits per square inch:

Graph of hard disk areal density in bits per square inch from 1956 to 2016

The blue line shows historical data up to 2002 (courtesy of Edward Grochowski of the IBM Almaden Research Center). The bright green line represents what might have been, if the 1997–2002 trend had continued. The orange line shows the real status quo: We are three orders of magnitude short of the optimistic extrapolation. The growth rate has returned to the more sedate levels of the 1970s and 80s.

What caused the recent slowdown? I think it makes more sense to ask what caused the sudden surge in the 1990s and early 2000s, since that’s the kink in the long-term trend. The answers lie in the details of disk technology. More sensitive read heads developed in the 90s allowed information to be extracted reliably from smaller magnetic domains. Then there was a change in the geometry of the domains: the magnetic axis was oriented perpendicular to the surface of the disk rather than parallel to it, allowing more domains to be packed into the same surface area. As far as I know, there have been no comparable innovations since then, although a new writing technology is on the horizon. (It uses a laser to heat the domain, making it easier to change the direction of magnetization.)

As the pace of magnetic disk development slackens, an alternative storage medium is coming on strong. Flash memory, a semiconductor technology, has recently surpassed magnetic disk in areal density; Micron Technologies reports a laboratory demonstration of 2.7 terabits per square inch. And Samsung has announced a flash-based solid-state drive (SSD) with 15 terabytes of capacity, larger than any mechanical disk drive now on the market. SSDs are still much more expensive than mechanical disks—by a factor of 5 or 10—but they offer higher speed and lower power consumption. They also offer the virtue of total silence, which I find truly golden.

Flash storage has replaced spinning disks in about a quarter of new laptops, as well as in all phones and tablets. It is also increasingly popular in servers (including the machine that hosts Do disks have a future?

open-case photo of a 15-year-old 2.5-inch disk drive

In my sentimental moments, I’ll be sorry to see spinning disks go away. They are such jewel-like marvels of engineering and manufacturing prowess. And they are the last link in a long chain of mechanical contrivances connecting us with the early history of computing—through Turing’s bombe and Babbage’s brass gears all the way back to the Antikythera mechanism two millennia ago. From here on out, I suspect, most computers will have no moving parts.

Maybe in a decade or two the spinning disk will make a comeback, the way vinyl LPs and vacuum tube amplifiers have. “Data that comes off a mechanical disk has a subtle warmth and presence that no solid-state drive can match,” the cogniscenti will tell us.

“You can never be too rich or too thin,” someone said. And a computer can never be too fast. But the demand for data storage is not infinitely elastic. If a file cabinet holds everything in the world you might ever want to keep, with room to spare, there’s not much added utility in having 100 or 1,000 times as much space.

In 2002 I questioned whether ordinary computer users would ever fill a 1-terabyte drive. Specifically, I expressed doubts that my own files would ever reach the million megabyte mark. Several readers reassured me that data will always expand to fill the space available. I could only respond “We’ll see.” Fourteen years later, I now have the terabyte drive of my dreams, and it holds all the words, pictures, music, video, code, and whatnot I’ve accumulated in a lifetime of obsessive digital hoarding. The drive is about half full. Or half empty. So I guess the outcome is still murky. I can probably fill up the rest of that drive, if I live long enough. But I’m not clamoring for more space.

One factor that has surely slowed demand for data storage is the emergence of cloud computing and streaming services for music and movies. I didn’t see that coming back in 2002. If you choose to keep some of your documents on Amazon or Azure, you obviously reduce the need for local storage. Moreover, offloading data and software to the cloud can also reduce the overall demand for storage, and thus the global market for disks or SSDs. A typical movie might take up 3 gigabytes of disk space. If a million people load a copy of the same movie onto their own disks, that’s 3 petabytes. If instead they stream it from Netflix, then in principle a single copy of the file could serve everyone.

In practice, Netflix does not store just one copy of each movie in some giant central archive. They distribute rack-mounted storage units to hundreds of internet exchange points and internet service providers, bringing the data closer to the viewer; this is a strategy for balancing the cost of storage against the cost of communications bandwidth. The current generation of the Netflix Open Connect Appliance has 36 disk drives of 8 terabytes each, plus 6 SSDs that hold 1 terabyte each, for a total capacity of just under 300 terabytes. (Even larger units are coming soon.) In the Netflix distribution network, files are replicated hundreds or thousands of times, but the total demand for storage space is still far smaller than it would be with millions of copies of every movie.

A recent blog post by Eric Brewer, Google’s vice president for infrastructure, points out:

The rise of cloud-based storage means that most (spinning) hard disks will be deployed primarily as part of large storage services housed in data centers. Such services are already the fastest growing market for disks and will be the majority market in the near future. For example, for YouTube alone, users upload over 400 hours of video every minute, which at one gigabyte per hour requires more than one petabyte (1M GB) of new storage every day or about 100x the Library of Congress.

Thus Google will not have any trouble filling up petabyte drives. An accompanying white paper argues that as disks become a data center specialty item, they ought to be redesigned for this environment. There’s no compelling reason to stick with the present physical dimensions of 2½ or 3½ inches. Moreover, data-center disks have different engineering priorities and constraints. Google would like to see disks that maximize both storage capacity and input-output bandwidth, while minimizing cost; reliability of individual drives is less critical because data are distributed redundantly across thousands of disks.

The white paper continues:

An obvious question is why are we talking about spinning disks at all, rather than SSDs, which have higher [input-output operations per second] and are the “future” of storage. The root reason is that the cost per GB remains too high, and more importantly that the growth rates in capacity/$ between disks and SSDs are relatively close . . . , so that cost will not change enough in the coming decade.

If the spinning disk is remodeled to suit the needs and the economics of the data center, perhaps flash storage can become better adapted to the laptop and desktop environment. Most SSDs today are plug-compatible replacements for mechanical disk drives. They have the same physical form, they expect the same electrical connections, and they communicate with the host computer via the same protocols. They pretend to have a spinning disk inside, organized into tracks and sectors. The hardware might be used more efficiently if we were to do away with this charade.

Or maybe we’d be better off with a different charade: Instead of dressing up flash memory chips in the disguise of a disk drive, we could have them emulate random access memory. Why, after all, do we still distinguish between “memory” and “storage” in computer systems? Why do we have to open and save files, launch and shut down applications? Why can’t all of our documents and programs just be everpresent and always at the ready?

In the 1950s the distinction between memory and storage was obvious. Memory was the few kilobytes of magnetic cores wired directly to the CPU; storage was the rack full of magnetic tapes lined up along the wall on the far side of the room. Loading a program or a data file meant finding the right reel, mounting it on a drive, and threading the tape through the reader and onto the take-up reel. In the 1970s and 80s the memory/storage distinction began to blur a little. Disk storage made data and programs instantly available, and virtual memory offered the illusion that files larger than physical memory could be loaded all in one go. But it still wasn’t possible to treat an entire disk as if all the data were all present in memory. The processor’s address space wasn’t large enough. Early Intel chips, for example, used 20-bit addresses, and therefore could not deal with code or data segments larger than \(2^{20} \approx 10^6\) bytes.

We live in a different world now. A 64-bit processor can potentially address \(2^{64}\) bytes of memory, or 16 exabytes (i.e., 16,000 petabytes). Most existing processor chips are limited to 48-bit addresses, but this still gives direct access to 281 terabytes. Thus it would be technically feasible to map the entire content of even the largest disk drive onto the address space of main memory.

In current practice, reading from or writing to a location in main memory takes a single machine instruction. Say you have a spreadsheet open; the program can get the value of any cell with a load instruction, or change the value with a store instruction. If the spreadsheet file is stored on disk rather than loaded into memory, the process is quite different, involving not single instructions but calls to input-output routines in the operating system. First you have to open the file and read it as a one-dimensional stream of bytes, then parse that stream to recreate the two-dimensional structure of the spreadsheet; only then can you access the cell you care about. Saving the file reverses these steps: The two-dimensional array is serialized to form a linear stream of bytes, then written back to the disk. Some of this overhead is unavoidable, but the complex conversions between serialized files on disk and more versatile data structures in memory could be eliminated. A modern processor could address every byte of data—whether in memory or storage—as if it were all one flat array. Disk storage would no longer be a separate entity but just another level in the memory hierarchy, turning what we now call main memory into a new form of cache. From the user’s point of view, all programs would be running all the time, and all documents would always be open.

Is this notion of merging memory and storage an attractive prospect or a nightmare? I’m not sure. There are some huge potential problems. For safety and sanity we generally want to limit which programs can alter which documents. Those rules are enforced by the file system, and they would have to be re-engineered to work in the memory-mapped environment.

Perhaps more troubling is the cognitive readjustment required by such a change in architecture. Do we really want everything at our fingertips all the time? I find it comforting to think of stored files as static objects, lying dormant on a disk drive, out of harm’s way; open documents, subject to change at any instant, require a higher level of alertness. I’m not sure I’m ready for a more fluid and frenetic world where documents are laid aside but never put away. But I probably said the same thing 30 years ago when I first confronted a machine capable of running multiple programs at once (anyone remember Multifinder?).

The dichotomy between temporary memory and permanent storage is certainly not something built into the human psyche. I’m reminded of this whenever I help a neophyte computer user. There’s always an incident like this:

“I was writing a letter last night, and this morning I can’t find it. It’s gone.”

“Did you save the file?”

“Save it? From what? It was right there on the screen when I turned the machine off.”

Finally the big questions: Will we ever get our petabyte drives? How long will it take? What sorts of stuff will we keep on them when the day finally comes?

The last time I tried to predict the future of mass storage, extrapolating from recent trends led me far astray. I don’t want to repeat that mistake, but the best I can suggest is a longer-term baseline. Over the past 50 years, the areal density of mass-storage media has increased by seven orders of magnitude, from about \(10^5\) bits per square inch to about \(10^{12}\). That works out to about seven years for a tenfold increase, on average. If that rate is an accurate predictor of future growth, we can expect to go from the present 10 terabytes to 1 petabyte in about 15 years. But I would put big error bars around that number.

I’m even less sure about how those storage units will be used, if in fact they do materialize. In 2002 my skepticism about filling up a terabyte of personal storage was based on the limited bandwidth of the human sensory system. If the documents stored on your disk are ultimately intended for your own consumption, there’s no point in keeping more text than you can possibly read in a lifetime, or more music than you can listen to, or more pictures than you can look at. I’m now willing to concede that a terabyte of information may not be beyond human capacity to absorb. But a petabyte? Surely no one can read a billion books or watch a million hours of movies.

This argument still seems sound to me, in the sense that the conclusion follows if the premise is correct. But I’m no longer so sure about the premise. Just because it’s my computer doesn’t mean that all the information stored there has to be meant for my eyes and ears. Maybe the computer wants to collect some data for its own purposes. Maybe it’s studying my habits or learning to recognize my voice. Maybe it’s gathering statistics from the refrigerator and washing machine. Maybe it’s playing go, or gossiping over some secret channel with the Debian machine across the alley.

We’ll see.

Posted in computing | 37 Comments