It was not quite a century ago that we got our first glimpse of molecules. William Lawrence Bragg, with a little help from his dad, figured out how to get molecules to sit still long enough for a portrait. First you had to crystallize the substance, then shoot x-rays through the carefully mounted crystal, then record the lace-doily pattern of diffracted rays on photographic film.
After all that lab work came the really hard part: analyzing the pattern of bright dots in order to reconstruct the positions of atoms in three-dimensional space. This was a difficult inverse problem, something like deducing the shape of a musical instrument from the sounds it emits. (The EDSAC, the first working stored-program computer, was put to work deciphering x-ray diffraction patterns circa 1950.)
Finally it came time to build a model of the molecule–and in those days a model occupied physical rather than virtual reality. In the 1970s I visited Max Perutz, who had by then spent more than 30 years working out the structure of the hemoglobin molecule. His offices were cluttered with modeling artifacts: stacked sheets of transparent plastic, marked with hand-drawn contour maps of electron density, lumpy clay and plaster extrusions showing the overall form of the protein at low resolution, and the now-familiar tinker-toy assemblies of balls and sticks.
It was hard-won knowledge, and I thought it heroic science. I still do. And yet everyone knew all along–Perutz more emphatically than anyone else–that those rigid, static models of proteins were highly misleading. In the living cell, biological macromolecules do not sit immobile like bronze statues. They are machines with moving parts; they continually flex and wiggle, mesh and then disengage, spin, flap, bend, stretch; all day long they do a hyperkinetic hokey-pokey.
I have now seen a remarkable performance of that molecular dance. In a talk at Harvard earlier this week David E. Shaw showed two videos, each portraying about a millisecond in the life of a single protein molecule. A millisecond may not sound like much, but the video was created by computing atomic motions at roughly one step per femtosecond. That’s 1012 steps in all. (If you included all the steps in the video, and displayed them at 60 frames per second, the show would go on for 500 years.)
Shaw was once a computer scientist at Columbia, then he went off to make some billions on Wall Street. (He was introduced to the Harvard audience as “King Quant.”) He has now turned to computational molecular biology, setting up his own lab and building a series of special-purpose computers designed for molecular-dynamics simulations. The machines are called Anton, in honor of Leeuwenhoek. Shaw’s group has built eight of them so far, each with 512 processors. A kiloprocessor model is expected to come on line in a few weeks.
The basic idea behind the computations is simple. Start with the initial positions and velocities of all the atoms. Calculate the force that each atom exerts on every other atom, and the resulting acceleration. Wash, rinse, repeat. For a system of N atoms, the naive version of this algorithm has performance proportional to N2; this quadratic growth is a bit of a problem, because the model includes not only several hundred atoms in the protein itself but also up to 50,000 atoms in the surrounding solvent. So Anton takes some shortcuts. The big one is to do a full accounting of pairwise interactions only for atoms within a limited radius; the distribution of more-distant atoms is remapped to a mesh of discrete points. But even after this winnowing of the problem, the calculation of pairwise forces remains the principal bottleneck. It is solved by throwing hardware at it: 32 × 512 parallel pipelines implemented on custom silicon. There’s more on Anton’s architecture and algorithms here; the Shaw Research web site lists lots of other publications as well, but most of them are not accessible without payment.
As far as I can tell, the videos of proteins in motion are not yet available anywhere, and that’s really too bad. They might well be the next dance sensation on YouTube. Watching them in the lecture hall, I was so bedazzled that I neglected to note the identity of the molecules. One was an ion channel, a protein that spans the width of a membrane and controls the passage of some specific ion (potassium, I think, in this case). We watched the six polypeptide strands twisting closed like the blades of a camera iris, shutting off the channel. Another simulation showed an even more dramatic reconfiguration. For many microseconds of biological time, and perhaps half a minute of wall-clock time, the protein sat nervously quivering and fidgeting, hunched up in a compact globule, with occasional minor adjustments to various loops and corners. And then suddenly the whole molecule opened up like a flower blooming; a moment later it closed again. If I understand correctly what Shaw was telling us, the existence of this alternative state had been known from experimental evidence, but the transformation had never been seen before. And, as he remarked early in the talk, “seeing what it looks like” brings a level of understanding that would be hard to achieve by more analytic methods.
Which brings me to my one gripe. The truth is, we still don’t know what a protein really looks like, and we never will, because “looking” is not a well-defined notion for objects smaller than the wavelength of light. Color, for example, is just not meaningful in this realm, and surface texture is also problematic. Thus schemes for depicting molecules are necessarily a matter of convention. It’s worth giving some careful thought to those conventions, choosing graphic forms that convey as much as possible about what we do know without inviting spurious inferences about what we don’t know (such as color and texture).
Some of Shaw’s illustrations use a ribbon-and-sheet scheme invented 30 years ago by Jane Richardson, which still seems to work well for showing the overall architecture of a protein. But other diagrams and videos use a ball-and-stick model to represent atomic detail, and this strikes me as a less-happy choice. Watching that jiggling assembly of balls and sticks (black for carbon, red for oxygen, etc.), I kept seeing a shiny, brittle, plastic model of a protein rather than the protein itself. Surely there are better graphic devices.
Update: Thanks to Ron Dror of D. E. Shaw Research for pointing out an error in my description of the Anton algorithm: Distant charges are not mapped to a continuum distribution but to a mesh of discrete points. (I’ve made a correction above.) Ron also notes that an article on Anton in Communications of the ACM is available in the CACM digital edition.