Wednesday, May 28, 2003

The Heart Grows Fonder

I've got Brookline Town Meeting all night this week, and then I'll be travelling for a couple of days. I probably won't be posting much. I'll be back on Monday.
Universal Hammer Equivalence

I've been reading a lot of things lately that claim that some various process, usually at the molecular level, is Turing-complete, meaning that it meets Turing's definition of a universal machine, capable of following any mechanical algorithm.

And it occurred to me that it's like taking one of those Dremels and calling it hammer-complete because you can bang on nails with it.

Why are we treating the Turing Machine like it's the ceiling, rather than the floor? There are provably models more powerful than the Turing Machine, and I fear that we're ignoring possible implementations as we experiment. If all you have is a hammer, everything does start looking like a nail.

Friday, May 23, 2003

Nameless

Here's a half-baked proposal for a programming language feature. Start with a regular structured language, and then remove the ability to refer to any other modules. Modules -- whatever modules look like, I'm not quite sure yet -- can communicate with other modules through two primitives: emit and accept. Emit and accept are functionally equivalent to sending and receiving messages, but they are not addressed to (or from) specific entitites. They only, like cells (and the biological inspiration becomes clear) put various siglals out into the environment and fish things back out of it.

Now, a better refinement is to somehow specify the "shape" of the messages, so as to restrict what you're willing to accept. After all, you don't want to accept every output that every other module is producing. So perhaps the primitive would be used in something like "accept(text)" or even "accept(sorted text)." This begins to sound a little bit like Linda-style tuples (implemented in Java as JavaSpaces).

The idea is to get something a little bit like Unix pipes, something so incredibly low-to-the-ground that anything can implement it, and is so general that you can hook up dozens of applications to produce a really powerful processing. This would improve on pipes in two ways: it would be networked, not just a linear sequence, and it would be automatic, as producers and consumers of various properties of data are coordinated by the fact of their matching emit and accept statements.

Thursday, May 22, 2003

The Oldest Computer

Okay, a couple of weeks ago we talked about what the fastest computer in the world is. Now, what's the oldest? As Neil Gaiman might say, no, older than that? Older than that?

A good candidate is the Antikythera Mechanism, a first-century B.C. Greek device for calculating the positions of the planets. (Also see simulations of it working and another description.)

But no, even older. A lot older.

The ciliate protozoans of genus Oxytricha and Stylonychia do a peculiar thing: they rewrite their DNA, editing and rearranging it in the process. The end result is a clean, junk-free DNA sequence that correctly codes all the various proteins that these little guys need. But what's bizarre and fascinating about it is that this rearranging process is provably equivalent to the Turing Machine. It's a universal computer, and it's millions of years old.

Wednesday, May 21, 2003

The Lie of Time

Returning to our theme of the mirrors of illusion that make programming suffering.

Another limitation of the Turing Machine is that it has a linear view of time. One step at a time, it crunches through the computation, reading, writing, moving the tape, and then (maybe) halting. Even non-deterministic Turing Machines only move through time in one direction. We only move through time in one direction too, so it's not unreasonable, but it's unimaginative.

There is an exception, which I mentioned a week or so ago. Undo. A tiny way for you to roll back your last operation, removing the effects of whatever dumb thing you did. Sometimes, you can keep undoing, sliding backwards in time, destroying your actions. (Commercial databases often have a much more sophisticated idea of rollback.)

But even Undo is linear, and the fact that it destroys your actions when you do something new is frustrating and limiting. Why can't we reach into an arbitrary point in the execution of a program, change an assumption or an action, and then stand back and see what would have happened? David Patterson at Berkeley applies something like this in search of reliable computing; he calls it "Rewind, Repair, Replay."

That's a step forward (backward?) but it's still a fundamentally linear view of time. Let's expand our minds just a little bit more. What if many simultaneous executions were happening at once? Each possible decision, resolution of ambiguity, or branch represents a fork into two possible futures. Why not do them all? Another ten doublings of Moore's Law will give us a thousand-fold improvement in performance; certainly we can spare a hundred-fold just to run multiple versions.

Mix that in with undo/redo, and you've got something like four-dimensional programming: moving back and forth through time, hopping from potential universe to potential universe, intervening, responding, and changing the assumptions at the beginning of time. Alfonso X said "If I was present at the creation, I would have given some useful hints for the better ordering of the universe." Programmers make mistakes; why shouldn't they have the capability that Alfonso wished for?

Monday, May 19, 2003

More on Windmills

There were some comments on BoingBoing on the last post. I thought I'd restate. Any self-correcting system can be in homeostasis, or use a cybernetic control. IBM autonomic systems are homeostatic, so is the canonical example of a thermostat-controlled room.

That's not the point. A windmill is homeostatic without recourse to an external monitor. We might call it isohomeostatic, as opposed to the IBM or thermostat metahomeostatic. Of course, those are terrible words and I wouldn't really use them in polite company.

I will point out, however, that evolution is an isohomeostat: the force of change (death of the unfit) is exactly the thing that imparts the change (change in the gene frequency of the surviving population).

Friday, May 16, 2003

Tilting at Windmills

There's a lot of talk in the air about autonomic computing, mostly but not entirely coming from IBM. The idea, which is compelling, is that more computer functions should work the way our autonomic nervous system works; that is, without any conscious intervention.

Leaving aside the criticism that the inspiration seems to be just the name, not the biology, there's something that sort of bothers me about their implementation. I won't be able to fully explain this, maybe someone can help me out in a comment.

But think for a moment how windmills work. The wind blows and turns the blades, sure. But if the wind shifts direction, then the tail of the windmill brings the blades back into alignment. Let's deconstruct that in autonomic terms: the windmill adapted its state to the new environment, using the external change as both the power and the alignment for the internal change.

That is, the windmill used the wind to respond to the wind. The two things--the external force and the internal response--are actually the same.

Now how do IBM-style autonomic systems work? They've got a separate monitor process, and if it sees its sub-process misbehaving, runs an algorithm and decides what to do, then does it. So a flood of traffic to a web server sets a trigger, which in turn starts computation, and eventually the system is reconfigured. Not hardly as elegant.

I want a word, and eventually software, that captures the way windmills work.
More on Emptiness

It occurred to me later that Scott McCloud's excellent book, Understanding Comics makes a similar point about emptiness. According to McCloud, it is the relative abstraction of comic book figures that allows us to empathize with them, to, figuratively, enter them and the comic.

I'm feeling a lot of other thoughts buzzing around this idea of Applied Emptiness; it probably deserves a standalone essay instead of these fast-twitch blog entries.

Thursday, May 15, 2003

Use Comes From What Does Not

At David Weinberger's talk at the O'Reilly Emerging Technology Conference, he talked about the problem with "old-school" knowledge management software, and by extension, all kinds of failed social software. This is something David and I had talked about together some, and I've been meaning to put it down in text.

Since I've been in a quoting mood lately, I'll start with a verse from Lao Tse's Tao te Ching, from the Peter Merel version:

11. Tools
Thirty spokes meet at a nave;
Because of the hole we may use the wheel.
Clay is moulded into a vessel;
Because of the hollow we may use the cup.
Walls are built around a hearth;
Because of the doors we may use the house.
Thus tools come from what exists,
But use from what does not.


Let's think about the difference between tools and use in the context of software. Software tools are really just like tools in the physical world: artifacts with defined boundaries, crafted by an expert, designed for a specific purpose. Use, on the other hand, is negative space, an undefined purpose in which people can grow, customize, and live.

One more metaphor to finish warming us up. There's an industry process standard called ISO 9000. [Caveat: I'm nothing at all like an expert on this other than I worked for a company while it went through certification.] Basically, ISO 9000 mandates that for every thing that a business does, how the business handles that has to be written down in a formal process. Everything. If, for example, that company just wings it, then bang, it fails. It's not "ISO 9000" compliant.

I won't comment on whether this makes sense for business. (Well, okay, I will: it's dumb.) But for software, and more specifically social software like knowledge management, it's fatal. It's like building a room with no doors, or even more, a solid room. The idea of traditional corporate KM is to completely define all possible interactions, yet this is completely antithetical to the idea of social interaction, and doomed to fail (as, in many cases, it did).

Getting involved in the whole is-social-software-real debate is probably not a good idea, but a man's posts should exceed his sense, else what's a blogger for? Let me throw my hat into the ring: social software is software that is what I could call Tao te Ching verse 11 (TTC 11) compliant: it offers emptiness in which social interactions can exist.

Wednesday, May 14, 2003

The Invisible User

A lot has been written on the importance of user-centric design, and considering the user interface as a core element of software design, rather than as a last-minute, after-the-fact process. I'm not an expert in this field, so I can't really add to it, but I'll just make a few observations.

First, note that the Turing machine model has no user; it is a self-contained entity reading and writing to a tape. The entire concept of interaction, perhaps key to models of computation beyond the Turing machine, is missing from the classic conception. This may be at the root of the design philosophy that exiles users to the outside of software, looking in.

Unfortunately, it gets even worse. Even today, software conferences are often filled with an ugly undertone of contempt for the user. While jokes of "luser" seem to be thankfully dated now, the attitude the produced them is still with us. Is it any wonder the software industry produces such crap?

Tuesday, May 13, 2003

More from JvN

A couple of days ago I referred to a classic 1952 paper by John von Neumann, "Probabilistic Logics and the Synthesis of Reliable Organisms from Unreliable Components." Here's a great quote from it.

"Error is viewed, therefore, not as an extraneous and misdirected or misdirecting accident, but as an essential part of the process under consideration -- its importance in the synthesis of automata being fully comparable to that of the factor which is normally considered, the intended and correct logical structure."


This isn't all that different from the usual rap on commercial software--more concerned with functionality than reliability. What can we do, in technology, in method, in culture, in the market, to reverse the situation?

Monday, May 12, 2003

The Lie of Modularity

One of the first things taught to a computer science student is the idea of modularity, that a problem can be cut into smaller problems, and those smaller problems can be cut into yet smaller problems, and so on, until the sub-sub-sub-problems are so incredibly simple that even you! can solve them. Now there are good reasons that software design is taught this way: it makes for easier design, and in theory it makes for less buggy software.

The basic idea was articulated by computer scientists like David Parnas back in the early 1970s. Modularity, also known as separation of concerns, was based on the idea that by separating parts of the programs into separate pieces, the total number of interdependencies would drop. Otherwise, the complexity of programs would be too large to effectively design or debug.

The problem with this idea, however, is that it's basically flawed. The goal -- reducing dependencies -- is reasonably, but it's also an illusion. Those dependencies are there, like it or not, and as software is called upon to accomplish more complicated tasks, it will be easier to see through the illusion.

This illusion was most powerfully revealed to me by a computer scientist named Daniel Dvorak, who works at the NASA Jet Propulsion Laboratory, designing software for spacecraft. When a program running on a deep space probe wants a file, it reads it from a hard drive, pretty much like an earth-bound computer would. However, when that disk drive spins up, not only does that account for a significant amount of the spacecraft's available power, not only does it generate heat and vibration which might interfere with scientific instruments, but it actually imparts a gyroscopic stabilization along the axis of spin! Now all of these side-effects are present on earth as well, we just don't need to worry about them because our available resources -- electricity, cooling fans, the entire mass of the earth -- so overwhelmingly swamp the size of the side effect.

But as we build more complex software, responsible for handling larger and more significant tasks, we are becoming more like that spacecraft: the side effects of running our software will spill over into other modules. The object-oriented programming pillar of "encapsulation," the principle that state can be isolated and owned by a single object, will no longer accurately model the behavior of complex systems.

Fortunately, there's a set of technologies that are solving at least part of this problem, and I'll talk about them later in this series.

Friday, May 09, 2003

Hearse and Buggy, Part III

Finally we come to the first word in the title of this three-part series. The sad fact is that software programs don't last forever, and eventually it's time to have them be adopted by your uncle with the farm, so there's room for them to run around...

More bluntly, software sticks around too long. In fact, there are two major failure modes here:

1. Software that should die doesn't. This is exactly the cause of the Y2K crisis: software that was written decades earlier, by programmers who couldn't imagine it would still be in use in the year 2000.

2. Software that dies too young. This is how data gets stranded, when applications that are the only way to access a particular data format fade away, possibly because the company that produced them lost interest or disappeared.

As software creators (and buyers!), we need to start thinking about the role of a piece of software in an ecology, and what happens at the end of the lifecycle. We should be designing in graceful exit strategies, and we should be building in guarantees that the data will live beyond the application.

In a funny way, then, bugs may help out with this. As a software artifact becomes more complex, as it inevitably will, exponentially more bugs will be introduced. Eventually, the bugs will be so numerous and so irritating that the customer will demand to move to a new target, thus giving innovation a chance. Just like when a tree falls in a forest (whether anyone hears it or not), and the sunlight suddenly reaches the floor, illuminating all sorts of seedlings reaching for the light.

Thursday, May 08, 2003

Hearse and Buggy, Part II

So what are we supposed to do? MIT computer scientist Martin Rinard talks about the idea of the "acceptability envelope," which is based on the realization that for many programs, there's not just a single correct way to execute, but perhaps a range of acceptable responses. This might not necessarily mean different numerical responses (although for many applications it might), but could simply refer to the number of possible orderings of a series of actions. In short, our goal as software creators shouldn't be to guarantee one single trajectory through a large function space, but to define the constraints within which a program is still considered to be correct.

There are many other interesting proposals for making software more reliable, most of which have been around for years, if not decades. Of course, simpler code would be more likely to succeed. Here's a quote, courtesy of my father:

"There are two ways of constructing a software design: One way is to make it so simple that there are obviously no deficiencies, and the other way is to make it so complicated that there are no obvious deficiencies."
-- C.A.R. Hoare


It occurs to me one more time reading that that computer science training, perhaps unlike any other field, is in a complete vacuum from the words of the founding masters. I'll do a longer writeup on that later.

Back to bug-awareness. The hardware reliability folks have an array of techniques, most based on kinds of redundancy and failover. Design-by-contract, assertions, preconditions, and postconditions are ways to constrain behavior, as are, in a small way, languages that support exception throwing (which are finally becoming mainstream).

However, one potential way to deal with bugs is to do the same thing I know you do with Spider Solitaire: Undo. But that would require us to change the way we think about time in software, another one of the bricks of the Turing Prison. I'll talk about that in a couple of days.

Wednesday, May 07, 2003

Hearse and Buggy, Part I

One of the greatest lies that we tell ourselves about software is that we can find bugs. But bugs, or errors, or malfunctioning code, or whatever, are not only unavoidable, they are probably necessary. Today, buggy software inflicts $68 billion of damage on the U.S. economy per year, according to the National Institute of Standards and Technology; I think that is probably an underestimation by at least one order of magnitude, especially when we consider not just fatal bugs, but aggravating, annoying, inconvenient bugs that just cause the users extra time, attention, or hassle.

Before I go on, I must provide yet another diversion and talk about Grace Murray Hopper. As you ought to know, Admiral Hopper is one of the great founders of computer science. She is often cited as the originator of the term "bug," due to a story that in 1945, when looking for a problem in the Mark II, one of the first computers, she found a moth. This was taped into her lab notebook with the inscription "First actual case of bug being found." (read the Navy's official version, with pictures) Now the story is true as far as it goes, but the whole point is that it was a joke: if the term "bug" hadn't been in use already, it wouldn't have been funny. The term bug goes back at least to Edison. Hopper and her team may have been the first to use it in a computer context, but they hardly invented the term.

Anyway. Our mistake is in thinking that we can find these bugs, but the complexity of modern software means that we can't possibly find them all. We must accept as a basic fact of life, some kind of equivalent to a law of thermodynamics, that the software will contain bugs. (Jon Bentley once observed that all programs contain at least one bug, and all programs can be reduced by one line. The corrolary of these two laws is that all programs can be reduced to one, non-working line.)

So what do you do? Modern software companies seem to have approached this problem by laying off their QA departments, which while I applaud the willingness to embrace new realities, isn't really what I'm talking about. We need to start writing software with the assumption built in that it will fail.

This isn't a new idea; another founder of the field, John von Neumann, wrote a paper in 1952 called "Probabilistic Logic and the Synthesis of Reliable Organisms from Unreliable Components"; but it's fifty years later and we still aren't following his advice.

The final point is a suggestion that bugs may not be a bad thing. In evolution, it is the presence of tons of excess, non-coding, irrelevant, or even malfunctional genes that act as a repository for functional innovation. When the environment changes somehow, and the species must adapt, it is often the "bugs" or non-optimal designs that provide the basis for changes. As we try to make software more adaptive and autonomous, bugs may similarly become the wellspring of change and innovation.

Tuesday, May 06, 2003

The Unbearable Precision of Languages

Ask a programmer why people can't just program in English (or whatever natural language), and he's likely to tell you that it's because English is a naturally ambiguous language, and that computer programming requires precision. Perhaps he'll go into greater detail, and wax on about how literal computers are, and how they don't "know" what you mean, and only do exactly what you tell them to do.

As if this is some sort of bad thing! The irony comes in when we realize that we humans manage to get by using such a (by implication) terrible language, full of imprecision and ambiguity. But the ambiguity of human language has two sides of the coin: we manage despite it, and we succeed because of it.

We manage despite it because our brains, being the wonderful organs they are, can hold multiple simultaneous meanings long enough to resolve the ambiguity. If you start speaking and say "The running..." you might mean running as an adjective ("the running program", and technically it's a gerundive) or as a noun ("the running of the bulls," and a gerund). In fact, "running program" might be the program that is running now, or it might be a program that is about running, perhaps a jogging diary. The point is that when you say the first word, the hearer's brain doesn't know how to classify it yet, but manages, in real time, to hold it in place and resolve it when the additional information comes in.

We manage because of it because ambiguity is an essential element in communication. For one reason, if language wasn't ambiguous, it would by necessity be incredibly long, as all possible meanings would have to be excluded. Ambiguity lets us slide by using short hand, knowing that the recipient of the message will be able to (most likely) resolve the ambiguity correctly. Also, the ambiguity goes hand in hand with the incredible expressiveness of language: a more formal and restricted language would be less prone to ambiguity, but would limit the kinds of utterances possible.

Now there's some more irony here. In the linguistics community, the idea of the Turing Machine and computation is the best thing since sliced phonemes. They assume that the brain is doing computation. From a computer scientist's point of view, however, what the brain's doing seems impossibly more advanced than the sorts of things our Turing Machines do.

So the point is that, in principle, there's nothing stopping programmers from employing ambiguity in a useful way. Now I'm not implying that full computer comprehension of natural language is right around the corner, nor do I think that natural languages really ought to be our programming language of choice. However, we've imprisoned ourselves in a straitjacket of non-ambiguity, when it might help us express ourselves so much more clearly.

If you're more interested in linguistics, a great book to pick up is Steven Pinker's The Language Instinct.

Friday, May 02, 2003

What's the Program?

Here's a little quiz you can take at home. When I talk about a computer program, what do you consider the definitive program? What's the real thing, and what's a secondary artifact? Specifically, do you think that the source code of a program is the real program, or is it the executable?

Okay, it was a bit of a trick question; I think it's neither. I think that the real program is the act of execution. Thinking back to the Turing Machine, we've inheriteted the idea of a string of symbols as a program. In fact, the Turing Machine was a step ahead, since it could dynamically rewrite the program as it executed. But the advent of the so-called Harvard architecture, in which data and code was separated, was the beginning of the process that froze programs into unchanging, unchangeable entities. And while a host of programming languages do offer the ability to dynamically change a program, today they are relegated to laboratories or museums. (Or laboratories in museums.)

We need to start thinking more dynamically: a program is a process, not a set of immutable laws. Once we start thinking about it as process, there are other things we can start to consider: the role of feedback; learning; more dynamic error recovery (more on this in a couple of days); debugging as a process of diagnosis and intervention into a complex dynamic, and potentially chaotic, system. In fact, the very idea that a program is a file is one so basic that we don't even think about it. But there's no physical law that says they have to be, and it's just one of the mirrors of illusion that imprison us in this world of suffering.

Thursday, May 01, 2003

The Fastest Computer in the World?

In 1998, two researchers published a report in Science (Y. Duan and P. A. Kollman, “Pathways to a Protein Folding Intermediate Observed in a 1-Microsecond Simulation in Aqueous Solution,” Science 282, No. 5389, 740–744 (1998)) that reported their efforts to simulate the folding of a protein. Briefly, one of the central questions in molecular biology is determining why (or how) a protein, made up of a given sequence of amino acids, folds into a specific three-dimensional shape. Right now, we're almost completely unable to predict that 3D shape, making it very difficult to do things like rational drug discovery or proteomics.

So these researchers borrowed a 256-node Cray T3E, one of the fastest computers in the world. They ran it for three months. And at the end of three months, they had simulated only a microsecond of the protein folding.

Now we can think of this problem in two ways. One, as IBM did, is to think about the opportunity for research in computer architecture to build a really, really fast computer, optimized for the message-passing characteristics of simulating protein folding.

Or, think about it this way: the protein, a single molecule, "figures out" how to fold in a little more than a microsecond!. One little molecule! It just does it! That's something like a trillion-fold difference in performance, if you buy that a protein folding is something like computation.

What's going on? How can a simple little molecule accomplish a task in an instant that one of the fastest machines in the world struggles to do over a course of many months? It may have something to do with the inherent massive parallelism of nature: each atom in the protein molecule acts independently. It may have something to do with the "bus" of nature: every pair of atoms can affect each other (by electromagnetic forces, for example) at the same time. And it may have something to do with the fact that nature has infinite precision, while digital computers are by their nature quantized. [It's true that time and space may be quantized, but they may not be, and in any case if they are, it is at an incredibly tiny scale.]

In any case, we must recognize the potential that every protein in our bodies may in some way represent a computer a trillion times faster than one of the most powerful human-built computers in the world.