Monday, March 17, 2003

Email as a Platform

This whole spam problem is a symptom of a larger conceptual mistake. For thirty years, we've been thinking about email as if it were an application. We've got word processors to type text, spreadsheets to manipulate numbers, and email apps to read our email, right?

Email's been a huge success, of course. But it could be much more, if we stop limiting it in our minds to a single application, and start thinking of it as a platform for constructing many new applications.

There's one instance of an email-platform app so far, so-called "meeting maker" that lets people schedule meetings. There's a couple proprietary versions, Outlook and Notes support it. But imagine if there was a standard, easy, way to integrate other apps to use email. Email is the most standard of Internet apps, its asynchrony is well matched to dial-up access (and other sources of intermittent access like wireless) and many work patterns.

For example, let's say I wanted to play a game by email with a friend. Today, we'd use a regular email application, and manually update the game with the results of the moves. However, if it was really, really easy for application writers to build on an email platform, then more apps would support this option. In many cases, email would fade into invisibility.

To accomplish this, however, we'd need to change the basic idea of email addressing. Now, the "one account, one address" form is exactly why spam is such a problem. Email addressing would have to become more like IP addressing: both a destination machine (or address) and a port number (or application). If these application-specific addresses had enough bits, which should be no problem, then we can use them as unforgeable capabilities. (This is something that Bob Frankston has been talking about for a while.)

More later.

Saturday, March 15, 2003

View Source and Intellectual Property

For all of the HTML books, classes, and online help web sites, one of the most powerful ways that the knowledge of HTML programming spreads is "View Source." That is, the ability of any visitor -- by definition, anyone who can see the site -- to see how that site was built means that any new technique will quickly spread.

Unfortunately, there's no equivalent to "View Source" for most programs. Sure, there's open source, but I don't want to get into that right now. The point is that copyright law is built on the assumption that people can see the way an idea is put into form; in a book, people can read the words. In lawyer-speak, this is the embodiment of the idea, and you can only copyright embodiments, not ideas.

But for code, you can't really see the embodiment. You can see the results of the embodiment (that is, the executable program), but not what I'd consider to be the true embodiment, the source code. It's clear that copyright law, based as it is on centuries-old legal precedents, didn't really foresee the duality of code and program.

The whole idea of intellectual property law in the United States is that we grant a limited monopoly to the rights-holder in exchange for some public benefit. But without "view source," what's the benefit? A common claim is that people won't create without the guarantee of intellectual property protection, but this seems less useful for code; after all, it is (partly) protected by the compilation process.

With the recent upholding of the Bono Copyright Extension Act, we're stuck with a 95 year corporate copyright term. This is completely overkill for software; it's hard to imagine any software retaining any value after 95 years. (And here's an interesting, if academic question: does a copyright holder have a duty to disclose the copyrighted material after the term expires? Will we have a right to view the source to Windows 3.0 in 2086?)

Intellectual property protection is important, and we need to find a system that protects the rights of holders as well as provides for public benefit. But I don't think that for software, copyright is the right tool. In a later entry, I'll start describing what I think is a better system.

Friday, March 14, 2003

Gigaware

What does it mean to program for a billion users? This thought occured to me recently when meeting with some people from Nokia Research. If you work for Nokia, and you write handset software, there's a good chance that, in a couple of years, your software could be used by a billion people. A billion! Not too long ago, there weren't that many people alive.

Other than the extremely remote possibility of getting every person in China to use your software, a billion users (or "gigaware") seems to demand a couple of features:

First, you need to be able to be accessible to people who read and speak different languages (or, in fact, who are illiterate or only semi-literate). You might get away with using English, since plenty more than a billion people can speak it (although not all as their first language), but people would be more likely to use it if it was native.

Second, you need to be prepared for some pretty odd behavior. As the old joke has it, there's a thousand of those one-in-a-million folks pounding away at your app. You need serious robustness.

Third, you'll probably need to support many different interface modes. Deaf or hard-of-hearing users, blind users, users with dyslexia, users for whom an icon is a religious symbol, etc. It's a well known (but unfortunately not often acted-upon) fact in education theory that different students have different learning styles. Why can't we translate that into advice for interface designers?

Fourth, you'll need a better way to upgrade. To borrow Steve Jobs' calculation style, if each user only takes five minutes to upgrade once a year, that's well more than one hundred entire human lifetimes. And, of course, they won't all upgrade, so there will always be users of every version you ever released.

Is this just an academic thought-exercise? Perhaps we won't really have gigaware, just an increasing number of apps in the hundred-million user range. In any case, though, it seems like these are good design principles, even if you just have a few users.

Thursday, March 13, 2003

Moving Off the Grid

What is it with the hype over grid computing? It's a perfectly interesting technology, but somehow the boosters seem to be ignoring some fundamental facts about distributed computing.

1. Not all problems are parallelizable. Making an algorithm run on parallel programs is not an easy thing. Many problems are inherently linear, and so having a grid computing (or utility, or whatever metaphor you prefer) resource available won't help. Problems that are well-suited to parallelization:
  • Problems where the dataset can be easily partitioned, and there are few relationships between datasets (e.g. Seti@Home).
  • Problems where there's one core calculation, and you want to plug in many possible values as inputs (e.g. many types of simulations, esp. evolutionary computation or hardware design).
  • Problems with very well-understood structure, and parallel algorithms have been developed (e.g. FFT)


2. Even when problems are parallelizable, the messaging profile might defeat the grid. Many parallel algorithms run well with 64 or 128 nodes merrily chugging away, but those nodes must constantly be exchanging the current state of the calculation. This is great for 64-way massive servers (or, in a more extreme fashion, like those thousand-way boxes the Department of Energy orders from IBM periodically), but no so much if you're running over an Ethernet or, worse, the Internet.

A slightly different version of this problem occurs when the data source is the bottleneck. Could you write a thousand-way parallel version of a database query? Well, probably, but you'd also have to replicate the database a thousand times to get the benefit of it, unless you want to make the database a bottleneck.

3. Ever hear of something called Moore's Law? Look, every 18 months the bottom half of the grid market drops out, as processors become fast enough to solve problems previously only the grid could handle.

I don't mean to dismiss the technology entirely. It certainly has important, if niche, applications. But people who expect to be able to just plug in a computer into a mythical grid and, without changing anything else, suddenly have supercomputer resources at their disposal, are in for some disappointment.

Wednesday, March 12, 2003

Programming Languages and the Whorfian Hypothesis

Along with Paul Feyerabend, another important 20th century relativist is a linguist named Benjamin Whorf. Whorf is best known for his part in framing the Whorfian Hypothesis (also known as the Sapir-Whorf Hypothesis, to give credit to his graduate advisor.) Briefly, the Whorfian hypothesis is that language is determinist: your language determines the way in which you think. So if, for example, your language has no way of expressing time (as he (falsely) believed of Hopi), then you would be unable to generate a conceptualization of time.

Generally, this hypothesis is believed to be false, at least in its strong sense. Obviously, there are weak senses in which it may be true. I especially recommend George Lakoff's book Women, Fire, and Dangerous Things and its discussion on Whorf.

But the point here is not about natural languages, but programming languages. From my own personal experience, I'd insist that, in fact, the Whorfian Hypothesis is true in a strong sense. When I learned Prolog, I was a terrible Prolog programmer as long as I kept thinking in terms of traditional structured programming. For example, to add up the values of a list of integers, I would conceptualize the task as an iteration. It was not until I (fairly suddenly) began thinking in Prolog that I could conceptualize the task properly.

The various conceptual models of different languages: functional, structured, logic, object, etc. all impose (or require) vastly different conceptual systems. In this sense, C++, C#, and Java are all essentially the identical language. In fact, that structured+object conceptual model seems to be the only game out there these days. It would be nice if there were a wider variety of languages available, ones that unlocked our brains from just a single way of thinking.

Tuesday, March 11, 2003

Simulation and Its Discontents, Part III

At the turn of the century, a book came out that gathered quite a bit of attention. It said that it could explain the vast complicated mechanisms of life, and startlingly, it could do so with only a single, simple principle. It sold quite well, and was received well in the popular mind (although less so by the scientific community). Its most striking feature was the beautiful pictures it included, showing what complex and sophisticated forms could evolve given only this one, basic principle.

I'm speaking, <irony>of course,</irony>, of Stephane Leduc's The Mechanism of Life, published in 1911. The principle he claimed was basic to all life? Osmosis. And the pictures really did look like the metabolic states of a living cell.

The point is, of course, that he was wrong. (And if you thought I was talking about another book, well, the jury's still out but the parallels are eerie.) Just because you can construct a mechanism that produces results that, superficially, look like the large-scale behavior of the system you are trying to model, doesn't mean you have a true model of a system.

In fact, we must recall statistician George Box's admonition: "All models are wrong, some are useful." Does the model have similar fidelity of data as the real system? Similar sensitivity to changes, to initial conditions? Does it capture scaling effects? Too many simulations claim congruence with a target system without demonstrating in a fundamental way that there is in fact underlying similarity in behavior.

(Note: I am indebted to Evelyn Fox Keller's wonderful book Making Sense of Life for the story of Stephane Leduc.)

Monday, March 10, 2003

Onward! and the Feyerabend Project

A theme that I'll develop over the next few months in this blog is that computer science as a field is off track. That somewhere, in the past few decades, we made a few assumptions that, while they may have served a role then, limit our progress today.

A group of computer scientists, lead by Richard Gabriel of Sun Microsystems is trying to fix computer science. This is known as the Feyerabend Project, named after philosopher of science Paul Feyerabend.

Feyerabend's fascinating thesis was that our beliefs about the "scientific method" are largely fictional. Science does not, according to Feyerabend, actually use the scientific method to make forward progress. Rather, it draws on myth, mistaken belief, and propaganda. One of his best known works, Against Method, claims that not only do scientists not use the scientific method, they shouldn't.

One way you can participate in the Feyerabend Project is to submit a paper to Onward!: Seeking New Paradigms and New Thinking. It's a special track at OOPSLA, one of the best of the computer science conferences. If you've got a wild or crazy idea on how to do software better or improve computer science, you should submit it to Onward. The deadline is March 21st, 2003, so hurry up. Disclaimer: I am a member of the program committee.

Saturday, March 08, 2003


Simulation and Its Discontents, Part II

So what's wrong with simulation? First, allow me to take a short detour through graduate-level computational complexity, for those of you smart or lucky enough to not go to computer science grad school. There's a class of problems known as NP, which among other attributes, are extremely computationally expensive to solve, using the best known current method (technically, it takes exponential time to solve them, e.g. for every additional input, the time it takes to solve will double). For large inputs, this is impractical. The best known instance of this is the so-called Travelling Salesman Problem (also known as TSP), in which you try to determine the shortest possible path visiting a number of cities once each.

A lesser known attribute of NP is something called the certificate, which is the answer. A certificate for TSP would be something like "Start in Boston, then go to Providence, then go to New York...". Now while it takes exponential time to find a certificate, a given certificate can be checked in polynomial time (which is usually much, much faster). So, if I handed you an answer for TSP and claimed that it was less than 10000 miles, you could check that by simply walking through the list and adding up the distance. Done.

Simulations, however, by definition can have no certificate. They are (generally) in another class called P-Space. There's no shortcut: to get the answer, you must run the entire program from beginning to end. This is the great weakness of simulations: once I get to the end, I have no convincing proof that my answer is correct, other than the entire run of the program. Put another way, there's no compression of the information. This makes it very easy for those opposed to the outcomes of a simulation (e.g., that a new highway will not improve traffic congestion) to simply dismiss the results.

Friday, March 07, 2003

Simulation and Its Discontents, Part I



There's been an increasing amount of attention towards the use of simulation lately. Partly, this is a function of the kinds of problems we're interested in solving (climate change, military action, policy interventions, business model/pricing model changes) that aren't particularly amenable to closed-form analysis. It also has a supply-side component: the increasing availability of raw computer resources as well as more sophisticated modeling tools and techniques.

Chief among these is a particular simulation technology known as agent-based modeling, in which the model is constructed of a few hundred to a few thousand actors, each using a small number of simple rules and local information to decide on courses of action.

Simulation is an important technology, but there are a couple of challenges to successfully using simulations, and over the next few days, I want to discuss some of them.

Thursday, March 06, 2003

Welcome to Coherence Engine!



Welcome to Coherence Engine, Geoff Cohen's blog about (among other things) the future of software. We'll try to steer clear of politics, movies, books, or what I had for breakfast. Drop me a line at gac@coherenceengine.com with any comments.