Friday, September 19, 2003

Do We Need Scalars for Programming?

Scalar values -- regular old numbers like integers, floats, and doubles with no associated units -- are the mainstay of programming. Computer architectures are designed to move around integers or floating point numbers pretty rapidly; the original Turing machine architecture assumed that the value of cells on the tape were atomic symbols. Even Claude Shannon's information theory says that the bits being communicated have no intrinsic meaning. This whole a-bit-is-a-bit metaphor for representing numbers has made the life of hardware designers easier, but I don't think it's the right thing for programming anymore.

This isn't to say that we should abandon numbers. No! This is not a hard-core object-oriented screed, that says we must abandon poor old integers for the sake of heavyweight objects that encapsulate their own methods (not that there's anything wrong with that). Numbers are good, but the fact is that when we use numbers, those numbers have meaning. More specifically, those numbers have units.

What's 3? What's 104.2? In the absense of any context, those numbers are meaningless. Only in context can we find out that it's three miles, or 104.2 degrees. Especially for scientific computing, engineering analysis, or financial applications (three fairly important classes of software), a number is almost always rooted in a unit. We all know what happened to the Mars probe when the units didn't match, and software that encoded unit types directly in numeric values would eliminate a large class of fatal but difficult to detect bugs. Two systems I know have this capability: Curl, an MIT spinoff LISPy like language that is positioned as a web client software language to replace Java and HTML, and some work by Paul Morrison take on the problem. But it's silly that standard languages like Java, C#, Perl, etc. don't have this built right into the guts.

Of course, there are countless ways to simulate this, from creating object types with flags representing types, but a robust system would know not only about specific units and measurement systems, but know when length is being compared to length (as in our Mars example), as opposed to, say, temperature, which could produce a compile-time error. Curl goes a step further, and transforms your numbers to mks behind the scenes, so you actually can be fairly loose and sloppy, and it takes care of lining up the units correctly. If you multiply mass times acceleration, you properly end up with a variable that represents force.

There are other applications for scalars, of course, most notably counting and iteration. But I suspect we can do away with all of those as well. That will have to wait for later.