Wednesday, February 4, 2009

Cells: The Secret Transcript

The Boss asked me to give the group fifteen minutes on Cells because I have been talking about it for a while as a future better mousetrap for us and then suddenly last week threatened actually to apply it to qooxdoo and the front end.

I forwarded to everyone a link to a reasonably complete yet relatively brief write-up which tells you everything you need to know about Cells. I know that if I were in your shoes I would not have read it so I presume no one has. But I would like to determine how many folks I will be boring to tears if I review said document, so I will first cut to the chase and ask if anyone has any questions based on what they read.

........silence..............

OK. Cells is at once the simplest and hardest thing in the world for programmers to understand. Simple because the idea is just to have slot values of objects work like cells in a spreadsheet, and everyone knows how spreadsheets work. What is hard is understanding that one can program computers this way.

I only have fifteen minutes so: Yes, you can. Program inputs are assigned by good old imperative code to input cells the same way a user types values into a spreadsheet when they are doing what-if analysis or recording, say, actual monthly expenditure into a budget spreadsheet. Intermediate, derived, and aggregate cells compute new values based on those new inputs from predefined rules just as user changes to a spreadsheet propagate to other spreadsheet cells. Observers on cells let the emergent working model manifest its decisions with more good old fashioned imperative code, usually by simply updating the screen or playing a sound or controlling some external device over a serial port or updating a database.

Boom, we're done. Why is it so great? What part of the superiority of functional and declarative paradigms should I explain first? As I outlined in the material you did not read, a lot of things have to happen when a program receives an input. The programmer coding the event handler has to look at the event and decide all the things that have to happen in light of that event, and any things that follow from those first things. Not only must they reliably see to all those things, but they must do them in the right order. The analogy to a real spreadsheet is quite strong, if you imagine hand-implementing a spreadsheet with old-fashioned pencil and paper.

So the first things Cells does is eliminate a lot of work and thus a lot of bugs. Because the work eliminated is tedious, Cells also makes programming a lot more fun. But there is more.

The declarative paradigm means I always know why a slot has a certain value, because all the logic appears in one place, in the rule assigned to that slot. Without Cells any number of lines of code may have assigned a value to a particular slot and a unified deriving rule certainly cannot be divined even if one were to track them all down.

There is more. Most people hate OO because it never quite panned out. Objects turned out not to be reusable. One of the nicest features of Cells is that two different instances can have different rules for the same slot. That makes objects reusable. Yayyyyyyy.

I did not use Cells in the Kleaner because that was more of a straight calculation running from start to finish. I like to decribe the role for Cells being in any situation where one has an unpredictable stream of data and one is keeping a model with a sufficiently large amount of internal state consistent with that stream of inputs. Two examples being a GUI and a RoboCup client. The Kleaner worked by compiling statistics from a fixed store and then translating exactly once dirty data into corrected data.

Where Cells would have been useful would have been in implementing Phil's ideas about an ongoing stream of data leading to rediscernment of things previously discerned. Cell rules would take new raw inputs and propagate them over to tables of probabilities which would then reach out to existing cleaned data and possibly redecide from the original raw state a new cleaned state.

So no, I do not use Cells for everything, but in this case it was only because the full functionality had not been addressed.

A fun note is that I have in the past applied Cells to a database, specifically the old AllegroStore persistent CLOS database. This works two ways. One is that a user can be looking at a screen and as the underlying data changes the screen changes. That may sound like old news but with Cells one doe not have to write any code to make it happen. One just says "this view shows this users overdue books" and when the date changes the overdue status on every book gets updated and a new book appears in the list on the screen if someone happens to be looking, simply by someone having written code to list overdue books on the screen as if it were an unchanging value.

The other thing that happens is hinted at above. Things like overdue books and amount of fines owed and paid can be calculated from scratch by reading a users entire history of checkouts and returns, but sometimes it is useful to record such derived values in the database and update them incrementally as books are checked out and returned. We can have code in programs do it and hope they run at the right time, or we can have the code in the database (as datapoints mediated by Cells) and be sure they run and run immediately. We get timeliness of data, efficiency, and we still get consistency even as we introduce redundancy.

I left something out. That is bad because the thing I left out is the thing I am planning to do with Cells on my own anyway and then apply to the FE. Cells makes it dead easy to drive a separate framework from Lisp. In my note I mentioned tcl/Tk and Gtk. These are two killer C GUI frameworks with their own homebrewed little object models. We want to program in Lisp, and we want our models driven by Cells for all the reasons above. No problem. We build a model out of instances of CLOS classes mapping isomorphically onto Tk or GTk classes and use Cell observers to pipe information (thru an FFI or even literally a pipe) to the C library or runtime to drive there the creation and animation of C instances.

Works great, and one amazing programmer Peter Hildebrandt pulled off a trifecta in which he had Cells driving and driven by both GTk and a C physics engine, name forgotten.

For a while I kinda marvelled at how Cells could be so useful for such disparate activities, and do so in the same application, the two activities being building an application model and having some other programming framework dance to that model's tune.

I figured it out in time for ECLM 2008, not they were able to understand me. I opened by telling them that Cells was the single most powerful library they could use, because Cells is about change and nothing is more fundamental than change.

Programming is hard because like someone doing a spreadsheet on paper we programmers end up with the burden of propagating change thoughout our models. It is tedious work, it must be done reliably, and there is a lot of it as internal program state multiplies, exponentially a lot. This exponential growth in interdependence of program state is what led Brooks to declare that a silver bullet was not only unlikely to be found but that it would be impossible to find; he felt the complexity was ineluctable because as states multiply there is nothing that can be done to avoid the explosion of interdependence.

I wrote to Dr Brooks recently and asked him if he had ever looked at dataflow. He said he was familiar with the concept, but no. Oops.