Friday, July 25, 2008

AA, BB, CC, and DD

I am not making it up. Those were the datanames in the code I had inherited from the Wunderkind. So this will be the most boring stating the obvious war story I ever write but that is not why I am writing it. I am writing it because the Lisp subset of the human race is at it again running factual red lights with their prior conviction feet to the pedal.

Yes, kiddies, we revisit today The Unbearable Impenetrability of the Lisper. The cool thing being that Arc is again involved albeit peripherally in this latest train wreck of human comprehension.

Back then the Maddening Crowd was utterly fascinated that I liked Arc which I did not but they were not to be denied, the best part being those people who responded to my objection to the crowd's misperception by saying yeah I saw you were flabbergasted by everyone thinking you liked Arc so tell me, you like Arc? 

No I am not making that up. Twice. 

This time it was my scrimshaw-ready datanames, sample below. Some months after my exploration of Arc I had a laugh as I whipped up a DSL for my Algebra software and found myself approaching Arcitude in the brevity of my names.  I posted something to comp.lang.lisp inadvertently loosing The Hounds of Lisp Density. A sample:

(hard
    (dsb (b x) (rp 2 (rv))
      (m/ (m^ b (m* (r2 7) x))(m^ (xqv b) (xqv x))))
    (dsb (b x) (rp 2 (rv))
      (m* (m^ b (ms* (r+ 7) x))(m^ (xqv b) (ms* (r+ 3) (xqv x)))))
    (dsb (b x) (rp 2 (rv))
      (m*eo (ms^ b (r+ 7))
        (m^ (xqv b) (ms* (r+ 3) (xqv x)))))
    (dsb (b x) (rp 2 (rv))
      (m/eo (ms^ b (r+ 7))
        (m^ (xqv b) (ms* (r+ 3) (xqv x)))))
    (dsb (n d) (rv 2)
      (w (k (r+ 12))
        (m/ (m^ k n) (m^ k d))))
  • dsb is short for the Lisp destructuring-bind.
  • m/ is short for make-fraction.
  • m*eo is short for make-product reordering the factors randomly (either-order)
  • etc etc
I gave a tip of the hat to Arc and explained that I had done this before (in C) and that this code (to generate randomly many varieties of Algebra problems) was a known PITA and even in C I had used the C preprocessor to likewise make the coding manageable.

Enter the Savages of Comp.lang.lisp.To a geek they lectured me on the importance of nice long meaningful names, or as His Sulzberbergerness edified me a ways back what Confucius called The Rectification of Names which is not quite the same thing but I can never resist name-dropping either Jay or Confucius. 

I responded to the jackals nipping at my heels that yeah I know but in this case with vast incessant repetition of a small collection of opcodes that recognition would not be an issue and that it was much better to diminish the low information content (cue Shannon) of long names (what exactly does make-product add to m* in a context where a leading m is used only for makers?)  and learn a dozen opcodes which were mnemonically and predictably built anyway from atoms such as M and * and EO.

The universal response was that longer names were better. My universal response was that these special circumstances flipped the arrow on that otherwise sage rule.

The universal response was that longer names were better.

Trying again, my universal response was that I agreed, but would anyone like to address the salience of the special circumstances I had suggested were germaine, perhaps explaining how they were not special enough or too special or the wrong damn color?

They all responded that longer names were better and I really started to enjoy things at that point. I was reminded of this desert spider that had a routine for burying any captured wasp and it involved positoning the wasp up just so and then going to dig a hole and then dragging in the dead wasp and these researchers would move the wasp a little while the spider was digging so the spider would be thrown off and start again by repositioning the wasp and no matter how many times they moved the wasp while the spider was digging the spider would just start right over readjusting the wasp and then digging the hole. 

The topper was I myself was a staunch proponent of good long names. Tilton's Law of Programming:
 Spend more time on the names you choose than on the algorithm.
 As for the specific quality of length, we have Tilton's Rule of Abbreviation:
Abbreviate no name less than seven characters long and then only if a good abbreviation is no more than half as long. Rounding down.
Which brings us to AA, BB, CC, and DD. I was working as a body shop consultant in easily my most Tall Building job ever. One day this new guy came on board, totally not GQ, scrawny, smart, energetic. I had no idea he was a first round draft pick, destined for greatness. An employee, by the way. Tom. He dives in and starts churning out a front-end application, learning the HLL and OS and tools all at once, a man after my heart. 

At one point he mentions to me a problem. He wants users to be able to make discontinuous menu jumps, say, go sideways without backing up to the menu above (yeah, this was the good old days of modal interfaces) and the programming language would not go sideways. ie, He was using HLL recursion to handle nested menus.

I suggested the obvious: no, you cannot call sideways in a structured language, you have to return from the called function with a "message" always checked by the caller to see if the user should be taken somewhere else. Tom yelled Great! and tore off to code it up.

A couple of weeks later I have inherited the system. Tom was smarter than I: as soon as he had chalked up the win he told his boss to "get some consultant" to maintain it. Moi. So there I am working on the first RFE and I am perusing the code trying to figure out how the hell it works and I find myself slowed a bit by the data names that pretty much controlled everything: AA, BB, CC, and DD.  Come on, you think I could make up names that bad?

So over I wander to Tom's desk, clear my throat, Tom looks up.

"Tom," I said. "I was looking at the code. AA? BB? CC? DD?"

Tom burst out laughing.

"Those are just temporary variables!" he protested, laughing even more.

"Un-hunh," I replied. "And they control the entire program flow."

"Well, change them if you like."

As I said, Tom was a smart cookie, he was wiping his hands of the whole deal.

Now it turns out that one of the things I like to do when working on OPC (Other People's Code) is to stare at it and stare at it and when I see some crucial variable playing a big part in things and its name is getting in my way I pick a better name and do a global change, rinse, repeat until the damn code makes sense. The only reason I had gone to talk to Tom was the same reason we pay to see a two-headed sheep at the carnival.

Two hours into the renaming I had come up with decent names for AA, BB, CC, and DD (that last  one turned out to be three different variables) and I was making the global changes eyeballing each as I stepped through the source and after one such change forgive me I will never in my life remember the specifics but imagine you have just made the substitution and are now looking at a line of code that says:

  total-weight = total-weight + this-length

I did not feel completely comfortable with that edit so I called Tom over. Did I mention he was a smart guy? Two seconds into explaining how I had gotten to that point he yells out, "Great! I gave up on finding that bug!"

Smart guy. Tough bug? He just moved on and chalked up the win. Let some consultant take over the code and run into the bug and think they introduced it.

But there you have it. A bug so hard to find that a very smart programmer gave up on finding it and someone who did not even know the bug existed changed a few datanames and the bug positively jumped off the page. Explaining the corollary to Tilton's Law: 
If the names are right the algorithm will write itself.
Now if only some walking fencepost from c.l.l will post a comment saying... well, that would ruin it, would it not?

15 comments:

  1. I don't suppose that you read Jeff Atwood's blog, do you? He was triggering a discussion about comments, as Jeff is wont to do. My first thought was hey, I hardly comment my lisp code, but my C++ needs them. Next thought was to check c.l.l and see if there was a discussion that justified my theory that lisp's culture of long variable names helps avoid the need for comments. Sure enough, a nice thread with the two pascals and kent pitman, and pretty much everyone agrees, long variable names and unit tests will limit most need for comments in lisp programs.

    Which made me wonder what you thought on the whole subject. It is delightful to see that you wrote a war story about exactly such a thing today. :)

    http://nightschool.near-time.net

    ReplyDelete
  2. Haven't followed cll lately. I like your names. Sorry. :)

    ReplyDelete
  3. This comment has been removed by the author.

    ReplyDelete
  4. This comment has been removed by the author.

    ReplyDelete
  5. Long descriptive names are great, but sometimes, the programmer comes up with the name, then changes the code so that the name no longer represents what the variable holds. Watch out for that one--it's worse than a meaningless name!

    As for comments, the best programmer I ever knew had an automated process for removing comments. Why? "The comments are always wrong." He claimed that with a comment there, you could be lulled into believing that the code did what the comment said. But if you look at the code instead of the comment, you KNOW what it does.

    ReplyDelete
  6. Yes, always begin work on inherited code by removing comments. Even if they were maintained (they are not) they are natural language written by engineers who cannot be understood ordering coffee in a diner. Getting back to comments not being maintained, my saying on that one is, "Comments do not run."

    Late in life I have learned, however, that sometimes a bit of code exists for reasons not obvious, and in fact they are inserted later during debugging because there was a case I did not consider or because my data structures were a little too opaque even for myself and I messed up. Sometimes the comment is just "Make better" or "Don't ask" or "I know, I know", just enough to warn me that this is a hotspot should I come this way again. And one fo the great joys of refactoring is coming across these and simply deleting all that code (and the comments) because the refactoring cleaned that mess up. Which brings me to Tilton's Law: "Every comment is a place the code could be better."

    ReplyDelete
  7. Dare I say that I agree with you completely on this one. Generally, longer names are better. But good names are even better than long names, and sometimes good names are short. Particularly for a specific domain, as you mentioned with your DSL.

    There are acronyms that the military has used for going on a century that many people in the military probably couldn't tell you what the long name was, but they understand the acronym perfectly. To the outsider, it seems impenetrable. To the insider, it's crystal clear.

    As an aside, when I was a kid, hacking assembly language for fun, we'd have to come up with so many labels for various conditional jumps and loops that we started to get slap happy. After putting out "loop59" or so, my friends and I started having contests to come up with the funniest labels. We'd have loops named "banana4me" and such. Not very maintainable, but humorous nonetheless.

    Finally, if comp.lang.lisp has you down, check out LispForum.

    ReplyDelete
  8. You may be talking about Sphex wasps and Caterpillars:
    http://everything2.com/e2node/Sphex

    Comments are often ways to say things that the programming language doesn't allow you to say. Fututure programming languages will allow you to express more of those comments as normal code.

    ReplyDelete
  9. Has anyone heard of Python? The single best thing about the language is integrated doc-testing. You write a couple of examples of usage in the doc-string at the top of the function, and then you can test all such examples later, whenever your code changes.

    ReplyDelete
  10. I stopped reading this post about three paragraphs in. Your writing is an indecipherable mess of garden-path and incomplete sentences with improper capitalization and ambiguous pronouns. Please send your next post to a friend to proof-read it.

    ReplyDelete
  11. haha, my impenetrable style is feature not a bug. I do not need to send it to anyone cuz what you see is deliberate and I am well aware of the start stop parsing required to get through it. The best you can hope for is sufficient tuning to eliminate the start stop deal. What you cannot hope for is punctuation or traditional sentence structure or any diminuition of the sense of where the hell is this sentence going I wish I had put on my seat belt. what I think one needs to do is the koan thing and let go of the left brain compulsion to be presented with structured form.

    hth.

    ReplyDelete
  12. An impenetrable lexicon is ok as long as everyone maintaining the code knows what it means. AA, BB, CC, DD are bad. But your M* crap was ok because your provided documentation to the reader. And you were doing it to have a consistent lexicon.

    Long Names suffer from inconsistent lexicon.
    :(

    See papers from ICSM 2007 (Binkley et al.), MSR 2008 (Hill et al.), ICPC 2008 (Binkley et al. and same people as MSR Hill et al.)

    ReplyDelete
  13. Awesome! I have a sneaking suspicion I inherited some of "Tom's" code... Same variable names, same recycling... If it's not Tom (Tom in my case starts with a 'D'...) it's his twin brother.

    ReplyDelete
  14. Cool! But Tom must have got it from your guy, I changed them all. Unless the guy after me changed them back...oh my. :)

    ReplyDelete
  15. You're wrong about the "universal" parts. I agreed at least partially with you in that discussion (the wasp that learned!)...

    Leslie

    ReplyDelete