This was such a weird project. Scheduled for five days altogether. My friend from the clinical drug trial venture was also a tech recruiter who got me about half my tech jobs over the years and this one was a real throwaway.
What we had was a mid-80s start-up in the educational software game producing exactly the kind of mind-numbing drill and practice software that was supposed to revolutionize education because Look, Ma! We used computers!
Now they were stuck on some software problem and needed help fast. Their stack was Tandy, Cobol, and some micro database package. My skills were Apple, Cobol, and ISAM and in those days that was a deerskin glove fit so off I went for a mutual look-see.
I was on the beach, why not?
The next morning I am walking up to an apartment building where this enterprise had wedged itself into what was meant to be doctor's offices. Inside I sit down with the top guy in his office and the entire company joins us.
The staff unleashes a thirty minute nightmare tale of software crashes, dysfunctions, anomalies, and disrepair as each person takes turns reciting some utterly bizarre malfunction of the application, all with the database software as the likely culprit. It was a tag-team misery report, a through the looking glass panoply of software non-determinsim. It was wonderful.
A half dozen times I formulated "Explanatory Guess X" only to hear in the speaker's next sentence that they had thought it might be X and but no luck. I mean it was really wonderful and then finally it ended. My head was spinning.
"Have you worked with the Tandy OS," the manager asked.
"Yes, but it does not sound like Cobol is your problem."
"No. I don't suppose you have worked with this DBMS?"
"Can you help us?" See straw. Clutch,
I have no idea what to tell them.
"Is the DBMS any good?", I recover enough to ask.
"I checked it out pretty well. It got great reviews, it is supposed to be the best."
I look down at my shoes.
The contract was for five days. The longest any single glitch had stopped me was for five days. Do the arithmetic.
"Yes," I say.
It took seven. They paid up front for the first five, never paid for the last two probably because they did not have it or maybe because of the way things went. You'll see. And I am surprised it came to seven days, I only remember one or two. I never ran their software once and I do not remember even touching a computer. Here is what happened.
After signing on I took home the manuals for their DBMS and a listing of their schema definition. It took maybe a day to decide that everything looked right. The next day I ask Tom the programmer how hard it would be to just initialize an empty database and start over entering the data.
"Easy", says Tom.
Welcome to Tilton's Law: Solve the First Problem. They had described to me twenty distinct failures and that was too many for me, I am not smart like you guys, I cannot just figure these things out in the shower.
I wanted to turn the software off and turn it back on with a clean slate and see what went wrong first and stop right there. I just wanted to see what went wrong first and fix that. I suspect that needs no explanation, but what am I doing up on this soapbox if I am not going to explain these things?
Here goes. Once upon a time my sleazebag ward politician buddy and I were cruising the singles bars back when they had such things and he got nicely eviscerated by a woman we were chatting up. My buddy had said something cynical and she had challenged him on it.
"Oh, I have compromised my principles a few times," he conceded with a sly grin.
"You can only compromise your principles once," she replied. "After that you don't have any."
Software is the same. This stuff is hard enough to get right when things are working nominally, but once they go wrong we no longer have a system that even should work.
Back on the project, the next day I get a call.
"Bad news," Tom says. Uh-oh.
"Same thing. Mary was entering the 118th record and the program crashed."
I pretty much fell out of my chair. Somewhere in the thirty minute firestorm of issues I had heard the number 118.
"118 sounds familiar."
"Yep," Tom moaned inconsolably. "That's what happened before. Sorry, no difference."
I was doing cartwheels.
"Tom, how hard would it be to write a program to just write out a couple hundred records, just put in dummy data, 1-2-3-4-5...?"
"That would be easy."
"Awesome, do that and let's see what happens in batch mode," says me.
"And reinitialize the DB first, OK?"
The next day I hear from Tom. Sounds like he is calling from the morgue.
"Bad news, Kenny."
Oh, no. It worked.
"Same thing. The program wrote out 118 records and crashed. Sorry, Kenny."
Oh, yeah, I just hate easily reproducible errors. Not!
"Listen, Tom, let's try making the buffer allocation bigger."
The next day, "Bad news. Same thing."
I am icing the champagne; this is one solid, reproducible bug. But what about the others?
"Tom, remember the first time this thing crashed, before I came on board?"
"Did you start over from a fresh database or just resume working on the one that had been open when the DBMS had crashed?"
"We just continued working with the same DB."
Tilton's Law (Solve the First Problem) had been broken as badly as broken can be. A DBMS had failed while writing data and they had tried to continue using the same physical DB. This transgression is so severe it almost does not count.
Normally Tilton's Law refers to two or three observed issues that do not necessarily seem even to be in the same ballpark. The law says pick out the one that seems most firstish and work on that and only that until it is solved. The other problems might just go away and even if not the last thing we need to do while working on one problem is to be looking over our shoulders at possible collateral damage from some other problem.
Two minutes later I am on the phone to DBMS tech support .
"Hi, we're reliably crashing after adding 118 records in one sitting."
"Yes, that is a known problem."
Oh. My. God.
"Would you like us to send you the patch for that?", she asks.
"That would be lovely."
This being before the advent of the Interweb we confirmed our mailing address and asked for it to be sent out ASAP and overnight delivery. But we are not done yet. Tilton's Law or no, all I have solved is P1, the first problem.
"One more thing," I say.
"If we continue working with the DB after this crash..."
"Oh, no. Don't do that. It's hopelessly corrupted at that point."
Were some of the other issues unrelated to the first crash? I will let you know as soon as this test I have running to solve the halting problem finishes.
Meanwhile, the conversation had suggested how we might get them up and entering data now. Apparently we were crashing because of a bug that surfaced when more than so many records were being held in the buffer before being written out. We had tried making the buffer bigger, only making things worse.
"Tom, we can wait for the patch, but I have one last idea in mind that might get this thing working for you. Want to try one more thing?"
"Try making the buffer half the size it was when we started."
A few minutes later he comes back.
"It works now."
"I had it loop to one thousand. No problem."
"Cool. Let's tell the others and go get drunk."
Nope. Something is wrong. Tom is just standing in the doorway all deer and headlights.
"Can I ask you something?", Tom asks quietly.
"I do not understand why making the buffer smaller made the program work."
"Well there was this bug that had to do with being unable to keep more than so many records in memory and with a smaller buffer the software did not try to keep so many in memory."
"OK, but why does it work now?"
"Maybe 118 multiplied by the record size is more than 16,384 and somewhere in the DBMS logic there was an integer overflow so the problem does not come up if the cache is smaller and the software flushes the cache before it gets to 16,384."
"All right," says Tom "But I do not understand why we make the buffer smaller and now the software works."
This was surreal. I try a different tack, a really dumb one, but sometimes when a grizzly bear has your back to the wall all you can do is tap dance.
"Look. There are multiple code paths in an application, right? Every conditional is a fork in the path. A bug exists in some branch or other out of all the code paths, right? By changing a fundamental parameter we send the code down a different code path. Avoiding the bug."
"I just don't understand why making the buffer smaller makes the program work."
Then it came to me. I was Dr. Chandra in 2010 trying to get Hal to fire the rockets, and Tom was Hal stuck in a Mobius loop unable to resolve my understanding of the confusion with his confusion of the understanding.
"I don't know, Tom," I say. "I don't know why it works now."
Suddenly Mike, the project lead, appears.
"Kenny, Tom. In my office. Now."
"OK, this has to stop. Kenny, I am paying you to solve this problem and you have Tom doing all your work. He has his own work to do. From now on you work on this problem and Tom you do what you are supposed to be doing. Have I made myself clear?"
Remember in Annie Hall when Woody Allen turns to the camera and asks, Why can't real life be like this?
"Actually...I think I'm done."
Leaving Mike and his facial expression frozen in spacetime, I turn to Tom with raised eyebrows for his assent and Tom nods. I turn back to Mike, who no longer knows where he is.
"It turns out this is a known bug. You'll have a patch tomorrow or the next day. In the meantime we found a workaround and you are up and running. Mary can start entering your data, um, now."
"So basically I am sitting here making a complete ass out of myself?"
Good for him. We all had a good laugh, shook hands and I was on my way and Tilton's Law of Programming was reaffirmed: Always solve the first problem. The corollary: there only ever is the first problem.