Friday, June 12, 2015

Tilton's Un-Law: Treasure the First Problem

Here is a good one from the trenches.

Once upon a time I said that when a multitude of problems present themselves we should decide which seems most like the first (independent of anything else the first observed, etc) and solve that first.

Perhaps I should have said identify the fault underlying the first problem.

This morning I am hacking away at Tilton's Algebra getting ready for a big pilot test and looking at several observed problems trying to help the student factor -x^2-2x-1, which can be solved as -(x2+2x+1) then -(x+1)(x+1) and finally -(x+1)^2.

  • The app will not accept -(x+1)^2 as the answer. It says I can do more work.
  • When then asked for a hint, it does not have one to offer. Wait, you said I could do more work!
  • When asked for a hint after -(x+1)(x+1) it says it does not have one to offer, either.
  • If I then ask if -(x+1)(x+1) is the answer, it says no (which is fine but again inconsistent with not having a hint to offer.)
  • Final observed oddity: on a different problem such as -x^2-5c-6 -> -(x+2)(x+3) it will accept that as the answer.
So what is the first observed problem? It is a tie:
  • If I first ask for a hint on -(x+1)(x+1), it has none to offer.
  • If I first say -(x+1)^2 is the answer, it says I can do more work.
So here I would say the failure to hint comes first, because I could reasonably ask for a hint before reaching what I think is the answer. Put another way, it is not so much the order actually encountered, it is more that I can see a problem exists as early as when I might ask for a hint.

And the underlying fault is quickly found:
The hint mechanism looks at where the student has gotten in a solution to decide what to suggest next. In this case, the engine thinks the student is just plain wrong, because it itself came up with a weak answer: (x+1)(-x-1). 
But why did it accept -(x+1)(x+1)? Aha! Big story begins: this is an educational app, designed mostly with struggling, unhappy students in mind. It does so with an expert system engine programmed to be able to do Algebra. This engine sometimes goes wrong, as in this case. When that happens it tells the student they are wrong when they are not. I for one consider that a Deadly Sin.

It occurred to me that simple numerical methods would be able to determine that a "wrong" entry by the student was consistent at least with the original problem, so a failsafe mechanism was introduced to let slide anything that looked wrong but was not, in the hope that it would work out in the end.
That trick actually fools the hinting mechanism, which looks back to the last correct step to decide what to hint. Unfortunately the safeguard decided to say a step (mistakenly) considered wrong should be labelled correct. As we learned in the movies 2001 and 2010, it is not wise to deceive software. Better would have been to flag the step accurately as "wrong but consistent", so the hinting engine would not try to hint off it.
And in the case of -x^2-5x-6 the safeguard does succeed at least when it comes to accepting the answer. But in neither case is it able to offer a hint once the student enters, say, -(x+2)(x+3) because it thinks they are just wrong and as noted above the engine looks to see where they are in the solution. In this case it decides nowhere, so it has no hint to offer.

Interestingly, when asked if -(x+1)(x+1) is the answer it says no because it sees it can be rewritten as -(x+1)^2! This then reveals a second fault:
The engine when looking for a hint and checking if they are done looks in two different places, so it can say "not done" then have no hint to offer. This sin is not as bad, but it could be deadly: the student will lose confidence in the software, at which point no matter what mistake they make they will think the app is at fault.
Re that last: helluva war story from the enterprise trenches when a single bug trashed insurance enrollment data significantly but not so badly that it got noticed immediately. By the time the complaints had rolled in and enough investigated to reveal the corruption reversion to a backup was not viable. Users had to deal with complaints for weeks while a fix was found, so once it was they continued to question the data for an additional month or two before confidence was restored.
 Now it may well be that fixing the original fault (the engine's answer of (x+1)(-x-1) will eliminate all the other misbehavior, but remember the spirit behind the failsafe: if we can avoid abusing the user when the engine goes wrong (as it may) we should.

Now in the past I have come this way before and simply fixed the engine and moved on, but then it was impossible later to design a safeguard! The engine flaw has to be authentic, because a fix for a contrived flaw might well not work on the real thing. And engine flaws are rare enough that I just forget the whole thing until a day like today comes along.

So this time I am not fixing the first problem, Go-live is approaching and soon living breathing struggling unhappy students will be at my app's mercy. I am treasuring this problem for the stress it puts on the rest of my app, because I accept that the expert engine will not always be so expert.

Only after all the downstream misbehavior has been addressed will the first problem be fixed.