Ok, just as a former programmer, the amount of lines with errors isn't relevant. One wrong line can have catastrophic impacts.

50000 lines isn't that large either đź‘€

#LeavingCert
Oh golly. I should clarify that part of my background is in analytics, both as a programmer and as a product owner/manager.

This mistake is a big one. The dataset wasn't sanitised, the algorithm didn't skip the variables it should have.

Someone messed up the QA badly here.
I am actually writing a talk at the moment about the dangers of sleepwalking into a world where we use predictive algorithms for everything and this is a good example - we can't just blindly trust algorithms, they are only as good as the data they are fed.
Predictive algorithms are just blind black boxes - they are trained by giving them datasets to try and "teach" them what the result should be, whether given set A, the output should be 1 or 0. (This is a simple explanation of classification, there are other algorithms).
If you train an algorithm on a bad dataset, you'll get bad results. They are a garbage in garbage out mechanism.

Most of the code around calling these algorithms is actually dataset processing, making sure the correct stuff is fed to your (hopefully) correctly trained model.
This sounds like a set of simple errors - variables which should have been excluded (e.g. CSPE result for JC), and a sorting error (sorted lowest value to top and added, instead of sorted highest).

Bad to have not caught this through code reviews and automated testing.
They're going to make a lot now of "one line" but one line is all it takes. One call to a sort function with the wrong parameter to say lowest instead of highest. Wouldn't even have to be a whole line of code.

The size of the mistake isn't the issue. It's that it wasn't caught.
I have to wonder now about the product process - were the requirements unclear or was the testing bad?

Surely someone did some manual calculation examples so that you could test "student A" in your system and see that it matched? If not this absolutely should have been done.
Did the developers misunderstand the requirements, or were they not clear?

What was the QA? Was this tested manually? With automated tests? And with what dataset?

So many questions need to be asked now, I hope someone with enough understanding asks them!
Did our department of education provide the algorithm to be implemented? If so they should have provided examples to validate against.

Or did the dept provide requirements and let Polymetrica figure out how best to work with the data?
If only they had published things in advance, as they were asked to...
I know how software gets from an idea to a released product. It's literally my bread and butter. There are so many opportunities to make sure you get it right and I would have some pretty big questions about why it went all the way through that process without this being caught.
I hope they actually bring someone in who understands code and the software development lifecycle from inception to delivery to help with questioning Polymetrica because you need someone who understands that the size of the error isn't related to number of lines it took.
You can follow @zenbuffy.
Tip: mention @twtextapp on a Twitter thread with the keyword “unroll” to get a link to it.

Latest Threads Unrolled: