Randomness and normal people

People are bad at math. I mean, not the school subject (well, many are bad there too), but rather practical application. It is why we are so lured in by gambling or awed by "coincidence". It almost always boils down to a lack of a true understanding of mathematics.

And NOWHERE are people worse at math than comprehending randomness.

As a computer scientist, it took my a solid 30 minutes to click on this news article in my feed on Google. I had a good idea from the title exactly what the argument was going to be about, and consequently that the article was going to be a total farce. Happily, by the time I got to it, the comments had restored my faith in humanity just a little as it seemed like everyone and their grandmothers who commented tore the author a new one, on several grounds including that the random function had been improved almost 10 years ago and what they were implying was down right silly either way.

So, here is my take on it. There is NO SUCH THING as a TRUE random number generator. People talk about getting seeds from decay rates of atomic particles, etc... but the truth is, all we're talking about is a non-random source which is just sufficiently difficult for us to understand today. Decaying isotopes obey the rules of nature. As to environmental noise and distortion and other methods of generating better seeds. But, none of these methods generate a TRULY random seed.

And then, once you have a random seed, it goes into a randomization function. But, that function is anything but random.

At a software level, from the point of the seed number, you can reproduce the same "random" data set, identically, an infinite number of times. Hardware based randomizers rely more on noise, radioactive decay and the likes and don't use a seed with a pseudo random number generator. But, if you have enough data you can still do the same predictions. Today, that level of ability seems impossible, so we happily call it random and walk away. But, during WW2, the enigma was consider unbreakable for much of the war. Today, computers could crack it in seconds.

There is no such thing as a random number generator. Period.

The next amazingly naïve piece of the puzzle is the claim that, knowing that a pseudo random number (PRN) generator had been used, that a person could hack their way back to the seed value and then predict the following values.

Except... if this is so true and so trivial... why does it matter how the seed is even determined? It shouldn't. It would mean that any seed can be hacked trivially. But, this isn't really the case. I mean, theoretically, it is absolutely possible. But, it does, potentially require an EXHAUSTIVE amount of information about the system.

The simple problem with randomness, even pseudo randomness, is that the same sequence of numbers can happen in any number of the possible seed values. You need a long enough string of values, and to save time and headaches, know exactly which iteration of the PRN produced them. What makes this even harder is that our software random numbers are almost never outputted as they are generated. 

What I mean is, random numbers almost always start life as a decimal value between 0 and 1, and almost always end up as a whole number. But, even, you had the original number between 0 and 1, it would have been cast as a data type within programming language at which point it would have potentially lost precision. And, perhaps a rather absurd amount of it. Without the EXACT original PRN output value, you've taken what was already a long shot and made it longer. But, if you've gone and multiplied that value by a whole number and then rounded... you've just butchered the ever long shit out of your precision. 

It all really boils down to how many seed values there are, what the precision of the values you have is relative to the original values and whether or not you know where within a sequence PRN's the values you have lie. Oh, and also whether or not you know PRN generating function. But, if you had all that it wouldn't matter if you seed value was somehow truly and magically, totally random. We could still ascertain what it was.

Now, beyond all of that comes the real hurdle. Practicality. To exploit this, you would need:
  • To know the seed value. 
  • Know the current iteration of the PRN.
  • Know the function or reproduce the output to find the values you're interested in.
  • Ensure the PRN generates no new values in the interim.
  • Guarantee an application is inserted into the spreadsheet at the exact time to generate the desired outcome.
To top this off, apparently those working with the spreadsheets were working with Ids which couldn't directly identify who was even getting their randomized value assigned.

With these measures in place, the sequence in which people are input into the list is sufficiently random on it's own, for most people's sake to ensure a reasonably random selection.

As is usual, I would simply argue that the people most able to pull this exploit off... have direct access to the computer. Why go through the effort? Just overwrite the value in the cell with the one you want. 

As a computer scientist, I would say that the process yields enough randomness. Use of any PRN at all combined with effectively randomized input data which isn't identifiable more than satisfies the needs for this sort of work. I might be more zealous if it were encrypting critical information or was being used in some fashion to protect vital infrastructure. But, as a means of random assignment... come on people. 

Comments

Popular Posts