TED Talks on Algorithms suprisingly bad...

I watched this talk (or at least as much of it as I could) at the recommendation of Adam Savage on Twitter... and I was shocked at just how bad it was.

Don't get me wrong, much of the premise is accurate. But many of the conclusions drawn are wildly inaccurate. At one point she says (paraphrasing) "suppose we want an algorithm to take over hiring, so we feed it the last 21 years worth of hires and define success as working here for X years and getting a promotion and apply machine learning" and then she says "you have just eliminated all women".

This was roughly the point at which I stopped. I hate when people spread FUD and disinformation. So while much her premise is spot on, she has over generalized the living hell out of things and made some assumptions that are generally going to be wildly inaccurate.

I want to be totally fair here, so I will try and attack this from both sides. But I really want to start at the glaring mistakes.

Firstly, you don't just "apply machine learning" to an arbitrary set of data. It is more than simply feeding arbitrary data and letting it go wild. Machine learning isn't some sort of magic. And while the generalizations she makes can be good for explaining things to a broader audience, leaning upon those generalization when drawing more complex conclusions leads to horrendous errors. The data scientists building the machine learning algorithm control both what information is fed into the algorithm from that massive pool of data, and can also weigh certain aspects of it differently.

Generally speaking, no one in their right minds would include gender data. If it were discovered that the algorithm did allow discrimination based on things which it is illegal to discriminate based upon, they could be held liable for that.

In other words, while it certainly could be, it is almost definitely never going to be the case that simply analyzing the data from the past X years would rule someone out based on gender.

And here is the point where I want to switch to an educating and supportive role in this argument. While it certainly isn't true that gender would explicitly rule out women. It IS true that the algorithm may develop an unintentional bias against women. Remember, the algorithm, if built with any degree of responsibility has no inherent knowledge of past or future applicants genders.

But people exhibit biases and society exhibits biases as well. So, let's say discrimination in society lead men and women down traditionally different paths. With men dominating the positive stats due to things like personal and social biases as well as workplace discrimination it is possible that an algorithm might see trends to success in things more common in male applicants, whether or not those things actually have an impact on their final performance.

A simple example might be that a machine learning algorithm detects a trend in people who received a promotion in their former career and positive performance in their new careers. Since workplace inequality has still not been fully addressed, there may be a lot of women whom the algorithm is biased against if they were unfairly not receiving promotions in their prior jobs.

I use the word bias a lot. And I use it for a reason. These sorts of algorithms aren't looking for a hard coded set of criteria. They aren't even trying to build one dynamically. Otherwise it might be possible to wind up in a case where you have plenty of qualified applicants but all are rejected. For this reason, even if gender WERE considered, it would be unlikely to be the only criteria, and thus, it wouldn't eliminate the possibility of a woman getting the offer, it would just bias things against her.

This IS NOT an attempt to justify this sort of a bias, or to call this OK. This is to correct the notion that applying machine learning to a data set of historically misogynistic data would automatically result in an algorithm seemingly hell bent on destroying the lives of women.

I also agree that big data is not a panacea or cure all. And I agree that mistakes in algorithms can have disastrous effects. On those points the TED talk is 100% accurate.

That being said... algorithms are more impartial than human beings. The only biases they possess are those explicitly coded into them and whatever biases the data they analyze suggests. A human being on the other hand is inherently biased. There are tons of books on the subjects. I read one a while back called "The Man Who Lied To His Laptop". It includes a lot of research indicative of a whole ton of unconscious and conscious biases humans exhibit. An algorithm has no subconscious and no subconscious tendencies. Any initial bias must be explicitly placed there by the developer.

Yes, a machine learning algorithm analyzing data on humans will typically tend to develop certain biases as well because our biases can affect the overall outcome of the data. But at the same time, a well developed algorithm will only be minimally impacted by bias in the data because it will be coded to prefer measurable and verifiable results.

From the last sentence you can arrive at another conclusion of mine; algorithms are really only ideally suited when the conditions they are seeking out are very well defined and are based upon metrics which are based upon what truly matters. Years of service and promotions aren't necessarily indicative of performance. Results of annual reviews might make better benchmarks as a generalized metric. This can still be biased but it is *better* than what was proposed. For sales people, it might measure stats like how much revenue was generated, or ratio of deals closed. If you can't find a suitable metric or set of metrics, you probably shouldn't rely on an algorithm. Either you don't understand the problem set well enough, or your problem is inherently poorly suited for this sort of analysis.

As for mistakes. Well there are two kinds. When the algorithm doesn't do what it was intended to do. Or when the algorithm does what it was intended to do, but the intention is wrong. And, the solution to both is the same. Review. That second condition of failure can actually change over time. An algorithm which meets today's needs may not meet tomorrow's. Transparency is also important where an algorithm affects a human being. Knowing what an algorithm prefers and the basis of that is critical in defending an algorithm as not presenting any "illegal" biases.

Comments

Popular Posts