(Revised and updated Monday, 24th October 2016.)
Weapons of Math Destruction, Cathy O’Neil, published by Crown Random House, 2016.
This is a thoughtful and very approachable introduction and review to the societal and personal consequences of data mining, data science, and machine learning practices which seem at times extraordinarily successful. While others have breached the barriers of this subject, Professor O’Neil is the first to deal with it in the call-to-action manner it deserves. This is a book you should definitely read this year, especially if you are a parent. It should be required reading for anyone who practices in the field before beginning work.
I have a few quibbles about the book’s observations based on its very occasional leaps of logic and some quick interpretations of history.
For example, while I wholeheartedly deplore the pervasive use of e-scores and a financing system which confounds absence of information with higher risk (that is, fails to posit and apply proper Bayesian priors), the sentence “But framing debt as a moral issue is a mistake”, while correct, ignores the widespread practice of debtors courts and prisons in the history of the United States. This is really not something new, only a new form. Perhaps it is more pervasive.
For a few of the cases used to illustrate WMDs, there are other social changes which exacerbate matters, rather than abused algorithms being a cause. For instance, the idea of individual home ownership was not such a Big Deal in the past, especially for people without substantial means. These less fortunate individuals resigned themselves to renting their entire lives. Having a society and a group of banks pushing home ownership onto people who can barely afford it sets them up for financial hardship, loss of home, and credit.
What will be interesting to see is where the movement to fix these serious problems will go. Protests are good and necessary but, eventually, engagement with the developers of actual or potential WMDs is required. An Amazon review is not a place to write more of this, nor give some of my ideas. Accordingly, I have written a full review at my blog for the purpose.
My primary recommendation is a plea for rigorous testing of anything which could become a WMD. It’s apparent these systems touch the lives of many people. Just as in the case of transportation systems, it seems to me that we as a society have very right to demand these systems be similarly tested, beyond the narrow goals of the companies who are building them. This will result in fewer being built, but, as Dr O’Neil has described, building fewer bad systems can only be a good thing.
(The above is the substance of a review I wrote at Amazon for the book.)
Here are some of my ideas
While a social movement may be a good way to start, and raise consciousness, I think more specific steps are needed. In particular, codifying acceptable technical practice in an IEEE or ISO standard might be a way to identify those companies which take care in their use and application of this technology. I emphasize application because it seems to me the action side of the process needs to be constrained in addition to the data gathering side. While some regulatory lasso needs to be thrown around the froth and foment of Web-scraping, data dredging companies and startups that deeply affect people’s lives, I also just don’t think social pressure for exploiters of these to act more ethically will do it. A compliance procedure for an IEEE or ISO standard would make what was being done more transparent, as well as constrain it. Of course, proposing and negotiating such a standard could take a long time, and may fall short of ambition. Would government agencies be willing to undergo compliance assessment under these standards? If not, is that letting a wolf into the henhouse?
This book is also a call to statisticians to do a better job educating the general public about risk and variability. Some have tried, such as David Spiegelhalter, Stephen Fienberg (who coauthored an article in 1980 which gave stark warnings about designing police patrol experiments), the collection edited by Joseph Gastwirth published in 2000, and others. That some education officials failed so completely to understand basic ideas about variability when assessing “value-added scores” in education means these decision makers and managers missed something very key in their quantitative educations. There were calls for considering racial bias at the Bureau of Justice Statistics back in the 1990s (e.g., Langan, 1995). There are an increasing number of complaints by the statistical community, such as in the current issue of Significance, the joint publication of the Royal Statistical Society and the American Statistical Association, regarding turnkey software which purports to help automate policing. In particular, the recent issue features an article by Kristian Lum and William Isaac called “To predict and serve?” not only highlights a disturbing instance of abuse of “predictive policing” software in Oakland, CA, but also suggests a technique for demonstrating where such software falls down. It also gives a number of references, including citations of articles cautioning regarding misuse. Alas, they also point out that they were able to do this with but one popular software package, and the other vendors refused to cooperate. Wouldn’t it be appropriate to insist that if such software is being used to drive as socially powerful a force as policing it be subjected to independent review and assessment?
While there is evidence there has been concerns repeatedly expressed, perhaps it will take something like Weapons of Math Destruction and the attendant media focus to make progress. Clearly, drawbacks cited by other experts have not prevented abuse.
Additional comments, demurrals, and quibbles
- In the chapter “Shell Shocked”, regarding D. E. Shaw, the tendency to keep portions of process “need to know” illustrates the limitations of any classification system when dealing with highly technical matters and systems which benefit from many eyes. It reminds me of the report by the late Richard Feynman in his Surely You’re Joking, Mr Feynman on how he was prohibited (at least for a long while) from telling the engineers he supervised on the Manhattan Project what they were working on so they could use their physics knowledge to help keep their calculations correct, despite the protestations of Project management that they were not progressing quickly enough.
- Same chapter, regarding “Very few people had the expertise and the information required to know what was actually going on statistically, and most of the people who did lacked the integrity to speak up”: Those who remained silent in such circumstances, in my opinion, despite the training they had which told them to know better, carry most of the responsibility for the consequences under such circumstances.
- In the chapter regarding “stop and frisk”, regarding the statement “The Constitution, for example, presumes innocence and is engineered to value it”, I disagree the Constitution presumes innocence. It presumes parties ought to be treated equitably. I think “innocence” is far too abstract a property for any legal system or process to determine, except when defined in the narrow sense of “Found not guilty of a specific formal charge.” That’s not “innocence” in the abstract sense. Indeed, a bit farther down, “The Constitution’s implicit judgment is that freeing someone who may well have committed a crime, for lack of evidence, poses less of a danger to our society than jailing or executing an innocent person” is that point exactly.
- Farther down, regarding “And the concept of fairness utterly escapes them. Programmers don’t know how to code for it, and few of their bosses ask them to”, in my opinion, it’s really not that hard, for it is an extension of the entropy measure. I think the problem is that this is not seen as important to specify. I also don’t know if we’d be much better off if there were a good measure of “fairness”.
- The problem cited in
The unquestioned assumption that locking away ‘high risk’ prisoners for more time makes society safer. It is true, of course, that prisoners don’t commit crimes against society while behind bars.
is not new. Norbert Wiener observed in his book Cybernetics that killing difficult people makes society safer still, yet that is too brutal or honest a proposal for most to contemplate, even if it is the logical extensions of the present system. He surely was not advocating that, and was, in fact, reacting most strongly against frontal lobotomy as a form of “treatment” for mental patients. His point was to highlight the hypocrisy of using convenience in managing them to justify treatment. Also, prisoners can commit crimes against society while behind bars, even if they only harm one another: Surely society has an interest is assuring that prisoners are safe, lest additional punishments be levied upon them without due process.
- Regarding “…for the benefit of both the prisoners and society at large”, society shows no common agreement regarding what the point of incarceration in standard prisons (not those for “white collar criminals”) is … Is it correction and rehabilitation? Or punishment? Or vengeance?
- In the chapter “Ineligible to Serve”, regarding “If his principal online contact happened to be Google’s Sergey Brin, or Palmer Luckey, founder of the virtual reality maker Oculus BR, Pedro’s social score would no doubt shoot through the roof”, of course, not all good candidates are online, and it’s a pretty strong constraint (and problem!) to assume they are.
- In the chapter “Sweating Bullets”, regarding Clifford’s drastic change in scores, I’m most amazed that the test administrators and interpreters don’t know about proper variability or how consider it. It seems to me they could not possibly be qualified for the positions they have if they don’t. But, again, as mentioned above, this is a failure of statistical and mathematical education, or the appreciation of it by this society.
- In the chapter “No Safe Zone”, regarding “We’ve already discussed how the growing reliance on credit scores across the economy …”, a lot of this practice, too, is based upon an implicit assumption and tenet of faith that “the markets” will weed out practitioners of this kind of statistical voodoo. “The markets” have no way to understand this stuff, and whatever natural selection they might apply is horribly inefficient and has little statistical power. An appeal to “the markets” and to “competition” is a fig leaf covering sloppy policy, again in my opinion.
- In the same chapter, regarding “The model is fine-tuned to draw as much money as possible from this subgroup. Some of them, inevitably, fall too far, defaulting on their auto loans, credit cards, or rent. That further punishes their credit scores, which no doubt drops them into an even more forlorn microsegment”, well, that’s it, isn’t it? It depends upon your loss function and the designers of this process, which can only be laughingly called an optimization algorithm, did a piss poor job of doing that design.
- In the same chapter, regarding “This undermines the point of insurance, and the hits will fall especially hard on those who can least afford them”, unfortunately, I just don’t buy that most insurers are that good at what they are supposed to do, with apologies to statistical actuaries. Some may indulge in the kind of statistical fallacy which Dr O’Neil describes, but it seems many don’t even properly consider the risks they know about. For example, some insurers don’t properly consider increased losses at coasts from storms and sea level rise. I don’t know if this is a product of actuarial consideration, or if the actuaries are constrained by management and the companies’ policies on what they can consider, or if their results are filtered by the same. No doubt their reinsurers do, and some rely upon generous interpretations of “flood damage” to avoid paying out. Nevertheless, these are not behaviors associated with the fiendishly clever and discriminating inference engines, human or otherwise, which are implied by Dr O’Neil’s explanation and postulated mechanism. Accordingly, I fail to see a plausible mechanism for this kind of thing happening, as nefarious as it is. Moreover, credit agencies and the like have an organic and unchecked internal error rate, and these errors work to frustrate precise predictions of risk, as well as associations of individuals with clusters, even if such errors can by themselves cause harm. I think it’s even a fair question to ask if deterministic associations of individuals with any group is ever proper statistical practice: It should be an affinity score or membership number against each group. I’ve made that observation in my own professional practice, and a common response is, “Well, that algorithm doesn’t scale.” Therein lies, I believe, a lot of the problem.
- Same chapter, regarding the conclusion “If we don’t wrest back a measure of control, these future WMDs will feel mysterious and powerful. They’ll have their way with us, and we’ll barely know it’s happening”, there are some ways of “wresting control”, even if most people will engage in them. (Many people seem starkly unaware of their self-interest.) One way is to “jam” the signal being fed to the systems and deliberately increasing the variance of their observations. This can be done by interfering with your location reported through cell phones, or simply mixing up what you do during the day, reducing consistency of patterns. The other way is to selectively lie. For instance, for years, in order to confound mail order catalogues, and other online solicitations, I have been misrepresenting my birth date. I acknowledge this kind of practice, even if widely adopted, won’t solve most of the problem.
- In the chapter “The Targeted Citizen”, regarding “I wouldn’t yet call Facebook or Google’s algorithms political WMDs … Still, the potential for abuse is vast. The drama occurs in code and behind imposing firewalls”, there’s nothing new in that view, long warned about by Lawrence Lessig in his book Code 2.0. In fact, some consider this a feature, keeping control of online things from governments and such. Lessig warns in his writings, however, that it is not turning out that way.
- In the chapter “Conclusion”, regarding “Dismantling a WMD doesn’t always offer such obvious payoff … For most of them, in fact, WMDs appear to be highly effective”, how the devil can they tell? I don’t see any evidence in the research presented that these companies and organizations do anything like a comprehensive testing program, that is one that assures the (written) objectives are met (in the real world), not merely that the code implements the requirements. To use the example by Lum and Isaac in “Predict and serve?” cited above, many companies or even organizational units won’t open their algorithms to outside scrutiny. That could be because of a desire to protect something proprietary, or it could be that the algorithms really don’t work well, and they are trying to sell shoddy algorithms as if they do, even to other units of the same business.
- In the chapter “Conclusion”, regarding the Derman and Wilmott “oath”, I respectfully but strongly disagree with it. The same could be said of all of Physics. And I don’t know what “overly impressed with Mathematics” means. Apart from lip service to a goal, people could insist that these systems undergo a comprehensive and rigorous — and necessarily expensive — testing program like many other systems which interact with the physical world do, for instance, aircraft. As a colleague observed after a discussion about this, it could be just as well said that the Mathematics was done badly and no independent check on it was available.
- Finally, in the chapter “Conclusion”, regarding “Though economists may attempt to calculate costs for smog or agricultural runoff, or the extinction of the spotted owl, numbers can never express their value”, I have a couple of things to say. First, I agree that a one-dimensional characterization of any complicated system or process or person, like a spotted owl, is doomed to be woefully incomplete. Second, I agree that economic assessments of these, if honest, must be based upon behavioral economics, and not upon the pseudo-objective rantings of the Chicago School, or Austrian, and, so, they are highly contingent and, being so, unsuitable for policy. But, third, I do think it is possible to quantitatively characterize such complicated things, and, if well done, these can be of great use to society and in solving its problems. The placement of the new Hoover Dam bypass (chronicled by Henry Petroski) and assessments of ecosystem services are two small examples. As any casual reader of this blog will note, I continue to be very enthusiastic regarding the economic prospects of solar PV as a technology for good, not only to advance zero Carbon energy, but as a basis for a helpful and common discussion among members of this United States society who can’t seem to agree on much of anything, and also to advance the revolution championed by the late Hermann Scheer, that of bringing control of the energy supply back to the people and, thereby, control of their democracy. This is an area where putting quantitative measures on often intangible things happens systematically.
One thing I fear when faced with these kinds of issues, and it’s something I have seen elsewhere in this society, especially among my younger colleagues, is a devolution into insidious cynicism. This is sometimes wrapped in a mantra which argues “you can only control yourself and doing anything else is engaging in an immoral act”, possibly substantiated by an appeal to Buberian ethics. And, ironically or hypocritically, the same complaintants will continue to work for companies with a deep investment in facilitating this kind of WMD engineering, even if the companies don’t build WMDs themselves. (How many companies profit from the existence and operations of Facebook?) Especially given the insights of behavioral economists like Daniel Kahneman, I hope the insights Dr O’Neil has don’t end with their merely being presented. My definition of a successful technology is one that does not depend upon people being good or morally perfect in order for it to “do no harm”. (I have been influenced a good deal towards this view by the lectures of Professor Sheila Widnall of MIT.) In fact, my standard is that every successful technology must assume people are imperfect, morally corruptible, and self-interested, and yet do perform its function nonetheless. If it cannot work under those conditions, any device or technology is broken. And I continue to be heartened by both the successes of engineering and science, and the deep mathematics that unpin them, especially as exemplified in the talent and smarts of young people pursuing these to make the world a better place, for all of its beings and creatures.