By Jenny Jiao | 09/25/17
In 2013, Eric Loomis pled guilty to two charges related to a drive-by shooting in La Crosse, Wisconsin, “attempting to flee a traffic officer and operating a motor vehicle without the owner’s consent.” At his sentencing hearing, the judge relied, in part, on Loomis’s score on a risk assessment algorithm called COMPAS, which deemed him at high risk for recidivism (the tendency for a criminal to reoffend). He was unable to contest the assessment’s result because the inputs and weighting of said inputs were kept secret. Since COMPAS was privately developed by Northpointe Inc., the methodology is considered a proprietary trade secret and therefore is inaccessible to all parties in the court system. Loomis was sentenced to six years in prison and five years of extended supervision.
Loomis appealed, arguing that because the assessment’s methodology was kept secret, its usage violated his due process rights to an individualized sentence and his right to be sentenced on accurate information. Upon a second appeal to the Wisconsin Supreme Court, Justice Ann Walsh Bradley affirmed the lower court’s decision against Loomis. One part of the reasoning behind the decision was that “as COMPAS uses only publicly available data and data provided by the defendant, the court concluded that Loomis could have denied or explained any information that went into making the report and therefore could have verified the accuracy of the information used in sentencing.”
However, in another case across the country, a defendant found it difficult to do just that: – to deny or explain the information that went into the making of the report. In 2016, in upstate New York at the Eastern Correctional Facility, Glenn Rodriguez was denied parole due to his “high-risk” COMPAS score. After consulting with other inmates and lawyers, he realized his score was wrong, due to a correctional officer mistakenly answering one of the survey’s questions about him (the COMPAS score is composed of answers from a 137-question survey about the individual). According to the Washington Monthly, at his second parole hearing in January 2017, his COMPAS score had not been updated. Since he had no access to the methodology of the assessment, he was unable to explain why the error would impact his score and resorted to arguing for parole in spite of his high risk score.
Loomis’ and Rodriguez’s circumstances are not isolated cases, but rather indicative of the current landscape of the criminal justice system. Risk assessment scores, like COMPAS, are used across the country at various points during the criminal justice process, to shorten inmates’ sentences based on good behavior, to redirect defendants to rehabilitation instead of imprisonment, or even to determine the length of the sentence itself. In nine states– Arizona, Colorado, Delaware, Kentucky, Louisiana, Oklahoma, Virginia, Washington, and Wisconsin—the assessments are used during criminal sentencing hearings. While some jurisdictions have developed their own algorithms or contracted from nonprofit organizations, the most widely used model remains Northpointe’s COMPAS model.
Usage of these algorithms has highlighted a litany of issues ranging from accuracy to constitutionality to perpetuation of biases; however, one fundamental issue undermines our ability to even debate and explore these issues properly: the lack of transparency.
A Black Box
Developers of risk assessment tools use machine learning, a branch of computer science that allows computers to learn and recognize patterns from large sets of data. Essentially, computers use historical data on crime and recidivism rates to determine which inputs are related to recidivism, and how much weight should be given to each. For example, past criminal history is one of the most important predictors of recidivism, and therefore is given more weight in an algorithm. These algorithms, which are often extremely long and complex, are then packaged and sold to criminal justice jurisdictions for use.
This type of algorithm is called a “black box,” since the jurisdiction has no access to the source code itself; jurisdictions feed the inputs from the survey into the algorithm which returns a risk assessment score. In the case of COMPAS, that score is a single number, 1-10.
“The key to our product is the algorithms, and they’re proprietary,” said Jeffrey Harmon, Northpointe’s general manager to the New York Times. “We’ve created them, and we don’t release them because it’s certainly a core piece of our business. It’s not about looking at the algorithms. It’s about looking at the outcomes.”
Looking at the outcomes, however, is raising questions about whether the COMPAS algorithm is racially biased. An independent ProPublica study showed that “the formula was particularly likely to falsely flag black defendants as future criminals, wrongly labeling them this way at almost twice the rate as white defendants.” Furthermore, white defendants faced the opposite result: they were more often mislabeled as low risk than black defendants. Northpointe has since disputed the study’s methodology and findings, and a number of legal scholars have offered up competing claims.
Assessing racial bias, or any type of bias, is difficult because it is not simply a matter of including race as a variable. Other factors, including socioeconomic status, neighborhood, family stability, and even criminal history can impact an algorithm’s ability to predict risk for different racial groups. For example, having a criminal record is correlated with race, but does not necessarily mean every algorithm with criminal history as a variable is considered biased. Data scientists are continually debating and innovating different approaches to both identify and mitigate the effects of racial bias.
What makes COMPAS particularly difficult to debate, however, is the fact that “[no one] knows what the algorithm actually is,” explained Cynthia Rudin, Associate Professor of Computer Science and Electrical and Computer Engineering at Duke University. Rudin and her colleagues have created their own risk assessment models that are transparent (“white box”) and use publicly sourced data. Rudin says she has not further explored racial biases in her own models, but that it would be much easier to assess, considering all interested parties could see the source code of the algorithm.
“The fact that these models are transparent means you can argue about whether these models are biased,” Rudin said. “It’s much more difficult to argue about whether a black box model is biased than whether a transparent model is biased.”
Struggling for Accuracy
In addition to bias, lack of transparency has made the pursuit of accuracy difficult. Defenders of black box algorithms say that these algorithms have been validated already. Validation is the statistical process of measuring the effectiveness of an algorithm to ensure it is robust, reliable and accurate for every possible input. However, in a 2013 study of 19 risk assessment tools used in the US, researchers Sarah Desmariais and Jay Singh found that “frequently, those [validation] investigations were completed by the same people who developed the instrument.”
Tim Brennan, co-founder of Northpointe and co-creator of COMPAS, conducted a study with colleagues that put the algorithm’s accuracy rate at 68%. However, since Northpointe uses nationwide data to develop the COMPAS algorithm, accuracy levels are likely to vary by state, and even by county. Northpointe has conducted some local validation studies, and “found no statistically significant deviations from the national norm group studies.” While it is suggested that jurisdictions conduct their own validation studies, many have not, including Wisconsin. This means that many jurisdictions using COMPAS do not know how effective the algorithm is for their populations.
The larger problem may be that even when jurisdictions do conduct validation studies, they are unable to make adjustments to improve effectiveness. Since these algorithms are black boxes, jurisdictions have no ability to customize the variables or the weighting of factors to best assess risk for their populations. Even if some predictive factors in urban New York City may not be as effective in rural Wisconsin and vice versa, these jurisdictions cannot access the algorithm itself and make those necessary adjustments.
Hope for Justice
These issues with black box models beg the question: why are they the tools of choice for so many jurisdictions? Conventional wisdom might be that they’re more accurate. However, that is simply not the case. In Desmariais and Singh’s study of 19 models, some of which are transparent and others of which are not, “there was no one instrument that emerged as systematically producing more accurate assessments than the others.”
One of Rudin’s algorithms, which is published and publicly accessible, was as accurate or more accurate than other black box models, including COMPAS. There are many other white box models out there, developed by non-profits, universities, or criminal justice jurisdictions (i.e. Pennsylvania, Ohio).
At the end of the day, risk assessment algorithms should be tools to help criminal justice officials make important decisions, such as bail determinations, parole terms, and even sentence lengths. The difficult questions that arise alongside them should be vigorously debated both inside and outside the courtroom. Black box models such as COMPAS hinder that discussion. They hinder the ability of defendants to argue for their freedom, the ability of judges to exercise discretion, and the ability of jurisdictions to best serve their own populations.
“This is evidence that’s being used against you,” Rodriquez said to the Washington Monthly. “They are making a determination on a person’s life on the basis of this evidence. So you should have a right to challenge it.”
Jenny Jiao is a Trinity Sophomore studying economics and political science.