Data and Discretion: Why We Should Exercise Caution Around Using the COMPAS Algorithm in Court

This article was first published for the magazine Stanford Politics.

In 2013, the state of Wisconsin charged Eric Loomis with five criminal counts in connection with a drive-by shooting. Loomis eventually accepted a plea deal and pled guilty to two less severe charges: attempting to flee a traffic officer and operating a motor vehicle without owner’s consent.¹ Before Loomis’ sentencing, a Wisconsin Department Corrections officer produced a presentence investigation report that included a Correctional Offender Management Profiling for Alternative Sanctions (COMPAS) risk-assessment score to help predict Loomis’ recidivism rate. Recidivism may broadly be thought of as an offender’s potential for re-offense in the future. COMPAS—a privately owned algorithmic system designed by the company Equivant—produces recidivism predictions based upon public data as well as answers from a 137-item interview questionnaire. The questionnaire takes into account a holistic profile of the defendant, gathering information ranging from past criminal involvement, relationships, lifestyle, and personality, to familial background, education level, and social exclusion.² COMPAS is largely considered to be a “black box” algorithm: though its basic input information is available, how these inputs are weighted or treated in the algorithm’s internal system is mostly kept a trade-secret. COMPAS produces scores grouped by risk level: 1-4 low risk; 5-7 Medium Risk; and 8-10 High Risk.³ Once formulated, Loomis’ COMPAS score identified him as high-risk of violence, high risk of recidivism, and a high pretrial risk.⁴

Before a COMPAS score was introduced into Loomis’ case, the prosecution and defense had agreed upon a plea deal of one year in county jail with probation.⁵ At Loomis’ trial, the trial court referred to the COMPAS generated risk-assessment score as a procedural tool to help in its sentencing determination. Based in part on this score, the court classified Loomis as high-risk of re-offending, and proceeded to sentence him to six years of imprisonment and five years of extended supervision.⁶ Loomis then filed a motion for post-conviction relief, on the grounds that the court’s reliance on the COMPAS score violated his due process rights, infringing on “both his right to an individualized sentence and his right to be sentenced on accurate information.”⁷ Loomis further charged on due process grounds that the “court unconstitutionally considered gender at sentencing by relying on a risk assessment that took gender into account.”⁸ The trial court, the Wisconsin Court of Appeals, and the Wisconsin Supreme Court ultimately denied Loomis’ post-conviction motion. Although the Wisconsin Supreme Court did not side with Loomis, its closing remarks encouraged strong skepticism toward the value and accuracy of risk-assessment scores when used as judicial aids in sentencing.

In the contemporary American criminal justice system, judges have turned towards risk-assessment algorithms as data-driven tools which provide evidence-based results for use in sentencing procedure. Risk-assessment algorithms, broadly, are machine-executed, automated decision-making procedures which calculate future risk-potential for a given offender. These predictive tools seek to provide reliable support to judges in sentencing that allows for effective and efficient systemic uniformity. Yet, while risk-assessments aim to improve the accuracy of judicial decision-making, this form of automated justice raises various concerns. Risk-assessments may fail to be adequately morally reflective, may not treat defendants as unique individuals, and may retain bias potential themselves. In particular, there are three reasons why we should be hesitant to use risk-assessment algorithms like COMPAS in court: a lack of transparency, a lack of narrative intelligibility, and potentially unconstitutional practice. Before exploring these factors, let us learn a little more about COMPAS, itself.

A Brief History of COMPAS

The algorithm COMPAS is one of the most widely used algorithms in the criminal justice system across the country and has been applied or adapted by multiple states. First developed in 1998, the algorithm has assessed over one million offenders. COMPAS’ parent company, Equivant, states that COMPAS is an accurate, validated fourth generation risk-assessment tool which is easy to use and adaptable across jurisdictions. Despite its wide-spread use, however, the algorithm itself remains largely a trade secret—Equivant refuses to release the details of its internal software system to the public.⁹ According to an Equivant executive, this black box scenario is necessary because “the key to [their] product is the algorithms, and they’re proprietary,” and once having created them, the company doesn’t “release them because it’s certainly a core piece of [its] business.”¹⁰ A number of problems emerge from this proprietary situation.

One is COMPAS’ bias potential. In 2016, the independent, nonprofit newsroom ProPublica launched a study of Florida’s adapted COMPAS system and found the following: “the formula was particularly likely to flag black defendants as future criminals, wrongly labeling them as such at almost twice the rate as white defendants… [who] were mislabeled as low risk more often than black defendants.”¹¹ The risk scores were also “unreliable in forecasting violent crime: only 20 percent of the people predicted to commit violent crimes actually went on to do so.”¹² ProPublica provided a succinct summary on the harms machine or algorithmic bias such as this can produce: “If computers could accurately predict which defendants were likely to commit new crimes, the criminal justice system could be fairer and more selective about who is incarcerated and for how long… [yet, if wrong] it could result in someone unfairly receiving a harsher sentence or waiting longer for parole than is appropriate.”¹³ Eric Loomis’s experience in State v. Loomis exacerbates the bias potential ProPublica uncovered further: COMPAS, as a privately-owned algorithm produces risk-assessment scores which are relatively binding for a defendant. Once assigned to a given individual, it is near impossible to re-assess, refute, or overturn a risk-assessment score. COMPAS scores become, in this sense, binding markers for defendants entering or already within the criminal justice system that are largely unappealable. Considered in terms of due process, the binding nature of COMPAS has enormous implications for a defendant’s ability to access non-incarceration opportunities and fair appeal.

Let us now turn to three COMPAS characteristics which generate particular ethical concern around its usage: its failure to be transparent and narratively intelligible, and its potential unconstitutional practice.

Transparency

One of the largest factors supporting the claim that COMPAS is capable of bias, and resultantly, produces unjust results, is its failure to be transparent. According to Equivant, the key to the COMPAS product is its distinct algorithm. As a privately-owned company, Equivant is legally able to protect its algorithm as a trade-secret, and thus refuses to publish any information on the system publicly. Such secrecy is allowed because the company has the right to protect its own intellectual property— a fact that the state of Wisconsin in State v. Loomis supported in its decision to reject Loomis’ appeal claims.¹⁴ Unless taken to court on charges such as impropriety, Equivant is unlikely to relinquish details pertaining to the system. Though Equivant asserts that their algorithms are proprietary, judges and the larger public are reduced to relying on that claim as valid truth. So, there exists little public knowledge about how the COMPAS system compiles input data (public data on the defendant, combined with the 137 interview questionnaire responses) to generate risk-assessment scores for a defendant. Consequently, any judge seeking to use COMPAS as a judicial aid cannot, at this moment, understand fully how a COMPAS risk-assessment score is developed, nor how factors about the defendant’s profile in particular were weighted to arrive at the given risk score.

This secretive aspect of COMPAS is concerning because it refuses a courtroom actor an answer to the following question: “How is the algorithm weighting different data points, and why?” Each aspect of this inquiry is crucial in relation to two core legal principles: due process, and the ability to meaningfully appeal an adverse decision.¹⁵ First, judicial processes are mostly open to the public. Even after juries have deliberated, judges themselves are required to provide a written explanation for their rulings, particularly when adjudicating sentencing. University of Maryland Law School Professor Frank Pasquale has argued that when an algorithmic scoring process like COMPAS’ is kept secret, it becomes impossible to challenge key aspects of that score because the internal function the system is undergoing remains unknown. Subsequently, all members party to the case, from the judge to the defendant, are unable to question, challenge, or request a re-assessment of an algorithmically-generated score. Not only does this prohibit judges from properly understanding a procedural tool meant to assist their process effectively and efficiently, but further, it arguably denies the defendant the ability to recognize a fair trial outcome, should the judge base their final ruling in any way upon a score they both cannot fully comprehend.

Narrative Intelligibility

The secrecy surrounding COMPAS causes it to fail narrative intelligibility standards. Narrative intelligibility— the “alignment between the author’s intended meaning and the one comprehended by the user”— is dependent upon both author and user arriving at the same conclusion about the meaning of a given piece of information.¹⁶ A narratively intelligible outcome, then, is one in which there is no confusion between involved parties regarding a decision, where a decision came from, and how it was reached. Let us briefly explore this concept by using the example of a trial court jury. For a case, the jury’s verdict is narratively intelligible: the judge, the defense, and the prosecution understand clearly the procedure the jury followed to reach the verdict. As a result, they all share a unified understanding of what that verdict means, and what evidence it drew from. Further, because this process is transparent and easily explicable to a public, a defendant’s due process rights are not plausibly at risk; as the defendant also understands the ways in which a verdict was arrived at, they can pursue legal action to attempt to receive a different verdict, should they so choose to.

The COMPAS system is not transparent, nor easily explicable to a judge or defendant, let alone a large public. Subsequently, it is narratively unintelligible. Because the internal function of the algorithm is a trade-secret, the risk assessment score COMPAS generates produces a lack of aligned understanding between author and user. All parties in the courtroom are subject to this confusion. The defendant, being unable to ascertain the manner in which their life and their future has been “risk-assessed,” is incapable of challenging the score should they disagree with it. The State v. Loomis case is an example of this scenario in action. As a result, the defendant’s risk score functions somewhat like an algorithmic brand: no matter the effort, the defendant is incapable of changing or eliminating it from their profile for the offense at hand. The judge, having been given this risk-assessment as an aid to use in sentencing, is similarly unable to question its authenticity, its origin, or its specific weighting of data points. In particular, this inability on the judge’s behalf is ethically concerning because it has the potential to result in automation bias.

Automation bias highlights human limitations in decision-making when using automated decision-making aids like COMPAS. According to social psychology and human-computer interaction research on the biases involved in algorithmic decision-making systems, automation bias reveals that “decision-makers regularly rate automated recommendations more positively than neutral despite being aware that such recommendations may be inaccurate, incomplete, or even wrong.”¹⁷ If experiencing unchecked automation bias, it “becomes extremely burdensome for a human decision-maker [like a judge] to refute” a recommendation.¹⁸ This might lead a judge to defer to or rely heavily upon a risk score without questioning its legitimacy, even if they might know of reason to caution against its usage. Automation bias advances COMPAS’ lack of narrative intelligibility because it encourages a judge’s reliance on a risk score without seeking to understand its extended meaning. This possibility threatens the defendant’s right to due process and, as State v. Loomis exhibits, the right to a meaningful appeal. However, even setting aside the possibility of automation bias, COMPAS in its given state still fails to be narratively intelligible.

As a result of COMPAS’ lack of narrative intelligibility, both judge and defendant may very well reach different conclusions about what a COMPAS risk-assessment score might mean. As a tool inexplicable to the public due to its Equivant protections, and as one capable of generating divisive meaning within a courtroom, COMPAS scores are thus not only secret, but further propagate a lack of communally understood process and method. On a larger scale, this failure to be narratively intelligible extends into the ethicality of ranking and rating human beings, as well as recidivism potential. Pasquale notes that with the COMPAS system, a company like Equivant is marketing analytics to “predict not only the likelihood of criminal recidivism, but also the chances that any given person will be mentally ill, a bad employee, a failing student, a criminal, or a terrorist” in the present or the future.¹⁹

Acknowledging this fact, Pasquale puts simply: if COMPAS scores “cannot be explained in a narratively intelligible way, perhaps they should not be used at all without the direct consent of the person they are evaluating.”²⁰ If a risk-assessment score is failing to make future crime predictions transparently, and further, these predictions are incapable of being explained in a narratively intelligible manner, the potential for illegitimate decision-making on the judge’s behalf appears to be high as a result— whether intentionally so, or not.

Potential for Unconstitutional Practice

It is further unclear if COMPAS scores themselves are even constitutional. In particular, the potential for COMPAS to incorporate bias into its results appears high, as it seems to weigh factors such as a defendant’s socioeconomic status, gender, or race disproportionality. Any intentional discrimination against these factors is unconstitutional under the 14^th Amendment’s Equal Protection clause, which has been interpreted to prevent discrimination based on race and gender. If proven to be intentional, COMPAS’ capacity to produce a risk-assessment score which is discriminatory against certain protected groups may be unconstitutional. Until COMPAS is transparent, we cannot entirely dismiss the possibility of intentional discrimination in COMPAS’ factor weighting. As the 2016 study by ProPublica and others like it point out, this potential is not simply abstract. COMPAS’ ability to generate scores which reflect biased and discriminatory practices in regard to factors such as socioeconomic status, gender, and race have been published as largely factually accurate, raising the concern that perhaps Equivant does in fact weigh these factors intentionally.

COMPAS takes into account information that extends beyond a specific criminal incident, as personal factors ranging from gender, age, race, education level, familial background, social capability, and more are considered. Consequently, COMPAS appears to be factoring in a holistic life view of the defendant into its risk-assessment score. University of Michigan Law Professor Sonja Starr has pointed out that as a result of this holistic practice, “judges and parole boards are told to consider risk scores that are based not only on criminal history, but also on socioeconomic and family-related disadvantages as well as demographic traits like gender and age.”²¹ Specifically, Starr contends that risk-assessment instruments like COMPAS deem defendants riskier based on indicators of socioeconomic disadvantage, deem males riskier over females, count crime victimization and living in a high-crime neighborhood as risk factors, and also include assessments of the defendant’s attitudes or mental health–related as risk factors.²²

Starr continues to note that the socioeconomic, family, and neighborhood-related factors that these procedural instruments consider are highly race-correlated. In particular, for Starr, gender-based sentencing as a result of COMPAS scores could “exacerbate the extraordinary rates of incarceration of young and poor men of color.”²³

Sentencing decisions made in the criminal justice system which consider race and gender are explicitly defined as unconstitutional practice under U.S. Sentencing Guidelines. By allowing judges (whether knowingly or not) to generate a ruling based in part upon such factors, the state itself is endorsing a practice that allows certain groups of people to be considered “high-risk” or more likely to engage in violent crime because of factors they have no control over. These are judgments based upon the characteristics of who a person is, and not what actions they have done. In this respect, when a judge uses a COMPAS risk-assessment score which seems to label defendants as higher risk based upon identity characteristics like socioeconomic status, race, and gender, this judge is allowing discriminatory factors entry into the legal arena. Such practice is, at root, unconstitutional. According to Starr, by using the technical language of a risk-assessment score to obscure discrimination in this manner, unconstitutional judgment practices are able to enter legal ruling in ways that would otherwise be unacceptable if stated outright.²⁴ COMPAS’ potential to produce bias and discriminatory results, and further, to then allow these results entry into adjudication as a formal risk-score a judge may rely upon, is ethically concerning. It is also outside the bounds of constitutional practice.

A Note on Reverse-Engineering

It is important to note that in 2018 risk-assessment scholars Megan Stevenson and Christopher Slobogin reverse-engineered the Violent Recidivism Risk Score (VRRS) component of COMPAS’ black box system. After conducting a partial decomposition of the VRRS, Stevenson and Slobogin discovered that age alone accounts for 60% of COMPAS’ variation—contributing more than factors of criminal history, gender, or race to the risk score COMPAS produces.²⁵ This reliance brings to the foreground the issue of what Stevenson and Slobogin term “decisional blindness.” In decisional blindness, even though Stevenson and Slobogin open slightly COMPAS’ black box, the risk score itself may still not be conveyed in a fully transparent manner. Stevenson and Slobogin argue that while judges using COMPAS may be told that an offender’s “youth is a risk factor, the relative weight of age in the overall score may not be fully explained or understood at the time of decision-making,” and further, that unless the judge “makes specific inquiries, they will not be informed of the variables that contributed most heavily to a particular defendant’s risk score.”²⁶ The potential for automation bias to occur here, despite reverse-engineering efforts, remains high.

As a result, reverse-engineering COMPAS’ VRRS does not solve the issues raised when contemplating the factors of transparency, narrative intelligibility, and constitutionally sound practice because a judge using COMPAS must know to inquire about specific variable weighting in order to comprehend a VRRS score’s meaning. Further, the judge must actually make that inquiry, and incorporate to a relevant degree the results of that inquiry into adjudication. This information may still influence a judge’s discretion regardless, prompting judges to make choices that advance bias or discriminatory practice.

Exercising Caution Around COMPAS’ Use in Court

A sound judicial aid should be transparent, narratively intelligible, and constitutionally sound. These factors promote a judicial aid’s chance of supporting a judge in reaching a morally and legally defensible decision. Further, they also boost a public’s confidence that a judge’s decision in sentencing—especially at the federal level—is ethically sound. As it currently exists, the COMPAS system’s inability to adhere to these three principles (and at times to completely deviate from them) do not qualify it as a judicial tool capable of supporting a judge in a legitimate manner. This failure places the production of fair and just outcomes emerging from the judicial bench at risk.

Specifically, due to the three characteristics of COMPAS evaluated, the potential cost in ethical decision-making and court procedure that using a private algorithm like COMPAS induces currently disqualifies it as a sound judicial aid in sentencing. As such, its use should be limited or at the very least heavily scrutinized by judges in sentencing, until these issues are properly addressed. In this vein, the work of Stevenson and Slobogin to make COMPAS more accessible represents a possible path forward for COMPAS’ ethical use by judges in the future.

This conclusion generates several concerns for a variety of actors in the criminal justice system. It is especially concerning for judges: as a figure operating with little oversight, if the judge relies upon an ethically alarming judicial tool such as COMPAS, what happens to the moral and legal components of their judicial decision-making? And what are the consequences of a judicial aid which fails this test?

What is at stake here is the direct future of individual lives as well as the integrity of a judge’s unique discretion; whether or not the judge is able to make a sound, legitimate decision on another’s livelihood relies on this discretion remaining intact. As Eric Loomis might be able to tell you, such stakes are too high to leave to the chance presence of bias in algorithmic models. Instead, courtroom actors should encourage strong caution when engaging with these tools until they have been proven to pass a standard of transparency, narrative intelligibility, and constitutionally sound practice. To use these tools without doing so is to effectively hinder an accurate delivery of justice in sentencing procedure.