Suppose I ask you 137 questions. How many times have you moved? Have you ever skipped class? Do you often become bored? Then, having supplied your answers to a sophisticated algorithm, I assign you a score, from 1 to 10. If the score is high enough, a judge will rule that you are ineligible to be released on bail or will add five years to your prison sentence.
Welcome to the reality of the Correctional Offender Management Profiling for Alternative Sanctions (COMPAS), a common American criminal risk assessment tool.1 COMPAS’s 137 questions probe a jailed individual’s personal and criminal history, asking questions related to crime (“How many prior juvenile felony offense arrests?”) along with seemingly mundane ones (“Do you live with friends?”; “Do you feel discouraged at times?”).2 The use of COMPAS in criminal sentencing has sparked controversy for alleged racial bias.3 The COMPAS controversy centers on whether it is ethical for the government to use unexplainable technology to make decisions that could alter the lives of their citizens.
Unlike some scholars and advocacy groups in the field of cyber ethics, I argue that there is a place for algorithmic tools in the criminal justice system. However, their use should be redefined under a human rights-based framework in which the results of the algorithm are evaluated not merely by their macro-level statistics, but by their micro-level human impact. The law loses transparency and accountability otherwise, becoming subject to the arbitrary design choices of Northpointe and other profit-seeking corporations. I will make three recommendations for governments in future implementations of risk-assessment tools. I argue that these tools should emphasize context rather than a raw score, that they should be decoupled from corporations in order to increase transparency, and that they should be paired with initiatives that generate oversight and raise awareness about bias.
The world grapples with a future in which once well-established institutions, from the Internal Revenue Service (IRS) to the criminal justice system, must radically transform even their most basic operating functions. Governments now use personal financial and residential data to streamline government services and uncover previously indiscernible patterns. The IRS uses data analysis to expose tax evasion and fraud, and the National Institutes of Health uses big data to speed up biomedical research.4
Another institution ripe for reform is the U.S. criminal justice system, which currently has the ignoble title of the world’s highest incarceration rate. Against the backdrop of overcrowded prisons, criminal justice reform has turned digital, with a greater focus on the criminal risk assessment algorithm.5 Those who are optimistic about tools like COMPAS hope that big data will make the US criminal justice system more equitable in the same way as it has boosted the efficiency of the IRS and helped to diagnose climate change problems in the UK.6 The resulting 137-question tool, intended to assess the rate of reoffending for a given defendant, soon became mired in controversy. The potentially transformative benefit of an algorithmic tool that could potentially reduce the incarceration rate and add a sense of impartiality to a system could nonetheless still make skewed judgements based on race. Although race is not among the factors assessed in the COMPAS survey, organizations like ProPublica have criticized it for assigning black defendants higher risk scores.7 Among individuals who ultimately did not reoffend, blacks had a 42% chance of being assigned a high risk score, whereas whites were incorrectly labeled high-risk only 22% of the time.8 Northpointe rebuts that its tool has fair accuracy rates, arguing that the fact that of individuals who were assigned a score 7 out of 10, a nearly equal number of white and black defendants reoffended (60% and 61%, respectively) proves that COMPAS neutrally assesses the likelihood of recidivism.9
Civil society groups such as the Partnership on AI have called for “jurisdictions to cease using the tools in decisions,” arguing that the technology is simply not ready in its current form.10 However, this conclusion is somewhat jejune. It does not at all grapple with the fact the government remains badly in need of reform, and ignores untapped potential in algorithmic tools. Discrediting big data in criminal justice throws the baby out with the bathwater.
I instead propose a reframing of the COMPAS debate. We should avoid limiting our solutions to either freely using COMPAS or banning it. Rather, we should interrogate how governments can appropriately and equitably employ algorithmic tools. The COMPAS debate, as it currently stands, is irresolvable because it relies on two mathematically contradictory definitions of fairness. It is impossible to both be “fair” in Northpointe’s sense (to have the score of 7 correspond to 60% regardless of race) and to be “fair” in ProPublica’s sense (to have the same rate of false positives across races). Due to existing biases in the criminal justice system, the recidivism rate of black defendants is higher than that of whites (52% versus 39%). This means that “a greater share of black defendants will be classified as high risk. And if a greater share of black defendants is classified as high risk, then…a greater share of black defendants who do not reoffend will also be classified as high risk,” only reinforcing the criminal justice system’s racial bias.11 In other words, it is not possible to design an algorithm that satisfies all definitions of fairness. The United States must therefore reconsider its definition of fair, ethical policy positions in the context of algorithmic tools.
Here, I draw from two recently proposed frameworks for the ethical use of artificial intelligence. Vidushi Marda describes a three-stage framework for AI— Data, Model, and Application—through a case study of India. Marda’s argument is applicable to AI in general, calling for an evaluation of factors such as training, data parity, and security within each stage.12 More recently, ARTICLE19, a British human rights organization, proposed the notion of a “human rights-based approach to AI,” drawing upon an international ethical and legal standard.13 Both of these frameworks ground a better approach to AI in the criminal justice system.
As the direct implementers of algorithmic tools, governments have the obligation to make the ethical use of artificial intelligence a priority. ARTICLE19 recommends that governments “[h]old AI systems to accountability, responsibility, and constitutional standards without dilution or exception.”14 Three specific standards from Marda’s framework help to narrow the concerns relevant to criminal justice reform: System and Historical Bias, Feature Selection, and Fairness.
The first standard from Marda’s framework asks governments to consider whether data can “cement, formalize and imbibe” biases in the algorithm’s greater environment.15 The second standard mandates oversight for data reidentification. Careless algorithmic design can inadvertently reveal protected attributes, such as race, that are supposed to be shielded from the algorithm’s view.16 Many of the inputs used in the COMPAS algorithm, such as a defendant’s home address and education, are considered to be a proxy for race. Finally, governments should pay close attention to the tradeoffs inherent in attempting to achieve a fair algorithm. The third standard makes clear that fairness often erodes competing values. For instance the “Removal of discrimination has been shown to reduce overall accuracy in a model.”17
Held up against these standards, COMPAS fails in several notable ways. Its numerical scores often erase nuances behind the data. As the Partnership on AI notes in its Report on Algorithmic Risk Assessment Tools:
The reasons for someone not appearing in court, getting re-arrested, and/or getting convicted of a future crime are all very distinct, so a high score…would group together people who are likely to have a less dangerous outcome…with [those likely to have] more dangerous outcomes.18
COMPAS produces the very risk that Marda warns of in the second standard. Even though race is not explicitly used as a feature, many of the features disproportionately relate to race—for instance, black defendants are likely to have more prior arrests, causing the algorithm to implicitly create a racial bias even when race is removed from the training data.19
These considerations point to the fact that, while Northpointe’s definition of fairness (that its prediction of those who reoffend is accurate regardless of race) is legitimate on its face, it is not worth the obvious tradeoffs to having fair outcomes. Evaluating COMPAS under a human rights-centric framework should therefore lead us to prioritize the quality of treatment of minority groups over the sanctity of a numerical fairness standard.
Unlike the Partnership on AI, I do not argue that algorithmic tools like COMPAS should be entirely abandoned. I advocate instead for a paradigm shift on the part of governments as they implement algorithmic decision-making tools in criminal justice. Algorithms are “more than simple mathematical problems: they are socio-technical systems that depend on the contextual setting in which they function.”20 As noted in the introduction, I make three broad recommendations to this end: first, I recommend that criminal risk assessment tools prioritize giving statistical insights over delivering a consolidated risk score. Second, I recommend decoupling these tools from corporations, as copyrighted tools are difficult to scrutinize. And finally, I recommend that the corrections departments introducing these tools undertake initiatives to mitigate the risk of bias—for instance, by establishing an ethical advisory board, conducting regular audits, and providing appropriate bias training for police officers.
My first recommendation stems from a need to shift away from a single numerical score. In addition to the mathematical impossibility of creating an unbiased score, providing police officers a number with no context causes automation bias—in which “information presented by a machine is viewed as inherently trustworthy and above skepticism.”21 COMPAS is an example of this effect. Its current use suggests a tendency to trust the algorithm even in instances where it is contextually inappropriate. For example, COMPAS is typically used on a local or regional basis, yet the algorithm was trained on a nationwide sample, which may not be representative of local trends. Moreover, although COMPAS was not originally developed for use during sentencing. It was intended to merely assess recidivism risk. Despite this, judges frequently use COMPAS scores to determine the length of criminal sentences.22
Additionally, numerical scores should not be presented in isolation. Companies like Northpoint should provide information about the source of data, the potential bias, and the original intended use of the tool. Officers who use the tools should be empowered to make decisions given a full understanding of the tool, rather than be encouraged to blindly trust the computer. Individuals whose lives are impacted by an algorithm’s judgement also have a right to understand how and why the decision was found. Even though current AI technology is far from providing rationale for its calculations, algorithmic systems should disclose the specific measures that were used in the decision-making process. These explanation facilities would enable “individuals and users assess [to] whether a given output is justified, and whether they should seek a remedy through the courts.”23
Furthermore, algorithmic tools should prioritize delivering information about trends at the systemic level rather than attempting to over-generalize group data for individual defendants. One yet-unexplored use of big data is to question long standing assumptions in criminal justice: whether mental health leads to violence; whether incarceration effectively decreases recidivism; whether community policing reduces crime.24
My second recommendation emerges from comparing COMPAS to its UK counterpart, the Harm Assessment Risk Tool (HART). Notably, the UK has a far less commercial approach to developing such tools. Unlike COMPAS, which is owned by Northpointe and licensed to individual jurisdictions, HART was created in a collaborative effort between the police department for the county of Durham and the University of Cambridge.25 This collaborative approach has made the development of the algorithm much more transparent. By contrast, the COMPAS algorithm is a piece of proprietary software and cannot be investigated by independent researchers. As one group of computer science researchers explain, “Northpointe has refused to disclose the details of its proprietary algorithm. …That’s understandable. But it raises questions about relying on for-profit companies to develop risk assessment tools.”26 I recommend that court systems avoid using algorithmic tools that cannot be scrutinized for proprietary reasons.
Finally, I recommend establishing institutional oversight for algorithmic tools. One generally successful example of such oversight is the West Midlands Ethics Committee in the United Kingdom. Following the introduction of algorithmic decision-making for criminal cases, the West Midlands Police and Crime Commissioner established an ethics board to oversee algorithms’ use and evaluation. This Committee is an excellent example of a proper engagement of civil society stakeholders; the Committee is composed mostly of West Midlands residents who have a “diverse range of relevant skills and experiences” and an equal gender balance.27
HART provides another example of oversight. Officers who use HART undergo substantial bias awareness training. The training materials make clear that HART is not meant to provide a comprehensive picture; “the custody officers… remain the decision makers and must ensure that the HART output is but one factor they consider alongside all of the many other factors they are statutorily obliged to consider.”28
These examples constitute important steps in the right direction. Certainly the UK’s approach is far from perfect: a June 2019 report by the Law Society of England and Wales found “a lack of explicit standards, best practice, and openness or transparency about the use of algorithmic systems in criminal justice across England and Wales.”29 Moreover, HART has raised its own set of controversies over bias and data privacy.30 Though we should laud these examples of involving civil society stakeholders and implementing anti-bias training, much work remains.
As governments decide how to incorporate big data into institutional reform, they should adopt a critical lens that acknowledges the risk that algorithms will link disparate features together, re-identifying sensitive demographic information and perpetuating oppression of minority groups. Especially because algorithms appear unbiased on the surface, “it will often not be clear to a human operator that an algorithmic criminal justice tool needs reconsideration.”31 States which hope to invest in AI must recognize that these tools can as easily solve their problems as they can intrench them. When the tools are implemented, governments have the duty to enshrine the human rights-based approach to AI in policy. Per ARTICLE19’s recommended paradigm, states should conduct “human rights impact assessments…[and] continuous auditing. These are not systems that can simply be rolled out. They should instead be tailored to the exact context and use for which they are intended.”32
Ultimately, racial equity in the criminal justice system should be motivated by a human- rights based approach, in which issues such as reidentification risk and automation bias remain front and center. In a world where 137 questions can change one’s life, governments ought to orient themselves toward a more equitable future, rather than cling to a flawed past.