Explainability may be a neologism but it has become a core concept for AI. Although it is not a brand-new idea, there are at least three reasons why this concept has become an important topic. Firstly, the main machine learning techniques used for AI today, particularly those built around artificial neural networks like deep learning, produce “black box” models, so called because they do not function according to explicitly programmed rules like symbolic AI, but based on complex probabilistic calculations that are difficult to trace. More specifically, the term “black box” is used to indicate that these models are opaque: their internal processes are difficult or impossible to understand. In particular, this opacity is due to the mathematical complexity of these models’ calculations and the sheer number of parameters that they use. For example, OpenAI’s GPT-4 is said to have 1.76 trillion parameters. Secondly, although these “black box” models produce good results in the learning phase, their performance in real-life contexts is nowhere near as good. It is therefore important to understand how these models work to determine what is causing this discrepancy. Finally, as “black box” models are being used in a growing number of sectors (for example, legal, insurance and healthcare), important ethical questions are coming to the fore, which also require an understanding of how these models work.
Although explainability is touted as a core principle of responsible AI today, it does not have a standard definition.
Explainability as an Ethical Principle of “Responsible” AI
Adopting explainability as an ethical principle is no simple matter. In the past, the concept was mainly explored from the perspective of human-computer interaction raised by expert systems in the context of symbolic AI. It is therefore interesting to inspect why explainability has become such a key topic from an ethical point of view.
Firstly, some systems have been found to produce biased outcomes, raising concerns about fairness and social justice. For example, some facial recognition systems have been found to work better for light skin than dark skin. Studies have shown that many speech recognition systems perform better for male voices. A system used by Amazon for recruitment was scrapped because it was found to favor male candidates. These unexpected outcomes are primarily linked to the use of biased training data. Given the “discrimination” observed, understanding how and why an AI model produced a given outcome seems necessary to identify, correct and prevent such biases.
Secondly, the use of black-box models in a growing number of sensitive areas within the social system such as legal, healthcare, insurance and banking sectors, has created the need to be able to explain how they work in situations where the decisions taken on the basis of these models can have serious consequences for individuals (for example, if they are denied a bank loan or their insurance premium is raised). The opacity of “black box” models thus contributes to the problem of what MacLure  calls a “public reason deficit,” in the sense that an individual will not be able to understand the “reasons” behind the decisions that an institution (e.g. a medical institution) has made about them with the help of AI when that reasoning is not publicly accessible and justifiable. The challenge is to enable anyone, regardless of their AI skill level, to understand the decisions directly affecting them that AI has helped to make. This is called a “right to explanation” and has become a key concept in the regulation of decision-making algorithms. It was incorporated into the EU General Data Protection Regulation (GDPR) in 2018. This right has also been enshrined in other countries, for example, the United States, where the Code of Federal Regulations stipulates that every decision related to bank credit must be underpinned by a right of explanation. In Europe, explainability is now part of the recent European AI Act. This Act states that AI systems shall be developed to allow explainability.
Explainability: Definitions and Methods
Although explainability is touted as a core principle of responsible AI today, it does not have a standard definition . In fact, it has several definitions, some more overlapping than others, and the terminology used to describe this principle can vary. For example, according to some authors, it refers to the ability to explain algorithmic processes in terms that are understandable or intelligible to a human . Another approach is to link explainability to the notion of “cause.” Through this lens, explainability defines the degree to which a human can understand the “causes” of an outcome from an AI system . The explanation therefore consists in providing the “causes” of the outcome. While some authors use interpretability as a synonym for explainability, others distinguish the two. In this way, Rudin  considers explainability to refer primarily to attempts to retrospectively explain the outcomes of black box models, whereas interpretability applies to models that are inherently interpretable (unlike “black box” models). Inherently interpretable models are models whose outputs are not only explainable but can be explained without additional algorithms created specifically for this purpose. Some definitions also distinguish between transparency and explainability. Transparency is sometimes associated with the level of overall understanding of the model (for example, the set of parameters it uses, how they relate to one another), while explainability refers to post-hoc explanations of a specific outcome generated by the model. Finally, work is underway to standardize this concept. For example, the IEEE defines it in relation to transparency: “The extent to which the information made transparently available to a stakeholder can be readily interpreted and understood by a stakeholder.”
But, in more practical terms, what kind of explanations of “black box” models can we actually produce? Today, there are many different algorithmic techniques used to generate such explanations, given that they keep evolving and new ones are developed. These techniques can be grouped into three broad categories. The first category groups together techniques that provide what are known as “local” or “post-hoc” explanations, that is to say that they concern a specific piece of system output (for example, the detecting a pathology on an X-ray, or a credit rating). These are specific algorithms that are used after the output. In this regard, local explanations involve indicating the variables (features), i.e. the input data, that were the most influential in generating the output in question. Counterfactual explanation is another type of local explanation that indicates the smallest change in variable values that would transform the output and lead to a different outcome. For example, if an AI system is used to estimate the probability of a customer terminating a telephone subscription (churn) based on a number of variables (e.g. personal information and services subscribed to), the counterfactual explanation would involve investigating which changes (in the variables) would alter that probability . The second category groups together all the “global” techniques that consist of algorithms specifically designed to provide a global explanation of how the “black-box” model operates and that are applied after the learning phase. The third category groups together the techniques involving learning models that are self-explaining. These models are “inherently” explainable or interpretable (explainable by design) insofar as they have been equipped with explanation-generating algorithms from the start. Finally, there are the “agnostic” techniques, so called because they can be applied to any machine learning model.
But what are the explanations produced by these techniques used for and to what extent do they really allow end users to understand an AI system’s output? In this respect, the structure and formulation of these explanations are not suitable in all contexts, particularly in the case of lay users (who are not AI experts). These explanations were originally developed by and for AI experts. Such aspects are important to consider in terms of explanation design, the user interface, Human-computer interaction and adapting explanations to different contexts of use. The different contexts in which explanations are used fall into a number of categories. The first category is made up of the people who need explanations to debug, optimize, or identify and remove any biases from the model. These people include data scientists, developers, technical staff etc. Another category of contexts that might call for explanations groups together the audit or analysis of a system’s regulatory compliance. In this case, the explanations’ users might be auditors (who are potentially legal specialists) or inspectors (e.g. safety or quality monitoring). The use of AI systems for professional activities (e.g. to assist with medical diagnoses) is the third category of contexts in which users with specific expertise in a given field (doctors, legal specialists, banking advisors, etc.) may need explanations. Finally, there are the contexts in which users are members of the “general public,” such as bank customers, patients or the passengers of a driverless car. For example, a bank customer might want to understand why an AI system has assigned them a given credit rating. This categorization shows us that there are a wide variety of situations in which producing explanations may prove useful or even crucial. These situations must be analyzed to determine what kinds of explanations the potential recipients need and how they should be formatted. In this regard, the user-centered design approach, which is common practice in ergonomics and human-computer interaction, seems perfectly suited to the task because it does just what it says: puts the needs of users at the center of the design process.
Debates and Criticisms: Are Explanations of “Black-Box” Models Thought Illusory?
While explainability appears at first glance to be key to developing “responsible AI,” it has sparked a number of important debates.
Beyond the criticisms pointing to the lack of precision in the definitions of explainability and its related concepts, such as interpretability and transparency , one such debate concerns the use of “black box” models. Some researchers are calling for these models to be withdrawn from domains where AI-driven decisions can have significant negative impacts on humans or the economy, for example in the legal, healthcare, transport or banking sectors, because the reliability of these models cannot be ensured . Furthermore, according to some authors such as Rudin et al. , explainability techniques generate, at best, only approximate explanations, which are not a true reflection of the actual algorithmic process that produced an outcome. At worst, these explanations have the potential to sometimes be misleading, so much so that certain authors consider explainability techniques to entail a risk of rationalization, which can lead to “fairwashing,” where the explanations generated could deceive people into believing that a model respects certain ethical values when it does not . For high-risk areas, these researchers therefore advocate the use of inherently interpretable techniques (for example, decision trees or linear regression) that generate “transparent” models (“white- or glass-box models”). And, countering the popular belief that the performance of such models is poorer, Rudin et al.  argue that there is no difference in performance in many use cases between these and “black box” models. In other words, accepting “inexplainability” as the price of greater performance is not necessarily the only option when it comes to choosing a machine learning technique. These authors go even further by considering that explainability can serve as a pretext that perpetuates the use of “black box” models despite the problems with them.
Another criticism of explainability concerns analogical reasoning. The premise of this reasoning is that we do not always seek to explain human decisions and we accept that it is not always possible to understand them, due to, for example, the indescribable or elusive nature of the decision-making process. The conclusion of this reasoning is that we should have the same attitude to “black box” models; we should accept their opacity. This is a view shared by one the world’s leading deep learning specialists, Geoffrey Hinton . However, it appears to be fallacious . It is based upon a view that focuses on the individual and does not take into account the social and institutional dimensions of decision-making. Indeed, in the case of legal systems, there are rules and procedures governing court rulings that enable litigants to understand the reasons for decisions that affect them. Viewed from this perspective, opacity becomes a problem because the use of “black box” models for decision-making precludes this type of justification as judges would not be able to explain the outcome (produced by the model) that they have relied on to make a decision. This brings us back to the aforementioned “public reason deficit” problem.
While explainability is currently considered key to developing “responsible” AI, applying this principle is no simple matter given the many debates and criticisms it has prompted. These criticisms and debates should be viewed as warnings of the current limitations of explainability and, therefore, the limitations of what can be done with it in terms of AI ethics. It should also be noted that explainability is not in itself an ethical principle, but rather a means of aligning AI development with other, more fundamental ethical principles (such as fairness, informed consent and respect for privacy) when AI is being used in contexts where such principles are at stake (for example, the assessment of a bank customer’s borrowing capacity or medical decision-making). The debates on explainability give rise to another, more general question, that of whether or not AI techniques that create “black box” models should be used at all, given how hard it is to explain the outcomes they produce. Although the performance of these models is undeniable, there appear to be other AI techniques that perform just as well, while remaining interpretable. It therefore seems more prudent to use these interpretable models in use cases where they perform just as well as “black box” models. Another key takeaway is that it is important not to separate the explainability of an AI system from the context in which it is used. The need for explanation should be determined by these contexts because this need can vary depending on the explanation’s recipients, its purposes, the activities concerned, etc. In other words, a contextual approach to explainability is needed.
 IEEE Standard for Transparency of Autonomous Systems,” in IEEE Std 7001-2021, vol., no., pp.1-54, 4 March 2022.