Newer
Older
%!TEX root = sl.tex
% The above command helps compiling in TexShop on a MAc. Hitting typeset complies sl.tex directly instead of producing an error here.
\section{Counterfactual-Based Imputation For Selective Labels}
If decision maker \machine makes a positive decision for a case where decision maker \human had made a negative decision, how can we infer the outcome \outcome in the hypothetical case where \machine's decision had been followed?
%
Such questions fall straight into the realm of causal analysis and particularly the evaluation of counterfactuals -- an approach that we follow in this paper~\cite{coston2020counterfactual}.
The challenges we face are two-fold.
%
Firstly, we do not have direct observations for the outcome under \machine's positive decision.
%
A first thought, then, would be to simply {\it predict} the outcome based on the features of the case.
%
In the bail-or-jail scenario, for example, we could investigate whether certain features of the defendant (e.g., their age and marital status) are good predictors of whether they comply to the bail conditions -- and use them if they do.
%
However, not all features that are available to \human are available to \machine in the setting we consider, which forms our second major challenge.
%
These complications mean that making direct predictions based on the available features can be suboptimal and even biased.
However, important information regarding the unobserved features \unobservable can often be recovered via careful consideration of the decisions in the data~\cite{mccandless2007bayesian,Jung2}.
%
This is exactly what our counterfactual approach achieves.
For illustration, let us consider a defendant who received a negative decision by the human judge.
Suppose also that, among defendants with similar recorded features \obsFeatures who were released, none violated the bail conditions -- and therefore, judging from observations alone, the defendant should be considered safe to release based on \obsFeatures.
However, if the judge was both lenient and precise -- i.e., was able to make those positive decisions that lead to successful outcome -- then it is very possible that the negative decision is attributed to unfavorable non-recorded features \unobservable.
And therefore, if a positive decision were made, {\it the above reasoning suggests that a negative outcome is more likely than what would have been predicted based alone on the recorded features \obsFeatures of released defendants}.
Our approach for evaluating decision of $\machine$ on cases where $\human$ made a negative decision unfolds over three steps: first, we learn a causal model over the dataset; then, we compute counterfactuals to predict unobserved outcomes; and finally, we use these predictions to evaluate a set of decisions by \machine.
% \note{Michael}{Actually, the paragraph above describes a scenario where {\it labeled outcomes} and possibly {\it contraction} would fail. Specifically, create cases where:
% (i) Z has much larger coefficient than X, and (ii) the judge is good (the two logistic functions for judge decision and outcome are the same), and (iii) the machine is trained on labeled outcomes. The machine will see that the outcome is successful regardless of X, because Z will dominate the positive (and negative) decisions. So it will learn that everyone can be released. Labeled outcomes will evaluate the machine as good -- but our approach will uncover its true performance.}
Recall from Section~\ref{sec:setting} that Figure~\ref{fig:causalmodel} provides the structure of causal relationships for quantities of interest.
We use the following causal model over this structure, building on what is used by Lakkaraju et al.~\cite{lakkaraju2017selective} and others~\cite{mccandless2007bayesian,jung2018algorithmic}.
%
First, we assume that the
% the observed feature vectors \obsFeatures and
unobserved features \unobservable can be modeled as a one-dimensional risk factor~\cite{mccandless2007bayesian}, for example by using propensity scores~\cite{rosenbaum1983central,austin2011introduction}.
%
Moreover, we are also going to present our modeling approach for the case of a single observed feature \obsFeatures -- this is done only for simplicity of presentation, as it is straightforward to extend the model to the case of multiple features \obsFeatures, as we do in the experiments (Section~\ref{sec:experiments}).
%
Motivated by the central limit theorem we model both \obsFeatures and \unobservable with Gaussian distributions.
%
Furthermore, since $\unobservable$ is unobserved we can assume its variance to be 1 without loss of generality, thus $\unobservable \sim N(0,1)$.
%
% (Any deviation from this can be achieved by adjusting intercepts and coefficients in the following).
In the setting we consider (Section~\ref{sec:setting}), a negative decision $T=0$ leads to successful outcome $Y=1$.
%
When $T=1$, the probability of success is given by a logistic regression model over the features $\obsFeatures$ and $\unobservable$:
\prob{\outcome=1~|~\decision, \obsFeaturesValue, \unobservableValue} & =&
1,~\text{if}~\decision = 0\\ \invlogit(\alpha_\outcome + \beta_\obsFeatures \obsFeaturesValue + \beta_\unobservable \unobservableValue
%
% Since the decisions are ultimately based on expected behaviour,
We model the decisions in the data similarly according to a logistic regression over the features:
\prob{\decision = 1~|~\judgeValue,\obsFeaturesValue, \unobservableValue} = \invlogit(\alpha_\judgeValue + \gamma_\obsFeatures \obsFeaturesValue + \gamma_\unobservable \unobservableValue
Although we model the decision makers here probabilistically, we do not imply that their decision are necessarily probabilistic (or include a random component). The probabilistic model arises from the unknown specific details of reasoning employed by each decision maker $\human_\judgeValue$.
Note also that we are making the simplifying assumption that coefficients $\gamma_\obsFeatures,\gamma_\unobservable$ are the same for all $\human_\judgeValue$, but decision makers are allowed to differ in intercept $\alpha_\judgeValue$.
%
Parameter $\alpha_{\judgeValue}$ controls the leniency of a decision maker $\human_\judgeValue \in \humanset$.
We take a Bayesian approach to learn the model over the dataset \dataset.
In particular, we consider the full probabilistic model defined in Equations \ref{eq:defendantmodel} and \ref{eq:judgemodel} and obtain the posterior distribution of its parameters $\parameters = \{ \alpha_\outcome, \beta_\obsFeatures, \beta_\unobservable, \gamma_\obsFeatures, \gamma_\unobservable\} \cup \bigcup_{\human_\judgeValue \in \human} \{\alpha_\judgeValue\}$, which includes intercepts $\alpha_\judgeValue$ for all $\human_\judgeValue$ employed in the data.
We use prior distributions given in Appendix~\ref{sec:priors} to ensure the identifiability of the parameters.
We remind that the goal is to provide a solution to Problem~\ref{problem:the} -- and, to do that, we wish to address those cases where $\machine$ decides $\decision = 1$ while the data has a negative decision $\decision = 0$, where evaluation cannot be performed directly.
In other words, we wish to answer a `what-if' question: for each specific case where a decision maker $\human_\judgeValue$ decided $\decision = 0$, what if we had intervened to alter the decision to $\decision = 1$?
In the formalism of causal inference~\cite{pearl2010introduction}, we wish to evaluate the counterfactual expectation
\begin{align}
\cfoutcome = & \expectss{\decision \leftarrow 1}{\outcome~| \obsFeaturesValue, \judgeValue, \decision = 0; \dataset}
The expression above concerns a specific entry in the dataset with features $\obsFeatures=x$, for which decision maker $\human_\judgeValue$ made a decision $\decision = 0$.
It expresses the probability that the outcome would have been positive ($\outcome = 1$) had the decision been positive ($\decision = 1$), conditional on what we know from the data entry ($\obsFeatures = \obsFeaturesValue$, $\decision = 0$, $\judge = \judgeValue$) as well as from the entire dataset \dataset.
Notice that the presence of \dataset in the conditional part of~\ref{eq:counterfactual} gives us more information about the data entry compared to the entry-specific quantities
%($\obsFeatures = \obsFeaturesValue$, $\decision = 0$)
and is thus not redundant.
In particular, it provides information about the leniency and other parameters of the decision maker $\human_\judge$, which in turn is important to infer information about the unobserved variables \unobservable, as discussed in the beginning of this section.
For the model defined above, the counterfactual $\hat{Y}$ can be computed by the approach of Pearl \cite{pearl2000}.
For a fully defined model (with fixed parameters) the counterfactual expectation can be determined by the following expression:
E_{\decision \leftarrow 1}(\outcome|\judgeValue,\decision=0,\obsFeaturesValue)
&= \int \prob{\outcome=1|\decision=1,\obsFeaturesValue,\unobservableValue} \prob{\unobservableValue|\judgeValue, \decision=0,\obsFeaturesValue} \diff{\unobservableValue} \label{eq:counterfactual_eq}
In essence, we determine the distribution of the unobserved features $\unobservable$ using the decision, observed features $\obsFeaturesValue$, and the leniency of the employed decision maker, and then determine the distribution of $\outcome$ conditional on all features, integrating over the unobserved features (see Appendix~\ref{sec:counterfactuals} for more details).
Having obtained a posterior probability distribution for parameters \parameters we can estimate the counterfactual outcome value based on the data:
\int \prob{\outcome=1|\decision=1,\obsFeaturesValue,\unobservableValue,\parameters} \prob{\unobservableValue|
\decision=0,\obsFeaturesValue,\parameters} \diff{\unobservableValue}\prob{\parameters | \dataset} \diff{\parameters} \label{eq:theposterior}
Note that, for all data entries other than the ones with $\decision = 0$
we trivially have \cfoutcome = \outcome
%\spara{Implementation}
The result of Equation~\ref{eq:theposterior} can be computed numerically:
%
\prob{\outcome = 1 | \decision = 1, \obsFeaturesValue, \unobservableValue_k, \theta_k} \label{eq:expandcf}
where the sums are taken over $N$ samples of $\parameters$ and $\unobservable$ obtained from their respective posteriors.
In practice, we use the MCMC functionality of Stan\footnote{\url{https://mc-stan.org/}} to obtain these samples.
Expression~\ref{eq:expandcf} gives us a direct way to evaluate the outcome of a positive decisions for any data entry for which $\decision = 0$.
Note though that, unlike $\outcome$ that takes integer values $\{0, 1\}$, \cfoutcome may take also fractional values $\cfoutcome \in [0, 1]$.
Having obtained outcome estimates for all data entries, it is now straightforward to obtain an estimate for the failure rate $\failurerate$ of decision maker \machine: it is simply the average value of \cfoutcome over all data entries.
We will refer to it as \textbf{\cfbi}, for {\underline c}ounter{\underline f}actual-{\underline b}ased {\underline i}mputation.
\begin{figure}[t!]
\begin{center}
%\includegraphics[height=2in]{img/setting}
\includegraphics[width=0.95\columnwidth]{img/fig3_antti}
\end{center}
%
Negative decisions ($\decision = 0$) by decision maker $\machine$ are evaluated as successful ($\cfoutcome = 1$), shown with dashed arrows.
%
Positive decisions ($\decision = 1$) by decision maker $M$ for which the decision in the data was also positive ($\decision = 1$) are evaluated according to the outcome $\outcome$ in the data, as marked by the solid arrow.
%
For the remaining cases (second and third), the evaluated outcomes $\cfoutcome$ are based on our counterfactual imputation technique. The failure rate of the decision maker $\machine$ is $2.7/7=38.6 \%$ here.
}
\label{fig:approach}
\end{figure}