$R$ is the leniency of the decision maker, $T$ is a binary decision, $Y$ is the outcome that is selectively labeled. Background features $X$ for a subject affect the decision and the outcome. Additional background features $Z$ are visible only to decision maker \human. }\label{fig:causalmodel}
\end{figure}
The setting we consider is described in terms of two decision processes.
The setting we consider is described in terms of {\ittwo decision processes}.
%
In the first one, a decision maker \human considers a case described by a set of features $F$ and makes a binary decision $T = T_{_H}\in\{0, 1\}$, nominally referred to as {\it positive} ($T =1$) or non-positive ($T =0$).
%
...
...
@@ -42,7 +42,7 @@ Moreover, we assume that decision maker \human is associated with a leniency lev
%
Formally, for leniency level $R = r\in[0, 1]$, we have
\begin{equation}
P(T = 1 | R = r) = \sum_{X, Z} P(T = 1, X, Z | R = r) = r .
P(T = 1 | R = r) = \sum_{F} P(T = 1, F | R = r) = r .
\end{equation}
The product of this process is a record $(H, X, T, Y)$ that contains only a subset $X\subseteq F$ of the features of the case, the decision $T$ of the judge and the outcome $Y$ -- but leaves no trace for a subset $Z = F - X$ of the features.
...
...
@@ -55,13 +55,13 @@ Figure~\ref{fig:causalmodel} shows the causal diagram that describes the operati
In the second decision process, a decision maker \machine considers a case from the dataset, described by the set of recorded features $X$ and makes its own binary decision $T = T_{_M}$ based on those features, followed by a binary outcome $Y = Y_{_M}$.
In the second decision process, a decision maker \machine considers a case from the dataset -- and makes its own binary decision $T = T_{_M}$ based on recorded features$X$, followed by a binary outcome $Y = Y_{_M}$.
%
In our example, \machine corresponds to a machine-based automated-decision system that is considered for replacing the human judge in bail-or-jail decisions.
%
% Notice that we assume \machine has access only to some of the features that were available to \human, to model cases where the system would use only the recorded features and not other ones that would be available to a human judge.
%
The definitions and semantics of decision $T$ and outcome $Y$ follow those of the first process and are not repeated.
The definitions and semantics of decision $T$ and outcome $Y$ follow those of the first process.
%
Moreover, decision maker \machine is also associated with a leniency level $R$, defined as before for \human.
%
...
...
@@ -79,16 +79,18 @@ Note that index $j$ refers to decision maker $H_j$ and \invlogit is the logistic
@@ -96,13 +98,13 @@ As stated in the equations above, we consider normalized features \features and
%
Moreover, the probability that the decision maker makes a positive decision takes the form of a logistic function (Equation~\ref{eq:judgemodel}).
%
Note that we are making the simplifying assumption that coefficients $\gamma$ are the same for all defendants, but decision makers are allowed to differ in coefficient $\alpha_j$, so as to allow for varying leniency levels among the decision makers.
Note that we are making the simplifying assumption that coefficients $\gamma$ are the same for all defendants, but decision makers are allowed to differ in coefficient $\alpha_j$ so as to model varying leniency levels among them (Equation~\ref{eq:leniencymodel}).
%
The probability that the outcome is successful conditional on a positive decision (Equation~\ref{eq:defendantmodel}) is also provided by a logistic function, applied on the same features as the logistic of (Equation~\ref{eq:judgemodel}).
The probability that the outcome is successful conditional on a positive decision (Equation~\ref{eq:defendantmodel}) is also provided by a logistic function, applied on the same features as the logistic formula of (Eq.~\ref{eq:judgemodel}).
%
In general, these two logistic functions may differ.
In general, these two logistic functions may differ in their coefficients.
%
However, in certain scenarios, a decision maker would be considered good if the two functions were the same -- i.e., if the probability to make a positive decision was the same as the probability to be followed with a successful outcome.
However, in many settings, a decision maker would be considered good if the two functions were the same -- i.e., if the probability to make a positive decision was the same as the probability to obtain a successful outcome.
\todo{Michael to Riku}{Define the full model above.}
...
...
@@ -117,15 +119,15 @@ Note, however, that a decision maker that always makes a negative decision $T=0$
For comparisons to be meaningful, we compare decision makers at the same leniency level $R$.
The main challenge in estimating FR, however, is that in general the dataset does not directly provide a way to evaluate FR.
The main challenge in estimating FR is that in general the dataset does not directly provide a way to evaluate FR.
%
In particular, let us consider the case where we wish to evaluate decision maker \machine\ -- and suppose that \machine is making a decision $T_{_M}$ for the case corresponding to record $(H, X, T_{_H}, Y_{_H})$.
%
Suppose also that the decision by \human was $T_{_H}=0$, in which case the outcome is always positive, $Y_{_H}=1$.
%
If the decision by \machine is $T_{_M}=1$, then it is not possible to tell directly from the dataset what the outcome $Y_{_M}$ would have been in the hypothetical case where decision maker's \machine's decision had been followed in the first place.
If the decision by \machine is $T_{_M}=1$, then it is not possible to tell directly from the dataset what its outcome $Y_{_M}$ would be.
%
The approach we take to deal with this challenge is to use counterfactual reasoning to infer $Y_{_M}$.
The approach we take to deal with this challenge is to use counterfactual reasoning to infer $Y_{_M}$(see Section~\ref{sec:imputation} below).
Ultimately, our goal is to obtain an estimate of the failure rate FR for a decision maker \machine.