@@ -128,6 +128,11 @@ deciding whether of defendant should be given bail or kept in jail, we are not a
\section{The Selective Labels Framework}
\acomment{Have to write this in terms of a defendant and bails and jails to make sense. Maybe generalize later?}
\acomment{Can we first write it without X and Z and R and then add them as modeling choices.}
\subsection{Model}
\begin{figure}
...
...
@@ -151,10 +156,38 @@ deciding whether of defendant should be given bail or kept in jail, we are not a
\caption{$R$ leniency of the decision maker, $T$ is a binary decision, $Y$ is the outcome only observed for some decisions. Background features $X$ for a subject affect the decision and the outcome. Additional background features $Z$ are visible only to the decision maker in use. }
\end{figure}
Binary variable $T$ denotes the decision: $T=0$ the defendant is jailed and when $T=1$ the defendant is given bail. We assume the decision is affected by the leniency, observed background factors in the data $X$ and other background factors $Z$ not observed in the data.
The binary variable $Y$ measures the outcome: if $Y=0$ defendant offended and if $Y=1$ the defendant did not. This outcome is affect by the observed background factors $X$, unobserved background factors $Z$. In addition, there may be other background factors that affect $Y$ but not $T$.
The selective labels issue is that in the observed data when $T=1$ (i.e. jail the defendant) then deterministically\footnote{Alternatively, we could see it as not observing the value of $Y$ when $T=1$ inducing a problem of selection bias.}$Y=1$ (i.e. no offences by the defendant).
\subsection{Decision Makers}
\subsection{Evaluation}
Acceptance rate (AR) is the number of positive decisions ($T=1$) divided by the number of all decisions.
Failure rate (FR) is the number of undesired outcomes ($Y=0$) divided by the number of all decisions.
One special characteristic of FR in this setting is that a failure can only occur with a positive decision ($T=1$).
%That means that a failure rate of zero can be achieved just by not giving any positive decisions but that is not the ultimate goal.
The goal is to give an estimate of the FR at any given AR for any decision maker $D$. The difficulty is occurs when a decision maker decides to bail a defendant , we cannot directly observe whether the defendant offended or not.
% Given the selective labeling of data and the latent confounders present, our goal is to create an evaluator module that can output a reliable estimate of a given decider module's performance. We use acceptance rate and failure rate as measures against which we compare our evaluators because they have direct and easily understandable counterparts in the real world / applicable domains. The evaluator module should be able to accurately estimate the failure rate for all levels of leniency and all data sets.
%The "eventual goal" is to create such an evaluator module that it can outperform (have a lower failure on all levels of acceptance rate) the deciders in the data generating process. The problem is of course comparing the performance of the deciders. We try to address that.