setting.tex

%!TEX root = sl.tex
% The above command helps compiling in TexShop on a MAc. Hitting typeset complies sl.tex directly instead of producing an error here.

\section{Setting and problem statement}
\label{sec:setting}


%\note{Antti}{Lakkaraju had many decision makers. Can we have just one or do we run into trouble somewhere? Perhaps need to add judge(s) at places.}  

%The setting we consider is described in terms of {\it two decision processes}.
%We consider a setting where a set of decision makers $\humanset=\{\human_\judgeValue\}$ make decisions for a set of cases.
%NO WE DO NOT CONSIDER THIS SETTING, THE SETTING IS THAT WE HAVE TO EVALUATE M BASED ON DATA
We consider data recorded from a decision making process with the following characteristics~\cite{lakkaraju2017selective}.
%
Each case is decided by one decision maker and we use $\judge$ as an index to the decision maker the case is assigned. 
%
For each such assignment, a decision maker $\human_\judgeValue$ considers a case described by a set of features \allFeatures and makes a binary decision $\decision \in\{0, 1\}$, nominally referred to as {\it positive} ($\decision = 1$) or {\it negative} ($\decision = 0$).
%
Intuitively, in our bail-or-jail example of Section~\ref{sec:introduction}, $\human_\judgeValue$ corresponds to the human judge deciding whether to grant bail ($\decision = 1$) or not ($\decision = 0$).
%
The decision is followed with a binary outcome $\outcome$, which is nominally referred to as {\it successful} ($\outcome = 1$) or {\it unsuccessful} ($\outcome = 0$).
%
An outcome can be {\it unsuccessful} ($\outcome = 0$) only if the decision that preceded it was positive ($\decision = 1$).
%
If the decision was not positive ($\decision = 0$), then the outcome is considered by default successful ($\outcome = 1$).
%
Back in our example, the decision of the judge is unsuccessful only if the judge grants bail ($\decision = 1$) but the defendant violates its terms ($\outcome = 0$).
%
Otherwise, if the decision of the judge was to keep the defendant in jail ($\decision = 0$), the outcome is by default successful ($\outcome = 1$) since there can be no bail violation.
%
%Moreover, we assume that decision maker \human is associated with a leniency level $\leniency$, which determines the fraction of cases for which they produce a positive decision, in expectation. 
%
%Formally, for leniency level $\leniency = r\in [0, 1]$, we have
%\begin{equation}
%	P(\decision = 1 | \leniency = \leniencyValue) = \sum_{\allFeatures} P(\decision = 1, \allFeatures~|~\leniency = \leniencyValue) = \leniencyValue .
%\end{equation}
%Antti I think this formula is mostly misleading

%
% This is useful when we model decision makers with different leniency levels or want to refer to the subjects each makes a decision for.

For each case a record $(\judgeValue, \obsFeaturesValue, \decisionValue, \outcomeValue)$ is produced that contains only observations on a subset $\obsFeatures\subseteq \allFeatures$ of the features of the case, the decision $\decision$ of the judge and the outcome $\outcome$ -- but leaves no trace for a subset $\unobservable = \allFeatures \setminus \obsFeatures$ of the features.
%
Intuitively, in our example, $\obsFeatures$ corresponds to publicly recorded information about the bail-or-jail case decided by the judge (e.g., the gender and age of the defendant) and $\unobservable$ corresponds to features that are observed by the judge but do not appear on record (e.g., whether the defendant appeared anxious in court).
%
The set of records $\dataset = \{(\judgeValue, \obsFeaturesValue, \decisionValue, \outcomeValue)\}$ %produced by decision maker \human
 comprises what we refer to as the {\bf dataset}.
% -- and the dataset generally includes records from \emph{more than one} decision makers, indexed by $\judgeValue$.
%
Figure~\ref{fig:causalmodel} shows the causal diagram of this decision making process.


Based on the recorded data, we wish to evaluate a decision maker \machine that considers a case from the dataset -- and makes its own binary decision $\decision$ based on recorded features $\obsFeatures$.
%, followed by a binary outcome $\outcome$.
%
In our example, \machine corresponds to a machine-based automated-decision system that is considered for replacing the human judge in bail-or-jail decisions.
%
% Notice that we assume \machine has access only to some of the features that were available to \human, to model cases where the system would use only the recorded features and not other ones that would be available to a human judge.
%
For decision maker \machine, the definition and semantics of decision $\decision$ and outcome $\outcome$ are the same as for decision makers \humanset, described above.


The quality of a decision maker $\machine$ is measured in terms of its {\bf failure rate} \failurerate -- i.e., the fraction of undesired outcomes ($\outcome=0$) out of all the cases for which a decision is made. 
%
A good decision maker achieves as low failure rate \failurerate  as possible.
%
Note, however, that a decision maker that always makes a negative decision $\decision=0$, has failure rate $\failurerate = 0$, by definition.
%
%To produce sensible evaluation of decision maker at varying leniency levels.
%Moreover, decision maker \machine is also associated with a leniency level $\leniency$, defined as before for \human.
%
Thus the evaluation to be meaningful, we evaluate decision makers at the different leniency levels $\leniency$.

%Ultimately, our goal is to obtain an estimate of the failure rate \failurerate for a decision maker \machine.
\begin{problem}[Evaluation]
Given a dataset $\{(\judgeValue, \obsFeaturesValue, \decisionValue, \outcomeValue)\}$, and a decision maker \machine, provide an estimate of the failure rate \failurerate at a given leniency level $R=r$.
\label{problem:the}
\end{problem}
\noindent

The main challenge in estimating \failurerate is that in general the dataset does not directly provide a way to evaluate \failurerate. 
%
In particular, let us consider the case where we wish to evaluate decision maker \machine\ -- and suppose that \machine is making a decision for the case corresponding to record 
%$(\judgeValue, \obsFeaturesValue, \decisionValue, \outcomeValue)$,
 based on the recorded features \obsFeaturesValue.
%
Suppose also that the decision in the data was negative, $\decision = 0$, in which case the outcome is always positive, $\outcome = 1$.
%
If the decision by \machine is $\decision = 1$, then it is not possible to tell directly from the dataset what its outcome $\outcome$ would be.
%
The approach we take to deal with this challenge is to use counterfactual reasoning to infer $\outcome$ if we had $\decision = 1$, as detailed in Section~\ref{sec:imputation} below.


\begin{figure}
%    \begin{tikzpicture}[->,>=stealth',node distance=1.5cm, semithick]
%
%  \tikzstyle{every state}=[fill=none,draw=black,text=black]
%
%  \node[state] (R)                    {$\judge$};
%  \node[state] (X) [right of=R] {$\obsFeatures$};
%  \node[state] (T) [below of=X] {$\decision$};
%  \node[state] (Z) [rectangle, right of=X] {$\unobservable$};
%  \node[state] (Y) [below of=Z] {$\outcome$};
%
%  \path (R) edge (T)
%        (X) edge (T)
%	     edge (Y)
%        (Z) edge (T)
%	     edge (Y)
%        (T) edge (Y);
%\end{tikzpicture}
%    \begin{tikzpicture}[->,>=stealth',node distance=1.5cm, semithick]
%
%  \tikzstyle{every state}=[fill=none,draw=black,text=black]
%
%  \node[state] (R)  at (1,1)                  {$\judge$};
%  \node[state] (X) at (3.5,1) {$\obsFeatures$};
%  \node[state] (T) at (2,0) {$\decision$};
%  \node[state] (Z) [rectangle] at (3.5,2) {$\unobservable$};
%  \node[state] (Y) at (5,0) {$\outcome$};
%
%  \path (R) edge (T)
%        (X) edge (T)
%	     edge (Y)
%        (Z) edge (T)
%	     edge (Y)
%        (T) edge (Y);
%\end{tikzpicture}
    \begin{tikzpicture}[->,>=stealth',node distance=1.5cm, semithick]

  \tikzstyle{every state}=[fill=none,draw=black,text=black]

  \node[state] (R) [ellipse] at (0.3,1.5)                  {\hspace*{-4mm}$\judge$: {\small Decision maker index }\hspace*{-4mm}};
  \node[state] (X) [ellipse] at (3.5,1) {\hspace*{-3mm}$\obsFeatures$: {\small Observed features}\hspace*{-3mm}};
  \node[state] (T) [ellipse] at (2,0) {\hspace*{-2mm}$\decision$: {\small Decision }\hspace*{-2mm}};
  \node[state] (Z) [rectangle] at (3.5,2) {$\unobservable$: {\small Unobserved features}};
  \node[state] (Y) [ellipse] at (5,0) {\hspace*{-2mm}$\outcome$: {\small Outcome}\hspace*{-2mm}};

  \path (R) edge (T)
        (X) edge (T)
	     edge (Y)
       % (Z) edge (T)
	     edge (Y)
        (T) edge (Y);
        
        \draw [->] (2.2,1.6) to (1.8,0.4);
         \draw [->] (4.8,1.6) to (5.2,0.4);
\end{tikzpicture}


\caption{The causal diagram of decision making in the selective labels setting. $\decision$ is a binary decision, $\outcome$ is the outcome that is selectively labeled based on $\decision$. Background features  $\obsFeatures$ for a subject affect the decision and the outcome. $\judge$ specifies the decision maker assignment, allowing us to model several decision makers with varying leniency. Importantly, decisions and outcomes may depend on additional latent background features  $\unobservable$ not recorded in the data.
%visible only to decision maker \human. 
}\label{fig:causalmodel}
\end{figure}


%Sometimes, we may have control over the leniency level of the decision maker we evaluate.
%
%In such cases, we would like to evaluate decision maker $\machine = \machine(\leniency = \leniencyValue)$ at various leniency levels $\leniency$.
%
%Ideally, the estimate returned by the evaluation should also be accurate for all levels of leniency.