Skip to content
Snippets Groups Projects
setting.tex 6.08 KiB
Newer Older
  • Learn to ignore specific revisions
  • \section{Setting and problem statement}
    
    \begin{figure*}
    
    \includegraphics[height=2in]{img/setting}
    
    \caption{Setting. Negative decisions by decision maker $M$ ($T_{_M} = 0$) are evaluated as successful ($Y_{_M} = 1$) -- this is shown with the dashed arrows. For negative deisions by decision maker $H$ ($T_{_H} = 0$), we perform counterfactual imputation -- this is shown with the solid arrows.
    }
    \end{figure*}
    
    The setting we consider is described in terms of two decision processes.
    %
    
    In the first one, a decision maker $H$ considers a case described by a set of features $F$ and makes a binary decision $T = T_{_H} = \in\{0, 1\}$, nominally referred to as {\it positive} ($T = 1$) or not ($T = 0$).
    
    %
    Intuitively, in our bail-or-jail example of Section~\ref{sec:introduction}, $H$ corresponds to the human judge deciding whether to grant bail ($T = 1$) or not ($T = 0$).
    %
    
    The decision is followed with a binary outcome $Y = Y_{_H}$, which is nominally referred to as {\it successful} ($Y = 1$) or not ($Y = 0$).
    
    %
    An outcome can be {\it unsuccessful} ($Y = 0$) only if the decision that preceded it was positive ($T = 0$).
    %
    If the decision was not positive ($T = 0$), then the outcome is considered by default successful ($Y = 1$).
    %
    Back in our example, the decision of the judge is unsuccessful only if the judge grants bail ($T = 1$) but the defendant violates its terms ($Y = 0$).
    %
    Otherwise, if the decision of the judge was to keep the defendant in jail ($T = 0$), the outcome is by default successful ($Y = 1$) since there can be no violation.
    %
    Moreover, we assume that decision maker $H$ is associated with a leniency level $R$, which determines the fraction of cases for which they produce a positive decision, in expectation. 
    %
    Formally, for leniency level $R = r\in [0, 1]$, we have
    \begin{equation}
    	P(T = 1 | R = r) = \sum_{X, Z} P(T = 1, X, Z | R = r) = r .
    \end{equation}
    
    The product of this process is a record $(X, T, Y)$ that contains only a subset $X\subseteq F$ of the features of the case, the decision $T$ of the judge and the outcome $Y$ -- but leaves no trace for a subset $Z = F - X$ of the features.
    
    Intuitively, in our example, $X$ corresponds to publicly recorded information about the bail-or-jail case decided by the judge (e.g., the gender and age of the defendant) and $Z$ corresponds to features that are observed by the judge but do not appear on record (e.g., whether the defendant appeared anxious).
    
    The set of records $\{(H, X, T, Y)\}$ produced by decision maker $H$ becomes part of what we refer to as the {\bf dataset} -- and the dataset may include records from more than one decision makers.
    
    Figure~\ref{fig:model} shows the causal diagram that describes the operation of a single decision-maker $H$.
    
    In the second decision process, a decision maker $M$ considers a case from the dataset, described by the set of recorded features $X$ and makes its own binary decision $T = T_{_M}$ based on those features, followed by a binary outcome $Y = Y_{_M}$.
    
    %
    In our example, $M$ corresponds to an automated-decision system that is considered for replacing the human judge in bail-or-jail decisions.
    %
    
    % Notice that we assume $M$ has access only to some of the features that were available to $H$, to model cases where the system would use only the recorded features and not other ones that would be available to a human judge.
    
    %
    The definitions and semantics of decision $T$ and outcome $Y$ follow those of the first process and are not repeated.
    %
    Moreover, decision maker $M$ is also associated with a leniency level $R$, defined as before for $H$.
    
    %
    Figure~\ref{fig:machine_model} shows the causal diagram that describes the operation of decision-maker $H$.
    
    \todo{Michael}{Show diagram for machine decider, also. The setting can be summarized in one figure.}
    \note{Michael}{I changed the notation and now refer to the two decision makers as $H$ and $M$, for "human" and system", respectively.}
    
    
    \subsection{Evaluating Decision Makers}
    
    
    The goodness of a decision maker is measured in terms of its failure rate {\bf FR} -- i.e., the fraction of undesired outcomes ($Y=0$) out of all the cases for which a decision is made. 
    %
    A good decision maker achieves as low failure rate FR  as possible.
    %
    Note, however, that a decision maker that always makes a negative decision $T=0$, has failure rate $FR = 0$, by definition.
    %
    For comparisons to be meaningful, we compare decision makers at the same leniency level $R$.
    
    
    The main challenge is estimating FR, however, is that in general the dataset does not directly provide a way to evaluate FR. 
    %
    
    In particular, let us consider the case where we wish to evaluate decision maker $M$ -- and suppose that $M$ is making a decision $T_{_M}$ for the case corresponding to record $(H, X, T_{_H}, Y_{_H})$.
    
    %
    Suppose also that the decision by $H$ was $T_{_H} = 0$, in which case the outcome is always positive, $Y_{_H} = 1$.
    %
    
    If the decision by $M$ is $T_{_M} = 1$, then it is not possible to tell directly from the dataset what the outcome $Y_{_M}$ would have been in the hypothetical case where decision maker's $M$'s decision had been followed in the first place.
    
    The approach we take to deal with this challenge is to use counterfactual reasoning to infer $Y_{_M}$.
    
    Ultimately, our goal is to obtain an estimate of the failure rate FR for a decision maker $M = M(r)$ that is associated with a given leniency level $R = r$:
    
    \begin{problem}[Evaluation]
    
    Given a dataset $\{(H, X, T, Y)\}$, and a decision maker $M$, provide an estimate of the failure rate FR.
    
    \end{problem}
    \noindent
    
    \mcomment{I think that leniency does not need to be part of the problem formulation, since imputation allows us to evaluate a decision maker even if we do not know its leniency level.}
    
    Sometimes, we may have control over the leniency level of the decision maker we evaluate.
    %
    In such cases, we would like to evaluate decision maker $M$ at various leniency levels.
    
    Ideally, the estimate returned by the evaluation should also be accurate for all levels of leniency.
    
    \todo{Michael}{Create and use macros for all main terms and mathematical quantities, so that they stay consistent throughout the paper.}