Make the judge identity part of the dataset

b686be12 · Michael Mathioudakis · 42dd6c78 · b686be12
Commit b686be12 authored 5 years ago by Michael Mathioudakis
--- a/paper/setting.tex
+++ b/paper/setting.tex
@@ -29,9 +29,9 @@ The product of this process is a record $(X, T, Y)$ that contains only a subset
 %
 Intuitively, in our example, $X$ corresponds to publicly recorded information about the bail-or-jail case decided by the judge (e.g., the gender and age of the defendant) and $Z$ corresponds to features that are observed by the judge but do not appear on record (e.g., whether the defendant appeared anxious).
 %
-The set of records $\{(X, T, Y)\}$ priduced by decision maker $H$ constitute what we refer to as the {\bf dataset}.
+The set of records $\{(H, X, T, Y)\}$ produced by decision maker $H$ becomes part of what we refer to as the {\bf dataset} -- and the dataset may include records from more than one decision makers.
 %
-Figure~\ref{fig:model} shows the causal diagram that describes the operation of decision-maker $H$.
+Figure~\ref{fig:model} shows the causal diagram that describes the operation of a single decision-maker $H$.



@@ -67,7 +67,7 @@ For comparisons to be meaningful, we compare decision makers at the same lenienc

 The main challenge is estimating FR, however, is that in general the dataset does not directly provide a way to evaluate FR. 
 %
-In particular, let us consider the case where we wish to evaluate decision maker $M$ -- and suppose that $M$ is making a decision $T_{_M}$ for the case corresponding to record $(X, T_{_H}, Y_{_H})$.
+In particular, let us consider the case where we wish to evaluate decision maker $M$ -- and suppose that $M$ is making a decision $T_{_M}$ for the case corresponding to record $(H, X, T_{_H}, Y_{_H})$.
 %
 Suppose also that the decision by $H$ was $T_{_H} = 0$, in which case the outcome is always positive, $Y_{_H} = 1$.
 %
@@ -77,9 +77,12 @@ The approach we take to deal with this challenge is to use counterfactual reason

 Ultimately, our goal is to obtain an estimate of the failure rate FR for a decision maker $M = M(r)$ that is associated with a given leniency level $R = r$:
 \begin{problem}[Evaluation]
-Given a dataset $\{(X, T, Y)\}$, and a decision maker $M(r)$ with leniency $R = r$, provide an estimate of the failure rate FR.
+Given a dataset $\{(H, X, T, Y)\}$, and a decision maker $M$, provide an estimate of the failure rate FR.
 \end{problem}
 \noindent
+\mcomment{I think that leniency does not need to be part of the problem formulation, since imputation allows us to evaluate a decision maker even if we do not know its leniency level.}
+Typically, we would like to evaluate decision maker $M$ at various leniency levels.
+%
 Ideally, the estimate returned by the evaluation should also be accurate for all levels of leniency.

 \todo{Michael}{Create and use macros for all main terms and mathematical quantities, so that they stay consistent throughout the paper.}