@@ -9,7 +9,7 @@ In this paper we considered the overall setting as formulated by~\citet{lakkaraj
%In addition to Lakkaraju et al.~\citet{lakkaraju2017selective} which we build upon, several papers consider related problems to ours.
Note that our setting allowing for unobserved confounding does not fulfill the ignorability or missing at random (MAR), preventing the use of methods assuming them~\cite{lakkaraju2017selective,DBLP:conf/icml/DudikLL11,bang2005doubly,little2019statistical}.
De-Arteaga et al. also note the possibility of using decision in the data to correct for selective labels, assuming expert consistency~\cite{dearteaga2018learning}. They directly impute decisions as outcomes and consider learning automatic decision makers from this augmented data. In contrast, our approach on decision maker evaluation is based on a rigorous probabilistic model accounting for different leniencies and unobservables. Furthermore, our approach gives accurate results even with random decision makers that clearly violate the expert consistency assumption. \citet{kleinberg2018human} present in detail discussion of employing contraction and a particular type of imputation.
De-Arteaga et al. also note the possibility of using decision in the data to correct for selective labels, assuming expert consistency~\cite{dearteaga2018learning}. They directly impute decisions as outcomes and consider learning automatic decision makers from this augmented data. In contrast, our approach on decision maker evaluation is based on a rigorous causal model accounting for different leniencies and unobservables. Furthermore, our approach gives accurate results even with random decision makers that clearly violate the expert consistency assumption. \citet{kleinberg2018human} present in detail discussion of employing contraction in a real data.% and a particular type of imputation.
In reinforcement learning a similar scenario is consider as offline policy evaluation, where the objective is to learn from
data recorded under some policy, the goodness of some other policies \cite{Jung2,DBLP:conf/icml/ThomasB16}. In particular, Jung et al. \cite{Jung2,jung2018algorithmic} consider sensitivity analysis in a similar scenario as ours, but without directly modelling judges with multiple leniencies.
...
...
@@ -25,7 +25,8 @@ To properly assess decision procedures for their performance and fairness we nee
More applied work includes works such as~\cite{murder,tolan2019why}.
\cite{madras2019fairness} learn fair and acccurate treatment policies from biased data \acomment{They cite Lakkaraju, De arteaga as conceptually similar work.}
\cite{madras2019fairness} learn fair and acccurate treatment policies from biased data.
% \acomment{They cite Lakkaraju, De arteaga as conceptually similar work.}
\cite{coston2020counterfactual} propose counterfactual measures for performance metrics with doubly robust estimation of these metrics. The first assumes absense of unobserved variables.