@@ -51,28 +51,35 @@ For the evaluation to be accurate, we should take into account the bias in the o
%
% The theory of causal inference~\cite{pearl2010introduction} gives us the tools to address this task in a principled manner.
There is a rich literature on problems that arise in similar settings.
%
One can approach the challenges from different viewpoints, the data includes selection bias, missing data, latent confounding or missing data in terms of not observing outcomes of negative decisions, offline policy evaluation, sensitivity analysis.
%
At its core, our task is to answer a `what-if' question, asking ``what would the outcome have been if a different decision had been made'', this is often mentioned as the `fundamental problem' in causal inference~\cite{holland1986statistics, bookofwhy}.
%
In the literature, settings where a portion of the data is not observed due to some filtering mechanism are said to exhibit {\it selection bias} (see, for example, \citet{hernan2004structural}).
%
Research on selection bias has achieved results in recovery the structure of the generative model (i.e., the mechanism that results in bias) and estimating causal effects (e.g.,~\citet{pearl1995empirical} and~\citet{bareinboim2012controlling}).
%
Recently, \citet{lakkaraju2017selective} referred to the problem of evaluation is such settings as the '{\it selective labels problem}' \cite{dearteaga2018learning}.
Similar scenarios appear in the literature for example under offline policy assessment~\cite{Jung2}.
There is a rich literature on problems that arise in similar settings, our specific problem can be approach from different viewpoints.
%
%One can approach the challenges from different viewpoints, selection bias, missing data, , offline policy evaluation, and latent confounding sensitivity, analysis .
% COUNTERFACTUALS: FUNDAMENTAL PROBLEM
At its core, our task is to answer a `what-if' question, asking ``what would the outcome have been if a different decision had been made'' (a counterfactual), this is often mentioned as the `fundamental problem' in causal inference~\cite{holland1986statistics, bookofwhy}.
% SELECTION BIAS
Settings where data samples are chosen through some intricate filtering mechanism are said to exhibit {\it selection bias} (see, for example, \citet{hernan2004structural}).
%WHAT WE DO NOT HAVE THIS???
% MISSING DATA %IMPUTATION
Settings where some variables are not observed for all samples have missing data.
%Research on selection bias has achieved results in recovery the structure of the generative model (i.e., the mechanism that results in bias) and estimating causal effects (e.g.,~\citet{pearl1995empirical} and~\citet{bareinboim2012controlling}).
%OFFLINE POLICY EVALUATION
Offline policy assessment refers to evaluation of a decision policy over a dataset recorded under another policy~\cite{Jung2}.
%COUNFOUNDING AND SENSITIVITY ANALYSIS
%WE WANT TO CITE HERE ALL SELECTIVE LABELS PAPERS TO SELL THIS VIEWPOINT
Recently, \citet[KDD2017]{lakkaraju2017selective} referred to the problem of evaluation is such settings as the '{\it selective labels problem}' empahasizing the fact that outcomes in the data are selectively labeled
(see also \cite{dearteaga2018learning,kleinberg2018human}).
%
\citet{lakkaraju2017selective} also presented {\it contraction}, a method for evaluating decision making mechanisms in a setting where subjects are randomly assigned to decision makers with varying leniency levels.
%
The {\it contraction} technique takes advantage of the assumed random assignment and variance in leniency: essentially it measures the performance of the evaluated system using the cases of the most lenient judge.
%
We note, however, that for contraction to work, we need lenient decision makers making decisions on a large number of subjects.
%
In another recent paper, \citet{jung2018algorithmic} studied unobserved confounding in the context of creating optimal decision policies.
%We note, however, that for contraction to work, we need lenient decision makers making decisions on a large number of subjects.
% THIS BELONGS TO RELATED WORK
%In another recent paper, \citet{Jung2} studied unobserved confounding in the context of creating optimal decision policies.
%
They approached the problem with Bayesian modelling, but they don't consider the selective labels issue or the possibility that the decisions reflected in the data may be taken by more than one decision makers with differing levels of leniency.
%They approached the problem with Bayesian modelling, but they don't consider the selective labels issue or the possibility that the decisions reflected in the data may be taken by more than one decision makers with differing levels of leniency.
\spara{Our contributions}
In this paper, we build upon the problem setting used in~\citet{lakkaraju2017selective} and present a novel, modular framework to evaluate decision makers over selectively labeled data.
...
...
@@ -81,7 +88,7 @@ Our approach makes use of causal modeling to represent our assumptions about the
%
We experiment with synthetic data to highlight various properties of our approach.
%
We also perform an empirical evaluation in realistic settings, using real data from COMPAS~\cite{brennan2009evaluating}.
We also perform an empirical evaluation in realistic settings, using real recidivism data from COMPAS~\cite{angwin2016machine,brennan2009evaluating}.
%
The results indicate that our method achieves more accurate results with considerably less variation than the state-of-the-art, and unlike the contraction approach that is tailored to this setting~\cite{lakkaraju2017selective}, it does not depend on the existence of lenient decision makers in the data.