related.tex

%!TEX root = sl.tex
% The above command helps compiling in TexShop on a MAc. Hitting typeset complies sl.tex directly instead of producing an error here.

\section{Related work}
\label{sec:related}


De-Arteaga et al. also note the possibility of using decision in the data to correct for selective labels assuming expert consistency~\cite{dearteaga2018learning}. They directly impute decisions as outcomes and consider learning automatic decision makers. In contrast, our approach on decision maker evaluation is based on a rigorous probabilistic model accounting for different leniencies and unobservables. Furthermore, our approach gives accurate results even with random decision makers that clearly violate the expert consistency assumption. \acomment{We should refer to Deartega somewhere early on, they have made the same discovery as we put presented it poorly.}

\subsection{Counterfactuals}

Recent research has shown the value of counterfactual reasoning in similar setting as this paper, for  fairness of decision making, and applications in online advertising~\cite{DBLP:journals/jmlr/BottouPCCCPRSS13,DBLP:conf/icml/Kusner0LS19,DBLP:conf/icml/NabiMS19,DBLP:conf/icml/JohanssonSS16,pearl2000}. 


Mccandless perform Bayesian sensitivity analysis on the priors for a similar model as here employing logistics regression~\cite{mccandless2007bayesian}.

\subsection{Imputation}


\subsection{Older}


%
Although contraction is computationally very simple/efficient and estimates the true failure rate well, it has some limitations.
%
First and perhaps most significant limitation of the algorithm is that it can only be applied in situations where there is data available from multiple decision makers.
%
Secondly, the performance of contraction depends on three quantities: the leniency of the most lenient decision maker, the number of decisions given by the most lenient decision maker and agreement rate. 
%
Agreement rate describes the rate of agreement between decision maker \machine and \human, the higher the agremeent rate, the better contraction performs.
%
Thirdly, contraction is only suitable for evaluating binary outcomes, whereas the counterfactual approach is readily extendable to accommodate real-valued outcomes.
%
In addition contraction can only estimate the true failure rate only up to the leniency of the most lenient decision maker, so if in some high stakes application areas, the greatest acceptance rate is only 50\%, the performance of a machine can only be evaluated up to leniency $0.5$.

\note{Riku}{How detailed must/should the description of contraction be? Limitation list is also quite long...}


\note{Michael}{It is not clear to me if we'll have a separate section and where.
I think the safest option is to place a Related Work section immediately after the introduction.
The disadvantage of that is that it may delay the presentation of the main contributions.
On the other hand, we should make sure that competing methods like \citet{lakkaraju2017selective} are sufficiently described before the appear in experiments.
}

Discuss this: 

\begin{itemize}
\item Lakkaraju and contraction. \cite{lakkaraju2017selective}
	\item Contraction
		\begin{itemize}
		\item Algorithm by Lakkaraju et al. Assumes that the subjects are assigned to the judges at random and requires that the judges differ in leniency. 
		\item Can estimate the true failure only up to the leniency of the most lenient decision-maker.
		\item Performance is affected by the number of people judged by the most lenient decision-maker, the agreement rate and the leniency of the most lenient decision-maker. (Performance is guaranteed / better when ...)
		\item Works only on binary outcomes
		\item (We show that our method isn't constrained by any of these)
		\item The algorithm goes as follows...
%\begin{algorithm}[] 			% enter the algorithm environment
%\caption{Contraction algorithm \cite{lakkaraju17}} 		% give the algorithm a caption
%\label{alg:contraction} 			% and a label for \ref{} commands later in the document
%\begin{algorithmic}[1] 		% enter the algorithmic environment
%\REQUIRE Labeled test data $\D$ with probabilities $\s$ and \emph{missing outcome labels} for observations with $T=0$, acceptance rate r
%\ENSURE
%\STATE Let $q$ be the decision-maker with highest acceptance rate in $\D$.
%\STATE $\D_q = \{(x, j, t, y) \in \D|j=q\}$
%\STATE \hskip3.0em $\rhd$ $\D_q$ is the set of all observations judged by $q$
%\STATE
%\STATE $\RR_q = \{(x, j, t, y) \in \D_q|t=1\}$
%\STATE \hskip3.0em $\rhd$ $\RR_q$ is the set of observations in $\D_q$ with observed outcome labels
%\STATE
%\STATE Sort observations in $\RR_q$ in descending order of confidence scores $\s$ and assign to $\RR_q^{sort}$.
%\STATE \hskip3.0em $\rhd$ Observations deemed as high risk by the black-box model $\mathcal{B}$ are at the top of this list
%\STATE
%\STATE Remove the top $[(1.0-r)|\D_q |]-[|\D_q |-|\RR_q |]$ observations of $\RR_q^{sort}$ and call this list $\mathcal{R_B}$
%\STATE \hskip3.0em $\rhd$ $\mathcal{R_B}$ is the list of observations assigned to $t = 1$ by $\mathcal{B}$
%\STATE
%\STATE Compute $\mathbf{u}=\sum_{i=1}^{|\mathcal{R_B}|} \dfrac{\delta\{y_i=0\}}{| \D_q |}$.
%\RETURN $\mathbf{u}$
%\end{algorithmic}
%\end{algorithm}


		\end{itemize}
\item Counterfactuals/Potential outcomes. \cite{pearl2010introduction} (also Rubin)
\item Approach of Jung et al for optimal policy construction. \cite{jung2018algorithmic}
	\begin{itemize}
	\item Task: They study unobserved confounding in the context of creating optimal decision policies. (Mentioned in the intro)
	\item Contributions: (1) a Bayesian model to evaluate decision algorithms in the presence of unmeasured confounding. (2) they show policy evaluation problem they consider is a generalization of estimating heterogeneous treatment effects in observational studies. (3) they  show that one can construct near-optimal decision algorithms even if there is unmeasured confounding.
	\item In contrast: They consider a 'bail or no-bail' scenario and construct a trade-off curve of proportion of defandants failing to appear in their trial vs. proportion released without bail.
	\item They approached the problem with Bayesian modelling, but they don't consider the selective labels issue where decisions can deterministically define the outcome. (in intro)
	\item Additionally they don't consider the effect of having multiple decision-makers with differing levels of leniency. (in intro)
	\end{itemize}
\item Discussions of latent confounders in multiple contexts.
%\item Classical Bayesian sensitivity analysis of \citet{mccandless2007bayesian}
%	\begin{itemize}
%	\item Task: Bayesian sensitivity analysis of the effect of an unmeasured binary confounder on a binary response with a binary exposure variable and other masured confounders.
%	\item Experiments: The writers consider the effect of different priors on the coefficient estimates logistic regression in a beta blocker therapy study.
%	\item The authors carry out a more classical analysis of the effect of priors on the estimates. There are similarities, but there are also a lot of differences, most notably lack of selective labeling and a different model structure where the observed independent variables affect both the unobserved confounder and the result. In their model the unobserved only affects the outcome.
%	\end{itemize}
\item Imputation methods and other approaches to selective labels
%\item Data augmentation approach by \citet{dearteaga2018learning}
%	\begin{itemize}
%	\item Task: Training predictive models to perform better under selective labeling utilizing the homogeneity of human decision makers. They base their approach on the notion that if decision makers consistently make a negative decision to some subjects they must be dangerous.
%	\item Contributions: They propose a method for augmenting the selectively labeled data with observations that have a selection probability under some threshold $\epsilon$. I.e. For observations with $\decision=0$, predict $\prob{\decision~|~\obsFeatures}$, augment data so that $\outcome = 0$ when $\prob{\decision~|~\obsFeatures} < \epsilon$ instead of having missing values.
%	\item In contrast: The writers assume no unobservable confounders affecting the outcome and focus only on the similarity of the assigned decisions given the features. Writers do not address the issue of leniency in their analysis.
%	\end{itemize}
\item Doubly robust methods, propensity score and other matching techniques
\end{itemize}