related.tex

\section{Related work}
\label{sec:related}

\acomment{Very preliminary.}

%
Although contraction is computationally very simple/efficient and estimates the true failure rate well, it has some limitations.
%
First and perhaps most significant limitation of the algorithm is that it can only be applied in situations where there is data available from multiple decision makers.
%
Secondly, the performance of contraction depends on three quantities: the leniency of the most lenient decision maker, the number of decisions given by the most lenient decision maker and agreement rate. 
%
Agreement rate describes the rate of agreement between decision maker \machine and \human, the higher the agremeent rate, the better contraction performs.
%
Thirdly, contraction is only suitable for evaluating binary outcomes, whereas the counterfactual approach is readily extendable to accommodate real-valued outcomes.
%
In addition contraction can only estimate the true failure rate only up to the leniency of the most lenient decision maker, so if in some high stakes application areas, the greatest acceptance rate is only 50\%, the performance of a machine can only be evaluated up to leniency $0.5$.

\note{Riku}{How detailed must/should the description of contraction be? Limitation list is also quite long...}


\note{Michael}{It is not clear to me if we'll have a separate section and where.
I think the safest option is to place a Related Work section immediately after the introduction.
The disadvantage of that is that it may delay the presentation of the main contributions.
On the other hand, we should make sure that competing methods like \citet{lakkaraju2017selective} are sufficiently described before the appear in experiments.
}

Discuss this: \cite{DBLP:conf/icml/Kusner0LS19}

\begin{itemize}
\item Lakkaraju and contraction. \cite{lakkaraju2017selective}
	\item Contraction
		\begin{itemize}
		\item Algorithm by Lakkaraju et al. Assumes that the subjects are assigned to the judges at random and requires that the judges differ in leniency. 
		\item Can estimate the true failure only up to the leniency of the most lenient decision-maker.
		\item Performance is affected by the number of people judged by the most lenient decision-maker, the agreement rate and the leniency of the most lenient decision-maker. (Performance is guaranteed / better when ...)
		\item Works only on binary outcomes
		\item (We show that our method isn't constrained by any of these)
		\item The algorithm goes as follows...
%\begin{algorithm}[] 			% enter the algorithm environment
%\caption{Contraction algorithm \cite{lakkaraju17}} 		% give the algorithm a caption
%\label{alg:contraction} 			% and a label for \ref{} commands later in the document
%\begin{algorithmic}[1] 		% enter the algorithmic environment
%\REQUIRE Labeled test data $\D$ with probabilities $\s$ and \emph{missing outcome labels} for observations with $T=0$, acceptance rate r
%\ENSURE
%\STATE Let $q$ be the decision-maker with highest acceptance rate in $\D$.
%\STATE $\D_q = \{(x, j, t, y) \in \D|j=q\}$
%\STATE \hskip3.0em $\rhd$ $\D_q$ is the set of all observations judged by $q$
%\STATE
%\STATE $\RR_q = \{(x, j, t, y) \in \D_q|t=1\}$
%\STATE \hskip3.0em $\rhd$ $\RR_q$ is the set of observations in $\D_q$ with observed outcome labels
%\STATE
%\STATE Sort observations in $\RR_q$ in descending order of confidence scores $\s$ and assign to $\RR_q^{sort}$.
%\STATE \hskip3.0em $\rhd$ Observations deemed as high risk by the black-box model $\mathcal{B}$ are at the top of this list
%\STATE
%\STATE Remove the top $[(1.0-r)|\D_q |]-[|\D_q |-|\RR_q |]$ observations of $\RR_q^{sort}$ and call this list $\mathcal{R_B}$
%\STATE \hskip3.0em $\rhd$ $\mathcal{R_B}$ is the list of observations assigned to $t = 1$ by $\mathcal{B}$
%\STATE
%\STATE Compute $\mathbf{u}=\sum_{i=1}^{|\mathcal{R_B}|} \dfrac{\delta\{y_i=0\}}{| \D_q |}$.
%\RETURN $\mathbf{u}$
%\end{algorithmic}
%\end{algorithm}


		\end{itemize}
\item Counterfactuals/Potential outcomes. \cite{pearl2010introduction} (also Rubin)
\item Approach of Jung et al for optimal policy construction. \cite{jung2018algorithmic}
\item Discussions of latent confounders in multiple contexts. (eg. McCandless)
\item Imputation methods and other approaches to selective labels, eg. \cite{dearteaga2018learning}
\item Doubly robust methods, propensity score and other matching techniques
\end{itemize}