intro.

d0499dff · Antti Hyttinen · b5eeb755 · d0499dff
Commit d0499dff authored 5 years ago by Antti Hyttinen
--- a/paper/sl.tex
+++ b/paper/sl.tex
@@ -86,10 +86,23 @@ deciding whether of defendant should be given bail or kept in jail, we are not a

 %\acomment{We should be careful with the word bias and unbiased, they may refer to statistical bias of estimator, some bias in the decision maker based on e.g. race, and finally selection bias.}

- Nowadays, a lot of decisions are being made which affect the course of human lives are being automated. In addition to lower cost, computational models could enhance the decision-making process in accuracy and fairness.
-%		\item 
+ Nowadays, a lot of decisions are being made which affect the course of human lives are being automated. In addition to lower cost, computational models could enhance the decision-making process in accuracy and fairness. 
 The advantage of using models does not necessarily lie in pure performance, that a machine can make more decisions, but rather in that a machine can give bounds for uncertainty and can learn from a vast set of information and that with care, a machine can be made as unbiased as possible.

+
+However, before deploying any decision making algorithms, they should be evaluated to show that they actually improve on previous, often human, decision-making. This evaluation far from trivial.
+%Although, evaluating algorithms in conventional settings is trivial, when (almost) all of the labels are available, numerous metrics have been proposed and are in use in multiple fields.
+Specifically, `Selective labels' settings arise in situations where data are the product of a decision mechanism that prevents us from observing outcomes for part of the data \cite{lakkaraju2017selective}. As a typical example, consider bail-or-jail decisions in judicial settings: a judge decides whether to grant bail to a defendant based on whether the defendant is considered likely to violate bail conditions while awaiting trial -- and therefore a violation might occur only in case bail is granted. Naturally similar scenarios are observed throughout many applications from ecnomy or medicine.
+%For example, given a data set of bail violations and bail/jail decision according some background factors, there will never be bail violations on those subjects kept in jail by the current decision making mechanism, hence the evaluation of a decision bailing such subjects is left undefined.
+
+Such settings give rise to questions about the effect of alternative decision mechanisms  -- e.g., `how many defendants would violate bail conditions if more bail decisions were granted?'.  In other words, one faces the challenge to estimate the performance of an alternative, potentially automated, decision policy that might make different decisions than the ones found in the existing data.
+
+
+%In settings like judicial bail decisions, some outcomes cannot be observed due to the nature of the decisions. 
+
+This can be seen as a complicated missing data problem where the missingness of an item is connected with its outcome and where the available labels aren't a random sample of the true population. Lakkaraju et al. recently name this  the selective labels problem \cite{lakkaraju2017selective}. Pearl calls missing the data under an alternative decision the 'fundamental problem' of causal inference.
+
+
 \begin{itemize}
 \item What we study
 	\begin{itemize}
@@ -99,20 +112,19 @@ The advantage of using models does not necessarily lie in pure performance, that
 	\begin{itemize}

 %		\item %Fairness has been discussed in the existing literature and numerous publications are available for interested readers. Our emphasis on this paper is on pure performance, getting the predictions accurate.
-		\item Before deploying any algorithms, they should be evaluated to show that they actually improve on human decision-making.
-		\item Evaluating algorithms in conventional settings is trivial, when (almost) all of the labels are available, numerous metrics have been proposed and are in use in multiple fields.
-	\end{itemize}
-\item Present the setting and challenge:
-	\begin{itemize}
-		\item Specifically, `Selective labels' settings arise in situations where data are the product of a decision mechanism that prevents us from observing outcomes for part of the data.
-		\item A typical example is that of bail-or-jail decisions in judicial settings: a judge decides whether to grant bail to a defendant based on whether the defendant is considered likely to violate bail conditions while awaiting trial -- and therefore a violation might occur only in case bail is granted. Naturally similar scenarios are observed throughout many walks of life from banking to medicine.
-		\item Such settings give rise to questions about the effect of alternative decision mechanisms  -- e.g., `how many defendants would violate bail conditions if more bail decisions were granted?'.
-		\item In other words, one faces the challenge to estimate the performance of an alternative, potentially automated, decision policy that might make different decisions than the ones found in the existing data.
-		\item Characteristically, in many of the settings the decisions hiding the outcomes are made by different deciders
-		\item Labels are missing non-randomly, decisions might be made by different deciders who differ in leniency.
-		\item So this might lead to situation where subjects with same characteristics may be given different decisions due to the differing leniency.
-		\item Of course the differing decisions might be attributable to some unobserved information that the decision-maker might have had available ude to meeting with the subject.
-		\item The explainability of black-box models has been discussed in X. We don't discuss fairness.
+%		\item 
+%		\item %
+%	\end{itemize}
+%\item Present the setting and challenge:
+%	\begin{itemize}
+%		\item 
+%
+	%	\item 
+%		\item %Characteristically, in many of the settings the decisions hiding the outcomes are made by different deciders
+%		\item Labels are missing non-randomly, decisions might be made by different deciders who differ in leniency.
+%		\item So this might lead to situation where subjects with same characteristics may be given different decisions due to the differing leniency.
+%		\item Of course the differing decisions might be attributable to some unobserved information that the decision-maker might have had available ude to meeting with the subject.
+%		\item %The explainability of black-box models has been discussed in X. We don't discuss fairness.
 		\item In settings like judicial bail decisions, some outcomes cannot be observed due to the nature of the decisions. This results in a complicated missing data problem where the missingness of an item is connected with its outcome and where the available labels aren't a random sample of the true population. Recently this problem has been named the selective labels problem.
 	\end{itemize}
 \item Related work