intro.

b5eeb755 · Antti Hyttinen · 9025ae5c · b5eeb755
Commit b5eeb755 authored 5 years ago by Antti Hyttinen
--- a/paper/sl.tex
+++ b/paper/sl.tex
@@ -44,6 +44,8 @@

 %\newcommand{\ourtitle}{Working title: From would-have-beens to should-have-beens: Counterfactuals in model evaluation}

+\newtheorem{problem}{Problem}
+
 \newcommand{\ourtitle}{Evaluating Decision Makers over Selectively Labeled Data}

 \input{macros}
@@ -80,9 +82,13 @@ deciding whether of defendant should be given bail or kept in jail, we are not a


 \section{Introduction} 
-\acomment{'Decision maker' sounds and looks much better than 'decider'! Can we use that?}
+%\acomment{'Decision maker' sounds and looks much better than 'decider'! Can we use that?}
+
+%\acomment{We should be careful with the word bias and unbiased, they may refer to statistical bias of estimator, some bias in the decision maker based on e.g. race, and finally selection bias.}

-\acomment{We should be careful with the word bias and unbiased, they may refer to statistical bias of estimator, some bias in the decision maker based on e.g. race, and finally selection bias.}
+ Nowadays, a lot of decisions are being made which affect the course of human lives are being automated. In addition to lower cost, computational models could enhance the decision-making process in accuracy and fairness.
+%		\item 
+The advantage of using models does not necessarily lie in pure performance, that a machine can make more decisions, but rather in that a machine can give bounds for uncertainty and can learn from a vast set of information and that with care, a machine can be made as unbiased as possible.

 \begin{itemize}
 \item What we study
@@ -91,10 +97,8 @@ deciding whether of defendant should be given bail or kept in jail, we are not a
 	\end{itemize}
 \item Motivation for the study
 	\begin{itemize}
-		\item Lot of decisions are being made which affect the course of human lives
-		\item Computational models could enhance the decision-making process in accuracy and fairness.
-		\item The advantage of using models does not necessarily lie in pure performance, that a machine can make more decisions, but rather in that a machine can give bounds for uncertainty and can learn from a vast set of information and that with care, a machine can be made as unbiased as possible.
-		\item Fairness has been discussed in the existing literature and numerous publications are available for interested readers. Our emphasis on this paper is on pure performance, getting the predictions accurate.
+
+%		\item %Fairness has been discussed in the existing literature and numerous publications are available for interested readers. Our emphasis on this paper is on pure performance, getting the predictions accurate.
 		\item Before deploying any algorithms, they should be evaluated to show that they actually improve on human decision-making.
 		\item Evaluating algorithms in conventional settings is trivial, when (almost) all of the labels are available, numerous metrics have been proposed and are in use in multiple fields.
 	\end{itemize}
@@ -144,23 +148,34 @@ Outcome $Y=0$ then marks that a defendant offended and $Y=1$ the defendant did n

 \subsection{Decision Makers}

-A decision maker $D$ makes the decision $T$ based on the characteristics of the subject. A decision maker may be human or a machine learning system. They seek to predict outcome $Y$ based on what they know and then decide $T$ based on this prediction: a negative decision $T=0$ is prefered for subjects predicted to have negative outcome $Y=0$ and a positive decision $T=1$ when the outcome is predicted as positive $Y=1$. We especially consider machine learning system that need to use similar data as used for the evaluation; they also need to take into account the selective labels issue.
+A decision maker $D(r)$ makes the decision $T$ based on the characteristics of the subject. We assume the decision maker gets input leniency $r$, which defines what percentage of subjects the decision maker makes a positive decision for. A decision maker may be human or a machine learning system. They seek to predict outcome $Y$ based on what they know and then decide $T$ based on this prediction: a negative decision $T=0$ is prefered for subjects predicted to have negative outcome $Y=0$ and a positive decision $T=1$ when the outcome is predicted as positive $Y=1$.  
+
+
+% We especially consider machine learning system that need to use similar data as used for the evaluation; they also need to take into account the selective labels issue.

-In the bail or jail example, a decision maker seeks to jail $T=0$ all dangerous defendants that would violate their bail ($Y=0$), but let out the defendants that will not violate their bail.
+In the bail or jail example, a decision maker seeks to jail $T=0$ all dangerous defendants that would violate their bail ($Y=0$), but let out the defendants that will not violate their bail. The leniency $r$ refers to the portion of bail decisions.


 \subsection{Evaluating Decision Makers}

-The goodness of a decision maker can be examined as follows. Acceptance rate (AR) is the number of positive decisions ($T=1$) divided by the number of all decisions. Failure rate (FR) is the number of undesired outcomes ($Y=0$) divided by the number of all decisions. 
+The goodness of a decision maker can be examined as follows. 
+%Acceptance rate (AR) is the number of positive decisions ($T=1$) divided by the number of all decisions. 
+%DO WE NEED ACCEPTANCE RATE ANY MORE 
+Failure rate (FR) is the number of undesired outcomes ($Y=0$) divided by the number of all decisions. 
 % One special characteristic of FR in this setting is that a failure can only occur with a positive decision ($T=1$).
 %That means that a failure rate of zero can be achieved just by not giving any positive decisions but that is not the ultimate goal.
-A good decision makes a large number of positive decisions with low failure rate. 
+A good decision maker achieves as low failure rate FR  as possible, for any leniency level.

 However, the data we have does not directly provide a way to evaluate FR. If a decision maker decides $T=1$ for a subject that had $T=0$ in the data, the outcome $Y$ recorded in the data is based on the decision $T=0$ and hence $Y=1$ regardless of the decision taken by $D$. The number of negative outcomes $Y=0$ for these decision needs to be calculated in some non-trivial way.

 In the example situation the difficulty is occurs when a decision maker decides to bail $T=0$ a defendant that has been jailed in the data, we cannot directly observe whether the defendant was about to offend or not.

-Therefore, the aim is here to give an estimate of the FR at any given AR for any decision maker $D$. This estimate is vital in the employment machine learning and AI systems to every day use.
+Therefore, the aim is here to give an estimate of the FR at any given AR for any decision maker $D$, formalized as follows:
+\begin{problem}
+Given selectively labeled data, and a decision maker $D(r)$, give an estimate of the failure rate FR for any leniency $r$.
+\end{problem}
+\noindent
+This estimate is vital in the employment machine learning and AI systems to every day use.