From 8fa4e1a91adab38387408e9b3a7f9648c92e6aa8 Mon Sep 17 00:00:00 2001
From: Riku-Laine <28960190+Riku-Laine@users.noreply.github.com>
Date: Mon, 5 Aug 2019 09:34:14 +0300
Subject: [PATCH] Outline to latex and some writing

---
 analysis_and_scripts/notes.tex |  2 +-
 paper/sl.tex                   | 12 ++++++++----
 2 files changed, 9 insertions(+), 5 deletions(-)

diff --git a/analysis_and_scripts/notes.tex b/analysis_and_scripts/notes.tex
index d65ed4e..317c327 100644
--- a/analysis_and_scripts/notes.tex
+++ b/analysis_and_scripts/notes.tex
@@ -180,7 +180,7 @@ The decider module in the data step has unobservable information available for m
 
 \subsection{Modelling step} Once the data is available, it is given to another decider in the modelling step. In general, a decider can give a ranking, a binary decision or a real-valued prediction as an output. We used conventional predictive models from the machine learning literature to output real-valued predictions in the interval [0, 1] for the probability of a negative outcome for a defendant. We split the data set randomly to two equal halves and trained our model in the other halve and created predictions using the trained model for the observations in the other halve. Later we might refer to the outputs from the decider in the modelling step as values from $\B$. (?)
 
-\subsection{Evaluation step} The purpose of the evaluation step is to output a reliable estimate of a decider module's performance. The estimate is created by the evaluator module and it should be precise and unbiased and it should have a low variance. The output of the decider module should also be as robust as possible to slight changes in the data generation step. The estimate of the evaluator should also be accurate for all levels of leniency of the deciders.
+\subsection{Evaluation step} The purpose of the evaluation step is to output a reliable estimate of a decider module's performance. The estimate is created by the evaluator module and it should be precise and unbiased and it should have a low variance. The output of the evaluator module should also be as robust as possible to slight changes in the data generation step. The estimate of the evaluator should also be accurate for all levels of leniency of the deciders.
 
 \textbf{Example} In the judicial setting, the Nature is our data generating module. Nature defines the characteristics of each defendant. The deciders in the data step are the judges deciding who to bail and who to put behind bars. In the modelling step, we try to predict the defendants' probability for violating the bail. Judge has unwritten information available concerning the defendant seeing their behaviour in court. We can fully define the decider used in the modelling step, for example it can be a neural net or a regression model. Finally we want to evaluate the performance of the human judges and the decider in the modelling step to see if using the machine would improve on human decision-making. That is, the evaluator module should output a reliable estimate of the rate of bail violations with a given rate of bail decisions.
 
diff --git a/paper/sl.tex b/paper/sl.tex
index dbb3637..b365c1b 100755
--- a/paper/sl.tex
+++ b/paper/sl.tex
@@ -78,18 +78,19 @@
 	\begin{itemize}
 		\item Lot of decisions are being made which affects the course of human lives
 		\item Using computational models could enhance the decision-making process in terms of accuracy and fairness.
-		\item The advantage of using models does not necessarily lie in pure performance, that a machine can make multiple decisions, but rather in that a machine can learn from a vast set of information and that with care, a machine can be made a unbiased as possible.
-		\item The explainability of black-box models has been discussed in X
+		\item The advantage of using models does not necessarily lie in pure performance, that a machine can make multiple decisions, but rather in that a machine can learn from a vast set of information and that with care, a machine can be made as unbiased as possible.
+		\item The explainability of black-box models has been discussed in X.
+		\item We don't discuss fairness.
 		\item Selective labeling is an issue in multiple fields where machine learning algorithms could be deployed. (Loans, medicine, justice, insurance, ...)
 		\item Before deploying any algorithms, they should be audited to show that they actually improve on human decision-making.
 		\item Auditing algorithms in conventional settings is trivial, (almost) all of the labels available, numerous metrics have been proposed and are in use in multiple fields.
 	\end{itemize}
 \item Present the setting and challenge:
 	\begin{itemize}
-		\item `Selective labels' settings arise in situations where data are the product of a decision mechanism that prevents us from observing certain variables for part of the data.
+		\item `Selective labels' settings arise in situations where data are the product of a decision mechanism that prevents us from observing outcomes for part of the data.
 		\item A typical example is that of bail-or-jail decisions in judicial settings: a judge decides whether to grant bail to a defendant based on whether the defendant is considered likely to violate bail conditions while awaiting trial -- and therefore a violation might occur only in case bail is granted.
 		\item Such settings give rise to questions about the effect of alternative decision mechanisms  -- e.g., `how many defendants would violate bail conditions if more bail decisions were granted?'.
-		\item In other words, one faces the challenge to estimate the performance of an alternative, potentially automated, decision policy that might make different decisions than the one found in the judicial data.
+		\item In other words, one faces the challenge to estimate the performance of an alternative, potentially automated, decision policy that might make different decisions than the one found in the existing data.
 		\item Missing labels and decisions made by different deciders
 		\item Labels are missing non-randomly, decisions might be made by different deciders who differ in leniency.
 		\item (Note: our approach doesn't require multiple deciders)
@@ -98,6 +99,7 @@
 \item Related work
 	\begin{itemize}
 		\item Lakkaraju presented contraction which performed well compared to other methods previously presented in the literature. We wanted to benchmark our approach to that and show that we can improve on their algorithm in terms of restrictions and accuracy. 
+		\item Restrictions = our method doesn't have so many assumptions (random assignments, agreement rate, etc.) and can estimate the performance on all levels of leniency despite the judge with the highest leniency. See fig 5 from Lakkaraju
 		\item Jung et al presented their method for constructing optimal policy, we show that that approach can be applied to the selective labels setting.
 		\item They didn't have selective labeling nor did they consider that the judges would differ in leniency.
 		\item Selection bias has been extensively discussed in the causal inference literature (Pearl, Bareinboim etc.)
@@ -113,6 +115,8 @@
 
 \section{Framework}
 
+In this section, we define the key terms used in this paper, present the modular framework for selective labels problems and state our problem.
+
 \begin{itemize}
 \item Definitions \\
 	In this paper we apply our approach on binary outcomes, but our approach is readily modifiable to accompany continuous or categorical responses. Then we could use e.g. sum of squared errors or other appropriate metric as the measure for performance.
-- 
GitLab