From fd96922f69a10f6ec9074c1f2a57659c6a469637 Mon Sep 17 00:00:00 2001
From: Riku-Laine <28960190+Riku-Laine@users.noreply.github.com>
Date: Fri, 14 Jun 2019 10:40:15 +0300
Subject: [PATCH] First version of fw definition

---
 analysis_and_scripts/notes.tex | 92 ++++++++++++++++++----------------
 1 file changed, 50 insertions(+), 42 deletions(-)

diff --git a/analysis_and_scripts/notes.tex b/analysis_and_scripts/notes.tex
index 955b35d..7d4b242 100644
--- a/analysis_and_scripts/notes.tex
+++ b/analysis_and_scripts/notes.tex
@@ -19,6 +19,13 @@
 \renewcommand{\algorithmicensure}{\textbf{Procedure:}}
 \renewcommand{\algorithmicreturn}{\textbf{Return}}
 
+\newcommand{\pr}{\mathbb{P}} % tn merkki
+\newcommand{\D}{\mathcal{D}} % aineisto
+\newcommand{\s}{\mathcal{S}} % "fancy S"
+\newcommand{\M}{\mathcal{M}} % "fancy M"
+\newcommand{\B}{\mathcal{B}} % "fancy B"
+\newcommand{\RR}{\mathcal{R}} % supistusalgon R
+
 \renewcommand{\descriptionlabel}[1]{\hspace{\labelsep}\textnormal{#1}}
 
 \makeatletter
@@ -64,7 +71,7 @@
 \graphicspath{ {../figures/} }
 
 \title{Notes}
-\author{RL, 13 June 2019}
+\author{RL, 14 June 2019}
 %\date{}                                           % Activate to display a given date or no date
 
 \begin{document}
@@ -109,35 +116,36 @@ One of the concepts to denote when reading the Lakkaraju paper is the difference
 
 On the formalisation of R: We discussed how Lakkaraju's paper treats variable R in a seemingly non-sensical way, it is as if a judge would have to let someone go today in order to detain some other defendant tomorrow to keep their acceptance rate at some $r$. A more intuitive way of thinking $r$ would be the "threshold perspective". That is, if a judge sees that a defendant has probability $p_x$ of committing a crime if let out, the judge would detain the defendant if $p_x > r$, the defendant would be too dangerous to let out. The problem in this case is that we cannot observe this innate $r$, we can only observe the decisions given by the judges. This is how Lakkaraju avoids computing $r$ twice by forcing the "acceptance threshold" to be an "acceptance rate" and then the effect of changing $r$ can be computed from the data directly.
 
-\section{Separation of decision makers and evaluation -- as discussed 13 June}
-
-Below we define some terms:
+\section{Framework definition -- 13 June discussion}
 
-First, data is generated through a \textbf{data generating process (DGP)}. DGP comprises of generating the private features, assigning the subjects their decisions (called \textbf{labeling}), splitting the data and so on. Two different DGPs are presented in algorithms \ref{alg:data_without_Z} and \ref{alg:data_with_Z}. They will be referred to in the fashion of "creating the data without unobservables" and "creating the data with unobservables" respectively. This part is presented in the first node of Figure \ref{fig:separation}.
+First, data is generated through a \textbf{data generating process (DGP)}. DGP comprises of generating the private features and the acceptance rates for the judges.  \textbf{Acceptance rate (AR)} is defined as the ratio of positive decisions to all decisions. As a formula \[ AR = \dfrac{\#\{Positive~decisions\}}{\#\{Decisions\}}. \]   Data generation process is depicted in the first box of Figure \ref{fig:separation}.
 
-Next, the created data is given to the \textbf{decision-making process (DMP)}. The DMP takes as input some features from the instances created by the DGP and outputs either a binary decision (yes/no), a probability (i.e. a real number in interval [0, 1]) or a metric for ordering for all the instances. The outputs are generated by a model called $\mathcal{M}_1$.
+Next, the generated data goes to the \textbf{labeling process}. In the labeling process, it is determined which instances of the data will have an outcome label available. This is done by humans and is presented in lines 5--7 of algorithm \ref{alg:data_without_Z} and 5--8 of algorithm \ref{alg:data_with_Z}. % Tämä nähdään dataa generoivissa algoritmeissa riveillä X-Y ja A-B. 
 
-From the DMP we can derive \textbf{failure rate (FR)} metric. FR is defined as the ratio of the number undesired outcomes to the number of decisions made. A special characteristic of FR in this setting is that a negative decision will prevent a failure. More explicitly \[ FR = \dfrac{\#\{Failures\}}{\#\{Decisions\}}. \] From the definition it is clear that FR can also be computed for the labelling process. 
+In the third step, the labeled data is given to a machine that will either make decisions or predictions using some features of the data. The machine will output either binary decisions (yes/no), probabilities (a real number in interval $[0, 1]$) or a metric for ordering all the instances. The machine will be denoted with $\M$.
 
-Finally, the data from DGP and decisions made in DMP are given to the \textbf{evaluation algorithms}. The evaluation algorithms output an estimate of failure rate with the before-mentioned input.
+Finally the decisions and/or predictions made by the machine $\M$ and human judges (see dashed arrow in figure \ref{fig:separation}) will be evaluated using an \textbf{evaluation algorithm}. Evaluation algorithms will take the decisions, probabilities or ordering generated in the previous steps as input and then output an estimate of the failure rate. \textbf{Failure rate (FR)} is defined as the ratio of the number undesired outcomes to the number of decisions made. One special characteristic of FR in this setting is that a failure can only occur with a positive decision. More explicitly \[ FR = \dfrac{\#\{Failures\}}{\#\{Decisions\}}. \] Second characteristic of FR is that the number of positive decisions and therefore FR itself can be controlled through acceptance rate defined above.
 
-Given the above set of definitions, the goal is to create an evaluation algorithm that can predict the failure rate of true evaluation given only selectively labeled data.
+Given the above framework, the goal is to create an evaluation algorithm that can accurately estimate the failure rate of any model $\M$ if it were to replace human decision makers in the labeling process. The estimations have to be made using only data that human decision-makers have labeled. The failure rate has to be accurately estimated for various levels of acceptance rate. The accuracy of the estimates can be compared by computing e.g. mean absolute error w.r.t the estimates given by \nameref{alg:true_eval} algorithm.
 
 \begin{figure} [H]
 \centering
-\begin{tikzpicture}[->,>=stealth',shorten >=1pt,auto,node distance=6cm,
+\begin{tikzpicture}[->,>=stealth',shorten >=1pt,auto,node distance=1.5cm,
                     semithick]
 
-  \tikzstyle{every state}=[fill=none,draw=black,text=black, rectangle, minimum width=3cm]
+  \tikzstyle{every state}=[fill=none,draw=black,text=black, rectangle, minimum width=6cm]
 
-  \node[state] (data) {\begin{tabular}{c} $\mathcal{D}$ \\ Data \end{tabular}};
-  \node[state] (decision) [right of=data] {\begin{tabular}{c} $\mathcal{M}_1$ \\ Decision-maker \end{tabular}};
-  \node[state] (evaluation) [right of=decision] {\begin{tabular}{c} $\mathcal{M}_2$ \\ Evaluation \end{tabular}};
+  \node[state] (D) {Data generation};
+  \node[state] (J) [below of=D] {Labeling process (human)};
+  \node[state] (MP) [below of=J] {$\mathcal{M}$ Machine decisions / predictions};
+  \node[state] (EA)  [below of=MP] {Evaluation algorithm};
 
-  \path (data) edge  (decision)
-        (decision) edge node {\begin{tabular}{c} Decision \\ Probability \\ Ordering \end{tabular}} (evaluation);
+  \path (D) edge (J)
+        (J) edge (MP)
+             edge [bend right=81, dashed] (EA)
+        (MP) edge (EA);
 \end{tikzpicture}
-\caption{The selective labels framework. }
+\caption{The selective labels framework. The dashed arrow indicates how human evaluations are evaluated without machine intervention using \nameref{alg:human_eval} algorithm.}
 \label{fig:separation}
 \end{figure}
 
@@ -227,7 +235,7 @@ The plotted curves are constructed using pseudo code presented in algorithm \ref
 	\FOR{i = 1 \TO $N_{iter}$}
 		\STATE Create data using either Algorithm \ref{alg:data_without_Z} or \ref{alg:data_with_Z}.
 		\STATE Train a logistic regression model using observations in the training set with available outcome labels and assign to $f$.
-		\STATE Using $f$, estimate probabilities $\mathcal{S}$ for Y=0 in both test sets (labeled and full) for all observations and attach them to the respective data sets.
+		\STATE Using $f$, estimate probabilities $\s$ for Y=0 in both test sets (labeled and full) for all observations and attach them to the respective data sets.
         		\STATE Compute failure rate of true evaluation with leniency $r$ and full test data using algorithm \ref{alg:true_eval}.
         		\STATE Compute failure rate of labeled outcomes approach with leniency $r$ and labeled test data using algorithm \ref{alg:labeled_outcomes}.
         		\STATE Compute failure rate of human judges with leniency $r$ and labeled test data using algorithm \ref{alg:human_eval}.
@@ -246,12 +254,12 @@ The plotted curves are constructed using pseudo code presented in algorithm \ref
 \caption{True evaluation} 		% give the algorithm a caption
 \label{alg:true_eval} 			% and a label for \ref{} commands later in the document
 \begin{algorithmic}[1] 		% enter the algorithmic environment
-\REQUIRE Full test data $\mathcal{D}$ with probabilities $\mathcal{S}$ and \emph{all outcome labels}, acceptance rate r
+\REQUIRE Full test data $\D$ with probabilities $\s$ and \emph{all outcome labels}, acceptance rate r
 \ENSURE
-\STATE Sort the data by the probabilities $\mathcal{S}$ to ascending order.
+\STATE Sort the data by the probabilities $\s$ to ascending order.
 \STATE \hskip3.0em $\rhd$ Now the most dangerous subjects are last.
-\STATE Calculate the number to release $N_{free} = |\mathcal{D}| \cdot r$.
-\RETURN $\frac{1}{|\mathcal{D}|}\sum_{i=1}^{N_{free}}\delta\{y_i=0\}$
+\STATE Calculate the number to release $N_{free} = |\D| \cdot r$.
+\RETURN $\frac{1}{|\D|}\sum_{i=1}^{N_{free}}\delta\{y_i=0\}$
 \end{algorithmic}
 \end{algorithm}
 
@@ -259,13 +267,13 @@ The plotted curves are constructed using pseudo code presented in algorithm \ref
 \caption{Labeled outcomes} 		% give the algorithm a caption
 \label{alg:labeled_outcomes} 			% and a label for \ref{} commands later in the document
 \begin{algorithmic}[1] 		% enter the algorithmic environment
-\REQUIRE Labeled test data $\mathcal{D}$ with probabilities $\mathcal{S}$ and \emph{missing outcome labels} for observations with $T=0$, acceptance rate r
+\REQUIRE Labeled test data $\D$ with probabilities $\s$ and \emph{missing outcome labels} for observations with $T=0$, acceptance rate r
 \ENSURE
-\STATE Assign observations with observed outcomes to $\mathcal{D}_{observed}$.
-\STATE Sort $\mathcal{D}_{observed}$ by the probabilities $\mathcal{S}$ to ascending order.
+\STATE Assign observations with observed outcomes to $\D_{observed}$.
+\STATE Sort $\D_{observed}$ by the probabilities $\s$ to ascending order.
 \STATE \hskip3.0em $\rhd$ Now the most dangerous subjects are last.
-\STATE Calculate the number to release $N_{free} = |\mathcal{D}_{observed}| \cdot r$.
-\RETURN $\frac{1}{|\mathcal{D}|}\sum_{i=1}^{N_{free}}\delta\{y_i=0\}$
+\STATE Calculate the number to release $N_{free} = |\D_{observed}| \cdot r$.
+\RETURN $\frac{1}{|\D|}\sum_{i=1}^{N_{free}}\delta\{y_i=0\}$
 \end{algorithmic}
 \end{algorithm}
 
@@ -273,12 +281,12 @@ The plotted curves are constructed using pseudo code presented in algorithm \ref
 \caption{Human evaluation} 		% give the algorithm a caption
 \label{alg:human_eval} 			% and a label for \ref{} commands later in the document
 \begin{algorithmic}[1] 		% enter the algorithmic environment
-\REQUIRE Labeled test data $\mathcal{D}$ with probabilities $\mathcal{S}$ and \emph{missing outcome labels} for observations with $T=0$, acceptance rate r
+\REQUIRE Labeled test data $\D$ with probabilities $\s$ and \emph{missing outcome labels} for observations with $T=0$, acceptance rate r
 \ENSURE
 \STATE Assign judges with leniency in $[r-0.05, r+0.05]$ to $\mathcal{J}$
-\STATE $\mathcal{D}_{released} = \{(x, j, t, y) \in \mathcal{D}~|~t=1 \wedge j \in  \mathcal{J}\}$
+\STATE $\D_{released} = \{(x, j, t, y) \in \D~|~t=1 \wedge j \in  \mathcal{J}\}$
 \STATE \hskip3.0em $\rhd$ Subjects judged \emph{and} released by judges with correct leniency.
-\RETURN $\frac{1}{|\mathcal{J}|}\sum_{i=1}^{\mathcal{D}_{released}}\delta\{y_i=0\}$
+\RETURN $\frac{1}{|\mathcal{J}|}\sum_{i=1}^{\D_{released}}\delta\{y_i=0\}$
 \end{algorithmic}
 \end{algorithm}
 
@@ -286,22 +294,22 @@ The plotted curves are constructed using pseudo code presented in algorithm \ref
 \caption{Contraction algorithm \cite{lakkaraju17}} 		% give the algorithm a caption
 \label{alg:contraction} 			% and a label for \ref{} commands later in the document
 \begin{algorithmic}[1] 		% enter the algorithmic environment
-\REQUIRE Labeled test data $\mathcal{D}$ with probabilities $\mathcal{S}$ and \emph{missing outcome labels} for observations with $T=0$, acceptance rate r
+\REQUIRE Labeled test data $\D$ with probabilities $\s$ and \emph{missing outcome labels} for observations with $T=0$, acceptance rate r
 \ENSURE
-\STATE Let $q$ be the decision-maker with highest acceptance rate in $\mathcal{D}$.
-\STATE $\mathcal{D}_q = \{(x, j, t, y) \in \mathcal{D}|j=q\}$
-\STATE \hskip3.0em $\rhd$ $\mathcal{D}_q$ is the set of all observations judged by $q$
+\STATE Let $q$ be the decision-maker with highest acceptance rate in $\D$.
+\STATE $\D_q = \{(x, j, t, y) \in \D|j=q\}$
+\STATE \hskip3.0em $\rhd$ $\D_q$ is the set of all observations judged by $q$
 \STATE
-\STATE $\mathcal{R}_q = \{(x, j, t, y) \in \mathcal{D}_q|t=1\}$
-\STATE \hskip3.0em $\rhd$ $\mathcal{R}_q$ is the set of observations in $\mathcal{D}_q$ with observed outcome labels
+\STATE $\mathcal{R}_q = \{(x, j, t, y) \in \D_q|t=1\}$
+\STATE \hskip3.0em $\rhd$ $\mathcal{R}_q$ is the set of observations in $\D_q$ with observed outcome labels
 \STATE
-\STATE Sort observations in $\mathcal{R}_q$ in descending order of confidence scores $\mathcal{S}$ and assign to $\mathcal{R}_q^{sort}$.
+\STATE Sort observations in $\mathcal{R}_q$ in descending order of confidence scores $\s$ and assign to $\mathcal{R}_q^{sort}$.
 \STATE \hskip3.0em $\rhd$ Observations deemed as high risk by the black-box model $\mathcal{B}$ are at the top of this list
 \STATE
-\STATE Remove the top $[(1.0-r)|\mathcal{D}_q |]-[|\mathcal{D}_q |-|\mathcal{R}_q |]$ observations of $\mathcal{R}_q^{sort}$ and call this list $\mathcal{R_B}$
+\STATE Remove the top $[(1.0-r)|\D_q |]-[|\D_q |-|\mathcal{R}_q |]$ observations of $\mathcal{R}_q^{sort}$ and call this list $\mathcal{R_B}$
 \STATE \hskip3.0em $\rhd$ $\mathcal{R_B}$ is the list of observations assigned to $t = 1$ by $\mathcal{B}$
 \STATE
-\STATE Compute $\mathbf{u}=\sum_{i=1}^{|\mathcal{R_B}|} \dfrac{\delta\{y_i=0\}}{| \mathcal{D}_q |}$.
+\STATE Compute $\mathbf{u}=\sum_{i=1}^{|\mathcal{R_B}|} \dfrac{\delta\{y_i=0\}}{| \D_q |}$.
 \RETURN $\mathbf{u}$
 \end{algorithmic}
 \end{algorithm}
@@ -310,11 +318,11 @@ The plotted curves are constructed using pseudo code presented in algorithm \ref
 \caption{Causal model, empirical performance (ep, see also section \ref{causal_cdf})} 		% give the algorithm a caption
 \label{alg:causal_model} 			% and a label for \ref{} commands later in the document
 \begin{algorithmic}[1] 		% enter the algorithmic environment
-\REQUIRE Labeled test data $\mathcal{D}$ with probabilities $\mathcal{S}$ and \emph{missing outcome labels} for observations with $T=0$, predictive model $f$, pdf $P_X(x)$ for features $x$, acceptance rate r
+\REQUIRE Labeled test data $\D$ with probabilities $\s$ and \emph{missing outcome labels} for observations with $T=0$, predictive model $f$, pdf $P_X(x)$ for features $x$, acceptance rate r
 \ENSURE
-\STATE For all $x_0 \in \mathcal{D}$ evaluate $F(x_0) = \int_{x\in\mathcal{X}} P_X(x)\delta(f(x)<f(x_0)) ~dx$ and assign to $\mathcal{F}_{predictions}$ 
+\STATE For all $x_0 \in \D$ evaluate $F(x_0) = \int_{x\in\mathcal{X}} P_X(x)\delta(f(x)<f(x_0)) ~dx$ and assign to $\mathcal{F}_{predictions}$ 
 \STATE Create boolean array $T_{causal} = \mathcal{F}_{predictions} < r$.
-\RETURN $\frac{1}{|\mathcal{D}|}\sum_{i=1}^{\mathcal{D}} \mathcal{S} \cdot T_{causal}$ which is equal to $\frac{1}{|\mathcal{D}|}\sum_{x\in\mathcal{D}} f(x)\delta(F(x) < r)$
+\RETURN $\frac{1}{|\D|}\sum_{i=1}^{\D} \s \cdot T_{causal}$ which is equal to $\frac{1}{|\D|}\sum_{x\in\D} f(x)\delta(F(x) < r)$
 \end{algorithmic}
 \end{algorithm}
 
-- 
GitLab