@@ -116,9 +116,9 @@ One of the concepts to denote when reading the Lakkaraju paper is the difference
...
@@ -116,9 +116,9 @@ One of the concepts to denote when reading the Lakkaraju paper is the difference
On the formalisation of R: We discussed how Lakkaraju's paper treats variable R in a seemingly non-sensical way, it is as if a judge would have to let someone go today in order to detain some other defendant tomorrow to keep their acceptance rate at some $r$. A more intuitive way of thinking $r$ would be the "threshold perspective". That is, if a judge sees that a defendant has probability $p_x$ of committing a crime if let out, the judge would detain the defendant if $p_x > r$, the defendant would be too dangerous to let out. The problem in this case is that we cannot observe this innate $r$, we can only observe the decisions given by the judges. This is how Lakkaraju avoids computing $r$ twice by forcing the "acceptance threshold" to be an "acceptance rate" and then the effect of changing $r$ can be computed from the data directly.
On the formalisation of R: We discussed how Lakkaraju's paper treats variable R in a seemingly non-sensical way, it is as if a judge would have to let someone go today in order to detain some other defendant tomorrow to keep their acceptance rate at some $r$. A more intuitive way of thinking $r$ would be the "threshold perspective". That is, if a judge sees that a defendant has probability $p_x$ of committing a crime if let out, the judge would detain the defendant if $p_x > r$, the defendant would be too dangerous to let out. The problem in this case is that we cannot observe this innate $r$, we can only observe the decisions given by the judges. This is how Lakkaraju avoids computing $r$ twice by forcing the "acceptance threshold" to be an "acceptance rate" and then the effect of changing $r$ can be computed from the data directly.
\section{Framework definition -- 13 June discussion}
\section{Framework definition -- 13 June discussion}\label{sec:framework}
First, data is generated through a \textbf{data generating process (DGP)}. DGP comprises of generating the private features for the subjects, generating the acceptance rates for the judges and assigning the subjects to the judges. \textbf{Acceptance rate (AR)} is defined as the ratio of positive decisions to all decisions that a judge will give. As a formula \[ AR =\dfrac{\#\{Positive~decisions\}}{\#\{Decisions\}}. \] Data generation process is depicted in the first box of Figure \ref{fig:separation}.
First, data is generated through a \textbf{data generating process (DGP)}. DGP comprises of generating the private features for the subjects, generating the acceptance rates for the judges and assigning the subjects to the judges. \textbf{Acceptance rate (AR)} is defined as the ratio of positive decisions to all decisions that a judge will give. As a formula \[ AR =\dfrac{\#\{Positive~decisions\}}{\#\{Decisions\}}. \] Data generation process is depicted in the first box of Figure \ref{fig:framework}.
Next, the generated data goes to the \textbf{labeling process}. In the labeling process, it is determined which instances of the data will have an outcome label available. This is done by humans and is presented in lines 5--7 of algorithm \ref{alg:data_without_Z} and 5--8 of algorithm \ref{alg:data_with_Z}.
Next, the generated data goes to the \textbf{labeling process}. In the labeling process, it is determined which instances of the data will have an outcome label available. This is done by humans and is presented in lines 5--7 of algorithm \ref{alg:data_without_Z} and 5--8 of algorithm \ref{alg:data_with_Z}.
...
@@ -146,7 +146,7 @@ Given the above framework, the goal is to create an evaluation algorithm that ca
...
@@ -146,7 +146,7 @@ Given the above framework, the goal is to create an evaluation algorithm that ca
(MP) edge (EA);
(MP) edge (EA);
\end{tikzpicture}
\end{tikzpicture}
\caption{The selective labels framework. The dashed arrow indicates how human evaluations are evaluated without machine intervention using \nameref{alg:human_eval} algorithm.}
\caption{The selective labels framework. The dashed arrow indicates how human evaluations are evaluated without machine intervention using \nameref{alg:human_eval} algorithm.}
\label{fig:separation}
\label{fig:framework}
\end{figure}
\end{figure}
...
@@ -403,7 +403,7 @@ In this part, Gaussian noise with zero mean and 0.1 variance was added to the pr
...
@@ -403,7 +403,7 @@ In this part, Gaussian noise with zero mean and 0.1 variance was added to the pr
In this section the predictive model was switched to random forest classifier to examine the effect of changing the predictive model. Results are practically identical to those presented in figure \ref{fig:results} previously and are presented in figure \ref{fig:random_forest}.
In this section the predictive model was switched to random forest classifier to examine the effect of changing the predictive model. Results are practically identical to those presented in figure \ref{fig:results} previously and are presented in figure \ref{fig:random_forest}.
@@ -425,13 +425,24 @@ In this section the predictive model was switched to random forest classifier to
...
@@ -425,13 +425,24 @@ In this section the predictive model was switched to random forest classifier to
Predictions were checked by drawing a graph of predicted Y versus X, results are presented in figure \ref{fig:sanity_check}. The figure indicates that the predicted class labels and the probabilities for them are consistent with the ground truth.
Predictions were checked by drawing a graph of predicted Y versus X, results are presented in figure \ref{fig:sanity_check}. The figure indicates that the predicted class labels and the probabilities for them are consistent with the ground truth.
\caption{Predicted class label and probability of $Y=1$ versus X. Prediction was done with a logistic regression model. Colors of the points denote ground truth (yellow = 1, purple = 0). Data set was created with the unobservables.}
\caption{Predicted class label and probability of $Y=1$ versus X. Prediction was done with a logistic regression model. Colors of the points denote ground truth (yellow = 1, purple = 0). Data set was created with the unobservables.}
\label{fig:sanity_check}
\label{fig:sanity_check}
\end{figure}
\end{figure}
\subsection{Fully random model}
Given our framework defined in section \ref{sec:framework}, the results presented next are with model $\M$ that outputs probabilities 0.5 for every instance of $x$. Labeling process is still as presented in \ref{alg:data_with_Z}.
\caption{Failure rate vs. acceptance rate with different levels of leniency. Data was generated with unobservables and $N_{iter}$ was set to 15. Machine predictions were done with completely random model, that is prediction $P(Y=0|X=x)=0.5$ for all $x$.}