Alg 3 specisifies S and f, also other minor tweaks

32fb16c7 · Riku-Laine · e5c300f2 · 32fb16c7
Commit 32fb16c7 authored 5 years ago by Riku-Laine
--- a/analysis_and_scripts/notes.tex
+++ b/analysis_and_scripts/notes.tex
@@ -99,6 +99,8 @@ The general idea of the SL paper is to train some predictive model with selectiv
 One of the concepts to denote when reading the Lakkaraju paper is the difference between the global goal of prediction and the goal in this specific setting. The global goal is to have a low failure rate with high acceptance rate, but at the moment we are not interested in it. The goal in this setting is to estimate the true failure rate of the model with unseen biased data. That is, given only selectively labeled data and an arbitrary black-box model $\mathcal{B}$ we are interested in estimating performance of model $\mathcal{B}$ in the whole data set with all ground truth labels.
+On the formalisation of R: We discussed how Lakkaraju's paper treats variable R in a seemingly non-sensical way, it is as if a judge would have to let someone go today in order to detain some other defendant tomorrow to keep their acceptance rate at some $r$. A more intuitive way of thinking $r$ would be the "threshold perspective". That is, if a judge sees that a defendant has probability $p_x$ of committing a crime if let out, the judge would detain the defendant if $p_x > r$. The problem in this case is that we cannot observe this innate $r$, we can only observe the decisions given by the judges. This is how Lakkaraju avoids computing $r$ twice by forcing the "acceptance threshold" to be an "acceptance rate" and then the effect of changing $r$ can be computed from the data directly.
 \section{Data generation}
 Both of the data generating algorithms are presented in this chapter. 
@@ -167,7 +169,7 @@ The following quantities are estimated from the data:
 \item Labeled outcomes: The "traditional"/vanilla estimate of model performance. See algorithm \ref{alg:labeled_outcomes}.
 \item Human evaluation: The failure rate of human decision-makers who have access to the latent variable Z. Decision-makers with similar values of leniency are binned and treated as one hypothetical decision-maker. See algorithm \ref{alg:human_eval}.
 \item Contraction: See algorithm 1 of \cite{lakkaraju17}
-\item Causal model: In essence, the empirical performance is calculated over the test set as $$\dfrac{1}{n}\sum_{(x, y)\in D}f(x)\delta(F(x) < r)$$ where $$f(x) = P(Y=0|T=1, X=x)$$ is a logistic regression model (see \ref{sec:model_fitting}) predicing Y from X trained on the labeled data and $$ F(x_0) = \int_{x\in\mathcal{X}} P(x)\delta(f(x) < f(x_0)) ~ dx.$$ All observations, even ones with missing outcome labels, can be used since empirical performance doesn't depend on them. $P(x)$ is Gaussian pdf from scipy.stats package and it is integrated over interval [-15, 15] with 40000 steps using si.simps function from scipy.integrate which uses Simpson's rule in estimating the value of the integral. (docs: \url{https://docs.scipy.org/doc/scipy/reference/generated/scipy.integrate.simps.html}) \label{causal_cdf}
+\item Causal model: In essence, the empirical performance is calculated over the test set as $$\dfrac{1}{n}\sum_{(x, y)\in D}f(x)\delta(F(x) < r)$$ where $$f(x) = P(Y=0|T=1, X=x)$$ is a logistic regression model (see \ref{sec:model_fitting}) predicting Y from X trained on the labeled data and $$ F(x_0) = \int_{x\in\mathcal{X}} P(x)\delta(f(x) < f(x_0)) ~ dx.$$ All observations, even ones with missing outcome labels, can be used since empirical performance doesn't depend on them. $P(x)$ is Gaussian pdf from scipy.stats package and it is integrated over interval [-15, 15] with 40000 steps using si.simps function from scipy.integrate which uses Simpson's rule in estimating the value of the integral. (docs: \url{https://docs.scipy.org/doc/scipy/reference/generated/scipy.integrate.simps.html}) \label{causal_cdf}
 \end{itemize}
 The plotted curves are constructed using pseudo code presented in algorithm \ref{alg:perf_comp}.