Begin framework definition

245069b0 · Riku-Laine · 1c013b74 · 245069b0
Commit 245069b0 authored 5 years ago by Riku-Laine
--- a/analysis_and_scripts/notes.tex
+++ b/analysis_and_scripts/notes.tex
@@ -9,6 +9,10 @@
 \usepackage[hidelinks, colorlinks=true]{hyperref}
 %\DeclareGraphicsRule{.tif}{png}{.png}{`convert #1 `dirname #1`/`basename #1 .tif`.png}

+\usepackage{pgf}
+\usepackage{tikz}
+\usetikzlibrary{arrows,automata}
+
 \usepackage{algorithm}% http://ctan.org/pkg/algorithms
 \usepackage{algorithmic}% http://ctan.org/pkg/algorithms
 \renewcommand{\algorithmicrequire}{\textbf{Input:}}
@@ -60,7 +64,7 @@
 \graphicspath{ {../figures/} }

 \title{Notes}
-\author{RL, 11 June 2019}
+\author{RL, 13 June 2019}
 %\date{}                                           % Activate to display a given date or no date

 \begin{document}
@@ -105,6 +109,40 @@ One of the concepts to denote when reading the Lakkaraju paper is the difference

 On the formalisation of R: We discussed how Lakkaraju's paper treats variable R in a seemingly non-sensical way, it is as if a judge would have to let someone go today in order to detain some other defendant tomorrow to keep their acceptance rate at some $r$. A more intuitive way of thinking $r$ would be the "threshold perspective". That is, if a judge sees that a defendant has probability $p_x$ of committing a crime if let out, the judge would detain the defendant if $p_x > r$, the defendant would be too dangerous to let out. The problem in this case is that we cannot observe this innate $r$, we can only observe the decisions given by the judges. This is how Lakkaraju avoids computing $r$ twice by forcing the "acceptance threshold" to be an "acceptance rate" and then the effect of changing $r$ can be computed from the data directly.

+\section{Separation of decision makers and evaluation -- as discussed 13 June}
+
+Below we define some terms:
+
+First, data is generated through a \textbf{data generating process (DGP)}. DGP comprises of generating the private features, assigning the subjects their decisions (called \textbf{labeling}), splitting the data and so on. Two different DGPs are presented in algorithms \ref{alg:data_without_Z} and \ref{alg:data_with_Z}. They will be referred to in the fashion of "creating the data without unobservables" and "creating the data with unobservables" respectively. This part is presented in the first node of Figure \ref{fig:separation}.
+
+Next, the created data is given to the \textbf{decision-making process (DMP)}. The DMP takes as input some features from the instances created by the DGP and outputs either a binary decision (yes/no), a probability (i.e. a real number in interval [0, 1]) or a metric for ordering for all the instances. The outputs are generated by a model called $\mathcal{M}_1$.
+
+From the DMP we can derive \textbf{failure rate (FR)} metric. FR is defined as the ratio of the number undesired outcomes to the number of decisions made. A special characteristic of FR in this setting is that a negative decision will prevent a failure. More explicitly \[ FR = \dfrac{\#\{Failures\}}{\#\{Decisions\}}. \] From the definition it is clear that FR can also be computed for the labelling process. 
+
+Finally, the data from DGP and decisions made in DMP are given to the \textbf{evaluation algorithms}. The evaluation algorithms output an estimate of failure rate with the before-mentioned input.
+
+Given the above set of definitions, the goal is to create an evaluation algorithm that can predict the failure rate of true evaluation given only selectively labeled data.
+
+\begin{figure} [H]
+\centering
+\begin{tikzpicture}[->,>=stealth',shorten >=1pt,auto,node distance=6cm,
+                    semithick]
+
+  \tikzstyle{every state}=[fill=none,draw=black,text=black, rectangle, minimum width=3cm]
+
+  \node[state] (data) {\begin{tabular}{c} $\mathcal{D}$ \\ Data \end{tabular}};
+  \node[state] (decision) [right of=data] {\begin{tabular}{c} $\mathcal{M}_1$ \\ Decision-maker \end{tabular}};
+  \node[state] (evaluation) [right of=decision] {\begin{tabular}{c} $\mathcal{M}_2$ \\ Evaluation \end{tabular}};
+
+  \path (data) edge  (decision)
+        (decision) edge node {\begin{tabular}{c} Decision \\ Probability \\ Ordering \end{tabular}} (evaluation);
+\end{tikzpicture}
+\caption{The selective labels framework. }
+\label{fig:separation}
+\end{figure}
+
+
+
 \section{Data generation}

 Both of the data generating algorithms are presented in this chapter.