diff --git a/analysis_and_scripts/notes.tex b/analysis_and_scripts/notes.tex index 28144054a628b8733421bd35eba73676beeb6847..9896a6484b67dcf888f59b4328c4b458122e0f63 100644 --- a/analysis_and_scripts/notes.tex +++ b/analysis_and_scripts/notes.tex @@ -11,7 +11,7 @@ \usepackage{pgf} \usepackage{tikz} -\usetikzlibrary{arrows,automata} +\usetikzlibrary{arrows,automata, positioning} \usepackage{algorithm}% http://ctan.org/pkg/algorithms \usepackage{algorithmic}% http://ctan.org/pkg/algorithms @@ -133,22 +133,23 @@ Counterfactual inference. Counterfactual inference techniques have been used ext \section{Framework definition -- 13 June discussion} \label{sec:framework} -First, data is generated through a \textbf{data generating process (DGP)}. DGP comprises of generating the private features for the subjects, generating the acceptance rates for the judges and assigning the subjects to the judges. \textbf{Acceptance rate (AR)} is defined as the ratio of positive decisions to all decisions that a judge will give. As a formula \[ AR = \dfrac{\#\{Positive~decisions\}}{\#\{Decisions\}}. \] Data generation process is depicted in the first box of Figure \ref{fig:framework}. +\emph{In this section we define some key terms and concepts and derive a more unambiguous framework for the selective labels problem. The framework is presented in writing and as a picture in figures \ref{fig:framework} and \ref{fig:framework_data_flow}.} -Next, the generated data goes to the \textbf{labeling process}. In the labeling process, it is determined which instances of the data will have an outcome label available. This is done by humans and is presented in lines 5--7 of algorithm \ref{alg:data_without_Z} and 5--8 of algorithm \ref{alg:data_with_Z}. +First, data is generated through a \textbf{data generating process (DGP)}. DGP comprises of generating the private features for the subjects, generating the acceptance rates for the judges and assigning the subjects to the judges. \textbf{Acceptance rate (AR)} is defined as the ratio of positive decisions to all decisions that a judge will give. As a formula \[ AR = \dfrac{\#\{Positive~decisions\}}{\#\{Decisions\}}. \] Data generating process is depicted in the first box of Figure \ref{fig:framework}. -In the third step, the labeled data is given to a machine that will either make decisions or predictions using some features of the data. The machine will output either binary decisions (yes/no), probabilities (a real number in interval $[0, 1]$) or a metric for ordering all the instances. The machine will be denoted with $\M$. +Next, the all of the generated data goes to the \textbf{labeling process}. In the labeling process, it is determined which instances of the data will have an outcome label available. This is done by humans and is presented in lines 5--7 of algorithm \ref{alg:data_without_Z} and 5--8 of algorithm \ref{alg:data_with_Z}. The data is then split randomly into training and test datasets, $\D_{train}$ and $\D_{test}$ respectively. + +In the third step, the labeled data is given to a machine that will either make decisions or predictions using some features of the data. The machine will be trained on the training data set. Then, the machine will output either binary decisions (yes/no), probabilities (a real number in interval $[0, 1]$) or a metric for ordering for all the instances in the test data set. The machine will be denoted with $\M$. Finally the decisions and/or predictions made by the machine $\M$ and human judges (see dashed arrow in figure \ref{fig:framework}) will be evaluated using an \textbf{evaluation algorithm}. Evaluation algorithms will take the decisions, probabilities or ordering generated in the previous steps as input and then output an estimate of the failure rate. \textbf{Failure rate (FR)} is defined as the ratio of undesired outcomes to given decisions. One special characteristic of FR in this setting is that a failure can only occur with a positive decision. More explicitly \[ FR = \dfrac{\#\{Failures\}}{\#\{Decisions\}}. \] Second characteristic of FR is that the number of positive decisions and therefore FR itself can be controlled through acceptance rate defined above. Given the above framework, the goal is to create an evaluation algorithm that can accurately estimate the failure rate of any model $\M$ if it were to replace human decision makers in the labeling process. The estimations have to be made using only data that human decision-makers have labeled. The failure rate has to be accurately estimated for various levels of acceptance rate. The accuracy of the estimates can be compared by computing e.g. mean absolute error w.r.t the estimates given by \nameref{alg:true_eval} algorithm. - \begin{figure} [H] \centering \begin{tikzpicture}[->,>=stealth',shorten >=1pt,auto,node distance=1.5cm, semithick] - \tikzstyle{every state}=[fill=none,draw=black,text=black, rectangle, minimum width=6cm] + \tikzstyle{every state}=[fill=none,draw=black,text=black, rectangle, minimum width=7.0cm] \node[state] (D) {Data generation}; \node[state] (J) [below of=D] {Labeling process (human)}; @@ -157,13 +158,36 @@ Given the above framework, the goal is to create an evaluation algorithm that ca \path (D) edge (J) (J) edge (MP) - edge [bend right=81, dashed] (EA) + edge [bend right=82, dashed] (EA) (MP) edge (EA); \end{tikzpicture} \caption{The selective labels framework. The dashed arrow indicates how human evaluations are evaluated without machine intervention using \nameref{alg:human_eval} algorithm.} \label{fig:framework} \end{figure} +\begin{figure} [H] +\centering +\begin{tikzpicture}[->,>=stealth',shorten >=1pt,auto,node distance=1.5cm, + semithick] + + \tikzstyle{every state}=[fill=none,draw=black,text=black, rectangle, minimum width=7.0cm] + + \node[state] (DG) {Data generation}; + \node[state] (LP) [below of = DG] {Labeling process (human)}; + \node[state] (MT) [below left=1.0cm and -4cm of LP] {Model training}; + \node[state] (MD) [below=1.0cm of MT] {$\mathcal{M}$ Machine decisions / predictions}; + \node[state] (EA) [below right=0.75cm and -4cm of MD] {Evaluation algorithm}; + + \path (DG) edge (LP) + (LP) edge [bend left=-15] node [right, pos=0.6] {$\D_{train}$} (MT) + edge [bend left=45] node [right] {$\D_{test}$} (MD) + edge [bend left=70, dashed] node [right] {$\D_{test}$} (EA) + (MT) edge node {$\M$} (MD) + (MD) edge (EA); + \end{tikzpicture} +\caption{The selective labels framework with explicit data flow. The dashed arrow indicates how human evaluations are evaluated without machine intervention using \nameref{alg:human_eval} algorithm. The evaluations are performed over the test set.} +\label{fig:framework_data_flow} +\end{figure} \section{Data generation}