Skip to content
Snippets Groups Projects
Commit d9e7d158 authored by Riku-Laine's avatar Riku-Laine
Browse files

Modules started

parent 95f8d1f3
No related branches found
No related tags found
No related merge requests found
...@@ -9,8 +9,14 @@ ...@@ -9,8 +9,14 @@
\usepackage[hidelinks, colorlinks=true]{hyperref} \usepackage[hidelinks, colorlinks=true]{hyperref}
%\DeclareGraphicsRule{.tif}{png}{.png}{`convert #1 `dirname #1`/`basename #1 .tif`.png} %\DeclareGraphicsRule{.tif}{png}{.png}{`convert #1 `dirname #1`/`basename #1 .tif`.png}
\usepackage[normalem]{ulem} %tables
\useunder{\uline}{\ul}{}
\usepackage{wrapfig} % wrap figures \usepackage{wrapfig} % wrap figures
\usepackage{booktabs}% http://ctan.org/pkg/booktabs
\newcommand{\tabitem}{~~\llap{\textbullet}~~}
\usepackage{pgf} \usepackage{pgf}
\usepackage{tikz} \usepackage{tikz}
\usepackage{tikz-cd} \usepackage{tikz-cd}
...@@ -70,6 +76,13 @@ ...@@ -70,6 +76,13 @@
\def\l@subsection{\@tocline{2}{0pt}{2.5pc}{5pc}{}} \def\l@subsection{\@tocline{2}{0pt}{2.5pc}{5pc}{}}
\makeatother \makeatother
%\makeatletter
%\def\listofalgorithms{\@starttoc{loa}\listalgorithmname}
%\def\l@algorithm{\@tocline{0}{3pt plus2pt}{0pt}{1.9em}{}}
%\renewcommand{\ALG@name}{AlGoRiThM}
%\renewcommand{\listalgorithmname}{List of \ALG@name s}
%\makeatother
\usepackage{subcaption} \usepackage{subcaption}
\graphicspath{ {../figures/} } \graphicspath{ {../figures/} }
...@@ -85,6 +98,8 @@ ...@@ -85,6 +98,8 @@
\tableofcontents \tableofcontents
%\listofalgorithms
\begin{abstract} \begin{abstract}
This document presents the implementations of RL in pseudocode level. First, I present most of the nomenclature used in these notes. Then I proceed to give my personal views and comments on the motivation behind Selective labels paper. In chapter 2, I define the framework for this problem and give the required definitions. In the following sections, I present the data generating algorithms and algorithms for obtaining failure rates using different methods. Finally in the last section, I present results using multiple different settings. This document presents the implementations of RL in pseudocode level. First, I present most of the nomenclature used in these notes. Then I proceed to give my personal views and comments on the motivation behind Selective labels paper. In chapter 2, I define the framework for this problem and give the required definitions. In the following sections, I present the data generating algorithms and algorithms for obtaining failure rates using different methods. Finally in the last section, I present results using multiple different settings.
\end{abstract} \end{abstract}
...@@ -230,6 +245,7 @@ Given the above framework, the goal is to create an evaluation algorithm that ca ...@@ -230,6 +245,7 @@ Given the above framework, the goal is to create an evaluation algorithm that ca
& Y & & Y &
\end{tikzcd} \end{tikzcd}
\caption{$\M$} \caption{$\M$}
\label{fig:dgm}
\end{wrapfigure} \end{wrapfigure}
\emph{Below is the framework as was written on the whiteboard, then RL presents his own remarks on how he understood this.} \emph{Below is the framework as was written on the whiteboard, then RL presents his own remarks on how he understood this.}
...@@ -622,6 +638,181 @@ Given our framework defined in section \ref{sec:framework}, the results presente ...@@ -622,6 +638,181 @@ Given our framework defined in section \ref{sec:framework}, the results presente
\label{fig:random_predictions} \label{fig:random_predictions}
\end{figure} \end{figure}
\section{Modules}
Different types of modules are presented in this section. Summary table is presented last.
\subsection{Data generation modules}
Data generation modules usually take only some generative parameters as input.
\begin{algorithm}[H] % enter the algorithm environment
\caption{Data generation module: "results by threshold" with unobservables} % give the algorithm a caption
%\label{alg:} % and a label for \ref{} commands later in the document
\begin{algorithmic}[1] % enter the algorithmic environment
\REQUIRE Total number of subjects $N_{total},~\beta_X=1,~\beta_Z=1$ and $\beta_W=0.2$.
\ENSURE
\FORALL{$i$ in $1, \ldots, N_{total}$}
\STATE Draw $x_i, z_i$ and $w_i$ from from standard Gaussians independently.
\STATE Set Y to 0 if $P(Y = 0| X, Z, W) = \sigma(\beta_XX+\beta_ZZ+\beta_WW) \geq 0.5$ and \\to 1 otherwise.
\STATE Attach to data.
\ENDFOR
\RETURN data
\end{algorithmic}
\end{algorithm}
\begin{algorithm}[H] % enter the algorithm environment
\caption{Data generation module: "coin-flip results" with unobservables} % give the algorithm a caption
%\label{alg:} % and a label for \ref{} commands later in the document
\begin{algorithmic}[1] % enter the algorithmic environment
\REQUIRE Total number of subjects $N_{total},~\beta_X=1,~\beta_Z=1$ and $\beta_W=0.2$.
\ENSURE
\FORALL{$i$ in $1, \ldots, N_{total}$}
\STATE Draw $x_i, z_i$ and $w_i$ from from standard Gaussians independently.
\STATE Draw $y_i$ from Bernoulli$(\sigma(\beta_XX+\beta_ZZ+\beta_WW))$.
\STATE Attach to data.
\ENDFOR
\RETURN data
\end{algorithmic}
\end{algorithm}
\subsection{Decider modules}
%For decider modules, input as terms of knowledge and parameters should be as explicitly specified as possible.
\begin{algorithm}[H] % enter the algorithm environment
\caption{Decider module: human judge as specified by Lakkaraju et al.} % give the algorithm a caption
%\label{alg:} % and a label for \ref{} commands later in the document
\begin{algorithmic}[1] % enter the algorithmic environment
\REQUIRE Data with features $X, Z$ of size $N_{total}$, knowledge that both of them affect the outcome Y and that they are independent, $\beta_X=1, \beta_Z=1$.
\ENSURE
\STATE Sample acceptance rates for each M judges from $U(0.1; 0.9)$ and round to tenth decimal place.
\STATE Assign each observation to a judge at random.
\STATE Calculate $P(T=0|X, Z) = \sigma(\beta_XX+\beta_ZZ)$ for each observation and attach to data.
\STATE Sort the data by (1) the judges' and (2) by probabilities $P(T=0|X, Z)$ in descending order.
\STATE \hskip3.0em $\rhd$ Now the most dangerous subjects for each of the judges are at the top.
\STATE If subject belongs to the top $(1-r) \cdot 100 \%$ of observations assigned to that judge, set $T=0$ else set $T=1$.
\RETURN data with decisions
\end{algorithmic}
\end{algorithm}
\begin{algorithm}[H] % enter the algorithm environment
\caption{Decider module: "coin-flip decisions"} % give the algorithm a caption
%\label{alg:} % and a label for \ref{} commands later in the document
\begin{algorithmic}[1] % enter the algorithmic environment
\REQUIRE Data with features $X, Z$ of size $N_{total}$, knowledge that both of them affect the outcome Y and that they are independent, $\beta_X=1, \beta_Z=1$.
\ENSURE
\FORALL{$i$ in $1, \ldots, N_{total}$}
\STATE Draw $t_i$ from Bernoulli$(\sigma(\beta_XX+\beta_ZZ)))$.
\STATE Attach to data.
\ENDFOR
\RETURN data with decisions
\end{algorithmic}
\end{algorithm}
\subsection{Evaluator modules}
\begin{algorithm}[H] % enter the algorithm environment
\caption{Evaluator module: Contraction algorithm \cite{lakkaraju17}} % give the algorithm a caption
%\label{alg:} % and a label for \ref{} commands later in the document
\begin{algorithmic}[1] % enter the algorithmic environment
\REQUIRE Data $\D$ with properties $\{x_i, t_i, y_i\}$, acceptance rate r, knowledge that X affects Y
\ENSURE
\STATE Split data to a test set and training set.
\STATE Train a predictive model $\B$ on training data.
\STATE Estimate probability scores $\s$ using $\B$ for all observations in test data and attach to test data.
\STATE Let $q$ be the decision-maker with highest acceptance rate in $\D$.
\STATE $\D_q = \{(x, j, t, y) \in \D|j=q\}$
\STATE \hskip3.0em $\rhd$ $\D_q$ is the set of all observations judged by $q$
\STATE
\STATE $\RR_q = \{(x, j, t, y) \in \D_q|t=1\}$
\STATE \hskip3.0em $\rhd$ $\RR_q$ is the set of observations in $\D_q$ with observed outcome labels
\STATE
\STATE Sort observations in $\RR_q$ in descending order of confidence scores $\s$ and assign to $\RR_q^{sort}$.
\STATE \hskip3.0em $\rhd$ Observations deemed as high risk by the black-box model $\mathcal{B}$ are at the top of this list
\STATE
\STATE Remove the top $[(1.0-r)|\D_q |]-[|\D_q |-|\RR_q |]$ observations of $\RR_q^{sort}$ and call this list $\mathcal{R_B}$
\STATE \hskip3.0em $\rhd$ $\mathcal{R_B}$ is the list of observations assigned to $t = 1$ by $\mathcal{B}$
\STATE
\STATE Compute $\mathbf{u}=\sum_{i=1}^{|\mathcal{R_B}|} \dfrac{\delta\{y_i=0\}}{| \D_q |}$.
\RETURN $\mathbf{u}$
\end{algorithmic}
\end{algorithm}
\begin{algorithm}[] % enter the algorithm environment
\caption{Evaluator module: True evaluation} % give the algorithm a caption
%\label{alg:true_eval} % and a label for \ref{} commands later in the document
\begin{algorithmic}[1] % enter the algorithmic environment
\REQUIRE Data $\D$ with properties $\{x_i, t_i, y_i\}$ and \emph{all outcome labels}, acceptance rate r, knowledge that X affects Y
\ENSURE
\STATE Split data to a test set and training set.
\STATE Train a predictive model $\B$ on training data.
\STATE Estimate probability scores $\s$ using $\B$ for all observations in test data and attach to test data.
\STATE Sort the data by the probabilities $\s$ to ascending order.
\STATE \hskip3.0em $\rhd$ Now the most dangerous subjects are last.
\STATE Calculate the number to release $N_{free} = |\D| \cdot r$.
\RETURN $\frac{1}{|\D|}\sum_{i=1}^{N_{free}}\delta\{y_i=0\}$
\end{algorithmic}
\end{algorithm}
\begin{algorithm}[] % enter the algorithm environment
\caption{Evaluator module: Labeled outcomes} % give the algorithm a caption
%\label{alg:labeled_outcomes} % and a label for \ref{} commands later in the document
\begin{algorithmic}[1] % enter the algorithmic environment
\REQUIRE Data $\D$ with properties $\{x_i, t_i, y_i\}$, acceptance rate r, knowledge that X affects Y
\ENSURE
\STATE Split data to a test set and training set.
\STATE Train a predictive model $\B$ on training data.
\STATE Estimate probability scores $\s$ using $\B$ for all observations in test data and attach to test data.
\STATE Assign observations in test data with observed outcomes (T=1) to $\D_{observed}$.
\STATE Sort $\D_{observed}$ by the probabilities $\s$ to ascending order.
\STATE \hskip3.0em $\rhd$ Now the most dangerous subjects are last.
\STATE Calculate the number to release $N_{free} = |\D_{observed}| \cdot r$.
\RETURN $\frac{1}{|\D|}\sum_{i=1}^{N_{free}}\delta\{y_i=0\}$
\end{algorithmic}
\end{algorithm}
\subsection{Summary}
%\begin{table}[H]
%\centering
%\begin{tabular}{l | l | l}
%\multicolumn{3}{c}{ \textbf{Module}} \\
%\textbf{Data generator} & \textbf{Decider} & \textbf{Evaluator} \\ \hline
% With unobservables, see \ref{fig:dgm} & independent decisions & Contraction algorithm, input: \\
% Without unobservables & & \tabitem jotain \\
% & & \tabitem lisaaa
%\end{tabular}
%\caption{Types of evaluation algorithms}
%\label{tab:jotain}
%\end{table}
\begin{table}
\centering
\begin{tabular}{lll}
\toprule
\multicolumn{3}{c}{Module type} \\[.5\normalbaselineskip]
\textbf{Data generator} & \textbf{Decider} & \textbf{Evaluator} \\
\midrule
With unobservables (figs TBA) & Independent decisions & {\ul Labeled outcomes} \\
Without unobservables & \tabitem $P(T=0|X, Z)$ & \tabitem Data $\D$ with properties $\{x_i, t_i, y_i\}$ \\
& \tabitem "threshold rule" & \tabitem acceptance rate r \\
& & \tabitem knowledge that X affects Y \\[.5\normalbaselineskip]
& & {\ul True evaluation} \\
& & \tabitem Data $\D$ with properties $\{x_i, t_i, y_i\}$ \\
& & and \emph{all outcome labels} \\
& & \tabitem acceptance rate r \\
& & \tabitem knowledge that X affects Y \\[.5\normalbaselineskip]
& & {\ul Contraction algorithm} \\
& & \tabitem Data $\D$ with properties $\{x_i, t_i, y_i\}$ \\
& & \tabitem acceptance rate r \\
& & \tabitem knowledge that X affects Y \\[.5\normalbaselineskip]
\bottomrule
\end{tabular}
\caption{Summary table of modules (under construction)}
\label{tab:jotain}
\end{table}
\begin{thebibliography}{9} \begin{thebibliography}{9}
\bibitem{dearteaga18} \bibitem{dearteaga18}
...@@ -631,5 +822,4 @@ Given our framework defined in section \ref{sec:framework}, the results presente ...@@ -631,5 +822,4 @@ Given our framework defined in section \ref{sec:framework}, the results presente
\end{thebibliography} \end{thebibliography}
\end{document} \end{document}
\ No newline at end of file
0% Loading or .
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment