Resolved.

Merge branch 'master' of https://version.helsinki.fi/rikulain/bachelors-thesis

Resolved.
6096525e · Antti Hyttinen · a825ca8d · b4a33628 · 6096525e
Commit 6096525e authored 5 years ago by Antti Hyttinen
--- a/paper/experiments.tex
+++ b/paper/experiments.tex
@@ -15,7 +15,6 @@ We compare \cfbi especially with the recent \contraction technique of \citet[KDD
 The implementation uses Python 3.6.9 and PyStan v.2.19.0.0 with cmdstanpy 0.4.3 -- and will be made available online upon publication.


-
 \subsection{Synthetic Data} 
 \label{sec:syntheticsetting}

@@ -23,7 +22,9 @@ We begin our experiments with synthetic data, in order to demonstrate various pr
 %
 To set up the experimentation, we follow the setting of \citet{lakkaraju2017selective}.

-Each synthetic dataset we experiment with consists of $\datasize=5{,}000$ randomly generated cases.
+We create 10 synthetic data sets.
+%
+Each dataset we experiment with consists of $\datasize=5{,}000$ randomly generated cases.
 %
 The features \obsFeatures and \unobservable of each case are drawn independently from standard Gaussians.
 %
@@ -37,18 +38,21 @@ from Uniform$(0.1,~0.9)$.
 %DOES NOT SOUND WHAT A REAL JUDGE WOULD DO 
 %I AM NOT SURE WE DO THIS; WE GET A BATCH AND THEN DO DECISIONS
 A decision $\decision$ is made for each case by the assigned decision maker.
-The exact way this is made by the different types of decision makers we consider is described in the next subsection (Sec.~\ref{sec:dm_exps}).
+%
+The exact method of assigning a decision is specified in the next subsection (Sec.~\ref{sec:dm_exps}).
 %
 If the decision is positive, then a binary outcome is sampled from a Bernoulli distribution to the case so that
 \begin{equation}
-\prob{\outcome = 0~|~\obsFeaturesValue, \unobservableValue}  =  \invlogit(\alpha_\outcome + \beta_\obsFeatures \obsFeaturesValue + \beta_\unobservable \unobservableValue + e_\outcome)  \label{eq:Ysampling} 
+\prob{\outcome = 0~|~\decision=1,~\obsFeaturesValue, \unobservableValue}  =  \invlogit(a_\outcome + b_\obsFeatures \obsFeaturesValue + b_\unobservable \unobservableValue + e_\outcome)  \label{eq:Ysampling} 
 \end{equation}% Note the ''inverted'' probability and added \epsilon_\outcome compared to eq 1.
 %
-with $\alpha_\outcome=0$ and $~\beta_\obsFeatures = \beta_\unobservable = 1$.
+with $a_\outcome=0$ and $~b_\obsFeatures = b_\unobservable = 1$.
 %
-Note that, in the event of a positive decision, the intercept $\alpha_\outcome$ determines the base probability for a negative result -- and the choice of $\alpha_\outcome = 0$ means that positive and negative outcomes are equally likely to appear in the dataset (expected proportion is $50\%-50\%$) among cases with positive decisions.
+Note that, the intercept $a_\outcome$ determines the base probability for a negative result -- and the choice of $a_\outcome = 0$ means that positive and negative outcomes are equally likely to appear in the dataset among cases with positive decisions.
 %
+
 Additional noise is added to the outcome of each case via $e_\outcome$, which was drawn from a zero-mean Gaussian distribution with small variance,  $e_\outcome\sim \gaussian{0}{0.1}$. The data set was split in half to training and tests sets, such that each decision maker appears only in one. The evaluated decision maker $\machine$ is trained on the training set while all evaluation is based on the test set. 
+
 %\acomment{$\epsilon_\outcome$  does not appear anywhere in the formulas except appendix A where it is used for a different purpose. Can you Riku describe it here in words?}
 %
 %\rcomment{I'm not exactly sure what you wish me to describe. The noise $\epsilon_\outcome$ is generated as stated above. It represents something unobserved by the decision maker, events happening after the decision is made or just simply chance.}
@@ -64,28 +68,38 @@ Our experimentation involves two categories of decision makers: (i) the set of d
 We describe both of them below.

 \mpara{Decisions by \humanset.} %\newline
+%%
+%Among cases that receive a positive decision, the probability to have a positive or negative outcome is higher or lower depending on the quantity below (see Equation~\ref{eq:defendantmodel}), to which we refer as the `{\it risk score}' of each case
+%\begin{equation}
+%\text{risk score} = \beta_\obsFeatures \obsFeaturesValue + \beta_\unobservable \unobservableValue .
+%\end{equation}
+%%
+%Lower values indicate that a negative outcome is more likely.
+%%
 %
-Among cases that receive a positive decision, the probability to have a positive or negative outcome is higher or lower depending on the quantity below (see Equation~\ref{eq:defendantmodel}), to which we refer as the `{\it risk score}' of each case
-\begin{equation}
-\text{risk score} = \beta_\obsFeatures \obsFeaturesValue + \beta_\unobservable \unobservableValue .
-\end{equation}
+The decisions of decision makers \humanset are based on their perception of the dangerousness of a case, to which we refer as the `{\it risk score}'.
 %
-Lower values indicate that a negative outcome is more likely.
+The risk score is a function over the features \obsFeatures and \unobservable.%, denoted $f(\obsFeatures, \unobservable)$.
 %
+With synthetic data the risk score function 
+\begin{equation} \label{eq:risk}
+f(\obsFeatures, \unobservable) = \invlogit(b_\obsFeatures \obsFeatures + b_\unobservable \unobservable).
+\end{equation}

 For the {\it first} type of decision makers we consider, we assume that decisions are rational and well-informed, and that a decision maker with leniency \leniencyValue makes a positive decision only for the \leniencyValue fraction of cases that are most likely to lead to a positive outcome. 
 %
 Specifically, we assume that the decision-makers know the cumulative distribution function $F$ that the risk scores $s = \beta_\obsFeatures \obsFeaturesValue + \beta_\unobservable \unobservableValue$ of defendants follow. 
 %
-This is a reasonable assumption to make in settings where decision makers have accurate knowledge of the joint feature distribution \prob{\obsFeatures = \obsFeaturesValue, \unobservable =\unobservableValue}, as well as the risk parameters $\beta_\obsFeatures$ and $\beta_\unobservable$ -- as such knowledge allows one to calculate $F$.
+This is a reasonable assumption to make in settings where decision makers have accurate knowledge of the joint feature distribution as such knowledge allows one to calculate $F$.
 %
-For example, a seasoned judge who has tried a large volume and variety of cases may have a good idea about the various cases that appear at court and which of them pose higher risk.
+For example, an experienced judge who has tried a large volume and variety of defendants may have a good idea about the various cases that appear at court and which of them pose higher risk.
 %
 Considering a decision maker with leniency $\leniency = \leniencyValue$ who decides a case with risk score $s$, a positive decision is made only if $s$ is in the \leniencyValue portion of the lowest scores according to $F$, i.e. if 
 \begin{equation}
 	s \leq F^{-1}(\leniencyValue).
 \end{equation} 
 %
+
 See Appendix~\ref{sec:independent} for more details. 
 %
 Since in our setting the distribution $F$ is given and fixed, such decisions for different cases happen independently based on their risk score.
@@ -95,10 +109,10 @@ Because of this, we refer to this type of decision makers as \independent.
 %OUT TO GET A SUBMITTABLE VERSION
 %\todo{MM}{Make sure Appendix~\ref{sec:independent} is correct.} \acomment{Appendix has F as the cumulative distribution function, here we have G.} %RL: Of the presented decision-makers, independent is the only one that could be deployed in an online setting. You couldn't deploy batch to make decisions on single subjects.

+
 In addition, we also experiment with a different type of decision makers, namely \batch, also used in \cite{lakkaraju2017selective}.
-%NEED TO CITE HERE AS THIS IS WHAT THEY DO; THEY WILL BE FURIOUS AS REVIEWERS IF WE DONT
 %
-Decision makers of this type are assumed to consider all cases assigned to them at once, as a batch; sort them by risk score; and, for leniency \leniency = \leniencyValue, release $\leniencyValue$ portion of the batch with the best risk score. 
+Decision makers of this type are assumed to consider all cases assigned to them at once, as a batch; sort them by the risk score in Equation~\ref{eq:risk}; and, for leniency \leniency = \leniencyValue, release $\leniencyValue$ portion of the batch with the lowest risk score. 
 %
 Such decision makers still have a good knowledge of the relative risk that the cases assigned to them pose, but they are also shortsighted, as they make decisions for a case \emph{depending} on other cases in their batch. 
 %
@@ -111,8 +125,7 @@ For example, if a decision maker is randomly assigned a batch of cases that are
 % MUCH EASIER TO TALK ABOUT IT SEPARATELY
 %
 Finally, we consider a third type of decision maker, namely \random. It simply makes a positive decision with probability \leniencyValue.
-%Decision makers of this type simply select uniformly at random a portion $\leniency=\leniencyValue$ of cases from the batch assigned to them, make a positive decision for them -- and make a negative decision for the remaining cases.
-%YES THIS IS WRONG, BUT IT WAS WRONG ALREADY WHEN ANTTI WROTE IT
+%
 We include this to test the evaluation methods also in settings where some of their assumptions may be violated.
 %
 %%\random decision makers may make poor decisions -- but they do not introduce selection bias, as their decision is not correlated with the possible outcome.
@@ -130,8 +143,9 @@ For \machine, we consider the same three types of decision makers as for \humans
 Their definitions are adapted in the obvious way -- i.e., for \independent and \batch, risk score involves only on the values of feature \obsFeatures.
 % -- but we still refer to them with the same names.
 %WHY ADVERTISE THIS 
-
-%RL: Risk scores are computed with a logistic regression model. For the \independent decision maker \machine, the cumulative distribution function of the risk scores is constructed using the risk scores of all the observations in the test data.
+Risk scores are computed with a logistic regression model which is trained in the training data set.
+%
+For the \independent decision maker \machine, the cumulative distribution function of the risk scores is constructed using the empirical distribution of risk scores of all the observations in the test data.

 \begin{figure}
 \includegraphics[width=1.1\linewidth,trim={0 0 0 1.8cm},clip]{./img/with_epsilon_deciderH_independent_deciderM_batch_maxR_0_9coefZ1_0_all}