...

59251fec · Antti Hyttinen · c6f4cf5c · 59251fec
Commit 59251fec authored 5 years ago by Antti Hyttinen
--- a/paper/imputation.tex
+++ b/paper/imputation.tex
@@ -88,24 +88,29 @@ We use the following causal model over the observed data, building on what is us
 %\end{eqnarray*}
 %\acomment{Where is the variance of X???}

-According to the selective labels setting we have $Y=1$ whenever $T=0$. When $T=1$, subject behaviour is modeled as logistic regression over the risk factors and a noise term.
+According to the selective labels setting we have $Y=1$ whenever $T=0$. When $T=1$, subject behaviour is modeled as logistic regression over the features and a noise term.
 \begin{eqnarray}
-\prob{\outcome=0~|~\decision, \obsFeatures, \unobservable} & =&
+\prob{\outcome=0~|~\decision, \obsFeaturesValue, \unobservableValue} & =&
 	\begin{cases}
-		0,~\text{if}~\decision = 0\\		\invlogit(\alpha_\outcomeValue + \beta_\obsFeaturesValue \obsFeaturesValue + \beta_\unobservableValue \unobservableValue + \epsilon_\outcomeValue),~\text{o/w} \label{eq:defendantmodel}
+		0,~\text{if}~\decision = 0\\		\invlogit(\alpha_\outcome + \beta_\obsFeatures^T \obsFeaturesValue + \beta_\unobservable \unobservableValue %+ \epsilon_\outcome STAN DOESNT HAVE THESE???
+		),~\text{o/w} \label{eq:defendantmodel}
 	\end{cases}
 \end{eqnarray}
-Here \invlogit is the standard logistic function.
+Here \invlogit is the standard logistic function. The observed features $\obsFeatures$ is a vector of different feature values.
 % \rcomment{$\sigma$ is standard logistic function, which is the inverse of \emph{logit} function.}


 %\acomment{What??? We have many decision makers used???}
 Since the decision are ultimately based on expected behaviour, we model the decisions in the data similarly according to a logistic regression over the features and a noise term:
 \begin{equation}
-\prob{\decision = 0~|~\humanValue,\obsFeaturesValue, \unobservableValue}  =  \invlogit(\alpha_\humanValue + \gamma_\obsFeaturesValue\obsFeaturesValue + \gamma_\unobservableValue \unobservableValue + \epsilon_\decisionValue)  \label{eq:judgemodel} 
+\prob{\decision = 0~|~\humanValue,\obsFeaturesValue, \unobservableValue}  =  \invlogit(\alpha_\humanValue + \gamma_\obsFeatures^T\obsFeaturesValue + \gamma_\unobservable \unobservableValue 
+%+ \epsilon_\decision STAN DOES NOT HAVE THESE
+)  \label{eq:judgemodel} 
 \end{equation}%
-Intercept $\alpha_\human$ provides for the leniency of a decision maker $\human$ by $\logit(\leniencyValue_\human)$. 
-Note that we are making the simplifying assumption that coefficients $\gamma$ are the same for all defendants, but decision makers are allowed to differ in intercept $\alpha_\human \approx \logit(\leniencyValue_\human)$ so as to model varying leniency levels among them. \rcomment{As discussed with Antti, the relation $\alpha_\human$ and $\invlogit(r_\human)$ is merely illustrative. It can be said that the leniency is adjusted via $\alpha_\human$. Also as a note, the leniency of any \human is bound to its identity. So different deciders \human may or may not have different leniencies.}% (Eq. \ref{eq:leniencymodel}).
+Note that we are here making the simplifying assumption that coefficients $\gamma_\obsFeatures,\gamma_\unobservable$ are the same for all defendants, but decision makers are allowed to differ in intercept $\alpha_\humanValue$.
+Parameter $\alpha_{\humanValue}$ controls the leniency of a decision maker $\humanValue$.  \acomment{now the epsilons used here and the appendix dont match!}
+%$ \approx \logit(\leniencyValue_\human)$ so as to model varying leniency levels among them. 
+%\rcomment{As discussed with Antti, the relation $\alpha_\human$ and $\invlogit(r_\human)$ is merely illustrative. It can be said that the leniency is adjusted via $\alpha_\human$. Also as a note, the leniency of any \human is bound to its identity. So different deciders \human may or may not have different leniencies.}% (Eq. \ref{eq:leniencymodel}).

 %The decision makers in the data differ from each other only by leniency.
 %\noindent
@@ -172,7 +177,7 @@ Note that we are making the simplifying assumption that coefficients $\gamma$ ar
 %\spara{Parameter estimation} 
 We take a Bayesian approach to learn the model over the dataset \dataset.
 %
-In particular, we consider the full probabilistic model defined in Equations \ref{eq:judgemodel} -- \ref{eq:defendantmodel} and obtain the posterior distribution of its parameters $\parameters = \{ \alpha_\outcomeValue, \beta_\obsFeaturesValue, \beta_\unobservableValue,  \gamma_\obsFeaturesValue, \gamma_\unobservableValue\} \cup\{\alpha_\human\}_\human$, which includes intercepts $\alpha_\human$ for all $\human$ employed in the data. % and $\alpha$ for all $\human$. %, where $i = 1, \ldots, \datasize$, conditional on the dataset.
+In particular, we consider the full probabilistic model defined in Equations \ref{eq:judgemodel} -- \ref{eq:defendantmodel} and obtain the posterior distribution of its parameters $\parameters = \{ \alpha_\outcome, \beta_\obsFeatures, \beta_\unobservable,  \gamma_\obsFeatures, \gamma_\unobservable\} \cup\{\alpha_\humanValue\}_\humanValue$, which includes intercepts $\alpha_\humanValue$ for all $\humanValue$ employed in the data. % and $\alpha$ for all $\human$. %, where $i = 1, \ldots, \datasize$, conditional on the dataset.
 % 
 %Notice that by ``parameters'' here we refer to all quantities that are not considered as known with certainty from the input, and so parameters include unobserved features \unobservable.
 %