...

32139568 · Antti Hyttinen · 3c15de65 · 32139568
Commit 32139568 authored 5 years ago by Antti Hyttinen
--- a/paper/imputation.tex
+++ b/paper/imputation.tex
@@ -110,37 +110,49 @@ Taking into account that we need to learn parameters from the data we integrate

 To make inference we obviously have to learn the parametric model from the data instead of fixed functions of the previous section. We can define the model as a probabilistic due to the simplification of the counterfactual expression in the previous section.

-We assume feature vectors $\obsFeaturesValue$ and $\unobservableValue$ representing risk can be consensed to unidimension risk values, for example by ... . Furthermore we assume their distribution as Gaussian. Since $Z$ is unobserved we can assume its variance to be 1.
+We assume feature vectors $\obsFeaturesValue$ and $\unobservableValue$ representing risk can be consensed to unidimension risk factors, for example by propensity scores. Furthermore we assume their distribution as Gaussian. Since $Z$ is unobserved we can assume its variance to be 1 without loss of generalization.
 \begin{eqnarray*}
 \unobservable &\sim& N(0,1), \quad \obsFeatures \sim N(0,\sigma_\obsFeatures^2)  
 \end{eqnarray*}
 \acomment{Where is the variance of X???}

-%
-Note that index $j$ refers to decision maker $\human_j$ and \invlogit is the standard logistic function.
-
-\noindent
-\hrulefill
-\begin{align}
-\decision  \sim  \nonumber \\
-\prob{\decision = 0~|~\leniency_j = \leniencyValue, \obsFeatures = \obsFeaturesValue, \unobservable = \unobservableValue} & = \invlogit(\alpha_j + \gamma_\obsFeaturesValue\obsFeaturesValue + \gamma_\unobservableValue \unobservableValue + \epsilon_\decisionValue),  \label{eq:judgemodel} \\
-	\text{where}~ \alpha_{j} & \approx \logit(\leniencyValue_j) \label{eq:leniencymodel}\\
-\prob{\outcome=0~|~\decision, \obsFeatures=\obsFeaturesValue, \unobservable=\unobservableValue} & =
+According to the selective labels setting we have $Y=0$ whenever $T=0$. When $T=1$, subject behaviour is modeled as logistic regression over the risk factors and a noise terms.
+\begin{eqnarray*}
+\prob{\outcome=0~|~\decision, \obsFeatures=\obsFeaturesValue, \unobservable=\unobservableValue} & =&
 	\begin{cases}
-		0,~\text{if}~\decision = 0\\
-		\invlogit(\alpha_\outcomeValue + \beta_\obsFeaturesValue \obsFeaturesValue + \beta_\unobservableValue \unobservableValue + \epsilon_\outcomeValue),~\text{o/w} \label{eq:defendantmodel}
+		0,~\text{if}~\decision = 0\\		\invlogit(\alpha_\outcomeValue + \beta_\obsFeaturesValue \obsFeaturesValue + \beta_\unobservableValue \unobservableValue + \epsilon_\outcomeValue),~\text{o/w} \label{eq:defendantmodel}
 	\end{cases}
-\end{align}
-\hrulefill
-
-\acomment{Unable to complete this!}
-This gives us parameters:
-$\parameters = \{ \alpha_\outcomeValue, \alpha_j, \beta_\obsFeaturesValue, \beta_\unobservableValue, \gamma_\obsFeaturesValue, \gamma_\unobservableValue\}$. \acomment{Where are the variance parameters?} Our estimate is simply integrating over the posterior of these variables.
-
-\begin{eqnarray*}
-E(Y)
- &=&    \int   P(Y=1|T=1,x,z)   P(z|R=r, T=0, x) P(\theta|D) dz d\theta \\
 \end{eqnarray*}
+Here \invlogit is the standard \acomment{inverse?} logistic function.
+
+Since the decision are ultimately based on the risk factors for behaviour, we model the decisions similarly according to a logistic regression over the risk factors, leniency for the decision maker and a noise term:
+\begin{eqnarray}
+\prob{\decision = 0~|~\leniency_j = \leniencyValue, \obsFeatures = \obsFeaturesValue, \unobservable = \unobservableValue} & = & \invlogit(\alpha_j + \gamma_\obsFeaturesValue\obsFeaturesValue + \gamma_\unobservableValue \unobservableValue + \epsilon_\decisionValue)  \label{eq:judgemodel} 
+\end{eqnarray}%
+Note that index $j$ refers to decision maker $\human_j$. Parameter $\alpha_{j}$ provides for the leniency of a decision maker by $\logit(\leniencyValue_j)$. The decision makers in the data differ from each other only by leniency.
+
+%\noindent
+%\hrulefill
+%\begin{align}
+%\decision  \sim  \nonumber \\
+%\prob{\decision = 0~|~\leniency_j = \leniencyValue, \obsFeatures = \obsFeaturesValue, \unobservable = \unobservableValue} & = \invlogit(\alpha_j + \gamma_\obsFeaturesValue\obsFeaturesValue + \gamma_\unobservableValue \unobservableValue + \epsilon_\decisionValue),  \label{eq:judgemodel} \\
+%	\text{where}~ \alpha_{j} & \approx \logit(\leniencyValue_j) \label{eq:leniencymodel}\\
+%
+%\end{align}
+%\hrulefill
+
+%\acomment{Unable to complete this!}
+%This gives us parameters:
+% \acomment{Where are the variance parameters?} Our estimate is simply integrating over the posterior of these variables.
+%
+%\begin{eqnarray*}
+%E(Y)
+% &=&    \int   P(Y=1|T=1,x,z)   P(z|R=r, T=0, x) P(\theta|D) dz d\theta \\
+%\end{eqnarray*}
+%where $\parameters = \{ \sigma_X^2, \alpha_\outcomeValue, \alpha_j, \beta_\obsFeaturesValue, \beta_\unobservableValue, \gamma_\obsFeaturesValue, \gamma_\unobservableValue\}$.
+%where $\theta=\{\sigma_X^2,\alpha_j\}$
+
+\acomment{I wonder if it makes sense to present as an integral. It is seems we integrate (sample) a whole lot of things.}

 We use prior distributions given in Appendix for all parameters to ensure their identifiability.