@@ -88,24 +88,29 @@ We use the following causal model over the observed data, building on what is us
%\end{eqnarray*}
%\acomment{Where is the variance of X???}
According to the selective labels setting we have $Y=1$ whenever $T=0$. When $T=1$, subject behaviour is modeled as logistic regression over the risk factors and a noise term.
According to the selective labels setting we have $Y=1$ whenever $T=0$. When $T=1$, subject behaviour is modeled as logistic regression over the features and a noise term.
0,~\text{if}~\decision = 0\\\invlogit(\alpha_\outcome + \beta_\obsFeatures^T \obsFeaturesValue + \beta_\unobservable\unobservableValue%+ \epsilon_\outcome STAN DOESNT HAVE THESE???
),~\text{o/w}\label{eq:defendantmodel}
\end{cases}
\end{eqnarray}
Here \invlogit is the standard logistic function.
Here \invlogit is the standard logistic function. The observed features $\obsFeatures$ is a vector of different feature values.
% \rcomment{$\sigma$ is standard logistic function, which is the inverse of \emph{logit} function.}
%\acomment{What??? We have many decision makers used???}
Since the decision are ultimately based on expected behaviour, we model the decisions in the data similarly according to a logistic regression over the features and a noise term:
Intercept $\alpha_\human$ provides for the leniency of a decision maker $\human$ by $\logit(\leniencyValue_\human)$.
Note that we are making the simplifying assumption that coefficients $\gamma$ are the same for all defendants, but decision makers are allowed to differ in intercept $\alpha_\human\approx\logit(\leniencyValue_\human)$ so as to model varying leniency levels among them. \rcomment{As discussed with Antti, the relation $\alpha_\human$ and $\invlogit(r_\human)$ is merely illustrative. It can be said that the leniency is adjusted via $\alpha_\human$. Also as a note, the leniency of any \human is bound to its identity. So different deciders \human may or may not have different leniencies.}% (Eq. \ref{eq:leniencymodel}).
Note that we are here making the simplifying assumption that coefficients $\gamma_\obsFeatures,\gamma_\unobservable$ are the same for all defendants, but decision makers are allowed to differ in intercept $\alpha_\humanValue$.
Parameter $\alpha_{\humanValue}$ controls the leniency of a decision maker $\humanValue$. \acomment{now the epsilons used here and the appendix dont match!}
%$ \approx \logit(\leniencyValue_\human)$ so as to model varying leniency levels among them.
%\rcomment{As discussed with Antti, the relation $\alpha_\human$ and $\invlogit(r_\human)$ is merely illustrative. It can be said that the leniency is adjusted via $\alpha_\human$. Also as a note, the leniency of any \human is bound to its identity. So different deciders \human may or may not have different leniencies.}% (Eq. \ref{eq:leniencymodel}).
%The decision makers in the data differ from each other only by leniency.
%\noindent
...
...
@@ -172,7 +177,7 @@ Note that we are making the simplifying assumption that coefficients $\gamma$ ar
%\spara{Parameter estimation}
We take a Bayesian approach to learn the model over the dataset \dataset.
%
In particular, we consider the full probabilistic model defined in Equations \ref{eq:judgemodel} -- \ref{eq:defendantmodel} and obtain the posterior distribution of its parameters $\parameters=\{\alpha_\outcomeValue, \beta_\obsFeaturesValue, \beta_\unobservableValue, \gamma_\obsFeaturesValue, \gamma_\unobservableValue\}\cup\{\alpha_\human\}_\human$, which includes intercepts $\alpha_\human$ for all $\human$ employed in the data. % and $\alpha$ for all $\human$. %, where $i = 1, \ldots, \datasize$, conditional on the dataset.
In particular, we consider the full probabilistic model defined in Equations \ref{eq:judgemodel} -- \ref{eq:defendantmodel} and obtain the posterior distribution of its parameters $\parameters=\{\alpha_\outcome, \beta_\obsFeatures, \beta_\unobservable, \gamma_\obsFeatures, \gamma_\unobservable\}\cup\{\alpha_\humanValue\}_\humanValue$, which includes intercepts $\alpha_\humanValue$ for all $\humanValue$ employed in the data. % and $\alpha$ for all $\human$. %, where $i = 1, \ldots, \datasize$, conditional on the dataset.
%
%Notice that by ``parameters'' here we refer to all quantities that are not considered as known with certainty from the input, and so parameters include unobserved features \unobservable.