@@ -82,9 +82,7 @@ In the bail-or-jail scenario, for example, we could investigate whether certain
However, not all features that are available to $\human_\judgeValue$ are available to \machine in the setting we consider, which forms our second major challenge.
%
These complications mean that making direct predictions based on the available features can be suboptimal and even biased.
However, important information regarding the unobserved features \unobservable can often be recovered via careful consideration of the decisions in the data~\cite{mccandless2007bayesian,Jung2}.
%
This is exactly what our counterfactual approach achieves.
However, important information regarding the unobserved features \unobservable can often be recovered via careful consideration of the decisions in the data which our counterfactual approach achieves~\cite{mccandless2007bayesian,Jung2}.
For illustration, let us consider a defendant who received a negative decision by a human judge $\human_\judgeValue$.
...
...
@@ -121,16 +119,16 @@ Moreover, we are also going to present our modeling approach for the case of a s
% we do this only in the compas section and it is not advertised there
%, as we do in the experiments (Section~\ref{sec:experiments}).
%
Motivated by the central limit theorem we model both \obsFeatures and \unobservable with Gaussian distributions.
%
Furthermore, since $\unobservable$ is unobserved we can assume its variance to be 1 without loss of generality, thus $\unobservable\sim N(0,1)$.
%Motivated by the central limit theorem we model both \obsFeatures and \unobservable with Gaussian distributions.
%%
%Furthermore, since $\unobservable$ is unobserved we can assume its variance to be 1 without loss of generality, thus $\unobservable \sim N(0,1)$.
% RL: LAST TWO SENTENCES ARE AND SHOULD BE IN EXPERIMENTS? SHOULD NOT MATTER WHILE MODELLING.
% (Any deviation from this can be achieved by adjusting intercepts and coefficients in the following).
In the setting we consider (Section~\ref{sec:setting}), a negative decision $T=0$ leads to successful outcome $Y=1$.
In our setting, a negative decision $\decision=0$ leads to successful outcome $\outcome=1$.
%
When $T=1$, the probability of success is given by a logistic regression model over the features $\obsFeatures$ and $\unobservable$:
When $\decision=1$, the outcome is modeled with a logistic regression model over the features $\obsFeatures$ and $\unobservable$:
@@ -142,13 +140,13 @@ Here \invlogit is the standard logistic function.
%
% Since the decisions are ultimately based on expected behaviour,
We model the decisions in the data similarly, since the decisions are ultimately based on expected behaviour, according to a logistic regression over the features:
We model the decisions in the data similarly with a logistic regression:%, since the decisions are ultimately based on expected behaviour, according to a logistic regression over the features:
Although we model the decision makers here probabilistically, we do not imply that their decision are necessarily probabilistic (or include a random component). The probabilistic model arises from the unknown specific details of reasoning employed by each decision maker $\human_\judgeValue$.
Although we model the decision makers here probabilistically, we do not imply that their decisions are necessarily probabilistic. The probabilistic model arises from the unknown specific details of reasoning employed by each decision maker $\human_\judgeValue$.
Note also that we are making the simplifying assumption that coefficients $\gamma_\obsFeatures,\gamma_\unobservable$ are the same for all $\human_\judgeValue$, but decision makers are allowed to differ in intercept $\alpha_\judgeValue$.
%
Parameter $\alpha_{\judgeValue}$ controls the leniency of a decision maker $\human_\judgeValue\in\humanset$.