@@ -96,9 +96,10 @@ In the formalism of causal inference~\cite{pearl2010introduction}, we wish to ev
\end{align}
The expression above concerns a specific entry in the dataset with features $\obsFeatures=x$, for which $\human_j$ made a decision $\decision_{\human_j}=0$.
%
It is read as follows: conditional on what we know from the data entry ($\obsFeatures=\obsFeaturesValue$, $\decision_{\human_j}=0$) as well as from the entire dataset \dataset
, the probability that the outcome would have been positive ($\outcome=1$) %in the hypothetical case %we had intervened to make
had the decision been positive ($\decision_{H_j}=1$).
%It is read as follows: conditional on what we know from the data entry ($\obsFeatures = \obsFeaturesValue$, $\decision_{\human_j} = 0$) as well as from the entire dataset \dataset
%, the probability that the outcome would have been positive ($\outcome = 1$) %in the hypothetical case %we had intervened to make
%had the decision been positive ($\decision_{H_j} = 1$).
It expresses the probability that the outcome would have been positive ($\outcome=1$) had the decision been positive ($\decision_{H_j}=1$), conditional on what we know from the data entry ($\obsFeatures=\obsFeaturesValue$, $\decision_{\human_j}=0$) as well as from the entire dataset \dataset.
%
Notice that the presence of \dataset in the conditional part of~\ref{eq:counterfactual} gives us more information about the data entry compared to the entry-specific quantities ($\obsFeatures=\obsFeaturesValue$, $\decision_{\human_j}=0$) and is thus not redundant.
%
...
...
@@ -204,7 +205,7 @@ In particular, we consider the full probabilistic model defined in Equations \re
%
%Notice that by ``parameters'' here we refer to all quantities that are not considered as known with certainty from the input, and so parameters include unobserved features \unobservable.
%
We use prior distributions given in Appendix~X to ensure the identifiability of the parameters.
We use prior distributions given in Appendix~X to ensure the identifiability of the parameters.
as derived in detail in Appendix~X. In essence, we determine the distribution of the unobserved features $Z$ using the decision, observed features, and the leniency of the employed decision maker, and then determine the distribution of $Y$ conditional on all features.
Having obtained a posterior probability distribution for parameters \parameters: % in parameter space \parameterSpace, we can now expand expression~(\ref{eq:counterfactual}) as follows.
%\begin{align}
...
...
@@ -228,22 +229,18 @@ Having obtained a posterior probability distribution for parameters \parameters:
%%Antti dont want to put specific parameters since P(z|...) depends on so many?
%The value of the first factor in the integrand of the expression above is provided by the model in Equation~\ref{eq:defendantmodel}, while the second is sampled by MCMC, as explained above.
%
Note that, for all data entries other than the ones with $\decision_\human=0$ and $\decision_\machine=1$, we trivially have
%The value of the first factor in the integrand of the expression above is provided by the model in Equation~\ref{eq:defendantmodel}, while the second is sampled by MCMC, as explained above.
%
Note that, for all data entries other than the ones with $\decision_{\human_j}=0$ and $\decision_\machine=1$, we trivially have \cfoutcome = \outcome
where \outcome is the outcome recorded in the dataset \dataset.
\spara{Implementation}
The result is computed numerically over the sample.
The result of Equation~\ref{eq:theposterior}is computed numerically:
@@ -255,9 +252,10 @@ The result is computed numerically over the sample.
\end{equation}
where the sums are taken over samples of $\parameters$ and $z$ obtained from their respective posteriors.
%
In practice, we use the MCMC functionality of Stan\footnote{\url{https://mc-stan.org/}} to obtain a sample \sample of this posterior distribution, where each element of \sample contains one instance of parameters \parameters.
In practice, we use the MCMC functionality of Stan\footnote{\url{https://mc-stan.org/}}.
% to obtain a sample \sample of this posterior distribution, where each element of \sample contains one instance of parameters \parameters.
%
Sample \sample can now be used to compute various probabilistic quantities of interest, including a (posterior) distribution of \unobservable for each entry in dataset \dataset.
%Sample \sample can now be used to compute various probabilistic quantities of interest, including a (posterior) distribution of \unobservable for each entry in dataset \dataset.