@@ -960,7 +960,7 @@ With this knowledge, it can be stated that if we observed $T=0$ with some $x$ an
\begin{equation}\label{eq:bounds}
\invlogit(x + z) \geq F^{-1}(r) \Leftrightarrow x+z \geq logit(F^{-1}(r)) \Leftrightarrow z \geq logit(F^{-1}(r)) - x
\end{equation}
as the logit and its inverse are strictly increasing functions and hence preserve the order of magnitude for all pairs of values in their domains. From equations \ref{eq:posterior_Z}, \ref{eq:Tprob} and \ref{eq:bounds} we can conclude that $\pr(Z < \invlogit(F^{-1}(r))- x | T=0, X=x, R=r)=0$ and that elsewhere the distribution of Z follows a truncated Gaussian with a lower bound of $logit(F^{-1}(r))- x$. The expectation of Z can be computed analytically. All this follows analogically for cases with $T=1$ with the changes of some inequalities.
as the logit and its inverse are strictly increasing functions and hence preserve the order of magnitude for all pairs of values in their domains. From equations \ref{eq:posterior_Z}, \ref{eq:Tprob} and \ref{eq:bounds} we can conclude that $\pr(Z < \invlogit(F^{-1}(r))- x | T=0, X=x, R=r)=0$ and that elsewhere the distribution of Z follows a standard Gaussian. The expectation of Z can be computed analytically from the formulas for truncated Gaussians. All this follows analogically for cases with $T=1$ with the changes of some inequalities.
In practise, in lines 1--3 and 10--13 of algorithm \ref{alg:eval:mc} we do as in the True evaluation evaluator algorithm with the distinction that some of the values of Y are imputed with the corresponding counterfactual probabilities. In line 4 we compute the bounds as motivated above. In the for-loop (lines 5--8) we merely compute the expectation of Z given the knowledge of the decision and that the distribution of Z follows a truncated Gaussian. The equation
\begin{equation}
...
...
@@ -997,7 +997,7 @@ computes the correct expectation automatically. Using the expectation, we then c
%\item
%\end{itemize}
In the future, we should utilize a fully Bayesian approach to be able to include priors for the different $\beta$ coefficients into the model.
In the future, we should utilize an even more Bayesian approach to be able to include priors for the different $\beta$ coefficients. Priors are needed for learning the values for the coefficients.
The following hierarchical model was used as an initial approach to the problem. Data was generated with unobservables and both outcome Y and decision T were drawn from Bernoulli distributions. The $\beta$ coefficients were systematically overestimated as shown in figure \ref{fig:posteriors}.
...
...
@@ -1012,20 +1012,20 @@ The following hierarchical model was used as an initial approach to the problem.