\antti{Here one can drop do even at the first line according to do-calculus rule 2, i.e. $P(Y=0|do(R=r))=P(Y=0|R=r)$. However, do-calculus formulas should be computed by first learning a graphical model and then computing the marginals using the graphical model. This gives more accurate result. Michael's complicated formula essentially does this, including forcing $P(Y=0|T=0,X)=0$ (the model supports context-specific independence $Y \perp X | T=0$.)}
Expanding the above derivation for model \score{\featuresValue} learned from the data
The decision variable $\decision$ was set to 0 if the probability $\prob{\outcome=0| \features=\featuresValue}$ resided in the top $(1-\leniencyValue)\cdot100\%$ of the subjects appointed for that judge.
The decision variable $\decision$ was set to 0 if the probability $\prob{\outcome=0| \features=\featuresValue}$ resided in the top $(1-\leniencyValue)\cdot100\%$ of the subjects appointed for that judge.\antti{How was the final Y determined? I assume $Y=1$ if $T=0$, if $T=1$$Y$ was randomly sampled from $\prob{\outcome| \features=\featuresValue}$ above? Delete this comment when handled.}
Results for estimating the causal quantity $\prob{\outcome=0 | \doop{\leniency=\leniencyValue}}$ with various levels of leniency $\leniencyValue$ under this model are presented in Figure \ref{fig:without_unobservables}.