%We show how a causality-based approach can be used to estimate the performance of prediction algorithms in `selective labels' settings -- with particular application to `bail-or-jail' judicial decisions.
.
Increasing number of important decision affecting people's lives are being made by machine learning and AI systems.
We study evaluating the quality of such decision makers.
The major difficulty in such evaluation is that existing decision makers in use, whether AI or human, influence the data the evaluation is based on. For example, when
deciding whether of defendant should be given bail or kept in jail, we are not able to directly observe the possible offences by defendants that the decision making system in use decides to keep in jail. To evaluate decision makers in these difficult settings, we derive a flexible Bayesian approach, that utilizes counterfactual-based imputation. Compared to previous state-of-the-art, the approach gives more accurate predictions on the decision quality with lower variance. The approach is also robust to different variations in the mechanisms producing the data.