diff --git a/paper/sl.tex b/paper/sl.tex index 6996511dd5a55f2831c32dbe3f3a1a8a237f0b54..6e1853402f934dc16694b268fbe8e62a41cbf008 100755 --- a/paper/sl.tex +++ b/paper/sl.tex @@ -62,11 +62,25 @@ \begin{abstract} -%We show how a causality-based approach can be used to estimate the performance of prediction algorithms in `selective labels' settings -- with particular application to `bail-or-jail' judicial decisions. -Increasing number of important decision affecting people's lives are being made by machine learning and AI systems. -We study evaluating the quality of such decision makers. -The major difficulty in such evaluation is that existing decision makers in use, whether AI or human, influence the data the evaluation is based on. For example, when -deciding whether of defendant should be given bail or kept in jail, we are not able to directly observe the possible offences by defendants that the decision making system in use decides to keep in jail. To evaluate decision makers in these difficult settings, we derive a flexible Bayesian approach, that utilizes counterfactual-based imputation. Compared to previous state-of-the-art, the approach gives more accurate predictions on the decision quality with lower variance. The approach is also shown to be robust to different variations in the decision mechanisms in the data. +As an increasing number of decisions affecting people's lives are made by AI systems, automating the evaluation of such systems becomes increasingly important. +% +One major challenge in such evaluation is that the inner workings of the system are not known, with the system used as a `black box'. +% +Another challenge is that often decisions skew the data on which the evaluation is performed. +% +% For example, when deciding whether a defendant should be granted bail or rather be led to jail, a decision is deemed successful if it grants bail to defendants who would honor the conditions of the bail and leads to jail ones who would violate them. +% +% However, in such cases, we are only able to directly evaluate the mechanism when it grants bail, while we cannot observe the potential bail violations by defendants who were led to jail. +% +For example, when a bank uses a credit rating system to decide whether a customer should be granted a loan or not, a decision is deemed successful if it grants a loan to customers who would honor the its conditions, but not to ones who would violate them. +% +However, in such cases, we are only able to directly evaluate the decision when it grants the loan, while we cannot observe whether customers who were not granted the loan would indeed violate its conditions. +% +To evaluate decision systems in such settings, we derive a Bayesian approach that (i) learns a probabilistic model for the system decisions, and (ii) uses counterfactual-based imputation to evaluate its performance in presence of unobserved quantities of interest. +% +Compared to previous state-of-the-art, the quality of decisions is estimated more accurately and with lower variance. +% +The approach is also shown to be robust to different variations in the decision mechanisms in the data. \end{abstract}