\todo{Michael}{Create and use macros for all main terms and mathematical quantities, so that they stay consistent throughout the paper. Already done for previous sections}
We thoroughly tested our proposed method for evaluating decision maker performance in terms of accuracy, variability and robustness. We employed both synthetic and real data, including decision of several different kinds of decision makers. We compare performance especially to the state-of-the-art contraction technique of \citet{lakkaraju2017selective}. We used Python 3.6.9 and PyStan v.2.19.0.0 with cmdstanpy 0.4.3 for all experiments.
We thoroughly tested our proposed method, CFBI (countefactual-based imputation) for evaluating decision maker performance in terms of accuracy, variability and robustness. We employed both synthetic and real data, including decision of several different kinds of decision makers. We compare performance especially to the state-of-the-art contraction technique of \citet{lakkaraju2017selective}. We used Python 3.6.9 and PyStan v.2.19.0.0 with cmdstanpy 0.4.3 for all experiments.
@@ -78,9 +78,9 @@ In this paper, we build upon the problem setting used in~\citet{lakkaraju2017sel
Our approach makes use of causal modeling to represent our assumptions about the process that generated the data and uses counterfactual reasoning to impute unobserved outcomes in the data.
We experiment with synthetic data to highlight various properties of our approach.
We also perform an empirical evaluation in realistic settings, using real data from COMPAS.
Our results indicate that {... \bf TODO}.
The results indicate that our method achieves more accurate results with considerably less variation than the state-of-the-art, allowing for evaluation in new settings where evaluation was not possible previously.