\section{Setting and problem statement}
The setting we consider is described in terms of two decision processes.
In the first one, a decision maker $H$ considers a case described by a set of features $F$ and makes a binary decision $T\in\{0, 1\}$, nominally referred to as {\it positive} ($T = 1$) or not ($T = 0$).
Intuitively, in our bail-or-jail example of Section~\ref{sec:introduction}, $H$ corresponds to the human judge deciding whether to grant bail ($T = 1$) or not ($T = 0$).
The decision is followed with a binary outcome $Y$, which is nominally referred to as {\it successful} ($Y = 1$) or not ($Y = 0$).
An outcome can be {\it unsuccessful} ($Y = 0$) only if the decision that preceded it was positive ($T = 0$).
If the decision was not positive ($T = 0$), then the outcome is considered by default successful ($Y = 1$).
Back in our example, the decision of the judge is unsuccessful only if the judge grants bail ($T = 1$) but the defendant violates its terms ($Y = 0$).
Otherwise, if the decision of the judge was to keep the defendant in jail ($T = 0$), the outcome is by default successful ($Y = 1$) since there can be no violation.
Moreover, we assume that decision maker $H$ is associated with a leniency level $R$, which determines the fraction of cases for which they produce a positive decision, in expectation. 
Formally, for leniency level $R = r\in [0, 1]$, we have
	P(T = 1 | R = r) = \sum_{X, Z} P(T = 1, X, Z | R = r) = r .
The outcome of this process is a record $(X, T, Y)$ that contains only a subset $X\subseteq $ of the features of the case, the decision of the judge and the outcome -- but leaves no trace for a subset $Z = F - X$ of the features.
Intuitively, in our example, $X$ corresponds to publicly recorded information about the bail-or-jail case decided by the judge (e.g., the gender and age of the defendant) and $Z$ corresponds to features that are observed by the judge but do not appear on record (e.g., whether the defendant appeared stressed).
The set of records $\{(X, T, Y)\}$ priduced by decision maker $H$ constitute what we refer to as the {\bf dataset}.
Figure~\ref{fig:model} shows the causal diagram that describes the operation of decision-maker $H$.
In the second decision process, a decision maker $M$ considers a case described by a set of features $X$ and makes a binary decision $T$ based on those features, followed by a binary outcome $Y$.
In our example, $M$ corresponds to an automated-decision system that is considered for replacing the human judge in bail-or-jail decisions.
Notice that $M$ has access only to some of the features that were available to $H$, to model cases where the system would use only the recorded features and not other ones that would be available to a human judge.
The definitions and semantics of decision $T$ and outcome $Y$ follow those of the first process and are not repeated.
Moreover, decision maker $M$ is also associated with a leniency level $R$, defined as before for $H$.
\todo{Michael}{Create and use macros for all main terms and mathematical quantities, so that they stay consistent throughout the paper.}
