Analysis_07MAY2019_new.ipynb

{
 "cells": [
  {
   "cell_type": "markdown",
   "metadata": {
    "toc": true
   },
   "source": [
    "<h1>Table of Contents<span class=\"tocSkip\"></span></h1>\n",
    "<div class=\"toc\"><ul class=\"toc-item\"><li><span><a href=\"#Data-sets\" data-toc-modified-id=\"Data-sets-1\"><span class=\"toc-item-num\">1&nbsp;&nbsp;</span>Data sets</a></span><ul class=\"toc-item\"><li><span><a href=\"#Synthetic-data-with-unobservables\" data-toc-modified-id=\"Synthetic-data-with-unobservables-1.1\"><span class=\"toc-item-num\">1.1&nbsp;&nbsp;</span>Synthetic data with unobservables</a></span></li><li><span><a href=\"#Data-without-unobservables\" data-toc-modified-id=\"Data-without-unobservables-1.2\"><span class=\"toc-item-num\">1.2&nbsp;&nbsp;</span>Data without unobservables</a></span></li></ul></li><li><span><a href=\"#Algorithms\" data-toc-modified-id=\"Algorithms-2\"><span class=\"toc-item-num\">2&nbsp;&nbsp;</span>Algorithms</a></span><ul class=\"toc-item\"><li><span><a href=\"#Contraction-algorithm\" data-toc-modified-id=\"Contraction-algorithm-2.1\"><span class=\"toc-item-num\">2.1&nbsp;&nbsp;</span>Contraction algorithm</a></span></li><li><span><a href=\"#Causal-approach---metrics\" data-toc-modified-id=\"Causal-approach---metrics-2.2\"><span class=\"toc-item-num\">2.2&nbsp;&nbsp;</span>Causal approach - metrics</a></span></li></ul></li><li><span><a href=\"#Performance-comparison\" data-toc-modified-id=\"Performance-comparison-3\"><span class=\"toc-item-num\">3&nbsp;&nbsp;</span>Performance comparison</a></span><ul class=\"toc-item\"><li><span><a href=\"#With-unobservables-in-the-data\" data-toc-modified-id=\"With-unobservables-in-the-data-3.1\"><span class=\"toc-item-num\">3.1&nbsp;&nbsp;</span>With unobservables in the data</a></span></li><li><span><a href=\"#Without-unobservables\" data-toc-modified-id=\"Without-unobservables-3.2\"><span class=\"toc-item-num\">3.2&nbsp;&nbsp;</span>Without unobservables</a></span></li></ul></li></ul></div>"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "<!-- ##  Causal model\n",
    "\n",
    "Our model is defined by the probabilistic expression \n",
    "\n",
    "\\begin{equation} \\label{model_disc}\n",
    "P(Y=0 | \\text{do}(R=r)) = \\sum_x \\underbrace{P(Y=0|X=x, T=1)}_\\text{1} \n",
    "\\overbrace{P(T=1|R=r, X=x)}^\\text{2} \n",
    "\\underbrace{P(X=x)}_\\text{3}\n",
    "\\end{equation}\n",
    "\n",
    "which is equal to \n",
    "\n",
    "\\begin{equation}\\label{model_cont}\n",
    "P(Y=0 | \\text{do}(R=r)) = \\int_x P(Y=0|X=x, T=1)P(T=1|R=r, X=x)P(X=x)\n",
    "\\end{equation}\n",
    "\n",
    "for continuous $x$. In the model Z is a latent, unobserved variable, and can be excluded from the expression with do-calculus by showing that $X$ is admissible for adjustment. Model as a graph:\n",
    "\n",
    "![Model as picture](../figures/intervention_model.png \"Intervention model\")\n",
    "\n",
    "For predicting the probability of negative outcome the following should hold because by Pearl $P(Y=0 | \\text{do}(R=r), X=x) = P(Y=0 | R=r, X=x)$ when $X$ is an admissible set:\n",
    "\n",
    "\\begin{equation} \\label{model_pred}\n",
    "P(Y=0 | \\text{do}(R=r), X=x) = P(Y=0|X=x, T=1)P(T=1|R=r, X=x).\n",
    "\\end{equation}\n",
    "\n",
    "Still it should be noted that this prediction takes into account the probability of the individual to be given a positive decision ($T=1$), see second term in \\ref{model_pred}.\n",
    "\n",
    "----\n",
    "\n",
    "### Notes\n",
    "\n",
    "* Equations \\ref{model_disc} and \\ref{model_cont} describe the whole causal effect in the population (the causal effect of changing $r$ over all strata $X$).\n",
    "* Prediction should be possible with \\ref{model_pred}. Both terms can be learned from the data. NB: the probability $P(Y=0 | \\text{do}(R=r), X=x)$ is lowest when the individual $x$ is the most dangerous or the least dangerous. How could we infer/predict the counterfactual \"what is the probability of $Y=0$ if we were to let this individual go?\" has yet to be calculated.\n",
    "* Is the effect of R learned/estimated correctly if it is just plugged in to a predictive model (e.g. logistic regression)? **NO**\n",
    "* $P(Y=0 | do(R=0)) = 0$ only in this application. <!-- My predictive models say that when $r=0$ the probability $P(Y=0) \\approx 0.027$ which would be a natural estimate in another application/scenario (e.g. in medicine the probability of an adverse event when a stronger medicine is distributed to everyone. Then the probability will be close to zero but not exactly zero.) -->\n",
    "\n",
    "Imports and settings."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 1,
   "metadata": {},
   "outputs": [],
   "source": [
    "# Imports\n",
    "\n",
    "import numpy as np\n",
    "import pandas as pd\n",
    "from datetime import datetime\n",
    "import matplotlib.pyplot as plt\n",
    "import scipy.stats as scs\n",
    "import scipy.integrate as si\n",
    "import seaborn as sns\n",
    "import numpy.random as npr\n",
    "from sklearn.preprocessing import OneHotEncoder\n",
    "from sklearn.linear_model import LogisticRegression\n",
    "from sklearn.ensemble import RandomForestClassifier\n",
    "from sklearn.model_selection import train_test_split\n",
    "\n",
    "# Settings\n",
    "\n",
    "%matplotlib inline\n",
    "\n",
    "plt.rcParams.update({'font.size': 20})\n",
    "plt.rcParams.update({'figure.figsize': (14, 7)})\n",
    "\n",
    "# Suppress deprecation warnings.\n",
    "\n",
    "import warnings\n",
    "\n",
    "\n",
    "def fxn():\n",
    "    warnings.warn(\"deprecated\", DeprecationWarning)\n",
    "\n",
    "\n",
    "with warnings.catch_warnings():\n",
    "    warnings.simplefilter(\"ignore\")\n",
    "    fxn()"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Data sets\n",
    "\n",
    "### Synthetic data with unobservables\n",
    "\n",
    "In the chunk below, we generate the synthetic data as described by Lakkaraju et al. The default values and definitions of $Y$ and $T$ values follow their description.\n",
    "\n",
    "**Parameters**\n",
    "\n",
    "* M = `nJudges_M`, number of judges\n",
    "* N = `nSubjects_N`, number of subjects assigned to each judge\n",
    "* betas $\\beta_i$ = `beta_i`, where $i \\in \\{X, Z, W\\}$ are coefficients for the respected variables\n",
    "\n",
    "**Columns of the data:**\n",
    "\n",
    "* `judgeID_J` = judge IDs as running numbering from 0 to `nJudges_M - 1`\n",
    "* R = `acceptanceRate_R`, acceptance rates\n",
    "* X = `X`, invidual's features observable to all (models and judges)\n",
    "* Z = `Z`, information observable for judges only\n",
    "* W = `W`, unobservable / inaccessible information\n",
    "* T = `decision_T`, bail-or-jail decisions where $T=0$ represents jail decision and $T=1$ bail decision.\n",
    "* Y = `result_Y`, result variable, if $Y=0$ person will or would recidivate and if $Y=1$ person will or would not commit a crime.\n",
    "\n",
    "The generated data will have M\\*N rows."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 2,
   "metadata": {},
   "outputs": [],
   "source": [
    "def sigmoid(x):\n",
    "    '''Return value of sigmoid function (inverse of logit) at x.'''\n",
    "\n",
    "    return 1 / (1 + np.exp(-1*x))\n",
    "\n",
    "\n",
    "def dataWithUnobservables(nJudges_M=100,\n",
    "                          nSubjects_N=500,\n",
    "                          beta_X=1.0,\n",
    "                          beta_Z=1.0,\n",
    "                          beta_W=0.2):\n",
    "\n",
    "    df = pd.DataFrame()\n",
    "\n",
    "    # Assign judge IDs as running numbering from 0 to nJudges_M - 1\n",
    "    df = df.assign(judgeID_J=np.repeat(range(0, nJudges_M), nSubjects_N))\n",
    "\n",
    "    # Sample acceptance rates uniformly from a closed interval\n",
    "    # from 0.1 to 0.9 and round to tenth decimal place.\n",
    "    acceptance_rates = np.round(npr.uniform(.1, .9, nJudges_M), 10)\n",
    "\n",
    "    # Replicate the rates so they can be attached to the corresponding judge ID.\n",
    "    df = df.assign(acceptanceRate_R=np.repeat(acceptance_rates, nSubjects_N))\n",
    "\n",
    "    # Sample the variables from standard Gaussian distributions.\n",
    "    df = df.assign(X=npr.normal(size=nJudges_M * nSubjects_N))\n",
    "    df = df.assign(Z=npr.normal(size=nJudges_M * nSubjects_N))\n",
    "    df = df.assign(W=npr.normal(size=nJudges_M * nSubjects_N))\n",
    "\n",
    "    # Calculate P(Y=0|X, Z, W)\n",
    "    probabilities_Y = sigmoid(beta_X * df.X + beta_Z * df.Z + beta_W * df.W)\n",
    "\n",
    "    df = df.assign(probabilities_Y=probabilities_Y)\n",
    "\n",
    "    # Result is 0 if P(Y = 0| X = x; Z = z; W = w) >= 0.5 , 1 otherwise\n",
    "    df = df.assign(result_Y=np.where(df.probabilities_Y >= 0.5, 0, 1))\n",
    "\n",
    "    # For the conditional probabilities of T we add noise ~ N(0, 0.1)\n",
    "    probabilities_T = sigmoid(beta_X * df.X + beta_Z * df.Z)\n",
    "    probabilities_T += np.sqrt(0.1) * npr.normal(size=nJudges_M * nSubjects_N)\n",
    "\n",
    "    df = df.assign(probabilities_T=probabilities_T)\n",
    "\n",
    "    # Sort by judges then probabilities in decreasing order\n",
    "    # Most dangerous for each judge are at the top.\n",
    "    df.sort_values(by=[\"judgeID_J\", \"probabilities_T\"],\n",
    "                   ascending=False,\n",
    "                   inplace=True)\n",
    "\n",
    "    # Iterate over the data. Subject will be given a negative decision\n",
    "    # if they are in the top (1-r)*100% of the individuals the judge will judge.\n",
    "    # I.e. if their within-judge-index is under 1 - acceptance threshold times\n",
    "    # the number of subjects assigned to each judge they will receive a\n",
    "    # negative decision.\n",
    "    df.reset_index(drop=True, inplace=True)\n",
    "\n",
    "    df['decision_T'] = np.where((df.index.values % nSubjects_N) <\n",
    "                                ((1 - df['acceptanceRate_R']) * nSubjects_N),\n",
    "                                0, 1)\n",
    "\n",
    "    # Halve the data set to test and train\n",
    "    train, test = train_test_split(df, test_size=0.5)\n",
    "\n",
    "    train_labeled = train.copy()\n",
    "    test_labeled = test.copy()\n",
    "\n",
    "    # Set results as NA if decision is negative.\n",
    "    train_labeled.loc[train_labeled.decision_T == 0, 'result_Y'] = np.nan\n",
    "    test_labeled.loc[test_labeled.decision_T == 0, 'result_Y'] = np.nan\n",
    "\n",
    "    return train_labeled, train, test_labeled, test, df"