Skip to content
Snippets Groups Projects
Analysis_07MAY2019_new.ipynb 200 KiB
Newer Older
Riku-Laine's avatar
Riku-Laine committed
{
 "cells": [
  {
   "cell_type": "markdown",
   "metadata": {
    "toc": true
   },
   "source": [
    "<h1>Table of Contents<span class=\"tocSkip\"></span></h1>\n",
Riku-Laine's avatar
Riku-Laine committed
    "<div class=\"toc\"><ul class=\"toc-item\"><li><span><a href=\"#Causal-model\" data-toc-modified-id=\"Causal-model-1\"><span class=\"toc-item-num\">1&nbsp;&nbsp;</span>Causal model</a></span><ul class=\"toc-item\"><li><span><a href=\"#Notes\" data-toc-modified-id=\"Notes-1.1\"><span class=\"toc-item-num\">1.1&nbsp;&nbsp;</span>Notes</a></span></li></ul></li><li><span><a href=\"#Data-sets\" data-toc-modified-id=\"Data-sets-2\"><span class=\"toc-item-num\">2&nbsp;&nbsp;</span>Data sets</a></span><ul class=\"toc-item\"><li><span><a href=\"#Synthetic-data-with-unobservables\" data-toc-modified-id=\"Synthetic-data-with-unobservables-2.1\"><span class=\"toc-item-num\">2.1&nbsp;&nbsp;</span>Synthetic data with unobservables</a></span></li><li><span><a href=\"#Data-without-unobservables\" data-toc-modified-id=\"Data-without-unobservables-2.2\"><span class=\"toc-item-num\">2.2&nbsp;&nbsp;</span>Data without unobservables</a></span></li></ul></li><li><span><a href=\"#Algorithms\" data-toc-modified-id=\"Algorithms-3\"><span class=\"toc-item-num\">3&nbsp;&nbsp;</span>Algorithms</a></span><ul class=\"toc-item\"><li><span><a href=\"#Contraction-algorithm\" data-toc-modified-id=\"Contraction-algorithm-3.1\"><span class=\"toc-item-num\">3.1&nbsp;&nbsp;</span>Contraction algorithm</a></span></li><li><span><a href=\"#Causal-approach---metrics\" data-toc-modified-id=\"Causal-approach---metrics-3.2\"><span class=\"toc-item-num\">3.2&nbsp;&nbsp;</span>Causal approach - metrics</a></span></li></ul></li><li><span><a href=\"#Performance-comparison\" data-toc-modified-id=\"Performance-comparison-4\"><span class=\"toc-item-num\">4&nbsp;&nbsp;</span>Performance comparison</a></span><ul class=\"toc-item\"><li><span><a href=\"#With-unobservables-in-the-data\" data-toc-modified-id=\"With-unobservables-in-the-data-4.1\"><span class=\"toc-item-num\">4.1&nbsp;&nbsp;</span>With unobservables in the data</a></span><ul class=\"toc-item\"><li><span><a href=\"#Predictive-model\" data-toc-modified-id=\"Predictive-model-4.1.1\"><span class=\"toc-item-num\">4.1.1&nbsp;&nbsp;</span>Predictive model</a></span></li><li><span><a href=\"#Visual-comparison\" data-toc-modified-id=\"Visual-comparison-4.1.2\"><span class=\"toc-item-num\">4.1.2&nbsp;&nbsp;</span>Visual comparison</a></span></li></ul></li><li><span><a href=\"#Without-unobservables\" data-toc-modified-id=\"Without-unobservables-4.2\"><span class=\"toc-item-num\">4.2&nbsp;&nbsp;</span>Without unobservables</a></span><ul class=\"toc-item\"><li><span><a href=\"#Predictive-model\" data-toc-modified-id=\"Predictive-model-4.2.1\"><span class=\"toc-item-num\">4.2.1&nbsp;&nbsp;</span>Predictive model</a></span></li><li><span><a href=\"#Visual-comparison\" data-toc-modified-id=\"Visual-comparison-4.2.2\"><span class=\"toc-item-num\">4.2.2&nbsp;&nbsp;</span>Visual comparison</a></span></li></ul></li></ul></li></ul></div>"
Riku-Laine's avatar
Riku-Laine committed
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "##  Causal model\n",
    "\n",
    "Our model is defined by the probabilistic expression \n",
    "\n",
Riku-Laine's avatar
Riku-Laine committed
    "\\begin{equation} \\label{model_disc}\n",
Riku-Laine's avatar
Riku-Laine committed
    "P(Y=0 | \\text{do}(R=r)) = \\sum_x \\underbrace{P(Y=0|X=x, T=1)}_\\text{1} \n",
    "\\overbrace{P(T=1|R=r, X=x)}^\\text{2} \n",
    "\\underbrace{P(X=x)}_\\text{3}\n",
    "\\end{equation}\n",
    "\n",
    "which is equal to \n",
    "\n",
    "\\begin{equation}\\label{model_cont}\n",
    "P(Y=0 | \\text{do}(R=r)) = \\int_x P(Y=0|X=x, T=1)P(T=1|R=r, X=x)P(X=x)\n",
    "\\end{equation}\n",
    "\n",
    "for continuous $x$. Model as a graph (Z is a latent variable, and can be excluded from the expression with do-calculus by showing that $X$ is admissible for adjustment):\n",
    "\n",
Riku-Laine's avatar
Riku-Laine committed
    "![Model as picture](../figures/intervention_model.png \"Intervention model\")\n",
Riku-Laine's avatar
Riku-Laine committed
    "\n",
    "For predicting the probability of negative outcome the following should hold because by Pearl $P(Y=0 | \\text{do}(R=r), X=x) = P(Y=0 | R=r, X=x)$ when $X$ is an admissible set:\n",
    "\n",
    "\\begin{equation} \\label{model_pred}\n",
    "P(Y=0 | \\text{do}(R=r), X=x) = P(Y=0|X=x, T=1)P(T=1|R=r, X=x).\n",
    "\\end{equation}\n",
    "\n",
    "Still it should be noted that this prediction takes into account the probability of the individual to be given a positive decision ($T=1$), see second term in \\ref{model_pred}.\n",
    "\n",
    "----\n",
    "\n",
    "### Notes\n",
    "\n",
    "* Equations \\ref{model_disc} and \\ref{model_cont} describe the whole causal effect in the population (the causal effect of changing $r$ over all strata $X$).\n",
    "* Prediction should be possible with \\ref{model_pred}. Both terms can be learned from the data. NB: the probability $P(Y=0 | \\text{do}(R=r), X=x)$ is lowest when the individual $x$ is the most dangerous or the least dangerous. How could we infer/predict the counterfactual \"what is the probability of $Y=0$ if we were to let this individual go?\" has yet to be calculated.\n",
    "* Is the effect of R learned/estimated correctly if it is just plugged in to a predictive model (e.g. logistic regression)?\n",
    "* $P(Y=0 | do(R=0)) = 0$ only in this application. My predictive models say that when $r=0$ the probability $P(Y=0) \\approx 0.027$ which would be a natural estimate in another application/scenario (e.g. in medicine the probability of an adverse event when a stronger medicine is distributed to everyone. Then the probability will be close to zero but not exactly zero.)"
   ]
  },
  {
   "cell_type": "code",
Riku-Laine's avatar
Riku-Laine committed
   "execution_count": 52,
Riku-Laine's avatar
Riku-Laine committed
   "metadata": {},
   "outputs": [],
   "source": [
    "# Imports\n",
    "\n",
    "import numpy as np\n",
    "import pandas as pd\n",
    "from datetime import datetime\n",
    "import matplotlib.pyplot as plt\n",
    "import scipy.stats as scs\n",
    "import scipy.integrate as si\n",
    "import seaborn as sns\n",
    "import numpy.random as npr\n",
    "from sklearn.preprocessing import OneHotEncoder\n",
    "from sklearn.linear_model import LogisticRegression\n",
    "from sklearn.ensemble import RandomForestClassifier\n",
    "\n",
    "# Settings\n",
    "\n",
    "%matplotlib inline\n",
    "\n",
    "plt.rcParams.update({'font.size': 16})\n",
    "plt.rcParams.update({'figure.figsize': (14, 7)})\n",
    "\n",
    "# Suppress deprecation warnings.\n",
    "\n",
    "import warnings\n",
    "\n",
Riku-Laine's avatar
Riku-Laine committed
    "\n",
Riku-Laine's avatar
Riku-Laine committed
    "def fxn():\n",
    "    warnings.warn(\"deprecated\", DeprecationWarning)\n",
    "\n",
Riku-Laine's avatar
Riku-Laine committed
    "\n",
Riku-Laine's avatar
Riku-Laine committed
    "with warnings.catch_warnings():\n",
    "    warnings.simplefilter(\"ignore\")\n",
    "    fxn()"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
Riku-Laine's avatar
Riku-Laine committed
    "## Data sets\n",
    "\n",
    "### Synthetic data with unobservables\n",
Riku-Laine's avatar
Riku-Laine committed
    "\n",
    "In the chunk below, we generate the synthetic data as described by Lakkaraju et al. The default values and definitions of $Y$ and $T$ values follow their description.\n",
    "\n",
    "**Parameters**\n",
    "\n",
    "* M = `nJudges_M`, number of judges\n",
    "* N = `nSubjects_N`, number of subjects assigned to each judge\n",
    "* betas $\\beta_i$ = `beta_i`, where $i \\in \\{X, Z, W\\}$ are coefficients for the respected variables\n",
    "\n",
    "**Columns of the data:**\n",
    "\n",
    "* `judgeID_J` = judge IDs as running numbering from 0 to `nJudges_M - 1`\n",
    "* R = `acceptanceRate_R`, acceptance rates\n",
    "* X = `X`, invidual's features observable to all (models and judges)\n",
    "* Z = `Z`, information observable for judges only\n",
    "* W = `W`, unobservable / inaccessible information\n",
    "* T = `decision_T`, bail-or-jail decisions where $T=0$ represents jail decision and $T=1$ bail decision.\n",
    "* Y = `result_Y`, result variable, if $Y=0$ person will or would recidivate and if $Y=1$ person will or would not commit a crime."
   ]
  },
  {
   "cell_type": "code",
Riku-Laine's avatar
Riku-Laine committed
   "execution_count": 53,
Riku-Laine's avatar
Riku-Laine committed
   "metadata": {},
Riku-Laine's avatar
Riku-Laine committed
   "outputs": [
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<style scoped>\n",
       "    .dataframe tbody tr th:only-of-type {\n",
       "        vertical-align: middle;\n",
       "    }\n",
       "\n",
       "    .dataframe tbody tr th {\n",
       "        vertical-align: top;\n",
       "    }\n",
       "\n",
       "    .dataframe thead th {\n",
       "        text-align: right;\n",
       "    }\n",
       "</style>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th>result_Y</th>\n",
Riku-Laine's avatar
Riku-Laine committed
       "      <th>0.0</th>\n",
       "      <th>1.0</th>\n",
       "      <th>All</th>\n",
       "    </tr>\n",
       "    <tr>\n",
Riku-Laine's avatar
Riku-Laine committed
       "      <th>decision_T</th>\n",
Riku-Laine's avatar
Riku-Laine committed
       "      <th></th>\n",
       "      <th></th>\n",
       "      <th></th>\n",
Riku-Laine's avatar
Riku-Laine committed
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>0</th>\n",
Riku-Laine's avatar
Riku-Laine committed
       "      <td>17263</td>\n",
       "      <td>7585</td>\n",
       "      <td>24848</td>\n",
Riku-Laine's avatar
Riku-Laine committed
       "    </tr>\n",
       "    <tr>\n",
       "      <th>1</th>\n",
Riku-Laine's avatar
Riku-Laine committed
       "      <td>7931</td>\n",
       "      <td>17221</td>\n",
       "      <td>25152</td>\n",
Riku-Laine's avatar
Riku-Laine committed
       "    </tr>\n",
       "    <tr>\n",
Riku-Laine's avatar
Riku-Laine committed
       "      <th>All</th>\n",
Riku-Laine's avatar
Riku-Laine committed
       "      <td>25194</td>\n",
       "      <td>24806</td>\n",
Riku-Laine's avatar
Riku-Laine committed
       "      <td>50000</td>\n",
Riku-Laine's avatar
Riku-Laine committed
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "</div>"
      ],
      "text/plain": [
Riku-Laine's avatar
Riku-Laine committed
       "result_Y      0.0    1.0    All\n",
       "decision_T                     \n",
Riku-Laine's avatar
Riku-Laine committed
       "0           17263   7585  24848\n",
       "1            7931  17221  25152\n",
       "All         25194  24806  50000"
Riku-Laine's avatar
Riku-Laine committed
      ]
     },
Riku-Laine's avatar
Riku-Laine committed
     "execution_count": 53,
Riku-Laine's avatar
Riku-Laine committed
     "metadata": {},
Riku-Laine's avatar
Riku-Laine committed
     "output_type": "execute_result"
Riku-Laine's avatar
Riku-Laine committed
    }
   ],
Riku-Laine's avatar
Riku-Laine committed
   "source": [
    "# Set seed for reproducibility\n",
Riku-Laine's avatar
Riku-Laine committed
    "#npr.seed(0)\n",
Riku-Laine's avatar
Riku-Laine committed
    "\n",
Riku-Laine's avatar
Riku-Laine committed
    "\n",
Loading
Loading full blame...