Skip to content
Snippets Groups Projects

Causal approach to selective labels

Repository for the research project called Causal approach to selective labels with the required data sets. This repository has originally been forked from the ProPublica repository for COMPAS analysis.

Structure of the repository

The contents of the repository is divided into four main folders:

  • analysis_and_scripts contains the scripts and notebooks for performing the analysis. Additionally, the folder contains notes.tex file which contains much of the different research done.
  • data folder contains the original data sets from the ProPublica analysis (see below for more information)
  • figures contains the figures used for the notes file mentioned earlier. This folder is also used for the figures for the BSocSc thesis.
  • paper contains the draft for a research publication.

Original README:

                 +---------+
Didn't Reoffend  |____|____|
Reoffended       |    |    |
                 +---------+


This repository contains a Jupyter notebook and data for the ProPublica story "Machine Bias."

Story:
https://www.propublica.org/article/machine-bias-risk-assessments-in-criminal-sentencing/

Methodology:
https://www.propublica.org/article/how-we-analyzed-the-compas-recidivism-algorithm/

Notebook (you'll probably want to follow along in the methodology):
https://github.com/propublica/compas-analysis/blob/master/Compas%20Analysis.ipynb

Main Dataset:
compas.db - a sqlite3 database containing criminal history, jail and prison time, demographics and COMPAS risk scores for defendants from Broward County.

Other files as needed for the analysis.