# Implementation of the Double/ Debiased Machine Learning Approach in Python

By Class of Summer Term 2019 | June 18, 2019

# Double Machine Learning Implementation

Christopher Ketzler*, Guillermo Morishige*

Abstract: The aim of this paper is to replicate and apply the approach provided by Chernozhukov et al. (2016) to get the causal estimand of interest: average treatment effect (ATE) $\ \eta_0$ using Neyman orthogonality and cross-fitting. For observational data, we will estimate the causal relationship between the eligibility and participation in the 401(k) and its effect on net financial assets; as well to apply it to other datasets, to find the effect of the Pennsylvania Reemployment Bonus on the unemployment duration and the effect of smoking on medical costs. As proposed by Chernozhukov’s Double/Debiased Machine Learning (DML) framework, we will estimate the causal effects of binary treatments on an outcome, the regression parameter in a partially linear regression model. With use of machine learning (ML) methods to estimate the nuisance parameters $\ \eta_0$ : the dependency of the confounding factors (controls) with respect to the outcome and the treatment assignment.

Keywords: Double machine learning, average treatment effect, Neyman-orthogonality, cross-fitting, partially linear regression model.

• School of Business and Economics, Humboldt Universität zu Berlin, Spandauer Str. 1, 10178, Berlin, Germany.

## 1) Introduction

People in the fields of econometrics, epidemiology, philosophy, just to name a few, have been interested into modelling causality: drawing conclusion through statistical analyses from associations between measurements. Although getting inferences from these statistical analyses could be tricky since the association (correlation) doesn’t imply causation. The word “causation” started to appear in settings of randomized experiments by Neyman (1923). Fisher (1935) stressed the importance of randomization as the basis for inference. Rubin (1974) takes it to a non-random assignment mechanism which could apply not only to experimental data, but also observational.
As computational power increased, innovations in statistical inference followed. New statistical approaches were developed and now these robust models could handle big sets of data with an extensive number of semi-parametric covariates:

In 2016, Victor Chernozhukov et al. introduced the “Double/ Debiased Machine Learning for Treatment and Structural Parameters” to solve the classic semiparametric problem of inference on a low parameter $\ \theta_0$ in the presence of a high-dimensional nuisance parameter $\ \eta_0$. A nuisance parameter represents an intermediate step for computing the parameter of interest. In this case, the treatment effect on a certain variable denoted by $\ \theta_0$ is of interest. Victor Chernozhukov et al. estimate the nuisance parameters through machine learning estimators.

Only machine learning models are applicable, which able to handle high dimensional cases, meaning that the entropy of the parameter space is increasing based on the sample size in a sufficiently small way (traditional framework). The following predictors are employed: Random Forest, Lasso, Ridge, Deep Neural Networks, Boosted Trees, and ensemble models based on at least one of them. This approach reduces the effect of easily overfitting and find a suitable trade-off between regularization and bias. By cross-fitting and using Neyman-orthogonal moment functions/ score functions Double/ Debiased Machine Learning reduces the bias, to get a closer estimate for the treatment effect $\ \theta_0$. Neyman-orthogonal functions have a lower sensitivity with respect to nuisance parameters while estimating treatment effects $\ \theta_0$.

This blog provides a rough overview of Victor Chernozhukov et al. Double Machine Learning approach. We do not aim to clarify all aspects. For further explanations, refer to the paper Chernozhukov et al (2016). The objective of our work is the implementation of Double Machine Learning approach in Python. Therefore, the blog is structured as followed: In section 2) we will make reference to developments in the machine learning field for average treatment estimation purposes. Section 3) provides a deeper insight into DML. In section 4) contains the empirical test of our code and interpretation of results.

## 2) Literature Review

For unconfounded assignment of the treatment effects there are a number of approaches that have been used through the development of statistical inference. Using the inverse of nonparametric estimates of the propensity scores for treatment effect estimations was an idea introduced by Hirano, Keisuke, et al. (2003). Elizabeth Stuart (2010) considered a wide range of matching methods to best compare the treatment effect between groups with covariates in common for an unbiased comparison. Knaus, et. al. (2018) used machine learning to simulate data generation processes (DGPs).

The machine learning estimators have been used for the estimation of heterogenous causal effects across different disciplines. The approaches with the different machine leraning methods are: regression trees by Su, et. al. (2009), random forests by Wagner and Athey (2018), lasso by Qian and Murphy (2011), support vector machines by Imai and Ratkovic (2013), boosting by Powers, et. al. (2018), neural networks by Johansson, F., et. al. (2016).

Specifically focused in developments of the Double Machine Learning, we can find an applied study by Knaus (2018): A Double Machine Learning Approach to Estimate the Effects of Musical Practice on Student’s Skills. He used the dataset of the German National Economic Panel Study (NEPS) Blossfeld and von Maurice (2011).

Chernozhukov et al (2016) also provided extensions to the model that are not going t be implemented by us. They proposed using instrumental variables (IV) in the partially linear model. He also estimates the average treatment effect on the treated (ATTE) and the local average treatment effects (LATE).

## 3) Double/ Debiased Machine Learning

### 3.1) Partially Linear Model

The mathematical model that describes the estimation problem is a partially linear equation as suggested by Robinson (1988). It is assumed that the treatment effects are fully heterogenous and the treatment variable is binary, $\ D \in{0,1}$. We consider the vectors $\ (Y,D,X)$, where $\ Y$ are the outcome variables, $\ D$ the treatment variable, and $\ X$ are the covariates.

$\ U$ and $\ V$ are disturbances. Our variable of interest is $\ \theta_0$, the average treatment effect. The nuisance parameter is: $\ \eta_0 = (m_0,g_0)$. The nuisance parameters are estimated using machine learning methods caused by the nonparametric nature of the variables in the covariates.

### 3.2) Naïve Estimator

A simple way to estimate the treatment effect is to construct a sophisticated machine learning estimator, i.e. Random Forest, and to learn the following regression function: $\ D\theta_0+g_0(X)$. Where the data is split into two parts with $\ i\in I$ and an auxiliary part with length $\ N-n$ . Then one solves the following equation to get the treatment effect:

By decomposing the scaled estimation error in the treatment effect ($\ \theta_0$) one can visualize the impact of the bias while learning the ml estimator $\ g_0$.

Where term $\ a$ (underlined), under mild conditions, follows: $\ a \to N(\theta, \sum^-)$ and $\ b$ is the regularization bias term.