Skip Navigation
Skip to contents

CPP : Cardiovascular Prevention and Pharmacotherapy

Sumissioin : submit your manuscript
SEARCH
Search

Articles

Page Path
HOME > Cardiovasc Prev Pharmacother > Volume 2(3); 2020 > Article
Special Article
Causal Claims in Health Sciences and Medicine: a Difference-in-Differences Method
Kyoung-Nam Kim, MD, PhD1,2orcid
Cardiovascular Prevention and Pharmacotherapy 2020;2(3):99-102.
DOI: https://doi.org/10.36011/cpp.2020.2.e13
Published online: July 31, 2020

1Division of Public Health and Preventive Medicine, Seoul National University Hospital, Seoul, Korea

2Department of Preventive Medicine, Seoul National University College of Medicine, Seoul, Korea

Correspondence to Kyoung-Nam Kim, MD, PhD Division of Public Health and Preventive Medicine, Seoul National University Hospital, 101 Daehak-ro, Jongno-gu, Seoul 03080, Korea. E-mail: kkn002@snu.ac.kr
• Received: June 6, 2020   • Accepted: July 27, 2020

Copyright © 2020 Korean Society of Cardiovascular Disease Prevention; International Society of Cardiovascular Pharmacotherapy, Korea Chapter.

This is an open-access article distributed under the terms of the Creative Commons Attribution Non-Commercial License (http://creativecommons.org/licenses/by-nc/4.0/) which permits unrestricted non-commercial use, distribution, and reproduction in any medium, provided the original work is properly cited.

  • The difference-in-differences (DID) method is a useful tool to make causal claims using observational data. The key idea is to compare the difference between exposure and control groups before and after an event. The potential outcome of the exposure group during the post-exposure period is estimated by adding the observed outcome change of the control group between the pre- and post-exposure period to the observed outcome of the exposure group during the pre-exposure period. Because the effect of exposure is evaluated by comparing the observed outcome and potential outcome of the same exposure group, unmeasured potential confounders can be cancelled out by the design. To apply this method appropriately, the difference between the exposure and control groups needs to be relatively stable if no exposure occurred. Despite the strengths of the DID method, the assumptions, such as parallel trends and proper comparison groups, need to be carefully considered before application. If used properly, this method can be a useful tool for epidemiologists and clinicians to make causal claims with observational data.
Randomized controlled trials are considered a gold standard for establishing causal relationships.1) However, random allocation of intervention is often impossible in human studies due to reasons such as ethical considerations, among others. Therefore, several methodologies to infer causal relationships using observational data have been developed and used in the fields of human sciences. The essence of these methods is to consider factors affecting intervention assignment and to identify appropriate controls in the absence of controlled experiments.2)
The difference-in-differences (DID) method is one of the most popular methods in these “quasi-experimental” approaches.3) The method is widely used because its key concept is intuitive and unknown or unmeasured confounding factors can also be controlled.
Therefore, we briefly described the issues regarding the DID method, such as key concepts and outline, assumptions, model specification, and further considerations.
The key idea of the DID method is to compare the differences between exposure and control groups before and after the exposure period. To estimate the potential outcome of the exposure group during the post-exposure period if exposure does not occur (counter-factual outcome of the group during the post-exposure period in the absence of exposure), the observed outcome change of the control group between the pre- and post-exposure periods is added to the observed outcome of the exposure group during the pre-exposure period (Figure 1).
In Table 1, the observed outcome of the exposure group during the post-exposure period is A+B+C+D and the potential outcome in the absence of exposure during the post-exposure period is A+B+C. Therefore, the effect of exposure is estimated as D (DID between pre- and post-exposure period).
Because the effect of exposure is evaluated by comparing the observed and potential outcomes of the same exposure group, unmeasured potential confounders can be cancelled out by the design, which is considered one of the strengths of this method.
The DID method can be applied to data with observations before and after exposure in the exposure and control groups, respectively. Although the method can be applied to data with only one observation before and after exposure in exposure and control groups, respectively (Figure 1, Table 1), it can also be applied to data with multiple time points and exposures.
Application of method is appropriate in the case that the difference between exposure and control groups needs to be relatively stable if exposure does not exist. This parallel-trends assumption (i.e., the same time trend between groups) cannot be directly tested but can be evaluated through visual inspection of the similar trends during the pre-exposure period between groups.
Although the causal relationship between exposure and outcome can be estimated by simple subtraction (Table 1), regression models are more commonly used because they can provide the results adjusted for potential confounders. In addition, such models can also provide statistical testing results and confidence intervals.
Regression models for estimating the causal relationship between exposure and outcome using the DID approach can be summarized as follows:
Ygp = b0 + b1×G + b2×P + b3×G×P + b4×Cgp + e
where Ygp is an outcome of interest for group G (control group, 0; exposure group, 1) in period P (pre-exposure period, 0; post-exposure period, 1), Cgp is potential confounders of group G in period P, and e is an error term. The causal relationship between exposure and outcome can be evaluated by testing the interaction between period (pre vs. post) and group (control vs. exposure) variables. Therefore, the regression coefficient of interaction term b3 is a coefficient of interest. If b3 is statistically different from 0, it can be interpreted that there is a causal relationship between exposure and outcome.
The following model can be used for DID analysis with multiple periods and groups:
Ygp = b0 + b1× Sgp + δg + γp + b2×Cgp + e
where Ygp and Sgp are an outcome and exposure for group G in period P, respectively. δg is a group fixed effect, γp is a period fixed effect, Cgp is potential confounders of group G in period P, and e is an error term. The regression coefficient for the interaction term b1 is a coefficient of interest. If b1 is statistically different from 0, it can be interpreted that there is a causal relationship between exposure and outcome.
Until now, only one control group was considered. However, causal inference can be strengthened with additional control groups. For example, if certain health policy aims to lower mortality in low-income groups, controls can be included for those whose income is not low in the groups (e.g., regions) to which health policy is applied and not applied. Considering additional sources of variation in this way is known as difference-in-difference-in-differences (DDD). In most cases, results from DID analysis are presented as a sensitivity analyses to the main DID analyses due to the difficulties in justifying the adequacy of two control groups.
Regression models for the DDD analysis can be summarized as follows:
Y = b0 + b1×G + b2×V + b3×P + b4×G×V + b5×V×P + b6×P×G + b7×G×V×P + b8×C + e
where Y is an outcome of interest, G is a group (control group, 0; exposure group, 1), V is an additional source of variance (control, 0; exposure, 1), P is a period (pre-exposure period, 0; post-exposure period, 1), C is potential confounders, and e is an error term. The causal relationship between exposure and outcome can be evaluated by testing the three-way interaction term G×V×P and the coefficient of interest is b8. If b8 is statistically different from 0, it can be interpreted that there is a causal relationship between exposure and outcome.
This study explained key concepts, outline, assumptions, model specification, and further considerations of the DID method. Despite the strengths of the method, the assumptions, such as parallel trends and proper comparison groups, need to be carefully considered before application.4) If used properly, this method can be a useful tool for epidemiologists and clinicians to make causal claims based on observational data.

Conflict of Interest

The author has no financial conflicts of interest.

Author Contributions

Conceptualization: Jeong NY, Choi NK; Investigation: Jeong NY, Kim SH, Lim E, Choi NK; Supervision: Choi NK; Writing - original draft: Jeong NY, Kim SH, Lim E; Writing - review & editing: Jeong NY, Kim SH, Lim E, Choi NK.

Figure 1.
Key concepts of difference-in-differences method.
cpp-2020-2-e13f1.jpg
Table 1.
Quantities of exposure effects estimated from the DID method
Pre-exposure period Post-exposure period Difference
Exposure group A+B A+B+C+D C+D
Control group A A+C C
DID D

DID = difference-in-differences.

  • 1. Rubin DB. For objective causal inference, design trumps analysis. Ann Appl Stat 2008;2:808–40.Article
  • 2. Butsic V, Lewis DJ, Radeloff VC, Baumann M, Kuemmerle T. Quasi-experimental methods enable stronger inferences from observational data in ecology. Basic Appl Ecol 2017;19:1–10.Article
  • 3. Angrist JD, Pischke JS. Mostly Harmless Econometrics: An Empiricist's Companion. Princeton, NJ, Princeton University Press; 2008.
  • 4. Clair TS, Cook TD. Difference-in-differences methods in public finance. Natl Tax J 2015;68:319–38.Article

Figure & Data

References

    Citations

    Citations to this article as recorded by  

      Figure

      CPP : Cardiovascular Prevention and Pharmacotherapy