The Strengths and Weaknesses of Directed Acyclic Graphs (Dags) as Cognitive, Analytical and Educational Tools for Medical Statistics

Ellison, George orcid iconORCID: 0000-0001-8914-6812 (2022) The Strengths and Weaknesses of Directed Acyclic Graphs (Dags) as Cognitive, Analytical and Educational Tools for Medical Statistics. (Submitted)

Full text not available from this repository.

Official URL:


The origins of directed acyclic graphs (DAGs) date back to the emergence of ‘graph theory’ in the early 1700s (Biggs et al. 1986). DAGs are conceptual or literal, diagrammatic representations of causal paths between variables which are constructed – as their name suggests – on the basis of two over-riding principles: first, that all causal paths are ‘directed’ (i.e. for each pair of variables, only one can represent the cause, while the other must be its consequence); and second, that no direct cyclical paths, or indirect cyclical pathways (comprising sequences of consecutive paths) are allowed, such that no consequence can be considered its own direct or indirect cause (hence ‘acyclic’; Law et al., 2012). As such DAGs reflect the knowledge, presumptions, assumptions and/or speculation of the analyst(s) concerned regarding the causal relationships between each of the variables included therein. Current convention dictates that variables are represented as nodes/vertices, and that any causal paths between variables are represented as directed arcs/edges/lines, often in the form of arrows (see Figure 1). Although each arc indicates the presence and direction of a known/presumed/assumed/speculative causal relationship between the two variables concerned, drawing an arc does not require the sign, magnitude, precision or shape of the relationship to be known or declared (Tennant et al., 2021). In this respect, DAGs provide a simple, uncomplicated, accessible and entirely nonparametric approach for postulating causal relationships amongst any variables of interest even when these are uncertain, unknown or entirely speculative (Ellison, 2020). Nonetheless, as a result of the parametric constraints imposed by the presence/absence of possible arcs within any given DAG, these also reflect and support a number of more sophisticated statistical applications which make it possible to use DAGs to inform the design of multivariable statistical models that reflect the causal structure(s) involved – albeit without the need to know or understand the mathematical technicalities on which these are based (Lewis and Kuerbis, 2016). These features make DAGs attractive cognitive, educational and analytical tools for strengthening the epistemological, theoretical and empirical basis of causal inference, and there has been a recent proliferation in the use of DAGs across a range of applied scientific disciplines (e.g. Knight and Winship, 2013), and an associated upsurge in analytical methods training (e.g. Elwert, 2011; Gilthorpe, 2017; Hernán 2018; Roy, 2021; Hünermund, 2021). This Chapter reflects on a decade of delivering medical statistics training to undergraduate medical students at the University of Leeds between 2012-2021 in which the third year research, evaluation and special studies module (‘RESS3’) has used DAGs to support the development of applied statistical skills relevant to the extended student-selected research and evaluation projects (ESREP) students undertake in their fourth and final years (Ellison, 2021; Ellison et al., 2014a,b). Based on successive iterations of the structure and content of the RESS3 module, together with notes made during formal and informal planning and review meetings with module leads, lecturers, tutors and students, we draw on the claims and criticisms made of DAGs in the epidemiological literature to identify a number of explicit strengths (and associated, often implicit. weaknesses) that are central to their use in prediction and causal inference modelling. While using DAGs requires (and benefits from) a clear understanding of their non-parametric nature and parametric implications, the weaknesses of DAGs seem likely to reflect both: the challenges inherent in the modelling of data generating processes when these are imperfectly understood; and troublesome cognitive and heuristic tendencies common to all analytical tools – in which the tool facilitates the task in hand by reducing the necessity (and benefits of) exploring uncertainties and identifying assumptions. These, more epistemological considerations appear particularly challenging for medical undergraduates to grasp (Ellison, 2021), but also appear poorly understood by many established analysts and clinical epidemiologists (Ellison, 2020).

Repository Staff Only: item control page