Thank you for visiting nature.com. You are using a browser version with limited support for CSS. To obtain the best experience, we recommend you use a more up to date browser (or turn off compatibility mode in Internet Explorer). In the meantime, to ensure continued support, we are displaying the site without styles and JavaScript.

  • View all journals
  • My Account Login
  • Explore content
  • About the journal
  • Publish with us
  • Sign up for alerts
  • Open access
  • Published: 17 February 2021

Temporal bias in case-control design: preventing reliable predictions of the future

  • William Yuan 1 ,
  • Brett K. Beaulieu-Jones   ORCID: orcid.org/0000-0002-6700-1468 1 ,
  • Kun-Hsing Yu   ORCID: orcid.org/0000-0001-9892-8218 1 ,
  • Scott L. Lipnick 1 , 2 , 3 ,
  • Nathan Palmer   ORCID: orcid.org/0000-0002-4361-207X 1 ,
  • Joseph Loscalzo   ORCID: orcid.org/0000-0002-1153-8047 4 ,
  • Tianxi Cai 1 , 5 , 6 &
  • Isaac S. Kohane 1  

Nature Communications volume  12 , Article number:  1107 ( 2021 ) Cite this article

9821 Accesses

27 Citations

24 Altmetric

Metrics details

  • Epidemiology
  • Machine learning

One of the primary tools that researchers use to predict risk is the case-control study. We identify a flaw, temporal bias, that is specific to and uniquely associated with these studies that occurs when the study period is not representative of the data that clinicians have during the diagnostic process. Temporal bias acts to undermine the validity of predictions by over-emphasizing features close to the outcome of interest. We examine the impact of temporal bias across the medical literature, and highlight examples of exaggerated effect sizes, false-negative predictions, and replication failure. Given the ubiquity and practical advantages of case-control studies, we discuss strategies for estimating the influence of and preventing temporal bias where it exists.

Introduction

The ability to predict disease risk is a foundational aspect of medicine, and is instrumental for early intervention, clinician decision support, and improving patient outcomes. One of the main tools utilized by researchers for identifying predictive associations or constructing models from observational data is the case-control study 1 . By measuring differing exposure patterns between the case and control groups, exposures can be interpreted as predictors or risk factors for case status 2 , 3 . With the proliferation of observational datasets and novel machine learning techniques, the potential for these studies to play a direct role in personalized medicine has begun to be explored 4 . However, we have identified a structural flaw, seen widely in basic case-control study designs, which we call temporal bias. At its core, temporal bias represents a mismatch between the data used in the study and the data that a clinician would have access to when making a diagnostic decision. A clinician must evaluate all patients in real time, without the luxury of knowing that they have been pre-selected according to their future status. Case-control studies, as popularly implemented, are uniquely unable to make prospectively valid predictions. This temporal bias not only amplifies reported effect sizes relative to what would be observed in practice, but also obfuscates the prospective use of findings.

A classic example of temporal bias and its impacts can be seen through the initial discovery of lyme disease, a tick-borne bacterial infection. Lyme disease is characterized by (i) an initial bite, (ii) an expanding ring rash, and (iii) arthritic symptoms, in that order 5 . However, the original 1976 discovery of lyme disease (then termed lyme arthritis) focused exclusively on patients who manifested with arthritic symptoms 6 . This enabled researchers to definitively identify the prognostic value of a ring rash towards arthritis, but not tick bites, due to the latter symptom’s temporal distance from the researcher’s focus. By focusing on predictive features immediately prior to the event in question, researchers capture a biased representation of the full trajectory from healthy-to-diseased. A contemporaneous doctor aware of lyme arthritis examining a patient presenting with a tick bite would miss the possibility of disease until further symptoms developed. Similarly, a predictive model for lyme arthritis focused on ring rashes would report false negatives if it were deployed in practice: patients who had yet to develop ring rashes would contract arthritis at a future time. These errors stem from the incomplete picture of symptoms that was captured.

However, temporal bias is not a problem of the past. The central flaw, an overemphasis on features collected near the case event, still occurs in the literature today. Within the medical domain, there are numerous examples of temporal bias in both clinical medicine and machine learning 7 , 8 , 9 , 10 , 11 , 12 , 13 , 14 , 15 , 16 . Despite increasing interest in machine learning risk prediction, few tools for use on individual patients have become standard practice 17 , 18 . As algorithms trained using large datasets and advanced machine learning methods become more popular, understanding limitations in the way they were generated is critical. In this article, we describe the basis for temporal bias and examine three representative instances of temporal bias in the medical, machine learning, and nutritional literature to identify the impact that this phenomenon has on effect sizes and predictive power.

Of interest are the expansive set of studies that focus on predicting future events in real time and obey the following general conditions. First, events to be predicted take the form of state transitions (healthy-to-diseased, stable-to-failed, control-to-case, etc.). This implies that there exists a bulk population of controls, from which cases differentiate themselves. Soon-to-be cases progress along a trajectory away from the control population at varying speeds. This trajectory terminates at the occurrence of the case event, but the position of control individuals along this trajectory cannot be reliably determined.

Second, we consider that the risk-of-event is equivalent to measuring progress along a control-to-case trajectory in time. Because risk prediction utilizes features from the present to assess the chance of a future event occurring, an event that is truly random would not be appropriate for a risk prediction algorithm. The trajectory represents the ground truth progression along a pathway towards the event in question and are defined relative to the specific populations chosen for the study. This assumes that the researchers have taken the exchangeability 19 of their case and control populations into account: if members of the control population are chosen poorly and cannot experience the case event, then there can be no trajectory.

Third, at the population level, the trajectory commences when the to-be-diseased population first begins to diverge from the non-diseased population and reaches a maximum when the disease event actually occurs. This requires that the trajectory is aligned to the event in question. Diseased individuals must consequently be referred to using terms such as days to disease, while control individuals exist in an undefined point along this timeline, because their days to disease is unknown. This is only required due to the retrospective nature of these studies and is a major departure from prospective deployment.

Finally, the features actually measured by a study represent proxies for an individual’s position along the trajectory. Regardless of their positive or negative association with the event, features subject to temporal bias will tend to diverge between cases and controls with a continuous trajectory, and become better at differentiating the controls from cases as case individuals get closer to their event. This divergence provides the mechanism of action for temporal bias to act. If a model does not possess time varying features (such as a GWAS), temporal bias cannot occur, but predicted risk will also be static with respect to time-to-case-event.

As a result, we can distill prediction studies into a common structure (Fig.  1 ): the members of the diseased population begin as controls at a point in the past, and progress along a trajectory until the disease occurs. Most case-control studies apply a dichotomous framework over this continuous trajectory.

figure 1

Red and green zones represent positions on the trajectory corresponding to outward definitions of diseased and non-diseased status. Vertical arrows represent sampling of population at a particular point of a trajectory. A The (single-class) case-control paradigm often imposes a dichotomous (binary) framework onto a continuous trajectory. B Experiments utilizing observations of cases that are concentrated at the time when the case event occurs cannot capture any information regarding the transition trajectory, resulting in temporal bias. C In order to predict a patient’s position along the trajectory, experiments capturing the entire transition from non-diseased to diseased are necessary.

Temporal bias occurs when cases are sampled unevenly in time across this trajectory (Fig.  1B ). (A theoretical basis for temporal bias is presented in Supplementary Note  1 .) This is a separate but analogous effect compared to selection bias: the control population may be exchangeable with the diseased population but must tautologically exist at a prior point along the disease trajectory compared to cases. Rather than operating over the selection of which patients to include in the study, temporal bias acts over the selection of when each subject is observed.

This important temporal feature yields two implications:

If the features of diseased subjects are evaluated based on a point or window that is defined relative to the case event (a future event, from the perspective of the feature measurements), features in the end of the trajectory will be oversampled. For example, a study that compares individuals one year prior to disease diagnosis to healthy controls will oversample the trajectory one year prior to disease, and undersample the trajectory further out.

The resulting model cannot be prospectively applied because the study design implicitly leaked information from the future: a prospective evaluator has no way of knowing if a particular subject is within the observation window defined by the study. It cannot be known if an individual is one year away from a disease diagnosis in real time.

Temporal bias is intuitively understood within certain epidemiological circles- in fact:

recall bias, caused by the tendency for survey respondents to remember recent events at a higher rate relative to past events, can be interpreted as a specific instance of temporal bias. Similarly, it is understood that case-control studies represent a lower level of evidence relative to other study designs 20 . Methodologies have been proposed that, while not explicitly designed to address temporal bias, happen to be immune to it (density-based sampling, among others 21 ). However, these tend to focus on point exposures or necessitate impractically exact sampling strategies. Despite this important shortcoming, the ease of the case-control framework has allowed temporal bias to proliferate across many fields. We examine three examples, in cardiology, medical machine learning, and nutrition below.

Temporal bias can inflate observed associations and effect sizes

The INTERHEART study 22 examined the association between various risk factors and myocardial infarction (MI) using a matched case-control design among a global cohort. Individuals presenting at hospitals with characteristic MI were defined as cases, and subjected to interviews and blood tests, while matched controls were identified from relatives of MI patients or healthy cardiovascular individuals presenting with unrelated disorders. One risk factor of interest included lipoprotein (a) [Lp(a)], a blood protein 23 , 24 . While Lp(a) levels are thought to be influenced by inheritance, significant intra-individual biological variance with time has been reported 25 , 26 .

One recent analysis utilized data from this study to examine the positive association between blood levels of Lp(a) and MI across different ethnicities and evaluate the possible efficacy of Lp(a) as a risk prediction feature 27 . However, because cases were only sampled at the time of the MI event, the resulting effect sizes are difficult to interpret prospectively. Indexing case patients by their case status leaks information regarding their status to which a physician prospectively examining a patient would not have access to. Intuitively, if Lp(a) was static until a spike immediately prior to an MI event, it could not be used as a prospective risk predictor, even though a significant association would be observed given this experimental design. This limitation cannot be overcome using only the data that was collected, as information regarding the dynamics of Lp(a) over time is missing. To evaluate the influence of temporal bias, we estimated the size of the Lp(a)-MI association had the experiment been done prospectively. This analysis was done by simulating control-to-case trajectories using INTERHEART case/control population Lp(a) distributions by imputing the missing data. We conducted extensive sensitivity testing over different possible trajectories to evaluate the range of possible effect sizes. This approach allowed for the recalculation of the association strength as if the study had been conducted in a prospective manner from the beginning.

Table  1 summarizes the observed effect size in the simulated prospective trials compared to the reported baseline. In all cases, the simulated raw odds ratio between Lp(a) and MI was significantly lower than the observed raw odds ratio due to temporal bias present in the latter measurement. This is intuitive, since case individuals as a group will be more similar to controls (healthier) when sampled at random points in time rather than when they experience an MI event (Fig.  1B ). Although it cannot be proven that prospective effect sizes would be smaller, as this would require longitudinal data that do not exist, this experiment suggests that the degree of temporal bias scales with area under the imputed trajectory. In order to observe the reported odds ratio, the underlying trajectory would need to resemble a Heaviside step function in which cases spontaneously experience a spike in Lp(a) levels at the point of their divergence from the controls, an assumption that is neither explicitly made in the study nor has a basis in biology. We repeated the imputation process with Heaviside step function-based trajectories, varying the position of the impulse in the trajectory (Table  1 ). As the impulse location approaches the beginning of the trajectory, the effect size relative to the baseline approaches 1. This observation illustrates the assumption intrinsic in the original INTERHEART experimental design: that MI individuals had static Lp(a) measurements during the runup to their hospitalizations.

To characterize these findings in a real-world dataset, we examined the Lp(a) test values and MI status of 7128 patients seen at hospitals and clinics within the Partners Healthcare System-representing Brigham and Women’s Hospital and Massachusetts General Hospital among others-who had indications of more than one Lp(a) reading over observed records. This dataset included 28,313 individual Lp(a) tests and 2587 individuals with indications of myocardial infarction. We identified significant intra-individual variation in Lp(a) values in this population: the mean intra-individual standard deviation between tests was 12.2 mg/dl, compared to a mean test result of 49.4 mg/dl. These results are consistent with literature findings of significant intra-individual variance of Lp(a) values 25 , 28 , 29 , challenging the assumption that individuals could have static levels in the runup to MI. Furthermore, in this dataset, biased Lp(a) measurement selection among case exposure values varied the observed association strength between Lp(a) and MI by between 51.9% (preferential selection of lower values) to 137% (preferential selection of higher values) of what would have been observed with random timepoint selection. On the upper end, this is a conservative estimate: we would expect the deviation to increase upon correcting for ascertainment bias in the dataset. Control individuals would be less healthy than true controls, while cases would typically not be sampled immediately prior to an MI, and consequently appear to be healthier than INTERHEART cases. These findings suggest that temporal bias was likely to act in this study design as executed, in a manner that would reduce the observed utility of Lp(a) as a risk predictor for future MI.

Prospective prediction failure due to temporal bias

As the availability of observational data has skyrocketed, event prediction has become a popular task in machine learning. Because of this focus on prediction, many methods utilize the idea of a prediction window: a gap between when an event is observed and when features are collected 12 , 13 . A model that differentiates patients six months prior to MI onset from healthy matched controls may be said to detect MI six months in advance. However, because the window is defined relative to a case event, it represents an uneven sampling of the disease trajectory. Consequently, this prediction requires unfounded assumptions regarding the trajectory of MI onset. For example, if the trajectory is such that patients’ risk in the year prior to the MI is approximately uniform and significantly elevated from the control risk, a model trained in this way would provide many false positive 6-month MI predictions by falsely implicating patients more than 6 months away from an MI. Because window sizes are often chosen without respect to the underlying transition trajectory, significant potential for temporal bias still exists, driven by factors such as differential diagnosis periods or missed exposures.

To illustrate the impact of temporal bias in this case, we constructed predictors for childbirth: a phenotype that was chosen because of its well-defined trajectory. While the trajectory for delivery is a rare example of a step function, we demonstrated in the previous section that the use of case-control effectively imposes a step-function regardless of the true shape of the underlying trajectory. Rather than to present a toy example, this is intended to represent the extreme case of the potential consequences of releasing a predictive model trained in this manner.

In this system, cases and controls are significantly more difficult to distinguish more than nine or ten months prior to delivery compared to later in pregnancy because the case population is not yet pregnant. Features collected while the case population is pregnant are far more informative regarding delivery status. A case-control study that uses a window defined three months prior to delivery will capture these informative, pregnancy related features. In contrast, a cohort study examining all patients in January of a given year will capture largely uninformative features when the case individual’s delivery takes place late in the year (Fig.  2A ).

figure 2

A The ground truth trajectory for delivery (orange) is composed of parts: an informative period, 9–10 months prior to the delivery, and a largely uninformative period prior. Case-control windows (blue) are indexed to delivery/baseline date, and so only sample a single (informative) slice of the trajectory. Cohort windows (green) always occur in January, and so uniformly sample the trajectory. B Model performance (Validation AUROC) for deep recurrent neural networks and logistic regression for each study design. Error bars represent the 95% confidence intervals. Each box represents the results of 10 independently trained models. Box bounds represent upper quartile, lower quartile, and mean. Whiskers represent maxima and minima. C Comparison of confusion matrices for CC-CC (left) and CC-Cohort (right) models. Color intensity corresponds to matrix value. D CC-Cohort validation model confidence distributions for late (Oct/Nov/Dec) deliveries given January features.

Using 2015 data from a de-identified nationwide medical insurance claims dataset, we simulated three studies:

CC-CC: models trained and evaluated under the case-control (CC) paradigm: one month of records, three months prior to the delivery date (cases) or matched baseline date (controls) are used.

CC-Cohort: models trained under the case-control paradigm, but evaluated under the cohort paradigm, where records from January are used to predict delivery in 2015.

Cohort-Cohort: models trained and evaluated under the cohort paradigm.

For each simulated study, records within the observation window of diagnoses, procedures, and prescriptions ordered were fed into both deep recurrent neural nets (RNN) and logistic regression (LR) models.

The significant difference in performance (Fig.  2B ) between CC-CC and CC-Cohort models illustrates a central trait of temporally-biased sampling. Uneven sampling across the transition trajectory improves validation AUC under artificial validation conditions, but model performance collapses when deployed in a prospective manner. In contrast, models designed with the prospective task from the outset (Cohort-Cohort) had intermediate performance that reflected the inherent ambiguity of the available observations. These findings were robust across both RNN and LR-based models. In fact, while the more complex RNN performed better than the logistic regression model for the CC-CC task, it performed worse than the LR on the CC-Cohort task. In this case, methodological improvements on an unrealistic task led to more significant declines in performance on a more realistic task.

For women with October/November/December deliveries, claims data from January are mostly uninformative, and a reliable prediction at that point is not possible at the population level, especially when using features trained during pregnancy. The confusion matrices produced by CC-CC and CC-Cohort models revealed that much of the performance collapse can be traced to false negatives (Fig.  2C ). We examined the confidence that the deep convolutional networks assigned to October/November/December deliveries when evaluated on cohort structured data were predictive (Fig.  2D ). Models trained under using case-control incorrectly label these individuals as high confidence controls, while models trained using cohorts more appropriately capture the intrinsic ambiguity of the prediction task. Clinicians do not have the luxury of examining only patients three months/six months/one year prior to disease incidence: they must assess risk in real time. These studies are common in the machine learning literature- one study even described the act of aligning patients by disease diagnosis time as a feature, and a major reason why their framework was better able to stratify risk 14 . However, aligning patients in this way requires waiting until disease diagnosis, and so the superior risk stratification comes too late to be useful.

It is critical to note that this is a problem that cannot be solved methodologically. As evidenced by the comparison of the performance of the RNN and LR models, novel or exotic machine learning techniques cannot compensate for the fact that the data fed into the models represent a distorted view of the actual population distribution that would be encountered prospectively. Even with perfect measurement and modeling, temporal bias and the issues that result would still be present: the underlying trajectory would still be unobserved.

Temporal bias-induced replication failure

Studies that identify disease risk factors through nutrition data enjoy a particularly high profile among the public 30 . As an example, the Mediterranean diet (characterized by consumption of olive oil, fruits, vegetables, among other factors) has been implicated as a protective factor against coronary heart disease, but the mechanism for this association is unclear. One paper set out to examine whether olive oil consumption specifically was associated with MI using patients from a Spanish hospital 31 . MI patients and matched controls were interviewed regarding their olive oil consumption over the past year, and a protective effect against MI was observed among the highest quintile of olive oil consumers. In response, another group analyzed data from an Italian case-control study and were unable to identify the same association between the upper quintile of olive oil consumption and MI 32 . Crucially, these analyses differed in the size of the observation window used: one year and two years respectively. As a result, not only were these studies sampling the MI trajectory unevenly, they sampled different parts of the MI trajectory. To examine the degree to which differing amounts of temporal bias present in each study could have influenced the results of the study, we utilized longitudinal data from nearly 100,000 individuals from the Nurses’ Health Study (NHS) regarding olive oil consumption patterns and MI to provide a baseline ground truth. We simulated retrospective case-control studies that considered different lookback periods to determine if the presence or magnitude of a protective effect was sensitive to the manner in which an experiment was conducted. Figure  3A details the simulation setup: longitudinal records (Fig.  3A ) were used to identify case (red) and control (green) individuals. MI dates were identified for cases, and baseline dates for controls were selected to match the age distribution of the cases. For each patient, exposures during the lookback time are recorded. The association between MI and the observed exposures were then calculated and the influence of the lookback time on association strength was assessed.

figure 3

A Over a particular time period, longitudinal data of olive oil consumption is continuous for all cohort members with time. Circles represent MI events, while diamonds represent matched, but otherwise arbitrarily chosen baseline points for controls. B Case-control studies arbitrarily align MI patients at the date of the MI. As a result, the time dimension is inverted and anchored to the MI date, the position of controls is consequently lost. C Strength of olive oil consumption-MI association given years of consumption prior to baseline considered. Effect size is normalized to the average 1-year association strength. Points are colored based on statistical significance after FDR correction. Each box plot represents 200 repeated trials. Box bounds represent upper quartile, lower quartile, and mean. Whiskers represent maxima and minima.

The simulated studies that examined one year of past olive oil consumption relative to the MI/baseline date detected a protective effect, as originally observed. However, the magnitude and statistical significance of this effect decayed as the size of the lookback period was increased, consistent with the results of the failed replication. When a two-year lookback period was used, only 41% of simulated studies observed a statistically significant result (Fig.  3C ). The observed protective effect in these cases is an artifact of methodology, rather than medicine, physiology, or society. The act of looking back from the MI date/matched baseline has the effect of inverting the time axis to time-from-MI “and aligning the case individuals (Fig.  3B ). However, no such treatment is possible for control individuals, and their position along the new temporal axis is unknown. As a result, there is no functional basis for comparing healthy individuals to individuals artificially indexed to a future event (MI) because these represent groups that can only be identified retrospectively, after the MI has already occurred. A mismatch exists between the information utilized in the study and the information that patients or physicians would have access to when making dietary decisions. While there may indeed be a prospective association between olive oil and MI, protective or otherwise, the data to observe such an effect was not collected. Because both olive oil consumption and MI risk are time-varying features, the strength of the instantaneous association between the two will naturally depend on when each feature is measured.

Temporal bias can be thought of as a flaw present in the application of case-control experiments to the real-world diagnostic or prognostic task. Because these experiments do not uniformly sample the control-to-case trajectory, features and observations in certain parts of the trajectory are oversampled and assigned disproportionate weight. These observations also do not match the observations that physicians or patients have when assessing risk in real time. Because the case observations that are model-applicable can only be identified after the case event actually occurs, the resulting experimental findings are impossible to use prospectively. Temporal bias serves to amplify differences between the healthy and diseased populations, improving apparent predictive accuracy and exaggerating effect sizes of predictors. In prospective cases, it may also result in researchers failing to discover predictive signals that were outside the window considered. Because the magnitude of its effects is a function of an often-unobserved trajectory, temporal bias is poorly controlled for and can lead to replication bias between studies. The relative impact of temporal bias will scale with the dynamic range of the trajectory: a trajectory that contains large, dramatic changes is susceptible to bias, while trajectories composed of static features (genotype, demographics, etc) will largely be immune.

Temporal bias has existed alongside case-control studies from when they were first utilized. The first documented case-control study in the medical literature was Reverend Henry Whitehead’s follow-up 33 to John Snow’s famous report 34 on the Broad Street cholera outbreak. Whitehead aimed to evaluate Snow’s hypothesis that consuming water from the Broad Street pump led to infection. Whitehead surveyed both families of infected and deceased as well as individuals without cholera regarding their consumption of pump water during the time deaths were observed 35 , 36 .

The outbreak began on August 31st, 1854 34 , with deaths occurring in the days that immediately followed. Whitehead’s efforts in identifying pump-water exposure among outbreak victims focused on the time period between August 30th and September 8th, corresponding to a lookback time between 1 and 10 days, depending on when the victim died. This would normally result in temporal bias towards the end of cholera trajectory. Although Whitehead’s conclusions were ultimately correct, the brief incubation period (2 h to 5 days 37 ) of cholera contributed to the success of the experiment and Whitehead’s later ability to identify the index patient. The rapid transition from healthy to diseased ensured that Whitehead’s chosen lookback time would have uniformly sampled the disease trajectory but is also something Whitehead could not have known at the time. Had Whitehead instead been faced with an outbreak of another waterborne disease such as typhoid fever, which can have an incubation period as long as 30 days 38 , Whitehead’s chosen window would oversample exposure status in the runup to death, leading to temporal bias that would overemphasize features in the latter portion of the disease trajectory (Fig.  4A ). Because the disease etiology and trajectory were unknown at the time, the association between Broad Street water and death is much less clear in the case of a hypothetical typhoid fever epidemic. (In another instance with unclear etiology, a recent survey of COVID-19 predictive algorithms found a significant number utilizing case-control sampling 39 ). Figure  4B summarizes hypothetical interview data given Whitehead’s study design in the case of both a cholera and a typhoid fever outbreak. In the unshaded columns, which represent information he would have access to, the association between pump water consumption and mortality is only clear in the case of cholera.

figure 4

A Whitehead’s cholera study benefited from the short period between infection and death. Had Whitehead been faced with an outbreak of typhoid fever, his sampling strategy would oversample late-stage features. B Hypothetical interview data from Whitehead’s case-control study. Lacking underlying knowledge regarding disease etiology, Whitehead’s experimental design would have experienced temporal bias given a disease with a longer incubation period. Shaded columns represent information hidden to the investigator. C Randomizing the lookback window among case patients can uniformly sample the trajectory, if the lookback times go far back enough. D Evaluating person-days, person-weeks, or person-months can allow for the entire trajectory to be considered. E Conducting a cohort study by creating a well-defined date from which a look forward window is deployed does not uniformly sample the trajectory in all individuals, but is still prospectively implementable since the starting date can be determined in real time.

Many factors have contributed to unconscious adoption of bias-susceptible experimental designs. From a data efficiency perspective, case-control studies are often motivated by large class imbalances. A case-control experiment is one of the only ways to take efficient advantage of all minority class observations in a model. The analogous cohort experiment would require identifying a starting alignment date common to all study subjects. Furthermore, longitudinal observational data are often expensive or difficult to acquire, compared to the ease of one-shot, non-temporal case-control datasets. Without the use of retrospective observations, a case-control study is one of the only types that can be conducted immediately after the study is conceived, rather than waiting for observations to be generated, as in prospective studies.

More concerningly, publication biases towards larger effect sizes and higher accuracy may have driven researchers towards methods that accentuate the differences between cases and controls. Temporal bias can be interpreted as a relatively invisible symptom of this subconscious aversion towards ambiguity in prognostic models. Strong predictive models (in terms of accuracy) are naturally easier to create when structural differences between the two groups are used to provide additional signal. The increasing popularity of large data sets and difficult-to-interpret deep learning techniques facilitates this strategy.

This is not to say that case-control studies should be abandoned wholesale. These studies for practical reasons (data efficiency, cost, ease of deployment) have contributed countless numbers of discoveries across fields. However, a systematic understanding of where and why temporal bias exists is critical in the transition of research findings to applications in the clinic and beyond. There are several strategies to minimize temporal bias where it exists and evaluate its effects otherwise (Fig.  4C–E , examples are provided in Supplementary Note  2 ).

Assuming that a suitable control population can be identified, the following two conditions can enable uniform sampling of the control-to-case trajectory: i) the use of a randomized lookback time, and ii) the length of the maximum lookback time plus the length of the observation window is longer than the transition period.

Person-time classification or prediction tasks, where multiple windows are drawn from sufficiently extended case observations for use can also uniformly sample the trajectory in question. This approach takes the form of sampling case trajectories more than once, and weighing them according to prevalence. This can be facilitated through careful control criteria definitions, as the selection of sicker controls can shorten the trajectory considered in the experiment, likely at the cost of model discriminative ability.

The use of well-defined baseline dates in cohort studies can eliminate temporal bias. Assessing exposure after a particular birthday, at the start of a particular month/year, or after a well-defined event makes the prospective deployment population easier to identify.

Finally, sensitivity analyses combined with researchers’ background domain knowledge regarding the state transition trajectory in question can be used to estimate effects of prospective deployment. An increasing focus on considering the deployability of a given model, the nature of the underlying trajectory, or even whether a particular feature can realistically be predicted from features at hand can also serve to prevent temporal bias from infiltrating a study.

While temporal bias is common and has far reaching implications, it is unique among experimental or epistemological flaws in that once understood, it is fairly easy to detect. As experiments grow broader in scope, transparency regarding the extent to which temporal bias influences findings is key to ensuring the consistency of associations and predictions.

Lipoprotein(a) trajectory imputation

Centiles of lipoprotein A values [Lp(a)] for myocardial infarction (MI) of 4441 Chinese patients (cases) and healthy matched controls (controls) published by Paré et al. 27 were used to construct log-normal distributions of Lp(a) values for each cohort. One hundred fifty thousand case and control measurements were drawn and a linear model was fit to establish the baseline coefficient of association between Lp(a) and MI in the presence of temporal bias. For trajectory imputation, for each case patient, a starting Lp(a) value was generated using one of three methods: (i) random sampling from the control distribution such that the drawn value is smaller than the case value, (ii) percentile matching (if the case value fell in the Nth percentile of the case distribution, the Nth percentile value from the control was drawn), and (iii) a uniform shift of 15% (representing the observation that the median control value was 15% lower than the median case). This starting value is understood to represent the Lp(a) measurement of the case patient in the distant past at the point when they were cardiovascularly healthy. The case-ending value was directly drawn from the published distributions. For each pair of case-starting and case-ending values, a linear/logarithmic/logistic/step function was fit using the two values as starting and ending points. New case observations were generated by randomly selecting a point along the generated trajectory allowing for the computation of a prospective effect size. All individual experiments were repeated 100 times with newly drawn sample cohorts.

To examine the potential impact of inadvertent selection bias on the observed association between Lp(a) and MI, the Lp(a) values and MI for all patients with more than one Lp(a) observations prior to the first recorded MI event were extracted from the Partners Research Patient Data Registry database in a deidentified manner. This work was approved by the Partners Institutional Review Board (Protocol #2018P000016). Case and control patients were defined based on MI status, and for each patient in each cohort, the (i) largest available, (ii) smallest available, and (iii) mean Lp(a) values were computed and used to identify the observed effect size under each selection scheme by fitting a logistic regression model. All calculations were conducted in R (version 3.44) using the glmnet package, version 2.0-16.

Delivery prediction from sequential claims data

Records of health insurance claims in 2015 from a deidentified national database from Aetna, a commercial managed health care company, were utilized for this study. The Harvard Medical School Institutional Review Board waived the requirement for patient consent for analysis of this database as it was deemed to not be human subjects research. Delivery events were identified based on International Classification of Diseases (ICD9/10) diagnostic code, Current Procedural Terminology (CPT) code, or the birth year of newly born members linked by subscriber-parent annotations. Cases were defined as individuals who experienced a delivery between February and December, 2015, while controls were defined as individuals who did not experience a delivery during any of 2015. Thirty thousand cases were randomly selected and matched to 30,000 controls based on age and ZIP code. For each individual, case-control and cohort feature windows were defined. Case-control windows were set as the month of records that was three months prior to the delivery/matched baseline date for cases and controls respectively. Cohort windows were set as the month of records from January, 2015. Three studies were simulated: (1) The CC-CC study consisted of model training using case-control windows and model evaluation using case-control windows. (2) The CC-Cohort study consisted of model training using case-control windows and model evaluation using cohort windows. (3) The Cohort-Cohort study consisted of model training using cohort windows and model evaluation using cohort windows. For each study, deep recurrent neural networks and logistic regression models were trained over the features present in each window. For deep recurrent neural network-based models, the linear sequence of features inside the window was provided in the form of International Classification of Diseases (ICD9) codes for diagnoses, Current Procedural Terminology (CPT) codes for procedures, and National Drug Codes (NDC) for prescriptions. The sequence length was set to 20 events, individual sequences were either padded or clipped to meet this requirement. Logistic regression models utilized binary occurrence matrices for all events as features. Both models contained demographic information in the form of age. Sex was excluded as a feature because all cohort members were female. All calculations were conducted in Python 2.7.3 using the Keras 2.2.0 and scikit-learn 0.18.1 packages.

Simulation of olive oil/myocardial infarction case-control study

Data from the Nurses’ Health Study (NHS) was used for this analysis. All nutrition and disease incidence surveys between 1994 and 2010 were considered. Internal NHS definitions of first MI were utilized to define the case population. Case individuals were only considered if they had at least two consecutive nutritional surveys with answers to all olive oil related questions prior to the first MI event. Individuals with any history of cardiovascular disease including MI and angina were excluded from the control population. Control individuals were only considered if they had at least two consecutive nutritional surveys with answers to all olive oil related questions. In total, 3188 total qualifying MI individuals were identified, and 94,893 controls. A baseline date for each control individual was defined based on the availability of consecutive nutrition surveys. For each case, a matched control was identified based using age at baseline and sex. For all individuals, total cumulative yearly olive oil consumption was computed by summing olive oil added to food and olive oil salad dressing consumption, as validated by Guasch-Ferré et al. 40 . For each experiment, a lookback time between 1 and 4 years was selected, and the cumulative total olive oil consumed during the lookback time relative to the MI date/baseline was calculated. For each lookback time, the effect size between the top quintile (based on total consumption) and the remaining population and statistical significance were calculated using a two-sided t -test. Each experiment, including case-control matching, was repeated 200 times. All calculations were conducted in R (version 3.44) using the glmnet package, version 2.0-16.

Reporting summary

Further information on research design is available in the  Nature Research Reporting Summary linked to this article.

Data availability

The data that support the findings of this study are available from Aetna Insurance, but restrictions apply to the availability of these data, which were used under license for the current study, and so are not publicly available. Please contact N. Palmer ([email protected]) for inquiries about the Aetna dataset. Summary data are, however, available from the authors upon reasonable request and with permission of Aetna Insurance. All data utilized in the study from the Nurses’ Health Study (NHS) is available upon request with the permission of the NHS and can be accessed at https://www.nurseshealthstudy.org/researchers . All data utilized in the study from the Partners Research Patient Data Registry is available upon request with the permission of Partners Healthcare and can be accessed at https://rc.partners.org/research-apps-and-services/identify-subjects-request-data#research-patient-data-registry .

Code availability

Auxiliary code is available at https://github.com/william-yuan/temporalbias

Song, J. W. & Chung, K. C. Observational studies: cohort and case-control studies. Plast. Reconstructive Surg. 126 , 2234–2242 (2010).

Article   CAS   Google Scholar  

Marshall, T. What is a case-control study? Int. J. Epidemiol. 33 , 612–613 (2004).

Lewallen, S. & Courtright, P. Epidemiology in practice: case-control studies. Community Eye Health 11 , 57–58 (1998).

CAS   PubMed   PubMed Central   Google Scholar  

Weiss, J. C., Natarajan, S., Peissig, P. L., McCarty, C. A. & Page, D. Machine learning for personalized medicine: predicting primary myocardial infarction from electronic health records. AI Mag. 33 , 33 (2012).

Article   Google Scholar  

Steere, A. C. et al. Lyme borreliosis. Nat. Rev. Dis. Prim. 2 , 16090 (2016).

Steere, A. C. et al. Lyme arthritis: an epidemic of oligoarticular arthritis in children and adults in three connecticut communities. Arthritis Rheum. 20 , 7–17 (1977).

Norgeot, B. et al. Assessment of a deep learning model based on electronic health record data to forecast clinical outcomes in patients with rheumatoid arthritis. JAMA Netw. Open 2 , e190606 (2019).

Chou, R. C., Kane, M., Ghimire, S., Gautam, S. & Gui, J. Treatment for rheumatoid arthritis and risk of Alzheimer’s disease: a nested case-control analysis. CNS Drugs 30 , 1111–1120 (2016).

Integrative Analysis of Lung Cancer Etiology and Risk (INTEGRAL) Consortium for Early Detection of Lung Cancer. Assessment of lung cancer risk on the basis of a biomarker panel of circulating proteins. JAMA Oncol. 4 , e182078 (2018). et al.

Himes, B. E., Dai, Y., Kohane, I. S., Weiss, S. T. & Ramoni, M. F. Prediction of chronic obstructive pulmonary disease (COPD) in asthma patients using electronic medical records. J. Am. Med. Inform. Assoc. 16 , 371–379 (2009).

Rand, L. I. et al. Multiple factors in the prediction of risk of proliferative diabetic retinopathy. N. Engl. J. Med. 313 , 1433–1438 (1985).

Choi, E., Schuetz, A., Stewart, W. F. & Sun, J. Using recurrent neural network models for early detection of heart failure onset. J. Am. Med. Inform. Assoc. 24 , 361–370 (2017).

Wang, X., Wang, F., Hu, J. & Sorrentino, R. Exploring joint disease risk prediction. AMIA Annu. Symp. Proc. 2014 , 1180–1187 (2014).

PubMed   PubMed Central   Google Scholar  

Ranganath, R., Perotte, A., Elhadad, N. & Blei, D. Deep survival analysis; Proceedings of the 1st Machine Learning for Healthcare Conference, PMLR 56 , 101–114 (2016).

Masino, A. J. et al. Machine learning models for early sepsis recognition in the neonatal intensive care unit using readily available electronic health record data. PLoS One 14 , e0212665 (2019).

Mayhew, M. B. et al. A generalizable 29-mRNA neural-network classifier for acute bacterial and viral infections. Nat. Commun. 11 , 1177 (2020).

Article   CAS   ADS   Google Scholar  

Topol, E. J. High-performance medicine: the convergence of human and artificial intelligence. Nat. Med. 25 , 44–56 (2019).

Rajkomar, A., Dean, J. & Kohane, I. Machine learning in medicine. N. Engl. J. Med. 380 , 1347–1358 (2019).

Hernan, M. A. Estimating causal effects from epidemiological data. J. Epidemiol. Community Health 60 , 578–586 (2006).

Burns, P. B., Rohrich, R. J. & Chung, K. C. The levels of evidence and their role in evidence-based medicine. Plast. Reconstr. Surg. 128 , 305–310 (2011).

Rothman, K. J. Epidemiology: an introduction (Oxford University Press, 2012).

Yusuf, S. et al. Effect of potentially modifiable risk factors associated with myocardial infarction in 52 countries (the INTERHEART study): case-control study. Lancet 364 , 937–952 (2004).

Jacobson, T. A. Lipoprotein(a), Cardiovascular Disease, and Contemporary Management. Mayo Clin. Proc. 88 , 1294–1311 (2013).

Hippe, D. S. et al. Lp(a) (Lipoprotein(a)) levels predict progression of carotid atherosclerosis in subjects with atherosclerotic cardiovascular disease on intensive lipid therapy: an analysis of the AIM-HIGH (Atherothrombosis intervention in metabolic syndrome with low HDL/high triglycerides: impact on global health outcomes) carotid magnetic resonance imaging substudy-brief report. Arterioscler. Thromb. Vasc. Biol. 38 , 673–678 (2018).

Garnotel, R., Monier, F., Lefèvre, F. & Gillery, P. Long-term variability of serum lipoprotein(a) concentrations in healthy fertile women. Clin. Chem. Lab. Med. 36 , 317–321 (1998).

Nazir, D. J., Roberts, R. S., Hill, S. A. & McQueen, M. J. Monthly intra-individual variation in lipids over a 1-year period in 22 normal subjects. Clin. Biochem. 32 , 381–389 (1999).

Paré, G. et al. Lipoprotein(a) levels and the risk of myocardial infarction among 7 ethnic groups. Circulation 139 , 1472–1482 (2019).

Hoffmann, M. M., Schäfer, L., Winkler, K. & König, B. Intraindividual variability of lipoprotein(a) and implications for the decision-making process for lipoprotein(a) lowering therapy. Atherosclerosis 263 , e27 (2017).

Nazir, D. J. & McQueen, M. J. Monthly intra-individual variation in lipoprotein(a) in 22 normal subjects over 12 months. Clin. Biochem. 30 , 163–170 (1997).

Goldberg, J. P. & Hellwig, J. P. Nutrition research in the media: the challenge facing scientists. J. Am. Coll. Nutr. 16 , 544–550 (1997).

CAS   PubMed   Google Scholar  

Fernández-Jarne, E. et al. Risk of first non-fatal myocardial infarction negatively associated with olive oil consumption: a case-control study in Spain. Int. J. Epidemiol. 31 , 474–480 (2002).

Bertuzzi, M., Tavani, A., Negri, E. & La Vecchia, C. Olive oil consumption and risk of non-fatal myocardial infarction in Italy. Int. J. Epidemiol. 31 , 1274–1277 (2002). author reply 1276–7.

Paneth, N., Susser, E. & Susser, M. Origins and early development of the case-control study: Part 1, Early evolution. Soz. Praventivmed. 47 , 282–288 (2002).

Snow, J. On the mode of communication of cholera. Edinb. Med. J. 1 , 668–670 (1856).

Whitehead, H. The broad street pump: an episode in the cholera epidemic of 1854 , 113–122 (Macmillan’s Magazine, 1865).

Newsom, S. W. B. Pioneers in infection control: John Snow, Henry Whitehead, the Broad Street pump, and the beginnings of geographical epidemiology. J. Hospital Infect. 64 , 210–216 (2006).

Centers for Disease Control and Prevention. Cholera – Vibrio cholerae infection. Information for Public Health & Medical Professionals, https://www.cdc.gov/cholera/healthprofessionals.html . (2020).

Mintz, E., Slayton, R. & Walters, M. Typhoid fever and paratyphoid fever. Control of Communicable Diseases Manual (2015) https://doi.org/10.2105/ccdm.2745.149 .

Wynants, Laure et al. Prediction models for diagnosis and prognosis of covid-19: systematic review and critical appraisal. BMJ 369 , m1328 (2020).

Guasch-Ferré, M. et al. Olive oil consumption and risk of type 2 diabetes in US women. Am. J. Clin. Nutr. 102 , 479–486 (2015).AA

Download references

Acknowledgements

W.Y. was supported by the NVIDIA Graduate Fellowship, the T32HD040128 from the NICHD/NIH, and received support from the AWS Cloud Credits for Research and NVIDIA GPU Grant Program.

Author information

Authors and affiliations.

Department of Biomedical Informatics, Harvard Medical School, Boston, MA, USA

William Yuan, Brett K. Beaulieu-Jones, Kun-Hsing Yu, Scott L. Lipnick, Nathan Palmer, Tianxi Cai & Isaac S. Kohane

Department of Stem Cell and Regenerative Biology, Harvard University, Cambridge, MA, USA

Scott L. Lipnick

Center for Assessment Technology and Continuous Health, Massachusetts General Hospital, Boston, MA, USA

Department of Medicine, Brigham and Women’s Hospital, Boston, MA, USA

Joseph Loscalzo

Division of Data Sciences, VA Boston Healthcare System, Boston, MA, USA

Department of Biostatistics, Harvard T. H. Chan School of Public Health, Boston, MA, USA

You can also search for this author in PubMed   Google Scholar

Contributions

Conceptualization: W.Y., B.K.B-J., K-H.Y., T.C., I.S.K.; Methodology, Investigation, Writing – Original Draft: W.Y.; Writing – Review and Editing: All authors; Supervision: I.S.K.

Corresponding authors

Correspondence to William Yuan or Isaac S. Kohane .

Ethics declarations

Competing interests.

The authors declare no competing interests.

Additional information

Peer review information Nature Communications thanks Augusto Di Castelnuovo and the other, anonymous, reviewer(s) for their contribution to the peer review of this work.

Publisher’s note Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Supplementary information

Supplementary information, reporting summary, rights and permissions.

Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license, and indicate if changes were made. The images or other third party material in this article are included in the article’s Creative Commons license, unless indicated otherwise in a credit line to the material. If material is not included in the article’s Creative Commons license and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this license, visit http://creativecommons.org/licenses/by/4.0/ .

Reprints and permissions

About this article

Cite this article.

Yuan, W., Beaulieu-Jones, B.K., Yu, KH. et al. Temporal bias in case-control design: preventing reliable predictions of the future. Nat Commun 12 , 1107 (2021). https://doi.org/10.1038/s41467-021-21390-2

Download citation

Received : 12 September 2020

Accepted : 22 January 2021

Published : 17 February 2021

DOI : https://doi.org/10.1038/s41467-021-21390-2

Share this article

Anyone you share the following link with will be able to read this content:

Sorry, a shareable link is not currently available for this article.

Provided by the Springer Nature SharedIt content-sharing initiative

This article is cited by

A machine learning model identifies patients in need of autoimmune disease testing using electronic health records.

  • Iain S. Forrest
  • Ben O. Petrazzini

Nature Communications (2023)

An integrated pipeline for prediction of Clostridioides difficile infection

  • Durgesh Chaudhary

Scientific Reports (2023)

TransformEHR: transformer-based encoder-decoder generative model to enhance prediction of disease outcomes using electronic health records

  • Zhichao Yang
  • Avijit Mitra

Optic nerve thickening on high-spatial-resolution MRI predicts early-stage postlaminar optic nerve invasion in retinoblastoma

  • Christiaan M. de Bloeme
  • Robin W. Jansen
  • Marcus C. de Jong

European Radiology (2023)

Subpopulation-specific machine learning prognosis for underrepresented patients with double prioritized bias correction

  • Sharmin Afrose
  • Wenjia Song
  • Danfeng Yao

Communications Medicine (2022)

By submitting a comment you agree to abide by our Terms and Community Guidelines . If you find something abusive or that does not comply with our terms or guidelines please flag it as inappropriate.

Quick links

  • Explore articles by subject
  • Guide to authors
  • Editorial policies

Sign up for the Nature Briefing newsletter — what matters in science, free to your inbox daily.

john snow case control study

john snow case control study

  • MPH Modules

A Brief History of Public Health

Module Outline

  • What is Public Health?
  • Early Concepts of Disease
  • Ideas About Health
  • Industrial Revolution
  • The Enlightenment (1700-1850)
  • John Snow - Father of Epidemiology
  • The Sanitary Idea (1850-1875)
  • Public Health in the United States
  • 20th Century Public Health Achievements

On This Page (6 of 9)

  • Centers for Disease Control and Prevention History
  • Core Functions of Public Health
  • Public Health Museum in Massachusetts

John Snow - The Father of Epidemiology

Cholera is an infectious disease that became a major threat to health during the 1800s. The story has been elegantly told in The Ghost Map by Steven Johnson, who describes the conditions in London in the 1800s situation in the brief video below.

Cholera continues to be a problem throughout the world today (see Cholera in Haiti ). The next video describes the cholera epidemic in Haiti in 2010.

In the 1800s there were large epidemics of cholera in Europe and America that killed thousands of people. John Snow (shown below) was a physician in London who spent several decades studying cholera in a systematic way. He is most often credited with solving an outbreak of cholera that occurred in London in 1854 (the outbreak is described below), but his studies of cholera were much more extensive than that. The first cholera epidemic in London struck in 1831, when Snow was still an apprentice. Another large epidemic occurred in 1848 and lasted through 1849.

john snow case control study

The prevailing opinion was that cholera was spread either by miasmas or by person-to-person contact, Snow began examining the victims and found that their initial symptoms were always related to the gastrointestinal tract. Snow reasoned that, if cholera was spread by bad air, it should cause pulmonary symptoms, but since the symptoms were gastrointestinal, perhaps it was transmitted by water or food consumption. In fact, cholera is caused by the bacterium, Vibrio cholera, which is transmitted by the fecal-oral route, that is by ingestion of water or food that is contaminated with sewage.

In August 1849 Snow published a paper entitled " On the Mode of Communication of Cholera " in which he presented his theory that the disease was acquired by ingestion of contaminated water, but his theory did not get much traction with the medical establishment. The epidemic ended in 1849, but Snow continued to collect data on the pattern of of disease and began finding evidence that linked cholera to specific sources of water.

Many Londoners received their water from hand pump wells (below) that were located throughout the city.

john snow case control study

However, increasing numbers of businesses and homes had water piped from the Thames River by private companies. Snow learned from municipal records that two private companies supplied piped in water to the areas that were primarily affected by cholera. Some consumers were supplied by the Lambeth Company, while others were supplied by Southwark & Vauxhall. The map of London below shows the areas of London supplied by these two water companies.

john snow case control study

Southwark & Vauxhall pumped water from a more downstream location that was clearly contaminated, and the rates of cholera were clearly higher in their customers compared to those supplied by the Lambeth Company. Nevertheless, many were unconvinced by his findings, since Snow had not actually demonstrated that the water contained something that could cause cholera.

In late August of 1853, cholera broke out in the Broad Street area, and the residents panicked and many began to flee. A hand pump was located right on Broad Street, and Snow was immediately suspicious. Water samples did not reveal gross contamination, but Snow persisted and began to collect detailed information on where the victims had gotten their drinking water. He obtained the names and the addresses of the first 83 victims who had died by the end of the first week. He went to their homes and learned from relatives that the vast majority of them had obtained their water from the Broad St. pump.

On Sept. 6 Snow appeared at the meeting of the local Board of Guardians and presented his evidence that the pump was the source of the outbreak. He argued that the pump handle should be removed in order to prevent further contamination. The board was not convinced, but agreed to remove the pump handle as a precaution. The epidemic quickly subsided.

The investigation continued. Ultimately, Snow was able to track down 197 victims, the vast majority of whom lived within walking distance of the pump. It was also noted that there was an extremely low incidence of cholera at a nearby work house and also at the Lion Brewery, and both of these businesses had their own water supply. The workers at another large business used water from the Broad St. pump, and their workers had a substantial incidence of cholera.

The map below shows the location of the pump, and the home or business location of the victims is shown by stacks of small dark marks that are clearly clustered around the pump. This type of map, which marks the location of disease cases, is now referred to as a "spot map."

john snow case control study

An initial examination of the well failed to show any problems, casting doubt on Snow's conclusions, and the pump was reopened without incident. However, some months later an associate of Snow's stumbled upon the records of an infant who had died of diarrhea at the very beginning of the outbreak. The timing of her death indicated that she had been the first cholera case. Upon questioning, the mother said that she had emptied a pail of the infant's diarrhea into a cesspool in front of their house immediately adjacent to the water pump. The cesspool and the pump well were than excavated, revealing that the cesspool, which was within three feet of the well, was leaking, and the wall of the well was decayed, allowing the contamination from the cesspool to seep in. In retrospect, it appeared that once the child died, there was no further contamination of the well, and the epidemic ended.

This graph shows the number of cholera deaths over time. There is an abrupt increase in cholera deaths at the very end of August. Deaths peak on September 2, when there were about 130 deaths, and the cholera death rate gradually declines to near zero over the next three weeks.

john snow case control study

With knowledge of the incubation period for the disease, the shape of an epidemic curve can sometimes provide clues regarding the source of the epidemic. Cholera has an incubation period of only 1-3 days, and this graph indicates that new cases occurred over a period of about 10 days. This suggests a "continuous source" epidemic of cholera exposure because new cases continue to occur for more than one incubation period, suggesting an ongoing source of contamination.

In retrospect, Snow made several important contributions to the development of epidemiologic thinking:

  • He proposed a new hypothesis for how cholera was transmitted.
  • He tested this hypothesis systematically by making comparisons between groups of people.
  • He provided evidence for an association between drinking from the Broad St. well and getting cholera.
  • He argued for an intervention which prevented additional cases (removal of the pump handle).

Boston University logo

Content ©2015. All Rights Reserved. Date last modified: October 1, 2015. Office of Teaching & Digital Learning Boston University School of Public Health

The rate of new cases of a condition, symptom, death, or injury that develop during a specific time period. For example, incidence can be calculated by dividing the number of new cases during a given observation period by the number of people at risk of developing the condition

Continuous Source epidemic

An epidemic in which the causal agent (e.g. polluted drinking water, spoiled food) is infecting people who come into contact with it, over an extended period of time, i.e., over more than one incubation period. The incubation period for cholera is 1-3 days, but the Broad Street epidemic extended over a period of 3-4 weeks, so there was a "continuous source".

John Snow: A Legacy of Disease Detectives

Snow cholera map

John Snow, known as the father of epidemiology, was born on March 15, 1813. This week, we honor the birthday of the first true disease detective.

The Story of the Broad Street Pump

London, 1854: A cramped Soho neighborhood teems with people and animals living in cramped and dirty quarters. A deadly outbreak of cholera is spreading. Doctors and scientists believe it’s caused by “miasma,” or bad air. They theorize that particles from rotting matter and waste are getting into the air and making people sick.

Enter John Snow. An accomplished physician, he becomes convinced that something other than the air might be responsible for the illness. Through carefully mapping the outbreak, he finds that everyone affected has a single connection in common: they have all retrieved water from the local Broad Street pump.

On September 8, 1854, Snow tests his theory by removing the pump’s handle, effectively stopping the outbreak, proving his theory, and opening the door to modern epidemiology.

Valuable Lessons for a Modern Age

In 1854, John Snow was the first to use maps and records to track the spread of a disease back to its source. Today, his ideas provide the foundation for how we find and stop disease all over the world.

We have better, more modern tools now for identifying and tracking illness, like access to state-of-the-art labs and computer systems. We have in-depth knowledge of germs and how they spread. But when we train today’s disease detectives , we still return to the basics. CDC disease detectives are trained to look for clues by asking:

  • WHO is sick?
  • WHAT are their symptoms?
  • WHEN did they get sick?
  • WHERE could they have been exposed to the cause of the illness?

We live in a world where disease can travel across the globe in a matter of hours. This means we must not only apply these basic lessons of epidemiology, but we must constantly be looking for ways to find better answers, faster.

Disease Detectives Make a Difference

When outbreaks or other threats emerge, CDC’s disease detectives, some of whom are trained through our Epidemic Intelligence Service (EIS) , are on the scene. These boots-on-the ground staff, called EIS officers, support over 100 public health investigations (Epi-Aids ) each year in the U.S. and worldwide.

CDC’s disease detectives have been instrumental in tracking down threats like:

Anthrax: During the 2001 anthrax outbreak among U.S. postal workers, disease detectives investigated the route of contaminated envelopes and how workers became infected.

E. coli : For the first time, disease detectives conclusively showed that flour was the source of a 2016 E. coli outbreak. Millions of pounds of flour were taken off the shelves, including flour-containing products like bread, cake, and muffin mixes.

Seoul virus: Disease detectives have been working to track and stop an outbreak of Seoul virus , an emerging rodent-borne hantavirus , involving home-based rat breeders this year. The outbreak was first identified after two Wisconsin rat breeders became ill in December and, as of March 13, the investigation has so far included rat-breeding facilities in 15 states, with 17 people infected in seven states.

Like Snow’s map that revealed cases of cholera congregated around the Broad Street pump, we must keep tabs on where and how disease is spreading. Once the source of disease is identified, it is crucial to develop and implement interventions to help prevent people from getting sick. We must remain innovative and creative, like Snow when he removed the handle of the Broad Street pump to stop disease at the source.

  • MMWR : 150th Anniversary of John Snow and the Pump Handle
  • CDC : Epidemic Intelligence Service (EIS)
  • CDC: Epidemic Intelligence Service Conference
  • Public Library of Science : John Snow – The First Epidemiologist
  • TED Talk :  How the “ghost map” helped end a killer disease

7 comments on “John Snow: A Legacy of Disease Detectives”

Comments listed below are posted by individuals not associated with CDC, unless otherwise stated. These comments do not represent the official views of CDC, and CDC does not guarantee that any information posted by individuals on this site is correct, and disclaims any liability for any loss or damage resulting from reliance on any such information. Read more about our comment policy » .

As a public health officer JS’s method is still valid in low income countries.However, technology has gone milestones but can not do without JS’s expertise. Now on a separate note l am interested to embark on an EIS course. ANY ADVISE PLEADE

As a public health officer JS’s method is still valid in low income countries.However, technology has gone milestones but can not do without JS’s expertise. Now on a separate note l am interested to embark on an EIS course. ANY ADVISE PLEASE

Also didn’t John Snow discover that in one neighborhood of Soho the people did not have the same outbreak because they drank more beer and less water. The brewery was in that same neighborhood.

Not trying to be funny here, I believe this is a fact? Please correct me if I am wrong. This came to me from an MBA study on epidemiology many years ago.

thanks for sharing

Very informative for Public Health Students.

Eventually we know that JS’s method are used long time ago and we have advanced techniques of diseases detection but still the origin of the methods still valid as it was invented by JS’s !!! I am a field epidemiologist who greatly inspired with his contributions for public health and I believe no scientific knowledge will be evident today if he is not been there some 200+ years ago !

Post a Comment

Your email address will not be published. Required fields are marked *

To receive email updates about this page, enter your email address:

Exit Notification / Disclaimer Policy

  • The Centers for Disease Control and Prevention (CDC) cannot attest to the accuracy of a non-federal website.
  • Linking to a non-federal website does not constitute an endorsement by CDC or any of its employees of the sponsors or the information and products presented on the website.
  • You will be subject to the destination website's privacy policy when you follow the link.
  • CDC is not responsible for Section 508 compliance (accessibility) on other federal or private website.

john snow case control study

Distillations magazine

John snow hunts the blue death.

In showing that cholera spreads through tainted water, an English doctor helped lay epidemiology’s foundations.

illustration of a person with cholera

John Snow left his office at a run. The streets were empty, London a ghost town, for cholera had returned.

Three-quarters of the population had fled, while many without means to leave lay sick or dying in their homes. Though he was a doctor, Snow was not running to treat the sick; he was rushing to the office of the register general, which held the records of those who had recently died from cholera.

Then he started walking.

Snow spent the next two days making on-the-ground inspections and interviewing locals and doctors and two nights combing through government records. Snow went door to door, retracing his steps again and again, trying to fit death counts and sudden illnesses into a picture that made sense. Finally, after walking past the same water pump for what must have felt like the hundredth time, he took everything he had found and plotted it all on a map of the Soho area.

Today, we would call the result a Voronoi diagram, but when people saw it in 1854 they dubbed it a ghost map, one that showed Soho overwhelmed by stacks of black lines, each representing a death by cholera. Several water pumps used by residents are clearly marked on the map, though it takes a moment to notice them; instead, the eye is immediately drawn toward the thick black lines clustering around the epicenter, the location of the Broad Street water pump.

Snow’s research in the preceding decade had indicated that cholera spread through water, but time and again his findings had been ignored and dismissed. He knew the local parish government was unlikely to listen to him. Even so, he had to try. If the commission agreed to remove the pump’s handle—if he could show beyond a shadow of a doubt that the water beneath the Broad Street pump was the source of sickness—then maybe he could prevent further deaths.

Armed with his research and a stubborn resolve to make the medical world listen, Snow, the not-yet father of epidemiology, set out to make his case.

Born in 1813, Snow was inquisitive from the start. Or, as biographer Benjamin Ward Richardson would later put it, he was “very reserved and peculiar—a clever man at bottom perchance, but not easy to be understood and very peculiar.”

Still, his mother insisted that he do something with his intellect, and at age 14 Snow began a medical apprenticeship in Newcastle. He took to his studies with all the fervor and excitement he gave to any challenge. He was a teetotaler, not entirely unusual for the time, and a vegetarian, which was virtually unheard of among the people he knew, including his medical colleagues. Richardson later described how Snow’s vegetarianism “puzzled the housewives, shocked the cooks, and astonished the children.”

Black and white photo of man in suit

Snow’s first significant encounter with cholera occurred in 1832. The second cholera pandemic was sweeping through England, leaving hospitals severely understaffed. Snow, then a 19-year-old apprentice, was sent to care for the workers in the mining village of Killingworth.

At the time, the accepted theory among both doctors and laypeople was that cholera was spread through miasmas—noxious vapors often caused by rotting organic compounds that would, when inhaled, cause sickness. While in Killingworth, Snow kept a journal documenting what he saw, and according to his entries such a theory made no sense. If miasma theory was correct, then workers near the area’s sewage dumps should be getting sick, rather than the miners in the coal pits, as was so often the case. Given that miasmas were inhaled, Snow reasoned that a sickness caused by miasma would most likely begin in the lungs or the throat, but cholera was felt first in the intestines. It was far more likely, he argued, that the “poison” was found in something the locals had ingested.

In 1837, a year after he arrived in London, 24-year-old Snow earned his medical license. The following year he became a member of the Royal College of Surgeons. By 1854, when cholera again struck the city, Snow was a respected and seasoned doctor.

In writing for the Canadian Public Health Journal , J. L. Little paints a remarkable picture of Snow: “If one could combine in a sketch something of an ascetic, a total abstainer, a vegetarian, a bachelor, a lover of children, a devotee of the open heath, and an enthusiast for all things of good report, then we would be presenting John Snow as a man.” Richardson, a colleague and close friend of Snow, offers a similar, albeit drier, description of Snow. Richardson presents a man dedicated to helping members of the lower class and ascribes his lack of wealthy patients to the fact that he “was an earnest man with not the least element of quackery in all his composition, with a retiring manner and a solid scepticism in relation to that routine malpractice which the people love.”

King Cholera

They called it the blue death. As dehydration racked the body, blood would begin to thicken in patients’ veins; starved of oxygen, their skin would turn a sickly shade of blue. Cholera could hit hard and work fast; in severe cases apparently healthy Londoners would drop in the middle of the street and were often dead by the end of the day.

Cartoon of haggard-looking man with various medicines

Today cholera is treated with hydration (given either orally or intravenously), electrolytes, and antibiotics. But in the 19th century, treatment was more likely to consist of a vacation to the seaside (which sometimes appeared to work, but in reality did very little), purgatives (the last thing cholera patients needed), leeches, or opium. Because cholera was believed to spread through miasmas, the usual treatment was to remove the patient from the “bad air.” This approach occasionally worked, though not because the patient had been moved to a place with healthier air; rather, they were moved to a place with cleaner water. But only those with milder cases benefited.

While not wholly reliable, a change of scenery worked just enough to reinforce miasmas as the leading suspect in cholera’s proliferation.

At the time, an alternative theory of disease-causing, self-replicating “poisons” had entered medical discourse, but such an idea was still theoretical. Most doctors held to the theory that bodies of water from which Londoners drank, such as the River Thames, would dilute a “poison” past the point of potency, and thus such poisons could not be major sources of disease. One consequence was that sewage was removed from people’s homes and sent straight into the Thames. The river, however, was tidal, its flow intermittent, and London was full of cesspits located dangerously close to wells.

A Broader Look

The story of the Broad Street pump is famously associated with Snow, and it appears in every account of his life. Yet Snow’s story is not that of a man single-handedly ending an epidemic in one glorious act of gritty detective work; it is the story of a doctor’s pursuit of answers over the course of years.

Scattered reports of a strange smell at the Broad Street pump offered the first hint. Snow, however, would have to prove that the problem was not the smell of the water, as miasma theory would have it, but something in the water that was swallowed. After all, his ghost map could be used to argue that living near a pump that emitted a foul odor caused cholera. Snow needed to prove that the water had to be ingested for cholera to spread, and to do that he needed to answer two main questions: why were there such large clusters of people living near the Broad Street pump who weren’t getting sick, and why were there people outside the parameters who were?

Black and white map of an urban neighborhood

Snow interviewed patients, doctors, and local residents and cross-referenced the results with government records. Of those cholera victims who at first glance shouldn’t have drunk from the Broad Street pump, he found that most were schoolchildren or workers who stopped by the pump as part of their daily routine. In the case of two particularly out-of-the-way deaths, the lady of the house had grown fond of Broad Street water when living in the area and after moving had the water delivered to her new house, whereupon it eventually killed her and her visiting niece. As for nearby households that were never affected, all had pumps on the premises, or, in the case of an alehouse, were staffed by workers who almost never drank water during the day thanks to a daily allotment of beer.

Ultimately, it was an impressive enough case to convince the parish government to remove the pump’s handle. But once the epidemic had ended, the handle was returned and Snow’s theories of waterborne disease were left to languish.

A group of local scientists and public figures had attempted to determine the source of the epidemic, with Snow among their number, but most did not favor his theories. One such doubter was Henry Whitehead, a local Anglican priest whose personal mission since the beginning of the epidemic had been to dismiss all notable theories about cholera’s spread—including Snow’s—in favor of his belief that cholera was sent as a punishment from God. But the more Whitehead investigated Snow’s theory, the less he doubted.

Cartoon of a group of well-dressed men searching a city street

It was Whitehead who eventually tracked down the source of the outbreak around Broad Street. The earliest known case, he found, was that of the infant daughter of one of his congregants, who had washed the child’s soiled linens and emptied the dirty water into her house’s cesspit, which was situated in a basement right next to the Broad Street pump; the cesspit was old and its foundation was cracked, causing it to leak directly into the pump’s well.

Water Troubles

Snow first published his thoughts on the disease in 1849 in “On the Mode of Communication of Cholera,” though he was reluctant to make any definitive claims about cholera’s spread without sufficient proof; such was Snow’s way in everything he published. He studied two cholera outbreaks in the London area and in both cases traced the contaminated water back to wells fed from certain stretches of the Thames. Even so, the idea that cholera was largely waterborne was still speculation—highly educated and well-backed speculation, but not a watertight case.

Then, in 1854, sandwiched around his famous work on Broad Street, came the study that helped lay the statistical foundations of epidemiology: the Southwark and Vauxhall and the Lambeth water companies.

Color satirical cartoon

Sometime between the epidemic of 1849 and 1853, the Lambeth Waterworks Company changed its water intake from the center of town to Thames Ditton, which was upstream of London’s sewage. Meanwhile, the Southwark and Vauxhall Water Company continued to source its water from the Thames at Battersea Fields, downstream of the city’s sewage. Snow dug through water company records and found that cholera struck customers serviced by Lambeth less frequently than those serviced by Southwark and Vauxhall. He set out to measure the difference.

It was, at first glance, the perfect scientific experiment, with clearly defined parameters unmuddied by extraneous factors. Snow describes how “each water company supplied alike both rich and poor, and thus there was a population of 300,000 persons, of various conditions and occupations, intimately mixed together, and divided into two groups by no other circumstance than the difference of water supply.”

The experiment proved to be decidedly less than perfect when Snow realized Southwark and Vauxhall’s organizational system was as inferior as its water source. In a British Medical Journal article published in 1857, Snow described the Lambeth operations and the clearly labeled ledger of customers listed by location. Southwark and Vauxhall, in contrast, had “a kind of alphabetical arrangement” that was no use to Snow; its list was in no real geographical order and could only narrow a customer’s location to within an area of 10 to 15 square miles. Throwing darts at a map of London would be roughly as effective.

Color-coded map of London

As on Broad Street, Snow took to knocking on doors. He and the colleagues working with him double- and triple-checked the water supplier for each cholera death on record. Once Snow had accounted for any possible misreporting, the results proved staggering: mortality rates in Southwark and Vauxhall houses were six times greater than in Lambeth houses.

Table of cholera death figures

Snow and his colleagues went over the results repeatedly, ensuring the data was as accurate as possible, to the point where even Snow, far from one to hail his own achievements, hazarded that “it probably supplies a greater amount of statistical evidence than was ever brought to bear on a medical subject.”

Epidemiologist Wade Hampton Frost would later call Snow “a nearly perfect model” for studying epidemiological situations because his analysis “led him to the confident conclusion that the specific cause of the disease was a parasitic micro-organism, conforming in all essentials of its natural history to what is now known of the Vibrio cholerae .”

Snow did not live to see germ theory take hold within the scientific community, nor did he live to see London’s overhaul of its sewer systems and the end of cesspits, but he did get to see some of the short-term results of his epidemiological research.

In 1857, the year before he died, he wrote “On the Origin of the Recent Outbreak of Cholera at West Ham,” in which he noted that “no water company [now] draws its supply from any part of the Thames which is within reach of pollution by the shipping, or the sewers of the town.” Death rates in London decreased significantly in the second half of the pandemic after changes in water supply became more widespread. Within a few years of Snow’s untimely death by stroke, cholera would become a rarity in England.

Miriam Reid was a staff writer for Distillations as part of Drexel University’s co-op program.

More from our magazine

Black and white photo of girl with a cotton plant

Rings of Fire

Arsenic cycles through racism and empire in the Americas.

Color photo of two men in suits, one without a shirt, photographed walking in the dark

Valery Fabrikant and Science’s Ethical Limits

Is it right to publish research from an unrepentant murderer?

Engraving of young Victorian woman crouch at feet of seated older woman

How Notorious Abortionist Madame Restell Built a Drug Empire

Desperate women, mistreated by the 19th century’s medical establishment, risked black-market remedies and the wrath of Anthony Comstock’s moralizing thugs.

Copy the above HTML to republish this content. We have formatted the material to follow our guidelines, which include our credit requirements. Please review our full list of guidelines for more information. By republishing this content, you agree to our republication requirements.

John Snow and the Birth of Epidemiology

Even though this physician pre-dated germ theory, he was able to track a London outbreak of cholera to one particular water pump.

John Snow

An 1854 cholera outbreak in London confounded those who thought the disease was caused by miasma, or foul air. Enter John Snow, who had already made a name for himself by administering chloroform to Queen Victoria during childbirth. Snow was skeptical of the reigning miasmatic theory of disease because of his own experiences fighting cholera. Even though he pre-dated germ theory and didn’t know that a bacterium caused cholera, he nonetheless tracked the outbreak of the disease.

JSTOR Daily Membership Ad

According to evolutionary biologist Susan Bandoni Muench, in the mid-nineteenth century, London had a population density greater than Manhattan’s today. Sanitary conditions weren’t particularly good even for the upper classes, while at the bottom rungs at least 100,000 people scavenged rags, bones, coal scraps, and night soil.

Dr. John Snow Cholera Map

During an earlier cholera outbreak in London, Snow wondered how the “blue death”—so called because of the blue tinge of the victims’ skin—spread. In an 1849 monograph, he postulated that cesspools might be spreading human waste to drinking water. This idea was met with scorn. For several years he mapped past incidences of the disease, compared neighborhoods and neighbors, and virtually invented epidemiology.

John Snow water pump

When the 1854 epidemic hit, killing 700 people in a matter of weeks, Snow was ready. He knocked on doors and interviewed families with cholera. What united all the cases? They got their water at this Broad Street pump in the Soho neighborhood. Snow’s research was reinforced by the Reverend Henry Whitehead, who initially doubted Snow’s thesis. But Whitehead found the same Broad Street pump connection by interviewing those locals who hadn’t gotten cholera; these people hadn’t used that pump.

The water pump’s handle was removed to render it inoperable. Cases of cholera plummeted afterwards. Snow himself noted that the epidemic was probably petering out by then anyway, as epidemics tend to do. But shutting the pump down clearly had an effect on the mortality of the epidemic.

Snow thought of cholera’s spread as analogous to a gas’s diffusion, but in the medium of water, not the air as the miasmatists had it. And Snow really knew his gasses, since he had been experimenting for years on chloroform, ether, ethyl nitrate, carbon disulphide, benzene, and several other potential anesthetics. These experiments were performed on animals and… himself .

Epidemiologist A.R. Mawson suggests that “ extensive and prolonged self-experimentation with anaesthetics over a 9-year period led to Snow’s renal failure, swollen fingers and early death from stroke.” Snow was only 45 when he died. An early bout with tuberculosis and a probable vitamin D deficiency from his vegetarian diet (since the age of 17) wouldn’t have helped.

cholera woman

Snow followed an exemplary lifestyle by today’s standards—he didn’t drink, didn’t eat meat, and exercised vigorously. He even distilled his own water while he lived in the heart of the Soho epidemic. His work ethic seems admirable, too. But he couldn’t know his very work was hazardous to health. Exposure to anesthetic gasses damaged kidneys and livers, as well as the nervous and reproductive systems.

For public health, Snow sacrificed his own.

JSTOR logo

JSTOR is a digital library for scholars, researchers, and students. JSTOR Daily readers can access the original research behind our articles for free on JSTOR.

Get Our Newsletter

Get your fix of JSTOR Daily’s best stories in your inbox each Thursday.

Privacy Policy   Contact Us You may unsubscribe at any time by clicking on the provided link on any marketing message.

More Stories

Camellia sinensis

  • Camellia sinensis : Labor and the Tea Plant

Andromeda Galaxy

100 Years after the “Great Debate”: How Edwin Hubble Expanded the Cosmos

Anaxagoras

Anaxagoras and the Eclipse: The First to Get It Right

Eoneophron infernalis

A Fresh Hell (Chicken)

Recent posts.

  • Mark Twain’s Obsession with Joan of Arc
  • Ukraine, Russia, and the West: A Background Reading List
  • Suppressing the Black Vote in 1811
  • From Jamaica to the World: Contextualizing Bob Marley

Support JSTOR Daily

Sign up for our weekly newsletter.

Library homepage

  • school Campus Bookshelves
  • menu_book Bookshelves
  • perm_media Learning Objects
  • login Login
  • how_to_reg Request Instructor Account
  • hub Instructor Commons
  • Download Page (PDF)
  • Download Full Book (PDF)
  • Periodic Table
  • Physics Constants
  • Scientific Calculator
  • Reference & Cite
  • Tools expand_more
  • Readability

selected template will load here

This action is not available.

Biology LibreTexts

Case Study: John Snow and the Origin of Epidemiology

  • Last updated
  • Save as PDF
  • Page ID 26899

This page is a draft and is under active development. 

John Snow and the Origin of Epidemiology; “You Know Nothing, John Snow.”

Part i—beginnings.

Jon Snow 1.png

John Snow was born in York, England, in 1813, the first of 9 children of a working-class family. Snow’s wealthy and well-connected uncle, arranged an apprenticeship for his nephew with a surgeon-apothecary, one of the two types of health care providers in 19th century London. Physicians were graduates of the medical programs at Oxford or Cambridge while surgeon-apothecaries went through a longer apprenticeship, attending classes part-time at smaller medical schools. John Snow moved to Newcastle at the age of 14 to apprentice with William Hardcastle. It was in Newcastle, near the end of Snow’s apprenticeship, that he first encountered cholera as it arrived in England in 1831.

In his medical studies, Snow learned the prevailing humoral model of disease, which held that health depended on the balance of four humors: blood, phlegm, black bile, and yellow bile. Diseases resulted from an excess or deficit in one of these four humors. To correct the problem, physicians would use leeches to bleed patients or purgatives to cause diarrhea or vomiting. The humoral model eventually was replaced with the miasma model of disease, which suggested that diseases were caused by pollution or “bad air.” At the time, the Germ Theory of Disease had not been established, and physicians didn’t fully understand the nature of disease and its transmission.

After being released from his apprenticeship, John Snow was one of the first physicians to study and calculate dosages for ether and chloroform as surgical anaesthetics. It was his work with anesthesia and gases that made him doubt the miasma model of disease.

1. Compare the two different types of medical professions of the 19th century. What modern professions would compare to these?

2. *Search cholera and list the major symptoms of the disease. Why is it sometimes called “The Blue Death?”

3. Compare the humoral model of disease (part 1) to the miasma model of disease. Which do you think is closest to our modern understanding of disease?

Part II: Sanitation in the 19th Century

Jon Snow 2.png

London in the middle of the 19th century contained 2.5 million people, housed in 30 square miles, a population density greater than present-day Manhattan. The Soho district of London had a serious problem with filth due to the large influx of people and a lack of proper sanitary services: the London sewer system had not reached Soho. Many cellars had cesspools underneath their floorboards.

A cesspit (cesspool) was an underground holding tank used for the storage of feces. Some pits were emptied when they became full; cleaned out by tradesmen using shovels and horse-drawn wagons. Some cesspits were designed to allow liquid to leach into the soil. Because of the population density in London, many of these cesspits were overflowing; waste accumulated in basements, courtyards, and even the streets. Since the cesspools were overrunning, the London government decided to dump the waste into the River Thames.

Because of the problems of waste disposal, few Londoners had a source of drinking water uncontaminated by human sewage. At that time, a total of nine different water companies supplied Londoners with water, obtained from either shallow wells or the Thames River. Some companies had their intake pipes farther upstream than others. Water obtained from pipes downstream were more likely to be contaminated with human waste.

4. What was it like to live in London in the early 19th century?

5. Examine the diagram showing a cesspit. Before houses installed cesspits, chamber pots were dumped into the streets. What were some advantages to having a cesspit? What were the disadvantages?

6. How does changing the location of the pipes (either upstream or downstream) improve water quality? Sketch an image of the river and the intake pipes and sewer pipes to show the ideal location.

Part III: Outbreak

Jon Snow 3.png

The germ theory was not created at this point (as Louis Pasteur would not create it until 1861), so Snow was unaware of the mechanism by which the disease was transmitted, but evidence led him to believe that it was not due to breathing foul air as the miasma model would suggest. He first published his theory in an essay On the Mode of Communication of Cholera in 1849 which proposed that cholera was transmitted in water. The essay received negative reviews in the Lancet and the London Medical Gazette. However, a reviewer made a helpful suggestion in terms of what evidence would be compelling: the crucial natural experiment would be to find people living side by side with lifestyles similar in all respects except for the water source.

Snow sought ways of strengthening his argument by carrying out the crucial experiment sought by the Medical Gazette’s reviewer. He went door to door interviewing families of cholera victims.

Snow began marking cholera deaths on city maps, and patterns began to emerge. He mapped out the locations of individual water pumps and generated cells which represented all the points on his map which were closest to each pump.

7. Based on the data shown in the map, which pump is the most likely source of the cholera infection? Circle all pumps that might also be suspect in this investigation.

8. The cluster of cases near Saville Row might be considered an outlier as they have their own pump nearby. What questions would Snow want to ask family members in this area?

Part IV: Snow Makes His Case

In Snow's own words:

On proceeding to the spot, I found that nearly all the deaths had taken place within a short distance of the [Broad Street] pump. There were only ten deaths in houses situated decidedly nearer to another street-pump. In five of these cases the families of the deceased persons informed me that they always sent to the pump in Broad Street, as they preferred the water to that of the pumps which were nearer. In three other cases, the deceased were children who went to school near the pump in Broad Street...

The result of the inquiry, then, is, that there has been no particular outbreak or prevalence of cholera in this part of London except among the persons who were in the habit of drinking the water of the above-mentioned pump well.

—John Snow, letter to the editor of the Medical Times and Gazette

Although Snow's chemical and microscopic examination of a sample of the Broad Street pump water was not able to conclusively prove its danger, his studies of the pattern of the disease were convincing enough to persuade the St James parish authorities to disable the well pump by removing its handle. At this point, John Snow had partnered with Reverend Henry Whitehead who assisted with interviewing families and tracking the disease. Whitehead succeeded in identifying an earlier case, an infant living in a house a few feet from the Broad Street pump who died from diarrhea two days before the cholera outbreak was officially recognized.

After excavation of the Broad Street well, it was found that it had been dug only three feet from an old cesspit that had begun to leak fecal bacteria. A mother of the baby who had contracted cholera had its diapers washed into this cesspit and was likely the source of the original infection.

12. What was John Snow’s original hypothesis and how did it conflict with prevailing models of health and disease?

13. Why would evidence of cholera in people living side by side, differing only in water supply, provide critical evidence ?

14. Snow found that none of the monks in the adjacent monastery contracted cholera. They drank only beer, which they brewed themselves. Does this mean that beer made with contaminated water is safe to drink? How could you test this?

Part V: The Aftermath

jon snow 4.png

Although many continued to reject Snow’s explanation, some began to give it grudging acceptance, often without acknowledging his contribution. Snow’s vindication came at a meeting of the Medical Society where a member stood up after such a presentation insisting that Snow be given credit. The pump is now a historic site in London and is located in front of the John Snow Pub.

15. The basic questions of epidemiology focus on time and place: “why here” and “why now”. What are the answers to these questions for the Broad Street outbreak?

16. Why was the death of the baby a significant observation for this study?

17. Epidemiology relies on non-experimental tests of hypotheses. What was Snow’s hypothesis and how did he test it without performing experiments.

18. Consider the term “non-experimental.” Given you had no ethical concerns with testing on humans, how would you test the hypothesis in an “experimental” way?

London Map: str.llnl.gov/str/September02/Hall.html

Science Stories: Using Case Studies to Teach Critical Thinking By Clyde Freeman Herreid, Nancy A. Schiller, Ky F. Herreid books.google.com

"Septic tank EN" by Olek Remesz (wiki-pl: Orem, commons: Orem) - Own work, based on this picture by Zielu20. Licensed under CC BY-SA 3.0 via Wikimedia Commons - http://commons.wikimedia.org/wiki/File:Septic_tank_EN.svg#mediaviewer/File:Septic_tank_EN.svg

http://www.ph.ucla.edu/epi/snow/snowbook.html - “On the Mode of Communcation of Cholera”:

The Birth of Genetic Fingerprinting, an Invaluable Tool for CSI

5 questions for beginners on artificial intelligence, openmind books, scientific anniversaries, e-elt: the world's next largest telescope, featured author, latest book, john snow and the origin of a new medicine in the time of cholera.

London was the centre of the world in 1850, but also a pit of filth and disease. Overcrowded and without a complete sewerage system, people living in areas that are today central and exclusive used to throw their waste into the street or into the River Thames. The continuous outbreaks of cholera also plagued the London populace until John Snow managed to find the connection between these two events, although he had to risk his prestige as an eminent doctor in Victorian England to do so. His great success helped to tear down the scientific theories of the day about infections and marked the birth of modern epidemiology.

During his childhood, he himself had suffered from the squalid sanitary conditions in the rapidly growing cities in the throes of the Industrial Revolution. The son of a coal worker, John Snow (15 March 1813 – 16 June 1858) was born in one of the poorest neighbourhoods in the city of York and was the first of nine children. As a child, he was noted for his mathematical skills and in 1827, at the age of 14, he became an apprentice to a surgeon and apothecary in Newcastle. He combined this post with studies at a newly established medical school; it was then that he had his first contact with cholera, during the second cholera pandemic, which came to Europe from Asia. Snow began investigating these epidemic outbreaks, working with surgeon Thomas M. Greenhow.

BBVA-OpenMind-Francisco Doménech-Francisco Doménech-John Snow y el origen medicina colera-1-Retrato de John Snow hacia 1856. Crédito: Wikimedia Commons

After a decade of schooling and medical experience, in 1836 John Snow moved to London to finally study medicine at the university. The following year, he began an internship at Westminster Hospital, where he stood out for his sense of observation: during his night shift, he designed experiments to study the origin of the diseases that so affected the students who performed autopsies at the hospital. He found that the common cause was poisoning from the arsenic fumes used to preserve bodies. That changed the practice of anatomical dissections and ended the use of arsenic in candle making.

An apprentice that became Queen Victoria’s surgeon

In 1844 he was finally able to complete his studies, add the title of doctor to his name and open a practice in the nowadays vibrant London borough of Soho. Dr. Snow gained prestige by applying scientific experiments to demonstrate the validity of his medical innovations, especially in relation to anesthesia. This new technique to avoid pain in operations and deliveries was still very unsafe, due to the lack of precise knowledge of the properties of the substances used as anaesthetics. Snow was one of the first to learn how to calculate the proper doses of chloroform and ether; he also designed devices and masks to apply them safely to patients and wrote a medical guide for their use. His renown was such that he was chosen to personally administer chloroform to Queen Victoria during the birth of her penultimate son, Leopold, in 1853. This contributed to the public acceptance of anesthesia.

However, John Snow is remembered today for another achievement, which put his reputation as a doctor in jeopardy, right when he was at the pinnacle of his career. During his long years as an apprentice and student, he had been taught the miasma theory: the “bad airs” that caused infectious diseases such as cholera or bubonic plague, according to the consensus of the scholars of the day. But there was something about this theory that did not make sense to Snow. He thought that if cholera was caused by harmful fumes, patients would exhibit some kind of respiratory symptom, which they did not. In addition, during the 1849 outbreaks, he conducted a case study and found that the incidence and death rate was much higher in South London, where the waters of the Thames were much more polluted than those drunk by the inhabitants of the rest of the British capital. In his article On the Mode of Communication of Cholera , he concluded that the cause was “morbid matter” invisible to the human eye, which patients ingested and which caused severe diarrhoea.

That hypothesis, which today is pure common sense, was then a challenge to established knowledge. Since the theory that microbes cause infections had not yet prevailed, Snow could not explain what this invisible and infectious matter was —just as Austrian physician and scientist Ignaz Semmelweis could not explain why doctors had to wash their hands to avoid spreading diseases from one patient to another.

A convincing proof that was not enough

Without being able to resort to his experimental demonstrations, the opportunity to act came in 1854 when a new and more serious cholera epidemic struck the United Kingdom. John Snow thoroughly investigated each case, talking to the sick and their families and pinpointing them on a map of London, searching for a correlation with the places from which the patients had obtained their drinking water. He was able to identify a water pump on Broad Street as the source of the outbreak in the Soho neighbourhood. His map of cholera convinced local authorities that this public water source had to be closed, and the number of cases began to drop dramatically.

BBVA-OpenMind-Francisco Doménech-Francisco Doménech-John Snow y el origen medicina colera-2-Detalle del mapa original de John Snow, mejorado digitalmente, que muestra el centro del brote en el Soho (y ningún caso en la fábrica de cerveza). Fuente: UCLA Department of Epidemiology

For the success of that large-scale test, John Snow is remembered today as the founder of modern epidemiology. But at the time it was not enough. Despite the evidence, public health experts believed in the miasma theory, and the handle of the water pump was reinstalled, just as the neighbours demanded —a measure Snow fought until he died of a stroke in 1858, at age 45.

The facts proved him right in the decades following his death: during the next cholera epidemic (in 1866), health authorities proved that Snow’s ideas were valid and that the water from that pump was mixed with faecal water; in 1884 Robert Koch finally identified the faecal bacterium Vibrio cholerae as the agent causing cholera. A few years earlier, Louis Pasteur’s experiments had already shown that microbes were the cause of infections and also explained why brewery workers had remained immune to the 1854 outbreak around that Broad Street water pump. Fearful of that water, they drank only beer (produced from boiled water). These days, on a corner of that London street, one finds that same water pump, the John Snow Pub and a commemorative plaque placed in memory of Snow’s great scientific achievement.

BBVA-OpenMind-Francisco Doménech-Francisco Doménech-John Snow y el origen medicina colera-3-Placa conmemorativa y pub John Snow, en la calle Broadwick (antes Broad) de Londres. Crédito: Matt Brown

Francisco Doménech @fucolin

Related publications.

  • 3 Historic Discoveries That Allow Us to Fight Against COVID-19
  • Turning Knowledge into Health
  • Joseph Lister, the Man who Sterilized Surgery

More about Science

Environment, leading figures, mathematics, scientific insights, more publications about ventana al conocimiento (knowledge window), comments on this publication.

Morbi facilisis elit non mi lacinia lacinia. Nunc eleifend aliquet ipsum, nec blandit augue tincidunt nec. Donec scelerisque feugiat lectus nec congue. Quisque tristique tortor vitae turpis euismod, vitae aliquam dolor pretium. Donec luctus posuere ex sit amet scelerisque. Etiam sed neque magna. Mauris non scelerisque lectus. Ut rutrum ex porta, tristique mi vitae, volutpat urna.

Sed in semper tellus, eu efficitur ante. Quisque felis orci, fermentum quis arcu nec, elementum malesuada magna. Nulla vitae finibus ipsum. Aenean vel sapien a magna faucibus tristique ac et ligula. Sed auctor orci metus, vitae egestas libero lacinia quis. Nulla lacus sapien, efficitur mollis nisi tempor, gravida tincidunt sapien. In massa dui, varius vitae iaculis a, dignissim non felis. Ut sagittis pulvinar nisi, at tincidunt metus venenatis a. Ut aliquam scelerisque interdum. Mauris iaculis purus in nulla consequat, sed fermentum sapien condimentum. Aliquam rutrum erat lectus, nec placerat nisl mollis id. Lorem ipsum dolor sit amet, consectetur adipiscing elit.

Nam nisl nisi, efficitur et sem in, molestie vulputate libero. Quisque quis mattis lorem. Nunc quis convallis diam, id tincidunt risus. Donec nisl odio, convallis vel porttitor sit amet, lobortis a ante. Cras dapibus porta nulla, at laoreet quam euismod vitae. Fusce sollicitudin massa magna, eu dignissim magna cursus id. Quisque vel nisl tempus, lobortis nisl a, ornare lacus. Donec ac interdum massa. Curabitur id diam luctus, mollis augue vel, interdum risus. Nam vitae tortor erat. Proin quis tincidunt lorem.

Biomathematics: the numerical secrets of biology

Do you want to stay up to date with our new publications.

Receive the OpenMind newsletter with all the latest contents published on our website

OpenMind Books

  • The Search for Alternatives to Fossil Fuels
  • View all books

About OpenMind

Connect with us.

  • Keep up to date with our newsletter

This page uses frames, but your browser doesn't support them.

Cholera, John Snow and the Grand Experiment

A British physician first determined that cholera spread through contaminated water in the 1850s, but the disease remains a major health risk today

Sarah Zielinski

Sarah Zielinski

20110520102403643px-Snow-cholera-map-1-300x279.jpg

I started reading about cholera over the weekend after hearing that health officials had confirmed several cases of the disease among victims of the recent Pakistani floods. Cholera is a bacterial disease that produces diarrhea and vomiting; people with the disease can die within hours if they don't get treatment. About 3 million to 5 million people suffer from cholera each year, mostly in developing countries, and 100,000 die from it, according to the World Health Organization .

This led me to the history of cholera and John Snow . Snow is credited with the discovery that cholera is transmitted through sewage-tainted water. His map of London's Soho region is often reproduced in biology textbooks with the story of how he made his discovery. Snow mapped out the cases of cholera during an 1854 outbreak and determined where each of the infected families obtained their water.

20110520102403643px-Snow-cholera-map-1-300x279.jpg

He would later write :

I found that nearly all the deaths had taken place within a short distance of the pump. There were only ten deaths in houses situated decidedly nearer to another street-pump. In five of these cases the families of the deceased persons informed me that they always sent to the pump in Broad Street, as they preferred the water to that of the pumps which were nearer. In three other cases, the deceased were children who went to school near the pump in Broad Street...
With regard to the deaths occurring in the locality belonging to the pump, there were 61 instances in which I was informed that the deceased persons used to drink the pump water from Broad Street, either constantly or occasionally...
The result of the inquiry, then, is, that there has been no particular outbreak or prevalence of cholera in this part of London except among the persons who were in the habit of drinking the water of the above-mentioned pump well.

The Broad Street well, Snow concluded, was contaminated with cholera (it was later found to have been built near an old cesspit). The well's pump handle was removed and the cholera outbreak ended. This is where most textbooks end. But there's a second part to the story—Snow's Grand Experiment.

There were parts of London that received their water from two distinct sources, the Southwark-Vauxhall Company and the Lambeth Waterworks Company. This was an ideal set-up for Snow for an experiment. Both companies drew water from the Thames, but Lambeth's intake was farther upriver—and thus less likely to be contaminated with the city's sewage—than Southwark-Vauxhall's.

Snow compiled data on the two sets of London households and found that during an 1854 epidemic there were 315 deaths from cholera per 10,000 homes among those supplied by Southwark-Vauxhall but only 37 deaths per 10,000 Lambeth homes.

That would seem to be a slam dunk in the research world, but Snow had gotten his numbers not from an extensive house-to-house search, which would have been too much work for even a team of men, but from a less-precise parliamentary report. Neither Snow nor many of his detractors believed his results were strong enough to make the case that cholera was related to water supply.

A few years ago, Thomas Koch and Kenneth Denike, of the University of British Columbia, re-evaluated the Grand Experiment and found even more problems with his methods and statistics. “The grand experiment ... was a failure,” Kock recently told The Scientist .

The irony, of course, is that Snow was right. As cities cleaned up their water supplies over the subsequent decades, cholera ceased being such a problem. But with more than a billion people worldwide lacking access to clean drinking water , the disease will remain with us for years to come.

Get the latest Science stories in your inbox.

Sarah Zielinski

Sarah Zielinski | | READ MORE

Sarah Zielinski is an award-winning science writer and editor. She is a contributing writer in science for Smithsonian.com and blogs at Wild Things, which appears on Science News.

turtle

The Biology Corner

Biology Teaching Resources

two turtles

Case Study: John Snow and the Origin of Epidemiology

case study

This case study explores a time before the Germ Theory when doctors were uncertain how disease was spread.  Current models, such as the humoral or miasma model could not fully explain how cholera infected some households, but not others.   This case story explores London, England in 1854 after an outbreak of cholera which had a mortality rate of 12.8%.  

Students examine a map which shows houses where cholera had been reported and places where there were common water sources and follow the steps of John Snow, who is considered the “Father of Epidemiology .”    This methodology is still used today to establish patterns of infections for emerging diseases.

In addition to epidemiology, the case study explores the history of sanitation, and how cesspits likely contributed to the spread of cholera.

Grade Level:  10-12 Time Required:  1-2 hours

Related posts:

bacteria

Shannan Muskopf

  • Language Models
  • Free Software
  • Managed services
  • TURNKEY SOLUTIONS

john snow case control study

  • Articles, videos & papers >>
  • Latest From the Blogs >>

john snow case control study

  • Announcement See all

john snow case control study

  • Install Software
  • Schedule a Call

Home » AI Case Studies

AI Case Studies

Kaiser Permanente is one of the USA’s largest health plans, serving 12.3 million members via 39 hospitals and over 217,000 employees. This NLP case study shows how it leveraged John Snow Labs’ AI Platform (for model training, deployment, and monitoring) and Spark NLP (for extracting key features from EMR notes) to optimize hospital patient flow models. The solution enabled real-time decision-making and strategic planning, by predicting:

  • Safe staffing levels
  • Hospital gridlock

Kaiser Permanente uses Spark NLP to integrate domain-specific NLP as part of a scalable, performant, measurable, and reproducible ML pipeline and improve the accuracy of forecasting the demand for hospital beds.

Usermind built its data science platform from scratch and deployed into production with live customers in 3 months – by using John Snow Labs’ out-of-the-box AI Platform. The deployed capabilities included data integration, visualization, training machine learning models, and deploying models to production – all within a hardened, enterprise-grade environment.

John Snow Labs delivered a whole new revenue stream for Usermind within three months.

The Khuluma project was carried out to enhance positive mental health amongst HIV-positive adolescents in South Africa, with the aim of reducing and ultimately preventing the spread of HIV & AIDS. Khuluma is an integrated, cost-effective and scalable health platform, leveraging the power of mentoring to facilitate interactive closed groups of 10-15 participants. John Snow Labs delivered the data analysis and data science aspects of the project, whose results were presented at the AIDS 2016 Conference.

It was a pleasure to work with John Snow Labs and we were deeply impressed by their passion for the work we do and by their drive for extracting the best data from Khuluma.

Artificial Intelligence In Service

Spark NLP for Healthcare was used to provide accurate, scalable, and healthcare-specific pipelines for OCR, sentence segmentation, spell checking, biomedical named entity recognition, assertion status (negation) detection, and entity resolution (to ICD and NDC codes). John Snow Labs’ AI Platform was used to develop, deploy, and operate the custom models within the required privacy, security, compliance, and scalability environment.

SelectData provides clinical coding, audit, and revenue cycle management services to the home health and hospice industry. Automating parts of the coding workflow – from diagnosis & medication extraction to coder assignment – required deep understanding of a variety of noisy, long, scanned, free-text patient records and reports. It also requires domain expertise since the context, vocabulary, and meaning of text is healthcare- and specialty-specific

Spark NLP augments the SelectData Data Science Platform to extract fuzzy, implied, and complex facts from home health patient records.

U.S. flag

An official website of the United States government

The .gov means it’s official. Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

The site is secure. The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

  • Publications
  • Account settings
  • Advanced Search
  • Journal List
  • HHS Author Manuscripts

Logo of nihpa

John Snow’s legacy: epidemiology without borders

This Review provides abstracts from a meeting held at the London School of Hygiene and Tropical Medicine, on April 11–12, 2013, to celebrate the legacy of John Snow. They describe conventional and unconventional applications of epidemiological methods to problems ranging from diarrhoeal disease, mental health, cancer, and accident care, to education, poverty, financial networks, crime, and violence. Common themes appear throughout, including recognition of the importance of Snow’s example, the philosophical and practical implications of assessment of causality, and an emphasis on the evaluation of preventive, ameliorative, and curative interventions, in a wide variety of medical and societal examples. Almost all self-described epidemiologists nowadays work within the health arena, and this is the focus of most of the societies, journals, and courses that carry the name epidemiology. The range of applications evident in these contributions might encourage some of these institutions to consider broadening their remits. In so doing, they may contribute more directly to, and learn from, non-health-related areas that use the language and methods of epidemiology to address many important problems now facing the world.

An external file that holds a picture, illustration, etc.
Object name is nihms493871u1.jpg

The causes of epidemiology

Cesar g victora.

Whether or not John Snow should be regarded as the father of epidemiology is open to debate, because some of his predecessors such as William Farr might also be credited with such a distinction. However, Snow was unquestionably an essential link in the causal chain that led epidemiology to what it is now. His ingenuity in placing cholera cases on a geographical grid, and in comparing cholera incidence according to sources of household water supply, constituted groundbreaking innovations in the development of the epidemio logical approach.

An ongoing series of articles in the International Journal of Epidemiology is taking stock of the status of epidemiology in the five continents. 1 – 5 As is the situation in many epidemiological studies, a key issue is the definition of a case. No accepted definition exists of an epidemiologist, since they include a range of professionals from highly trained academics to public health practitioners in local health authorities, with variable levels of training in epidemiology but who undertake essential surveillance and monitoring functions. Regardless of the definition, however, the series in the International Journal of Epidemiology is clear in showing that epidemiologists seem to be just as unequally distributed throughout the world as is the case for income, technology, and most other worldly goods.

In the past few decades, the term epidemiology has been largely associated with aetiological investigations and the use of increasingly sophisticated statistical methods. One might even say that epidemiologists are obsessed with finding causes. However, in global terms, this pursuit is only one of several applications of our discipline. In countries where most deaths occur, we know very little about the precise number of such deaths or the diseases that caused them, and even less about the frequency of major non-fatal illnesses. Epidemiological capacity is lowest in Africa and in south Asia, which are the world’s regions with the greatest disease burden. 3 , 4 Not only are fewer epidemiologists trained there than in other regions, but poor working conditions and low salaries contribute to the epidemiological brain drain from these areas, similar to the situation for doctors and nurses. An epidemiological divide clearly exists.

Which causes, therefore, should epidemiologists be pursuing 200 years after Snow’s birth? At a time when the post-2015 global development agenda is being established, epidemiologists can certainly make an important contribution to the cause of sustainable health and development. We definitely need more and better aetiological studies, but I would argue that we are relatively well served with these, in comparison with the unmet need for high-quality health data in the world’s poorest and sickest regions. The Brazilian scientist Mauricio Rocha e Silva was once asked about snakebite statistics in Brazil. He replied that there were no reliable data: “where there are snakes, there are no statistics; and where there are statistics, there are no snakes”. In my view, the most pressing need for epidemiology in today’s unequal world is to develop measurement capacity in the regions where our skills are needed most, to support evidence-based health planning and policy making. Capacity to measure disease burden, monitor trends, establish determinants, and assess the effect of public health interventions and programmes is scarce in such settings. We urgently need more John Snows—epidemiologists who count cases, investigate why these occurred, and, rather than waiting for others to act, become directly involved in evidence-based public health actions.

An external file that holds a picture, illustration, etc.
Object name is nihms493871u2.jpg

Inference of causes

Kenneth j rothman.

In his writings on cholera, Snow revealed his thinking about how causal inference works. His articulate arguments were laid out meticulously in great confidence, such as when he concluded “Whilst the presumed contamination of the water of the Broad Street pump with the evacuations of cholera patients affords an exact explanation of the fearful outbreak of cholera in St James’s parish, there is no other circumstance which offers any explanation at all, whatever hypothesis of the nature and cause of the malady be adopted.” 6

This confidence might seem reasonable, especially in hindsight, but Snow’s logic was not uniformly airtight. In discounting the role of “offensive effluvia”, a then-popular theory to explain how cholera spread, he noted that “many places where offensive effluvia are very abundant have been visited very lightly by cholera, whilst the comparatively open and cleanly districts of Kennington and Clapham have suffered severely. If inquiry were made, a far closer connection would be found to exist between offensive effluvia and the itch, than between these effluvia and cholera; yet as the cause of itch is well known, we are quite aware that this connection is not one of cause and effect.” 6

I am reluctant to pick at Snow’s brilliant work, but I cannot help but notice that this argument, although disarmingly strong, is premised on the invalid concept that any known cause of disease precludes other factors from being causes. That is to say, it presumes that only one cause exists for a disease, and if that is known, then to seek other causes is futile. However, by way of counterexample, epidemiologists can show that many organisms can cause pneumonia, and that the role of smoking in causing lung cancer does not preclude ionising radiation or asbestos from also causing lung cancer. Furthermore, the identification of proximal causes does not rule out causes further upstream, as illustrated by Davey-Smith’s discussion of socioeconomic factors in causing cholera in the 19th century. 7

Every disease has several causes, in two senses: first, many causal pathways can exist that end in the disease, starting with distal antecedents and progressing towards proximal causes; and second, each causal pathway has multiple components that act in concert to produce the effect through that mechanism. To define causes is easier than to lay out the rules for causal inference, if any such rules actually exist. Hume, Russell, Popper, and others have explained that induction—the prediction of future events on the basis of past events—can be shown to be naive and illogical. As Russell put it, “Domestic animals expect food when they see the person who feeds them. We know that all these rather crude expectations of uniformity are liable to be misleading. The man who has fed the chicken every day throughout its life at last wrings its neck instead, showing that more refined views as to the uniformity of nature would have been useful to the chicken.” 8

But do better methods of causal inference exist? Popper’s approach of conjecture and refutation was an attempt to solve Hume’s problem, although not to everyone’s satisfaction. Some people, notably Feyerabend, have argued against the existence of inferential rules, and others believe that quantification of uncertainty with bias analysis, 9 including Bayesian methods, is the best of all the imperfect solutions. Today, the foundation of causal inference is not much stronger than it was in Snow’s time, but the weaknesses are more evident.

An external file that holds a picture, illustration, etc.
Object name is nihms493871u3.jpg

Cancer, viruses, and causality in the 21st century

Patrick s moore, yuan chang.

Up to one in five cancer cases worldwide are now known to be caused by infection, and mainly by only seven human viruses. However, new genomic technologies are revealing hundreds of previously unknown agents. How well does epidemiology do in terms of telling us whether any of these new agents actually cause cancer?

Two very distinct stories emerge from the two most recently discovered human cancer viruses: Kaposi’s sarcoma herpesvirus and Merkel cell polyomavirus. Kaposi’s sarcoma became infamous early in the AIDS pandemic when it struck previously healthy men who have sex with men. This cancer was a medical enigma, with many people suggesting that HIV itself was the trigger. However, by the late 1980s, careful epidemiology revealed that Kaposi’s sarcoma is caused by a second, as then undiscovered, infectious agent. 10 This latent agent would manifest as a cancer when an infected person becomes immunosuppressed. The expected prevalence of the so-called Kaposi’s sarcoma agent was predicted to be high in Africa and parts of the Middle East and the Mediterranean, but relatively rare in northern Europe and America. The agent was postulated to be sexually transmitted among men who have sex with men in developed countries but poorly transmitted through blood contact, such as through transfusion. We discovered this aetiological agent by isolating two small genomic fragments of Kaposi’s sarcoma herpesvirus in 1994, which allowed the development of tests to rapidly confirm the epidemiological predictions. 11 Despite initial controversy, Hill’s criteria for Kaposi’s sarcoma herpesvirus causing Kaposi’s sarcoma were fulfilled quickly and, in epidemiological terms, Kaposi’s sarcoma herpesvirus is a well behaved virus. Nowadays, the virus is the most common cause of cancer in some parts of Africa.

An external file that holds a picture, illustration, etc.
Object name is nihms493871u4.jpg

More than a decade later, we found another potential cancer agent, 12 this time for a highly aggressive but rare skin cancer called Merkel cell carcinoma. As with Kaposi’s sarcoma, Merkel cell carcinoma is more common in immunosuppressed people. Merkel cell polyomavirus is the first polyomavirus to be convincingly linked to human cancer. However, the virus is typically a harmless, near-universal member of our skin flora. In less than 5 years since its discovery, new diagnostics are being used in patients with Merkel cell carcinoma, and promising molecular therapies are in development to treat this previously intractable cancer.

Beyond provision of the first clues that Merkel cell carcinoma might be caused by infection, quantitative epidemiology provided little help to establish causality between Merkel cell polyomavirus and Merkel cell carcinoma. Molecular studies carried this burden by convincingly showing that patients with Merkel cell polyomavirus and Merkel cell carcinoma undergo stepwise molecular changes that include loss of immune surveillance, clonal viral integration into the host genome, and mutation to the virus itself. 13 The possibility that cancers can arise from mutations to a commensal virus rather than the host cell could fundamentally change our searches for the causes of human cancers. However, not all Merkel cell carcinomas contain Merkel cell polyomavirus, and the virus is found in people without the cancer, which are violations of traditional notions for infectious causality. New probabilistic or Bayesian approaches to causality—taking into account molecular biology—are urgently needed to weigh up this information.

New technologies have uncovered dozens of previously unknown human polyomaviruses, papillomaviruses, and other agents, many of which are ubiquitous but could still hold the answers to long-sought causes for some chronic human diseases. 14 To avoid becoming irrelevant, modern epidemiology must develop approaches to adequately assess molecular biological information when establishing causality.

Beyond belief: on the gap between knowing and doing

John Snow recommended handwashing and personal hygiene for the prevention of cholera almost 160 years ago. 15 At about the same time, Ignaz Semmelweis showed that childbed fever could be prevented by hand disinfection. 16 Generations of parents have since attempted to instil handwashing habits into their children. 10 years ago, we reviewed the evidence and concluded that handwashing with soap could reduce the risk of diarrhoea by 47% and potentially s ave 1 million lives in developing countries. 17 The fact that handwashing can prevent disease is common knowledge; even in rural areas of developing countries with low literacy rates, more than 90% of those surveyed are aware of the importance of handwashing with soap. 18 However, actual practice remains low. In countries such as India, China, Ghana, Tanzania, Peru, and Kyrgyzstan, we have recorded rates of post-toilet handwashing with soap of below 20%. In UK motorway service station toilets, sensors showed that only 64% of women and 32% of men were washing their hands with soap. 19 Knowledge about handwashing has clearly not been translated into actual practice.

An external file that holds a picture, illustration, etc.
Object name is nihms493871u5.jpg

Most of the major health problems faced worldwide need changes in behaviour—whether it be to eat less, exercise more, sleep under a bednet, or practise safe sex. 20 Although more sophisticated approaches are gaining ground, most health campaigns are still based on the assumption that changes in knowledge will lead to the desired changes in behaviour. Realisation of the weaknesses of educational, belief-based approaches has led us to seek new solutions. By reviewing both recent behavioural science and the practice of marketing in multinational companies, we developed the Evo-Eco approach to behaviour change.

Behaviour is an evolved phenomenon that has been around for far longer than human beings and predates rational thought. Most human behaviour is a consequence of ancient reflexes and motives. 21 , 22 Much of what we do is not under conscious control but is motivated unconsciously, or done reactively in response to cues. 23 , 24 Brains direct bodies to produce behaviour that would have been adaptive in the physical, biological, and social environments of our ancestors, 25 either as a matter of routine; in response to cues, context-based rules and roles; or in response to opportunities that present themselves to meet evolutionarily important needs.

With this approach, we designed and tested a campaign to introduce handwashing with soap in villages in Andra Pradesh, India. The campaign used emotionally affecting appeals to nurture, disgust, and status, and cues designed to change habit. Furthermore, it attempted to redefine mothers’ roles in their social settings as so-called SuperMums (SuperAmmas). The campaign avoided explicit health messaging and banned mention of disease, doctors, or diarrhoea. A cluster-randomised controlled trial showed handwashing with soap to be 19% higher in the SuperAmma intervention villages than in the control groups (from a baseline of almost complete absence), and evidence of a change in perceived norms around handwashing behaviour was noted.

If the Evo-Eco approach is now proving to be useful beyond hygiene for both public health and for market development in the private sector, this is because it addresses the many other emotional, habitual, and situational factors that affect behaviour beyond knowledge and belief.

Cholera: genetic sequencing, shame, and blame

David l heymann.

When a cholera outbreak led to a civil disturbance in Haiti 9 months after the 2010 Haitian earthquake, cholera once again took a place at the interface of health, water, and sanitation; international travel; and global politics. The first known cases of the outbreak were reported downriver from a UN Stabilization Mission base, and suspicion fell on the UN Stabilization Forces as the source of the infection. 26 When third-generation real-time DNA sequencing linked the strains of Vibrio cholerae circulating in Haiti to endemic strains in Nepal, and a member of the Swedish Diplomatic Service publicly announced this link in a Swedish newspaper, the skirmish in Haiti intensified and resulted in gunfire and death. 27

In November, 2011, the Institute for Justice and Democracy in Haiti filed a claim on behalf of 5000 Haitians who had recovered from cholera in an effort to demand the UN to provide a national water and sanitation system; to pay compensation for losses due to cholera; and to make a public apology. 28 The UN, however, invoked its legal immunity and announced unwillingness to compensate, basing its decision on the 1947 convention that grants the UN immunity for its actions. The UN also noted an independent report that had concluded that a series of events in Haiti, not only an importation, had led to the cholera outbreak, and that the genetic sequences of the organism are not unique to Nepal, but are also found in other parts of south Asia. 29 – 31

Genetic sequencing has become a powerful method in investigation of outbreaks, and it has confirmed understanding in the 19th century that linked global spread of cholera to trade routes, returning military forces, and migration. Genetic sequencing also confirms what John Snow recognised in 1848, when he linked the introduction of cholera in London to a seaman who had travelled to the city from Hamburg—that infectious diseases respect no borders. Snow’s investigation at that time led to his theory of contagion, and he concluded that wells and water pipes would have to be kept isolated from drains, cesspools, and sewers to stop transmission. 6 years later, after cholera returned to London, his careful fieldwork and two famous maps confirmed his theory, and led to measures to stop transmission of cholera. 15 , 32

Snow would certainly be surprised to learn that 150 years after he removed the pump handle in Soho, a cholera outbreak continues under the same unsanitary conditions he observed in 19th century London, in a world where safe water and sanitation should be within the reach of all people. Although he could not have imagined the power of 21st century genetic sequencing in identification of the probable source of the cholera outbreak in Haiti, he would certainly not have been surprised to learn that infections spread globally. He might have been disappointed that some turn this information to shame and blame. He rose above that shame and blame to create an environment that could interrupt the transmission of enteric pathogens. When will Haiti and its partners do the same?

Treatment of violence as an epidemic disease

Gary slutkin.

Throughout history, only infectious diseases and violence have killed up to tens of millions of people in epidemic form. However, in the past 200 years, we have made substantial progress in more successful management of infectious diseases, as a result of scientifically understanding the epidemiology, microbiology, and invisible forces of transmission. Yet, our understanding of violence remains stuck in thoughts of bad people and morality that we replaced a long time ago for infectious diseases by understanding their biological underpinnings.

An external file that holds a picture, illustration, etc.
Object name is nihms493871u6.jpg

A scientific view of violence reveals both population and individual characteristics that closely resemble infectious diseases. 33 Population characteristics include the tendency for event clusters, epidemic curves, and capacity for spread. The clustering seen in maps of killings in US cities resembles maps of cholera in Bangladesh. Historical graphs showing outbreaks of killing in Rwanda resembled graphs of cholera in Somalia. Spread of violence is seen in street retaliations, gang wars, UK riots, and the recent crisis in Syria.

At the individual level, exposure to violence through observing or witnessing of violence or direct victimisation leads to increased likelihoods of perpetuation of violence by the individual exposed. 34 This pattern has been shown for many types of violence, including child abuse, domestic violence, community violence, and suicide. Furthermore, transmission occurs across these forms of violence—eg, exposure to child abuse increases the likelihood of not only child abuse but also community violence, and vice versa. Exposure to war leads to a greater likelihood of subsequent performance of violence in one’s own community or family. Something is transmitted across a range of syndromes. Epidemiological characteristics of infectious diseases are also present for violence, including exposure, dose–response associations, variable susceptibilities, incubation periods, clinical syndromes, dormancy, and relapse.

The invisible processes that underlie transmission of violence are not completely known but seem to include mirror-like cortical circuits that mediate observational learning (imitation), and dopamine, pain-mediating, and other pathways that facilitate following and group behaviours. 33 The effects of trauma on the limbic system further accelerate the contagious process. The brain dysregulates in response to—and to cause—transmission of violence, similar to how the intestine dysregulates salt and water absorption, facilitating the transmission and further spread of cholera.

The epidemic control model for reduction of violence begins with basic epidemiological mapping, detection and interruption of potential events, cessation of spread through behaviour change, and modification of social expectations and norms. 35 Entirely new categories of disease control workers include violence interrupters, behaviour change agents, and others who are selected, trained, and supported for each of these functions in a unified system. This method, now referred to as Cure Violence, has undergone three independent assessments and has shown up to 100% reductions in retaliations in the settings of a killing, a statistical association between interruptions and drops in killings, and 34–73% reductions in shootings and killings. 36 , 37 This method helps to validate the theory, and offers a new way to reverse the age-old problem of violence, based on an epidemiological framework and biological understanding. Cure Violence is now working in 15 cities and seven countries.

An external file that holds a picture, illustration, etc.
Object name is nihms493871u7.jpg

John Snow and today’s world

Robert m may.

Explicitly mathematical approaches to epidemiology date from Daniel Bernoulli’s evaluation, in 1760, of the efficacy of variolation against smallpox. However, most people acknowledge John Snow’s spot map analysis (itself effectively mathematical) of the cholera epidemic in 1854 as the birth of modern epidemiology.

Mathematical modellers (myself included) have, however, been rather slow to recognise the mathematically inconvenient fact that one cannot usually treat a population as homogeneous, with all people transmitting infection at a roughly average rate. In particular, in studies of gonorrhoea by Hethcote and Yorke, 38 and later studies of HIV/AIDS in its early days, investigators found it impossible to explain what was going on without acknowledging substantial heterogeneities in patterns of sexual-partner acquisition, and the consequent disproportionate effect of so-called superspreaders. For any given value of the basic reproductive number, R 0 , such high heterogeneity in infectiousness implies that the superspreaders are most likely to become infected, and also most likely to transmit infectiousness. Thus, the epidemiologically relevant factor is not the average number of partners per person, but rather the mean-square number divided by the mean. 39 This fact has obvious implications for intervention—namely, to focus attention on the superspreaders. However, one further complication prevails. A detailed analysis depends not only on knowledge of the distribution of partner numbers, but also on the contact patterns: are they associative (those with many partners interacting mainly with similar people), disassociative (the opposite: highly active people associating mainly with those who have few partners), or merely random. In the case of HIV/AIDS, if the contact patterns are associative, the epidemic will develop more quickly, but will burn out more quickly, and fewer people are likely to become infected. By contrast, if the distribution patterns are disassociative, the epidemic will develop more slowly, but more people will be infected in the long run.

Not distracted by mathematical convenience, Snow explicitly recognised heterogeneity in transmission. In doing so, he was guided by facts: brewery workers who drank beer rather than water had lower infection rates, whereas the opposite was noted in the washerwomen who were at high risk from handling soiled linen and in the lady in West Hampstead who preferred water from the Broad Street pump. Paul Fine argues convincingly that “it was this recognition of heterogeneity which allowed him [Snow] to crack the problem”.

Notably, some of this work is finding applications in studies that might be called “stability and complexity in financial ecosystems”. 40 In the build-up to the recent financial crises, an increasingly elaborate set of financial devices emerged (especially derivates), intended to optimise returns to individual institutions with seemingly little risk. In essence, no attention was paid to the possible effects on the stability of the system as a whole. An increasing amount of work draws analogies with the dynamics of ecological food webs and with networks within which infectious diseases spread. For the latter analogy, one can view the dodgy financial devices as newly emerging infectious agents. Indeed, the recent rise in financial assets and the subsequent crash have rather precisely the same shape as the typical rise and fall of cases in an outbreak of measles or other infection. Such curves also characterise past financial bubbles, such as tulip mania or the South Sea Bubble of the early 18th century.

One basic question, of course, is how to prevent a problem that arises in one bank from cascading through the entire banking system. Here, insights from medical epidemiology have been helpful, and indeed the word superspreaders is now used often. 41 Unfortunately, studies of the interconnections between big and little banks suggest that these networks are disassociative.

An external file that holds a picture, illustration, etc.
Object name is nihms493871u8.jpg

Another aspect of the current financial crises is the way in which low confidence can spread throughout the system, leading to liquidity hoarding (shortening or calling-in loans) and thus amplifying problems. 42 Here again, the analogy with medical epidemiology is clear. I find it interesting that the science of epidemiology is so far in advance of the allegedly so-clever banking system and its so-called quants.

In short, today’s society owes more than is often realised to the iconic Snow and other epidemiological pioneers.

From burden to action: trials of mental health interventions in resource-poor settings

Vikram patel.

The profile of the global burden of disease has changed profoundly since John Snow’s time. We now know that non-communicable diseases are the leading causes of death and disability, and that their proportionate contribution to this burden is rising inexorably in tandem with the epidemiological and demographical transitions in most countries. These very diverse health conditions include mental disorders, which are arguably the most neglected of all global health challenges. Depression is the leading mental health-related contributor to the burden of disease. A substantial amount of epidemiological evidence testifies to the high frequency of this disease (about 5% prevalence in the general population) 43 and its strong, bidirectional association with social disadvantage. 44 Equally well documented is the effect that this disorder has on functioning (eg, depression is at the top of the list of disorders ranked according to years lived with disability) 45 and on other global health priorities (eg, about a quarter of the burden of child undernutrition in developing countries is attributable to maternal depression). 46 On the positive side, a strong evidence base now exists in support of the efficacy of structured psychological treatments and antidepressants for the management of depression. 47 Despite this compelling evidence, however, most people with depression worldwide do not optimally benefit from these treatments. This challenge is being addressed by trials of complex mental health interventions in routine care settings.

The MANAS project sought to improve the clinical and social outcomes of people with depression and anxiety disorders (the so-called common mental disorders) in primary care in India. The intervention and trial design had to address two formidable barriers: how to detect cases in the absence of a biomarker-based diagnostic test, and how to deliver the interventions in the absence of specialist skills in primary care personnel. The intervention addressed these barriers with a brief screening questionnaire, which was previously validated against a structured diagnostic interview, to detect cases; and a task-sharing model of care with a lay counsellor, recruited from the local community, who delivered the psychosocial components of the intervention (eg, psychoeducation, case management, and interpersonal therapy) in collaboration with the primary care doctor and under the supervision of a mental health specialist. Systematic efforts based on the Medical Research Council framework 48 were made to design the intervention so that it was both acceptable to key stakeholders in the health system and feasible for delivery in the context of an absence of formal training in mental health care. For example, the lay counsellors actively addressed the social difficulties experienced by many patients and used local, rather than biomedical, labels and concepts. The intervention was assessed in a cluster randomised controlled trial in public and private primary care facilities. The results showed that in the public primary care facilities, compared with enhanced usual care (in which the primary care doctors received the results of the screening and treatment guidelines), the prevalence of common mental disorders decreased by 30% (risk ratio 0·70, 95% CI 0·53–0·92) and the prevalence of suicide attempt or plans over 12 months decreased by 36% (0·64, 0·42–0·98). 49 Despite the additional resources needed, the approach was dominant from an economic perspective. 50 In the private sector, the enhanced usual care facilities showed equivalent outcomes to the intervention facility.

Trials of innovations to improve access to mental health care in developing countries are now having an effect both on the research agenda of global mental health (eg, task-sharing is one of the central themes of the Grand Challenges in Global Mental Health 51 ) and on national health policies (eg, the new District Mental Health Program of the Indian Government includes a new cadre of non-specialist mental health workers to deliver care at primary health care centres).

An external file that holds a picture, illustration, etc.
Object name is nihms493871u9.jpg

John Snow and implementation of cost-effective interventions

Ian roberts.

John Snow got the handle taken off the pump. He did not estimate the burden of disease due to cholera, insist that cholera was made a public health priority, or lobby for more funding for cholera research. Rather, he “respectfully requested an interview” with the Board of Governors of St James Parish, who, on hearing his appraisal of the aetiological factors, ordered that the handle be removed from the Broad Street pump.

An external file that holds a picture, illustration, etc.
Object name is nihms493871u10.jpg

For maximisation of health with the resources available, the important problems are the ones that we can do something about—ie, the ones for which we have cost-effective interventions. Removal of the handle from the pump was a highly cost-effective public health intervention. In 2010, the CRASH-2 trial 52 showed that an inexpensive drug called tranexamic acid safely reduces mortality in bleeding trauma patients. When given within 3 h of injury, tranexamic acid reduces the risk of bleeding to death by 30%. Treatment is highly cost effective. 53 Indeed, tranexamic acid is one of the cheapest existing ways to save a life. With whom does one “respectfully request an interview” to use this information to improve health?

An external file that holds a picture, illustration, etc.
Object name is nihms493871u11.jpg

For new medical knowledge to improve health, health workers and patients need to hear about it, the relevant treatment must be available, and it needs to be used appropriately. The usual method of communication of knowledge is by publication in a medical journal. The CRASH-2 results were published in The Lancet in 2010 and 2011. 52 , 54 On the day of publication, a press conference was held and the results were covered by newspapers around the world.

However, new results are news for only a day. Even with extensive coverage, no more than a small proportion of the doctors who treat trauma worldwide will hear about it. Dissemination begins the next day, and the marketing power of the multinational drug industry is one of the most effective machines to change medical hearts and minds. Here, we encounter an obstacle. Although the knowledge that tranexamic acid can save tens of thousands of lives is new, the drug was invented in the 1960s and the profits from selling a short course of a generic drug for a problem that mostly affects poor people leaves many product managers underwhelmed. 3 years after publication, the investigators and road traffic injury victim groups are still lobbying government, regulators, and drug firms to license tranexamic acid for use in trauma.

Military medics moved quickly to include tranexamic acid in combat-care protocols. Military deaths are highly politically sensitive and when army medical chiefs say a drug should be used, it really is used. Early use of tranexamic acid by the military influenced civilian trauma care. The UK National Health Service (NHS) also embraced tranexamic acid use, and the proportion of trauma patients who received tranexamic acid is now being audited and used as a criterion for the reimbursement of trauma units in the NHS. Tranexamic acid was also included on the WHO list of essential medicines, although WHO has little capacity to ensure that bleeding trauma patients actually receive tranexamic acid. Sadly, some health professionals could be an impediment to the implementation of cost-effective treatments. The misguided view that disease burden should control which health-care activities are prioritised, rather than a comparison of costs and effects of different interventions, could cause substantial avoidable human suffering. 55 What the world does not have is the policy equivalent of the Board of Governors of St James Parish—an organisation to ensure that cost-effective interventions are implemented wherever patients can benefit from them.

Epidemiology, crime, and criminality

Richard wortley.

A fundamental distinction can be drawn between explanations of crime and explanations of criminality. Most criminological research and theory—including that from an epidemiological perspective—have focused on the second of these issues. Researchers have sought to identify historical factors—perinatal trauma, parenting and disciplinary style, child abuse and neglect, economic deprivation, adverse schooling experiences, association with antisocial peers, and so on—that affect why some individuals or groups are at an increased risk of developing criminal dispositions. 56 The primary prevention of crime is conceptualised in terms of changing the developmental antecedents judged to have created the antisocial attitudes and personality characteristics that define the criminal off ender.

However, criminality does not necessarily predict crime: people with criminal dispositions do not commit crime all the time, and crime is often committed by people who do not possess criminal dispositions. By contrast with most criminological approaches, crime science is concerned with why, when, where, and how crime occurs. 57 Crime is not a random event but clusters around criminogenic environments; researchers in this field seek to uncover the proximal, or situational, factors that account for the patterned distribution of crime in time and space. Primary prevention of crime might be achieved by changing the aspects of the immediate environment that facilitate or encourage crime to occur at that particular time and place—a practice known as situational crime prevention. 58 Pub violence, for example, peaks at particular times of the day and on specific days of the week, and is concentrated in a small number of establishments. 59 Substantial reductions in pub violence at targeted locations can be achieved with strategies such as reduction of overcrowding, enforcement of server intervention, improved training for bouncers, staggered closing times, and introduction of shatterproof glasses.

John Snow is regarded as a seminal figure in the development of the crime science approach to crime prevention. Fundamentally, Snow’s commitment to collection of data and testing of hypotheses is the basis for the problem-solving, evidence-based method that defines crime science. More specifically, Snow pioneered the concept of geographical hotspot analysis. Just as Snow mapped the distribution of cholera cases around the infamous Broad Street pump, so too crime scientists map the distribution of crime around so-called environmental crime generators. The disabling of the pump is analogous to the situational crime prevention strategies advocated by crime scientists.

Opportunities and challenges of trials in educational research

Carole torgerson.

The most robust research design to establish effectiveness is widely accepted to be the randomised controlled trial. 60 Although the randomised controlled trial is widely used in health-care research, its first use in the last century was in the area of education. In 1931, Walters 61 randomly allocated students in a university setting to a mentoring programme or a control situation and then measured academic outcomes. Later, in 1940, Lindquist 62 described how the natural unit of allocation in school-based research was the class or school, rather than the individual child. Furthermore, he described the appropriate statistical approach for analysis of clustered data, which was not used widely in health-care cluster trials until the early 1990s.

Immense opportunities exist for rigorous educational randomised controlled trials to be undertaken, and design and implementation of a trial in education is, in theory, quite straightforward. For example, potential schools can be readily identified to take part in the trial, and children are generally registered with a high degree of stability within the schools; data for every child are collected regularly and comprehensively, which enables interventions to be targeted carefully at those for whom the greatest effect is anticipated. Because pretests are strongly predictive of post-tests, many educational randomised controlled trials can use this predictive value to ensure that the trial has good statistical power to record important educational differences in outcome between randomised groups. Teachers and schools assess children routinely, and children themselves are accustomed to completing tests and assessments. This situation enables us to measure any treatment effects easily.

Nevertheless, substantial challenges exist for trials undertaken in education. The design and execution of a randomised controlled trial needs skilled researchers. Unfortunately, especially in the UK, there is a dearth of experienced trial methodologists in education and inadequate capacity to undertake such trials. Consequently, a substantial proportion of published educational randomised controlled trials have flaws in their conduct, design, or analysis, which leads to uncertainty about their conclusions. 63 Simple but common errors include: failure to have a sufficiently large sample size; failure to use independent randomisation; failure to do an intention-to-treat analysis; failure to undertake blinded testing and marking; failure to prespecify the main outcome; and failure to account for clustering in the analysis. Trial good practice, as recommended by CONSORT and other groups, should be followed in educational trials. 60 Educational trials should be registered and reported according to modified CONSORT criteria.

An external file that holds a picture, illustration, etc.
Object name is nihms493871u12.jpg

A major opportunity exists to undertake randomised controlled trials in education in the existing political climate. Real public investment in rigorous educational randomised controlled trials began in earnest in the USA in 2002, and in the UK the Educational Endowment Foundation has recently begun a programme of assessment of interventions targeting disadvantaged children with randomised controlled trial designs. However, this wave of enthusiasm must not be blighted by poorly designed and executed randomised controlled trials.

Epidemiology, randomised controlled trials, and the search for what works in economic development

Angus deaton.

Epidemiological methods are having a large and largely unhelpful effect on statistical practice in economics. In the early 1990s, several important natural experiment studies were done in economics. Snow’s work, especially as described by David Freedman, 64 was often explicitly acknowledged, and admired. In one study, investigators looked at the effects of increasing the minimum wage on employment by comparing fast food restaurants in New Jersey and Pennsylvania, USA. 65 Another study used the Vietnam War draft lottery to compare the subsequent earnings of those whose random draw put them at a higher or lower risk of being drafted. 66 In several studies, administrative discontinuities in education were used to estimate the effects of schooling on earnings. These reports were admired for their credible identification, by contrast with much previous work that had rested on challengeable assumptions. The movement paralleled the increasing use of non-parametric methods in applied statistics.

An external file that holds a picture, illustration, etc.
Object name is nihms493871u13.jpg

A requirement of these methods is that the natural experiment must mimic random allocation. Snow’s assignment of water suppliers should not be a disguise for income or locational differences, which he understood well and documented effectively. In economics, the credibility of the natural experiments has worn thin with time. New Jersey is different from Pennsylvania in many ways. Conditional on a bad draw in the Vietnam lottery, the selection of those who actually went to war was systematic, not random. Parents work around educational discontinuities, and wealthier parents do so more successfully. Investigators tired of endless challenge to their natural experiments, and moved towards real experiments—randomised controlled trials. This was especially the case in development economics, where randomised controlled trials were regarded as the way to discover what works in economic development—an endeavour that held the promise of abolishing poverty worldwide. 67

A well designed experiment is sometimes exactly what we need. However, experiments have their own problems. Many studies are underpowered, and when the underlying treatment effects are asymmetrically distributed—as is often the case when outcomes are financial—standard statistical methods are misleading, and we get contradictory and often implausible results, which are seemingly explained by what might be called just-so stories. 68 Experimental samples are rarely randomly drawn from the population that would be treated by the hypothetical policy, and an unbiased, but noisy, estimate from a randomised controlled trial of a selected small sample can be less useful than a biased but precise estimate from an observational study of a larger and more representative sample. Average treatment effects for one group might not apply to another group, or even to subgroups or individuals within the experiment. Scaling up to the population will often bring general equilibrium or feedback effects that are shut off in the randomised controlled trial, even if the scaling up can be done in a way that is faithful to the experiment. Most seriously, the result from a randomised controlled trial is entirely silent about the mechanisms at work. Economics is concerned with the discovery and testing of mechanisms; without them, we have no chance to assess out-of-sample validity, to predict what might happen under scaling up, or indeed to learn.

Conflicts of interest

We declare that we have no conflicts of interest.

IMAGES

  1. John Snow Case Study.docx

    john snow case control study

  2. Case Study

    john snow case control study

  3. Case Study: John Snow and the Origin of Epidemiology

    john snow case control study

  4. Case Study

    john snow case control study

  5. john snow case study 1.docx

    john snow case control study

  6. john snow assignment.docx

    john snow case control study

VIDEO

  1. Dashing through the snow, cause Santa’s gonna blow : Santa Outhouse Toy #christmas #toys

  2. case control study part 2 || epidemiology|| PSM|| @Sudarshan263

  3. case control study I Features I Steps I odds ratio I PSM

  4. Chapter 5 Part 1 Case control study

  5. Case Control study, Odds Ratio Concept

  6. Case control study

COMMENTS

  1. The Story of John Snow and How He Saved Countless Lives

    The real John Snow was a physician and scientist who observed a few cholera epidemics in London in 1854. By mapping where the victims lived, he demonstrated that contaminated water dispensed from the Broad Street pump was the key source of the disease.

  2. Temporal bias in case-control design: preventing reliable ...

    The first documented case-control study in the medical literature was Reverend Henry Whitehead's follow-up 33 to John Snow's famous report 34 on the Broad Street cholera outbreak. Whitehead ...

  3. John Snow

    John Snow (shown below) was a physician in London who spent several decades studying cholera in a systematic way. He is most often credited with solving an outbreak of cholera that occurred in London in 1854 (the outbreak is described below), but his studies of cholera were much more extensive than that.

  4. John Snow: A Legacy of Disease Detectives

    John Snow, known as the father of epidemiology, was born on March 15, 1813. This week, we honor the birthday of the first true disease detective. The Story of the Broad Street Pump London, 1854: A cramped Soho neighborhood teems with people and animals living in cramped and dirty quarters. A deadly outbreak of cholera is spreading.

  5. John Snow, Cholera, the Broad Street Pump; Waterborne Diseases Then and

    Dr. John Snow (1813-58). London practicing obstetrician/anesthesiologist who conducted a detailed epidemiologic investigation of London cholera epidemic adjacent to the now famous Broad St. pump. Courtesy: University of California at Los Angeles (UCLA) School of Public Health. Available at: http://www.ph.ucla.edu/epi/snow/snowcricketarticle.html

  6. John Snow

    His best-known studies include his investigation of London 's Broad Street pump outbreak, which occurred in 1854, and his "Grand Experiment," a study comparing waterborne cholera cases in two regions of the city—one receiving sewage-contaminated water and the other receiving relatively clean water.

  7. John Snow

    John Snow (15 March 1813 - 16 June 1858) was an English physician and a leader in the development of anaesthesia and medical hygiene.He is considered one of the founders of modern epidemiology and early germ theory, in part because of his work in tracing the source of a cholera outbreak in London's Soho, which he identified as a particular public water pump.

  8. John Snow Hunts the Blue Death

    A cholera victim exhibiting the bluish pallor characteristic of the disease. Illustration by John William Gear, 1832. John Snow left his office at a run. The streets were empty, London a ghost town, for cholera had returned. Three-quarters of the population had fled, while many without means to leave lay sick or dying in their homes. Though he ...

  9. John Snow's legacy: epidemiology without borders

    This Review provides abstracts from a meeting held at the London School of Hygiene and Tropical Medicine, on April 11-12, 2013, to celebrate the legacy of John Snow. They describe conventional and unconventional applications of epidemiological methods to problems ranging from diarrhoeal disease, mental health, cancer, and accident care, to education, poverty, financial networks, crime, and ...

  10. John Snow

    John Snow was a founding member of one of the first professional societies devoted to epidemiology. JOHN SNOW PUB. A London pub honoring the life and legend of John Snow. JOHN SNOW SOCIETY. Founded in 1993, the John Snow Society aims to promote the life and works of Dr John Snow as anesthetist and pioneer of epidemiological methods.

  11. John Snow and the Birth of Epidemiology

    May 28, 2018 3 minutes The icon indicates free access to the linked research on JSTOR. An 1854 cholera outbreak in London confounded those who thought the disease was caused by miasma, or foul air. Enter John Snow, who had already made a name for himself by administering chloroform to Queen Victoria during childbirth.

  12. John Snow and the Broad Street Pump: On the Trail of an Epidemic

    A factory near the pump, at 37 Broad Street, wasn't so lucky. The factory kept two tubs of water from the pump on hand for employees to drink and 16 of the workers died from cholera. The cases of two women, a niece and her aunt, who died of cholera puzzled Snow. The aunt lived some distance from Soho, as did her niece, and Snow could make no ...

  13. Case Study: John Snow and the Origin of Epidemiology

    On 31 August 1854, after several other outbreaks had occurred elsewhere in the city, a major outbreak of cholera struck Soho. By September, 500 people had died and the mortality rate was 12.8 percent in some parts of the city. By the end of the outbreak, 616 people had died. John Snow later called it "the most terrible outbreak of cholera which ...

  14. The Lesson of John Snow and the Broad Street Pump

    John Snow's conviction about the source for the London outbreak and his concern for public health compelled him to oppose the popular beliefs of his time and convince the local council in London's West End to disable the water pump on Broad Street. ... he used his geographic correlations of the outbreak as the logic to support a public ...

  15. On Ecological Studies: A Short Communication

    A famous case of early epidemiology in action is the John Snow case and cholera epidemic in London in the mid 1800s. ... In addition, the case control study design has its own set of weaknesses, similar to the ecological study design, namely, that they (case control studies) often have flaws pertaining bias, matching, ...

  16. John Snow, His Map, and the Cholera Outbreak

    "Snow spent considerable time looking for cases that would naturally control for unobservables, cases where the only reasonable or possible difference between infection versus not was the water supply."[8] Also, "The circumstances of the Southwark & Vauxhall versus Lambeth companies in south London provided a large-scale case where Snow could ...

  17. Case study

    GCSE Edexcel Medicine in 18th- and 19th-century Britain, c.1700-c.1900 - Edexcel Case study - John Snow and cholera Medicine in 18th- and 19th-century Britain saw great change, especially...

  18. John Snow and the Origin of modern epidemiology

    In addition, during the 1849 outbreaks, he conducted a case study and found that the incidence and death rate was much higher in South London, where the waters of the Thames were much more polluted than those drunk by the inhabitants of the rest of the British capital.

  19. JOHN SNOW -- BROAD STREET PUMP OUTBREAK

    24 Jul 2003 The UNC School of Public Health Case John Snow Case Study was written in 1998 by Karl Umble and produced and developed by Dave Potenziani, Constance Humphries, Ansje Burdick, Kimberly McClain, Ben Davis, Lew Binkowski, Nicole Walker, and Traci Wike.

  20. Cholera, John Snow and the Grand Experiment

    Snow compiled data on the two sets of London households and found that during an 1854 epidemic there were 315 deaths from cholera per 10,000 homes among those supplied by Southwark-Vauxhall but...

  21. Case Study: John Snow and the Origin of Epidemiology

    This case story explores London, England in 1854 after an outbreak of cholera which had a mortality rate of 12.8%. Students examine a map which shows houses where cholera had been reported and places where there were common water sources and follow the steps of John Snow, who is considered the "Father of Epidemiology .".

  22. AI Case Studies

    Kaiser Permanente is one of the USA's largest health plans, serving 12.3 million members via 39 hospitals and over 217,000 employees. This NLP case study shows how it leveraged John Snow Labs' AI Platform (for model training, deployment, and monitoring) and Spark NLP (for extracting key features from EMR notes) to optimize hospital patient flow models.

  23. John Snow's legacy: epidemiology without borders

    John Snow recommended handwashing and personal hygiene for the prevention of ... Most human behaviour is a consequence of ancient reflexes and motives. 21,22 Much of what we do is not under conscious control but is motivated ... Minimum wage and unemployment: a case study of the fast-food industry in New Jersey and Pennsylvania. Am Econ Rev ...