Tag Archives: Statistics

Analysis of variance (ANOVA)

Analysis of variance (ANOVA) is a collection of statistical models used to analyze the differences among group means and their associated procedures (such as “variation” among and between groups), developed by statistician and evolutionary biologist Ronald Fisher. In the ANOVA setting, the observed variance in a particular variable is partitioned into components attributable to different sources of variation. In its simplest form, ANOVA provides a statistical test of whether or not the means of several groups are equal, and therefore generalizes the t-test to more than two groups. ANOVAs are useful for comparing (testing) three or more means (groups or variables) forstatistical significance. It is conceptually similar to multiple two-sample t-tests, but is more conservative (results in less type I error) and is therefore suited to a wide range of practical problems.


Latin square

The “Gamma plus two” method for generating “odd order” magic squares, the“Gamma plus two plus swap” method for generating “singly even order” magicsquares, and Durer’s method for generating “doubly even order” magic squares.
By Professor Edward Brumgnach, P.E.
City University of New York
Queensborough Community College

Continue reading

boy girl paradox

Published on Mar 8, 2016

TED-Ed presented a riddle last week based on a classic probability problem. However in the riddle there is a small and seemingly insignificant detail that changes the calculation. In this video I present the pertinent details of the frog riddle, explain its connection to the boy or girl paradox, and then do a detailed calculation of what I believe is the correct probability.

TED-ED frog riddle: https://www.youtube.com/watch?v=cpwSG…

Blog post (another calculation if the probability a male frog croaks is p): http://wp.me/p6aMk-4wD

Ron Niles made a video that shows the probability visually and explains an interpretation of a male frog croaking with probability p:https://www.youtube.com/watch?v=K53P5…

Hill’s criteria for causation

The Bradford Hill criteria, otherwise known as Hill’s criteria for causation, are a group of minimal conditions necessary to provide adequate evidence of a causal relationship between an incidence and a possible consequence, established by the English epidemiologist Sir Austin Bradford Hill (1897–1991) in 1965.

The list of the criteria is as follows:

  1. Strength (effect size): A small association does not mean that there is not a causal effect, though the larger the association, the more likely that it is causal.[1]
  2. Consistency (reproducibility): Consistent findings observed by different persons in different places with different samples strengthens the likelihood of an effect.[1]
  3. Specificity: Causation is likely if there is a very specific population at a specific site and disease with no other likely explanation. The more specific an association between a factor and an effect is, the bigger the probability of a causal relationship.[1]
  4. Temporality: The effect has to occur after the cause (and if there is an expected delay between the cause and expected effect, then the effect must occur after that delay).[1]
  5. Biological gradient: Greater exposure should generally lead to greater incidence of the effect. However, in some cases, the mere presence of the factor can trigger the effect. In other cases, an inverse proportion is observed: greater exposure leads to lower incidence.[1]
  6. Plausibility: A plausible mechanism between cause and effect is helpful (but Hill noted that knowledge of the mechanism is limited by current knowledge).[1]
  7. Coherence: Coherence between epidemiological and laboratory findings increases the likelihood of an effect. However, Hill noted that “… lack of such [laboratory] evidence cannot nullify the epidemiological effect on associations”.[1]
  8. Experiment: “Occasionally it is possible to appeal to experimental evidence”.[1]
  9. Analogy: The effect of similar factors may be considered.[1]

Debate in modern epidemiology

Bradford Hill’s criteria are still widely accepted in the modern era as a logical structure for investigating and defining causality in epidemiological study. However, their method of application is debated. Some proposed options include:

  1. using a counterfactual consideration as the basis for applying each criterion.[2]
  2. subdividing them into three categories: direct, mechanistic and parallel evidence, expected to complement each other. This operational reformulation of the criteria has been recently proposed in the context of evidence based medicine.[3]
  3. considering confounding factors and bias.[4]
  4. using Hill’s criteria as a guide but not considering them to give definitive conclusions.[5]
  5. separating causal association and interventions, because interventions in public health are more complex than can be evaluated by use of Hill’s criteria[6]

Arguments against the use of Bradford Hill criteria as exclusive considerations in proving causality also exist. Some argue that the basic mechanism of proving causality is not in applying specific criteria—whether those of Bradford Hill or counterfactual argument—but in scientific common sense deduction.[7] Others also argue that the specific study from which data has been produced is important, and while the Bradford-Hill criteria may be applied to test causality in these scenarios, the study type may rule out deducing or inducing causality, and the criteria are only of use in inferring the best explanation of this data.[8]

Debate over the scope of application of the criteria includes whether they can be applied to social sciences.[9] The argument proposed in this line of thought is that when considering the motives behind defining causality, the Bradford Hill criteria are important to apply to complex systems such as health sciences because they are useful in prediction models where a consequence is sought; explanation models as to why causation occurred are deduced less easily from Bradford Hill criteria as the instigation of causation, rather than the consequence, is needed for these models.

Researchers have applied Hill’s criteria for causality in examining the evidence in several areas of epidemiology, including connections between ultraviolet B radiation, vitamin D and cancer,[10][11] vitamin D and pregnancy and neonatal outcomes,[12] alcohol and cardiovascular disease outcomes,[13] infections and risk of stroke,[14] nutrition and biomarkers related to disease outcomes,[15] and sugar-sweetened beverage consumption and the prevalence of obesity and obesity-related diseases.[16] Referenced papers can be read to see how Hill’s criteria have been applied.

The Will Rogers phenomenon

The Will Rogers phenomenon is obtained when moving an element from one set to another set raises the average values of both sets. It is based on the following quote, attributed (perhaps incorrectly)[1] to comedian Will Rogers:

When the Okies left Oklahoma and moved to California, they raised the average intelligence level in both states.

The effect will occur when both of these conditions are met:

  • The element being moved is below average for its current set. Removing it will, by definition, raise the average of the remaining elements.
  • The element being moved is above the current average of the set it is entering. Adding it to the new set will, by definition, raise the average.

forest plot

A forest plot (or blobbogram[1]) is a graphical display designed to illustrate the relative strength of treatment effects in multiple quantitative scientific studies addressing the same question. It was developed for use in medical research as a means of graphically representing a meta-analysis of the results of randomized controlled trials. In the last twenty years, similar meta-analytical techniques have been applied in observational studies (e.g. environmental epidemiology) and forest plots are often used in presenting the results of such studies also.

Although forest plots can take several forms, they are commonly presented with two columns. The left-hand column lists the names of the studies (frequently randomized controlled trials or epidemiological studies), commonly in chronological order from the top downwards. The right-hand column is a plot of the measure of effect (e.g. an odds ratio) for each of these studies (often represented by a square) incorporating confidence intervals represented by horizontal lines. The graph may be plotted on a natural logarithmic scale when using odds ratios or other ratio-based effect measures, so that the confidence intervals are symmetrical about the means from each study and to ensure undue emphasis is not given to odds ratios greater than 1 when compared to those less than 1. The area of each square is proportional to the study’s weight in the meta-analysis. The overall meta-analysed measure of effect is often represented on the plot as a dashed vertical line. This meta-analysed measure of effect is commonly plotted as a diamond, the lateral points of which indicate confidence intervals for this estimate.

A vertical line representing no effect is also plotted. If the confidence intervals for individual studies overlap with this line, it demonstrates that at the given level of confidence their effect sizes do not differ from no effect for the individual study. The same applies for the meta-analysed measure of effect: if the points of the diamond overlap the line of no effect the overall meta-analysed result cannot be said to differ from no effect at the given level of confidence.

Forest plots date back to at least the 1970s. One plot is shown in a 1985 book about meta-analysis.[2]:252 The first use in print of the word “forest plot” may be in an abstract for a poster at the Pittsburgh (USA) meeting of the Society for Clinical Trials in May 1996.[3] An informative investigation on the origin of the notion “forest plot” was published in 2001.[4] The name refers to the forest of lines produced. In September 1990, Richard Peto joked that the plot was named after a breast cancer researcher called Pat Forrest and as a result the name has sometimes been spelled “forrest plot