ignoring modest differences on pretest scores is like considering polar bonds as covalent

Categories: Comparisons

An example of an analogy drawing upon a scientific concept (intended for an audience already familiar with the science):

"A common technique used in quasi-experiments (that is, where researchers work with existing groupings and do not try to randomise learners to treatments) is to look to demonstrate that classes are equivalent before the experimental intervention. The argument is that if we have good grounds to consider two (or more) groups are equivalent before they each experience one of two (or more) different treatments, then any differences in outcomes between the groups after treatment will be due to the differences between the treatments (assuming everything else pertinent can be controlled). If we are convinced that we can consider two groups equivalent, then failure to randomise to treatment seems less problematic. So, in such situations, it is common to use a form of pre-test. …

In practice, many studies rely on testing for a statistically significant difference, although this is a very dubious criterion for equivalence …This becomes clear if we consider how inferential tests are interpreted when comparing the final outcome measures in a study. At the end of a true experiment, statistical tests may be used to infer that a difference in final outcomes was unlikely enough that we can confidently assume it is not due to random variations but is due to a systematic difference (i.e., the difference between the experimental treatment and the control condition) and so can be assumed to (probably) apply more generally in the population and not just to this specific permutation of learners. So, a statistically significant difference means a very unlikely one (in practice, normally one with a probability value, p < 0.05).

Often a similar approach is used in studies to evaluate the differences found at pre-test. The results are analysed to see if there is a very unlikely difference between the scores in the different conditions. If a statistically significant difference is found, then this clearly suggests the groups cannot be considered equivalent. That is reasonable.
Unfortunately the reverse does not apply: if the differences do not reach significance, we cannot assume that implies equivalence. Say p=0.08 (which means that the differences were unlikely enough to only occur by chances about once in 12 times, rather than once in twenty times as when p=0.05), this still shows there was a difference that was unlikely to be down to random factors.

There is a logical difference between what we are seeking to do in these two situations. In one case (comparing post-test results) we are trying to exclude all but those outcomes that are most unlikely to be chance events, and in the other (comparing pre-test scores) we are trying to show that any difference is small enough to be insignificant in affecting later outcomes. So, in the first case we are trying to show something is very improbable, but in the other case we are trying to show we have a very probable outcome. So, using the same kind of inferential test as a test of equivalence means (sensibly) excluding cases with very different pre-test outcomes across treatments from being labelled equivalent: but still (dubiously) admitting other substantially different pre-test outcomes across treatments as being equivalent.

If this seems a little abstract, consider this analogy. Consider Table 2.2 which presents two questions that might be posed to a learner, and her hypothetical responses:

Table from book
From 'Chemical Pedagogy: : Instructional Approaches and Teaching Techniques in Chemistry'

The two questions are looking at two different ends of a spread (that is a continuum from combinations of elements with very different electronegativities to combinations of elements with the same electronegativity) and a suitable criterion that works for one extreme cannot be simply reversed to be used at the other extreme (which would be like saying anyone who is not over 2m tall should be considered short). That is, if we agree that an electronegativity difference of >2.5 is a good criterion to identify highly ionic compounds, then it is inappropriate to use the same cut-off as the basis for a criterion (<2.5) to identify highly covalent (i.e., non-polar) compounds. Perhaps we should instead look for an electronegative difference <1.0 or <0.5? The precise choices are open to opinion (you might actually suggest >3.0 for the most ionic compounds): but the invalidity of using the same cut-off to identify both sets of extreme cases is not. If we decide the most covalent compounds are those where the electronegative difference is <0.7, we should not then class any where the difference is >0.7 as ionic.
By the same logic, whilst it makes sense to exclude pre-test differences which reach statistical significance from being considered equivalent (like excluding KF from our list of highly covalent compounds) that is not sufficient to judge equivalence, and something more is needed. One rule of thumb that has been suggested is that rather than using p≥0.05 as the critical value here, it should be p≥0.5 (i.e., only admitting as equivalent groups where the pre-test differences are more likely to occur by chance than not), but there are more sophisticated approaches…"

Read about analogy in science

Read examples of scientific analogies

Author: Keith

Former school and college science teacher, teacher educator, research supervisor, and research methods lecturer. Emeritus Professor of Science Education at the University of Cambridge.