a term use in experimental studies
When undertaking experimental studies to compare the effects of different pedagogy, curriculum or learning resources it is useful to know that the learners in the different conditions are starting from an 'equivalent' level of knowledge and understanding (or skill or attitude). In published papers, a lack of statistically significant differences in pre-testing is often considered sufficient, but this is a weak measure of equivalence.
"Although statistical tests can offer some guidance on what counts as equivalent, they need to be interpreted differently than when looking for a statistically significance difference in the outcomes of the experiment…. An initial difference which is substantial, but statistically non-significant, may be sufficient to explain out- come differences that do reach statistical significance… If statistical tests are applied to the starting conditions using the usual p < 0.05 criterion then they will only flag up differences between the two groups which are very unlikely to be due to chance. However, what should be looked for is evidence of close similarity, rather than the absence of evidence of improbable differences….Two classes with differences between them that are at a level quite unlikely to occur by chance are certainly not equivalent (at least in the sense that the word is generally employed)." (Taber, 2019, pp.86-87)