An example of an analogy drawing upon a chemical concept to explain another idea (here an idea from research methods):

"In practice, many studies rely on testing for a statistically significant difference, although this is a very dubious criterion for equivalence…. This becomes clear if we consider how inferential tests are interpreted when comparing the final outcome measures in a study. At the end of a true experiment, statistical tests may be used to infer that a difference in final outcomes was unlikely enough that we can confidently assume it is not due to random variations but is due to a systematic difference (i.e., the difference between the experimental treatment and the control condition) and so can be assumed to (probably) apply more generally in the population and not just to this specific permutation of learners. So, a statistically significant difference means a very unlikely one (in practice, normally one with a probability value, p<0.05).
Often a similar approach is used in studies to evaluate the differences found at pre-test. The results are analysed to see if there is a very unlikely difference between the scores in the different conditions. If a statistically significant difference is found, then this clearly suggests the groups cannot be considered equivalent. That is reasonable.
Unfortunately the reverse does not apply: if the differences do not reach significance, we cannot assume that implies equivalence. Say p=0.08 (which means that the differences were unlikely enough to only occur by chance about once in 12 times, rather than once in twenty times as when p=0.05), this still shows there was a difference that was unlikely to be down to random factors. There is a logical difference between what we are seeking to do in these two situations. In one case (comparing post-test results), we are trying to exclude all but those outcomes that are most unlikely to be chance events, and in the other (comparing pre-test scores), we are trying to show that any difference is small enough to be insignificant in affecting later outcomes. So, in the first case, we are trying to show something is very improbable, but in the other case, we are trying to show we have a very probable outcome. So, using the same kind of inferential test as a test of equivalence means (sensibly) excluding cases with very different pre-test outcomes across treatments from being labelled equivalent: but still (dubiously) admitting other substantially different pre-test outcomes across treatments as being equivalent.
If this seems a little abstract, consider this analogy. Consider Table 2.2 which presents two questions that might be posed to a learner and her hypothetical responses. The two questions are looking at two different ends of a spread (that is a continuum from combinations of elements with very different electronegativities to combinations of elements with the same electronegatiity) and a suitable criterion that works for one extreme cannot be simply reversed to be used at the other extreme (which would be like saying anyone who is not over 2 m tall should be considered short). That is, if we agree that an electronegativity difference of >2.5 is a good criterion to identify highly ionic compounds, then it is inappropriate to use the same cut-off as the basis for a criterion (<2.5) to identify highly covalent (i.e., non- polar) compounds. Perhaps we should instead look for an electronegative difference <1.0 or <0.5? The precise choices are open to opinion (you might actually suggest >3.0 for the most ionic compounds): but the invalidity of using the same cut-off to identify both sets of extreme cases is not. If we decide the most covalent compounds are those where the electronegative difference is <0.7, we should not then class any where the difference is >0.7 as ionic.
By the same logic, whilst it makes sense to exclude pre-test differences which reach statistical significance from being considered equivalent (like excluding KF from our list of highly covalent compounds) that is not sufficient to judge equivalence, and something more is needed. One rule of thumb that has been suggested is that rather than using p≥.05 as the critical value here, it should be p≥0.5 (i.e., only admitting as equivalent groups where the pre-test differences are more likely to occur by chance than not), but there are more sophisticated approaches…"
Mea culpa: the aim here was to explain an idea to readers who were likely to know about chemistry (as they were reading a book about chemistry teaching) by using a chemical context to give an analogy for how a research technique is often misused.

Treating differences that do not reach statistical significance as equivalent does violence to what most people understand 'equivalent' to mean.
Necessary but not sufficient: Testing for equivalence is a widely used technique in experimental research in education and the social sciences.Using the conventional cut-off of p<0.05 is such studies is the like equating…

Read about testing for initial equivalence in research studies