Misconceptions of change

It may be difficult to know what counts as an alternative conception in some topics – and sometimes research does not make it any clearer


Keith S. Taber


If a reader actually thought the researchers themselves held these alternative conceptions then one could have little confidence in their ability to distinguish between the scientific and alternative conceptions of others

I recently published an article here where I talked in some detail about some aspects of a study (Tarhan, Ayyıldız, Ogunc & Sesen, 2013) published in the journal Research in Science and Technological Education. Despite having a somewhat dodgy title 1, this is a well respected journal published by a serious publisher (Routledge/Taylor & Francis). I read the paper because I was interested in the pedagogy being discussed (jigsaw learning), but what promoted me to then write about it was the experimental design: setting up a comparison between a well-tested active learning approach and lecture-based teaching. A teacher experienced in active learning techniques taught a control group of twelve year old pupils through a 'traditional' teaching approach (giving the children notes, setting them questions…) as a comparison condition for a teaching approach based on engaging group-work.

The topic being studied by the sixth grade, elementary school, students was physical and chemical changes.

I did not discuss the outcomes of the study in that post as my focus there was on the study as possibly being an example of rhetorical research (i.e., a demonstration set up to produce a particular outcome, rather than an open-ended experiment to genuinely test a hypothesis), and I was concerned that the control conditions involved deliberately providing sub-optimal, indeed sub-standard, teaching to the learners assigned to the comparison condition.

Read 'Didactic control conditions. Another ethically questionable science education experiment?'

Identifying alternative conceptions

The researchers actually tested the outcome of their experiment in two ways (as well as asking students in the experimental condition about their perceptions of the lessons), a post-test taken by all students, and "ten-minute semi-structured individual interviews" with a sample of students from each condition.

Analysis of the post-test allowed the researchers to identify the presence of students' alternative conceptions ('misconceptions'2) related to chemical and physical change, and the identified conceptions are reported in the study. Interviewees were purposively selected,

"Ten-minute semi-structured individual interviews were carried out with seven students from the experimental group and 10 students from the control group to identify students' understanding of physical and chemical changes by acquiring more information about students' unclear responses to [the post-test]. Students were selected from those who gave incorrect, partially correct and no answers to the items in the test. During the interviews, researchers asked the students to explain the reasons for their answers to the items."

Tarhan et al., 2013, p.188

I was interested to read about the alternative conceptions they had found for several reasons:

  1. I have done research into student thinking, and have written a lot about alternative conceptions, so the general topic interests me;
  2. More specifically, it is interesting to compare what researchers find in different educational contexts, as this gives some insight into the origins and developments of such conceptions;
  3. Also, I think the 'chemical and physical changes' distinction is actually a very problematic topic to teach. (Read about a free classroom resource to explore learners' ideas about physical and chemical changes.)

In this post I am going to question whether the author's claims in their research report about some of the alternative conceptions they reported finding are convincing. First, however, I should explain the second point here.

Cultural variations in alternative conceptions

Some alternative conceptions seem fairly universal, being identified in populations all around the world. These may primarily be responses to common experiences of the natural world. An obvious example relates to Newton's first law (the law of inertia): we learn from very early experience, before we even have language to talk about our experiences, that objects that we push, throw, kick, toss, pull… soon come to a stop. They do not move off in a straight line and continue indefinitely at a constant speed.

Of course, that experience is not actually contrary to Newton's first law (as various forces are acting on the objects concerned), but it presents a consistent pattern (objects initially move off, but soon slow and stop) that becomes part of out intuitions about the world and so makes learning the scientific law seem counter-intuitive, and so more difficult to accept and apply when taught in school.

Read about the challenge of learning Newton's first law

By contrast, no one has ever tested Newton's first law directly by seeing what happens under the ideal conditions under which it would apply (see 'Poincaré, inertia, and a common misconception').

Other alternative conceptions may be less universal: some may be, partially at least, due to an aspect of local cultural context (e.g. folk knowledge, local traditions), the language of instruction, the curriculum or teaching scheme, or even a particular teacher's personal way of presenting material.

So, to the extent that there are some experiences that are universal for all humans, due to commonalities in the environment (e.g., to date at least, all members of the species have been born into an environment with a virtually constant gravitational field and a nitrogen-rich atmosphere of about 1 atmosphere pressure {i.e., c.105 Pa} and about 21% oxygen content), there is a tendency for people everywhere (on earth) to develop the same alternative conceptions.

And, conversely, to the extent that people in different institutional, social, and cultural contexts have contrasting experiences, we would expect some variations in the levels of incidence of some alternative conceptions across populations.

"Some common ideas elicited from children are spread, at least in part, through informal learning in everyday "life-world" contexts. Through such processes youngsters are inducted into the beliefs of their culture. Ideas that are common in a culture will not usually contradict everyday experience, but clearly beliefs may develop and be disseminated without matching formal scientific knowledge. …

Where life-world beliefs are relevant to school science – perhaps contradicting scientific principles, perhaps apparently offering an explanation of some science taught in school; perhaps appearing to provide familiar examples of taught principles – then it is quite possible, indeed likely, that such prior beliefs will interfere with the learning of school science. …

Different common beliefs will be found among different cultural groups, and therefore it is likely that the same scientific concepts will be interpreted differently among different cultural groups as they will be interpreted through different existing conceptual frameworks."

Taber, 2012a, pp.5-6

As a trivial example, in England the National Curriculum for primary age children in England erroneously describes some materials that are mixtures as being substances. These errors have persisted for some years as the government department does not think they are important enough to make the effort to correct the error. Assuming many primary school teachers (who are usually not science specialists, though some are of course) trust the flawed information in the official curriculum, we might expect more secondary school students in England, than in other comparable populations, to later demonstrate alternative conceptions in relation to the critical concept of a chemical substance.

"This suggests that studies from different contexts (e.g., different countries, different cultures, different languages of instruction, and different curriculum organisations) should be encouraged for what they can tell us about the relative importance of educational variables in encouraging, avoiding, overcoming, or redirecting various types of ideas students are known to develop."

Taber, 2012a, p.9
The centrality of language

Language of instruction may sometimes be important. Words that supposedly are translated from one language to another may actually have different nuances and associations. (In English, it is clearly an alternative conception to think the chemical elements still exist in a compound, but the meaning of the French élément chemie seems to include the 'essence' of an element that does continue into compound.)

Research in different educational contexts can in principle help unravel some of this: in principle as it does need the various researchers to detail aspects of the teaching contexts and cultural contexts from which they report as well as the student's ideas (Taber, 2012a).

Chemical and physical change

Teaching about chemical and physical change is a traditional topic in school science and chemistry courses. It is one of those dichotomies that is understandably introduced in simple terms, and so, offers a simplification that may need to be 'unlearnt' later:

[a change is] chemical change or physical change

[an element is] metal or non-metal

[a chemical bond is] ionic bonding or covalent bonding

There are some common distinctions often made to support this discrimination into two types of change:


Table 1.2 from Teaching Secondary Chemistry (2nd ed) (Taber, 2012b)

However, a little thought suggests that such criteria are not especially useful in supporting the school student making observations, and indeed some of these criteria simply do not stand up to close examination. 2

"the distinction between chemical and physical changes is a rather messy one, with no clear criteria to help students understand the difference"

Taber, 2012b, p.33


So, I was especially interested to know what Tarhan and colleagues had found.

Methodological 'small print'

In reading any study, a consideration of the findings has to be tempered by an understanding of how the data were collected and analysed. Writing-up research reports for journals can be especially challenging as referees and editors may well criticise missing details they feel should be reported, yet often journals impose word-limits on articles.

Currently (2023) this particular journal tells potential authors that "A typical paper for this journal should be between 7000 and 8000 words" which is a little more generous than some other journals. However, Tarhan and colleagues do not fully report all aspects of their study. This may in part be because they need quite a lot of space to describe the experimental teaching scheme (six different jigsaw learning activities).

Whatever the reason:

  • the authors do not provide a copy of the post-test which elicited the responses that were the basis of the identified alternative conceptions; and
  • nor do they explain how the analysis to identify conceptions was undertaken – to show how student responses were classified;
  • similarly, there are no quotations from the interview dialogue to illustrate how the researchers interpreted student comments .

Data analysis is the process of researchers interpreting data so they become evidence for their findings, and generally research journals expect the process to be detailed – but here the reader is simply told,

"Students' understanding of physical and chemical changes was identified according to the post-test and the individual interviews after the process."

Tarhan et al., 2013, p.189

'Misconceptions'

In their paper, Tarhan and colleagues use the term 'misconception' which is often considered a synonym for 'alternative conception'. Commonly, conceptions are referred to as alternative if they are judged to be inconsistent with canonical concepts.

Read about alternative conceptions

Although the term 'misconception' is used 32 times in the paper (not counting instances in the reference list), the term is not explained in the text, presumably because it is assumed that all those working in science education know (and agree) what it means. This is not at all unusual. I once wrote about another study

"[The] qualities of misconceptions are largely assumed by the author and are implicit in what is written…It could be argued that research reports of this type suggest the reported studies may themselves be under-theorised, as rather well-defined technical procedures are used to investigate foci that are themselves only vaguely characterised, and so the technical procedures are themselves largely operationalised without explicit rationale."

Taber, 2013, p.22

Unfortunately, in Tarhan and colleagues' study there are less well-defied technical procedures in relation to how data was analysed to identify 'misconceptions', so leaving the reader with limited grounds for confidence that what are reported are worthy of being described as student conceptions – and are not just errors or guesses made on the test. Our thinking is private, and never available directly to others, and, so, can only be interpreted from the presentations we make to represent our conceptions in a public (shared) space. Sometimes we mis-speak, or we mis-write (so that then our words do not accurately represent our thoughts). Sometimes our intended meanings may be misinterpreted (Taber, 2013).

Perhaps the researchers felt that this process of identifying conceptions from students' texts and utterances was unproblematic – perhaps the assignments seemed so obvious to the researchers that they did not need to exemplify and justify their analytical method. This is unfortunate. There might also be another factor here.

Lost and found in translation?

The study was carried out in Turkey. The paper is in English, and this includes the reported alternative conceptions. The study was carried out "in a public elementary school" (not an international school, for example). Although English is often taught as a foreign language in Turkish schools, the language of instruction, not unreasonably, is Turkish.

So, it seems either

  • the data was collected in (what, for the children, would have been) 'L2' – a second language, or
  • a study carried out (questions asked; answers given) in Turkish has been reported in English, translating where necessary from one language to another.

This issue is not discussed at all in the paper – there is no mention of either the Turkish or English language, nor of anything being translated.

Yet the authors are not oblivious to the significance of language issues in learning. They report how one variant of Jigsaw teaching had "been designed specifically to increase interaction among students of differing language proficiencies in bilingual classrooms" (p.186) and how the research literature reports that sometimes children's ideas reflect "the incorrect use of terms in everyday language" (p.198). However, they did not feel it was necessary to report either that

  1. data had been collected from elementary school children in a second language, or
  2. data had been translated for the purposes of reporting in an English language journal

It seems reasonable to assume they would have appreciated the importance of mentioning option 1, and so it seems much more likely (although readers of the study should not have to guess) the reporting in English involved translation. Yet translation is never a simple algorithmic process, but rather always a matter of interpretation (another stage in analysis), so it would be better if authors always acknowledged this – and offered some basis for readers to consider the translations made were of high quality (Taber, 2018).

Read about guidelines for detailing translation in research reports

It is a general principle that the research community should adopt, surely, that whenever material reported in a research paper has been translated from another language (a) this is reported and (b) evidence of the accuracy and reliability of the translation is offered (Taber, 2018).

I make this point here, as some of the alternative conceptions reported by the authors are a little mystifying, and this may(?) be because their wording has been 'degraded' (and obscured) by imperfect translation.

An alternative conception of combustion?

For example, here are two of the learning objectives from one of the learning activities:

"The students were expected to be able to:

…comment on whether the wood has similar intensive properties before and after combustion

…indicate the combustion reactions in examples of several physical and chemical changes"

Tarhan et al., 2013, p.193

The wording of the first of these examples seems to imply that when wood is burnt, the product is still…wood. That is nonsense, but possibly this is simply a mistranslation of something that made perfect sense in Turkish. (The problem is that a reader can only speculate on whether this is the case, and research reports should be precise and explicit.)

The second learning objective quoted here implies that some combustion reactions are physical changes (or, at least, combustion reactions are components of some physical changes).

Combustion reactions are a class of chemical reactions. 'Chemical reaction' is synonymous with 'chemical change'. So, there are (if you will excuse the double negative) no examples of combustion reactions that are not chemical reactions and which would be said to occur in physical changes. So, this is mystifying, as it is not at all clear what the children were actually being taught unless one assumes the researchers themselves have very serious misconceptions about the chemistry they are teaching.

If a reader actually thought that the researchers themselves held these alternative conceptions

  • the product of combustion of wood is still wood
  • some combustion reactions are (or occur as part of) physical changes

then one could have little confidence in their ability to distinguish between the scientific and alternative conceptions of others. (A reader might also ask how come the journal referees and editor did not ask for corrections here before publication – I certainly wondered about this).

There are other statements the authors make in describing the teaching which are not entirely clear (e.g., "give the order of the changes in matter during combustion reactions", p.194), and this suggests a degree of scepticism is needed in not simply accepting the reported alternative conceptions at face value. This does not negate their interest, but does undermine the paper's authority somewhat.

One of the misconceptions reported in the study is that some students thought that "there is a flame in all combustion reaction". This led me to reflect on whether I could think of any combustion reactions that did not involve a flame – and I must confess none readily came to mind. Perhaps I also have this alternative conception – but it seems a harsh judgement on elementary school learners unless they had actually been taught about combustion reactions without flames (if, indeed, there are such things).


The study reported that some 12 year olds held the 'misconception' that "there is a flame in all combustion reaction[s]".

[Image by Susanne Jutzeler, Schweiz, from Pixabay]


Failing to control variables?

Another objective was for students to "comprehend that temperature has an effect on chemical reaction rate by considering the decay of fruit at room temperature, and the change in color [colour] from green to yellow of fallen leaves in autumn" (p.193). As presented, this is somewhat obscure.

Presumably it is not meant to be a comparison between:

the rate of
decay of fruit at room temperature
andthe rate of
change in colour of fallen leaves in autumn
Explaining that temperature has an effect on chemical reaction rate?

Clearly, even if the change of colour of leaves takes place at a different temperature to room temperature, one cannot compare between totally different processes at different temperatures and draw any conclusions about how "temperature has an effect on chemical reaction rate" . (Presumably, 'control of variables' is taught in the Turkish science curriculum.)

So, one assumes these are two different examples…

But that does not help matters too much. The "decay of fruit at room temperature" (nor, indeed, any other process studied at a single temperature) cannot offer any indication of how "temperature has an effect on chemical reaction rate". The change of colours in leaves of deciduous trees (that usually begins before they fall) is triggered by environmental conditions such as change in day length and temperature. This is part of a very complex system involving a range of pigments, whilst water content of the leaf decreases (once the supply of water through the tree's vascular system is cut off), and it is not clear how much detail these twelve year olds were taught…but it is certainly not a simple matter of a reaction changing rate according to temperature.

Evaluating conceptions

Tarhan and colleagues report their identified alternative conceptions ('misconceptions') under a series of headings. These are reported in their table 4 (p.195). A reader certainly finds some of the entries in this table easy to interpret: they clearly seem to reflect ideas contrary to the canonical science one would expect to be reflected in the curriculum and teaching. Other statements are less obviously evidence of alternative conceptions as they do not immediately seem necessarily at odds with scientific accounts (e.g., associating combustion reactions with flames).

Other reported misconceptions are harder to evaluate. School science is in effect a set of models and representations of scientific accounts that often simplify the actual current state of scientific knowledge. Unless we know exactly what has been taught it is not entirely clear if students' ideas are credit-worthy or erroneous in the specific context of their curriculum.

Moreover, as the paper does not report the data and its analysis, but simply the outcome of the analysis, readers do not know on what basis judgements have been made to assign learners as having one of the listed misconceptions.


Changes of state are chemical changes

A few students from the lecture-based teaching condition were identified as 'having' the misconception that 'changes of state are chemical changes'. This seems a pretty serious error at the end of a teaching sequence on chemical and physical changes.

However, this raises a common issue in terms of reports of alternative conceptions – what exactly does it mean to say that a student has a conception that 'changes of state are chemical changes'? A conception is a feature of someone's thinking – but that encompasses a vast range of potential possibilities from a fleeting notion that is soon forgotten ('I wonder if s orbitals are so-called because they are spherical?') to an on-going commitment to an extensive framework of ideas that a life is lived by (Buddhism, Roman Catholicism, Liberalism, Hedonism, Marxism…).


A person's conceptions can vary along a range of characteristics (Figure from Taber, 2014)


The statement that 'Changes of state are chemical changes' is unlikely to be the basis of anyone's personal creed. It could simply be a confusion of terms. Perhaps a student had a decent understanding of the essential distinction between chemical and physical changes but got the terms mixed up (or was thinking that 'changes of state' meant 'chemical reaction'). That is certainty a serious error that needs correcting, but in terms of understanding of the science, would seem to be less worrying than a deeper conceptual problem.

In their commentary, the authors note of these children:

"They thought that if ice was heated up water formed, and if water was heated steam formed, so new matter was formed and chemical changes occurred".

Tarhan et al., 2013, p.197

It is not clear if this was an explanation the learners gave for thinking "changes of state are chemical changes", or whether "changes of state are chemical changes" was the researchers' gloss on children commenting that "if ice was heated up water formed, and if water was heated steam formed, so new matter was formed and chemical changes occurred".

That a range of students are said to have precisely the same train of thought leads a reader (or, at least, certainly one with experience of undertaking research of this kind) to ask if these are open-ended responses produced by the children, or the selection by the children of one of a number of options offered by the researchers (as pointed out above, the data analysis is not discussed in detail in the paper). That makes a difference in how much weight we might give to the prevalence of the response (putting a tick by the most likely looking option requires less commitment to, and appreciation of, an idea than setting it out yourself in your own personally composed text), illustrating why it is important that research journals should require researchers to give full accounts of their instrumentation and analysis.

Because density of matter changes during changes of state, its identity also changes, and so it is a chemical change

Thirteen of the children (all in the lecture-based teaching condition) were considered to have the conception "Because density of matter changes during changes of state, its identity also changes, and so it is a chemical change". This is clearly a much more specific conception (than 'changes of state are chemical changes') which can be analysed into three components:

  • a change of state is a chemical change, AND
  • we know this because such changes involve a change in identity, AND
  • we know that because a change of state leads to a change in density

Terhan and colleagues claim this conception was "first determined in this study" (p.195).

The specificity is intriguing here – if so many students explicitly and individually built this argument for themselves then this is an especially interesting finding. Unfortunately, the paper does not give enough detail of the methodology for a reader to know if this was the case. Again, if students were just agreeing with an argument offered as an option on the assessment instrument then it is of note, but less significant (as in such cases students might agree with the statement simply because one component resonated – or they may even be guessing rather than leaving an item unanswered). Again this does not completely negate the finding, but it leaves its status very unclear.

Taken together these first two claimed results seem inconsistent – as at least 13 students seem to think "Changes of state are chemical changes". That is, all those who thought that "Because density of matter changes during changes of state, its identity also changes, and so it is a chemical change" would seem to have thought that "Changes of state are chemical changes" (see the Venn diagram below). Yet, we are also told that only five students held the less specific and seemingly subsuming conception "changes of state are chemical changes".


If 13 students think that changes of state are chemical changes because a change of density implies a change of identity; what does it mean that only 5 students think that changes of state are chemical changes?

This looks like an error, but perhaps is just a lack of sufficient detail to make the findings clear. Alternatively, perhaps this indicates some failure in translating material accurately into English.

The changes in the pure matters are physical changes

Six children in the lecture-based teaching condition and one in the jigsaw learning condition were reported as holding the conception that "The changes in the pure matters are physical changes". The authors do not explain what they mean here by "pure matters" (sic, presumably 'matter'?). The only place this term is used in the paper is in relation to this conception (p.195, p.197).

The only other reference to 'pure' was in one of the learning objectives for the teaching:

  • explain the changes of state of water depending on temperature and pressure; give various examples for other pure substances (p.191)

If "pure matter" means a pure sample of a substance, then changes in pure substances are all physical – by definition a chemical changes leads to a different substance/different substances. That would explain why this conception was "first determined [as a misconception] in this study", p.195, as it is not actually a misconception)". So, it does not seem clear precisely why the researchers feel these children have got something wrong here. Again, perhaps this is a failure of translation rather than a failure in the original study?

Changes in shape?

Tarhan and colleagues report two conceptions under the subheading of 'changes in shape'. They seem to be thinking here more of grain size than shape as such. (Another translation issue?) One reported misconception is that if cube sugar is granulated, sugar particles become small [smaller?].


Is it really a misconception to think that "If cube sugar is granulated, sugar particles become small"?

(Image by Bruno /Germany from Pixabay)


Tarhan and colleagues reported that two children in the experimental condition, and 13 in the control condition thought that "If cube sugar is granulated, sugar particles become small". Sugar cubes are made of granules of sugar weakly joined together – they can easily be crumbled into the separate grains. The grains are clearly smaller than the cubes. So, what is important here is what is meant/understood* by the children by the term 'particles'.

(* If this phrasing was produced by the children, then we want to know what they meant by it. If, however, the children were agreeing with a phrase presented to them by researchers, then we wish to know how they understood it.)

If this means quanticle level particles, molecules, then it is clearly an alternative conception – each grain contain vast numbers of molecules, and the molecules are unchanged by the breaking up the cubes. If, however, particles here refers to the cube and grains**, then it is a fair reflection of what happens: one quite large particle of sugar is broken up into many much smaller particles. The ambiguity of the (English) word 'particles' in such contexts is well recognised.

(** That is, if the children used the word 'particles' – did they mean the cubes/grains as particles of sugar? If however the phrasing was produced by the researchers and presented to the children, and if the researchers meant 'particles' to mean 'molecules'; did the children appreciate that intention, or did they understand 'particles' to refer to the cubes and grains?)

However, as no detail is given on the actual data collected (e.g., is this the children's own words; was this based on an open response?), and how it was analysed (and, as I suspect this all occurred in Turkish) the reader has no way to check on this interpretation of the data.

What kind of change is dissolving?

Tarhan and colleagues report a number of 'misconceptions' under the heading of 'molecular solubility'. Two of these are:

  • "The solvation processes are always chemical changes"
  • "The solvation processes are always physical changes"

This reflects a problem of teaching about physical and chemical changes. Dissolving is normally seen as a physical change: there is no new chemical substance formed and dissolving is usually fairly readily reversed. However, as bonds are broken and formed it also has some resemblance to chemical change.2

In dissolving common salt in water, strong ionic bonds are disrupted and the ions are strongly solvated. Yet the usual convention is still to consider this a physical change – the original substance, the salt, can be readily recovered by evaporation of the solvent. A solution is considered a kind of mixture. In any case, as Tarhan and colleagues refer to 'molecular' solubility (strictly solubility refers to substances, not molecules, but still) they were, presumably, only dealing with examples of the dissolving of substances with discrete molecules.

Taking together these two conceptions, it seems that Tarhan and colleagues think that dissolving is sometimes a physical change, and sometimes a chemical change. Presumably they have some criterion or criteria to distinguish those examples of dissolving they consider physical changes from those they consider chemical changes. A reader can only speculate how a learner observing some solute dissolve in a solvent is expected to distinguish these cases. The researchers do not explain what was taught to the students, so it is difficult to appreciate quite what the students supposedly got wrong here.

Sugar is invisible in the water, because new matter is formed

The idea that learners think that new matter is formed on dissolving would indeed be an alternative conception. The canonical view is that new matter is only formed in very high energy processes – such as in the big bang. In both chemical and physical processes studied in the school laboratory there may be transformations of matter, but no new matter.

This seems a rather extreme 'misconception' for the learners to hold. However, a reader might wonder if the students actually suggested that a new substance was formed, and this has been mistranslated. (The Turkish word 'madde' seems to mean either matter or substance.) If these students thought that a new type of substance was formed then this would be an alternative conception (and it would be interesting to know why this led to sugar being invisible – unless they were simply arguing that different appearance implied different substance).

While sugar is dissolving in the water, water damages the structure of sugar and sugar splits off

Whether this is a genuine alternative conception or just imprecise use of language is not clear. It seems reasonable to suggest that while sugar is dissolving in the water, the process breaks up the structure of solid sugar and sugar molecules split off – so some more detail would be useful here. Again, if there has been translation from Turkish this may have lost some of the nuance of the original phrasing through translation into English.

The phrasing reflects an alternative conception that in chemical reactions one reactant is an active agent (here the water doing the damaging) and the other the patient, that is passive and acted upon (here the sugar being damaged) – rather than seeing the reaction as an interaction between two species (Taber & García Franco, 2010) – but there is no suggestion in their paper that this is the issue Tarhan and colleagues are highlighting here.

When sugar dissolves in water, it reacts with water and disappears from sight

If the children thought that dissolving was a chemical reaction then this is an alternative conception – the sugar does indeed disappear from sight, but there has been no reaction.

Again, we might ask if this was actually a misunderstanding (misconception), or imprecise use of language. The sugar does 'react' with the water in the everyday sense of 'reaction'. But this is not a chemical reaction, so this terminology should be avoided in this context.

Even in science, 'reaction' means something different in chemistry and physics: in the sense of Newtonian physics, during dissolving, when a water molecule attracts a sugar molecule {'action')'} there will be an equal and oppositely directed reaction as the sugar molecule attracts the water molecule. This is Newton's third law, which applies to quanticles as much as to planets. If a water molecule and a sugar molecule collide, the force applied by the sugar molecule on the water molecule is equal to the force applied by the water molecule on the sugar molecule.

Read about learning difficulties with Newton's third law

So, 'sugar reacts with water' could be

  • a misunderstanding of dissolving (a genuine alternative conception);
  • a misuse of the chemical term 'reaction'; or
  • a use of the everyday term 'reaction' in a context where this should be avoided as it can be misunderstood

These are somewhat different problems for a teacher to address.

Molecules split off in physical changes and atoms split off in chemical changes

Ten of the children are said to have demonstrated the 'misconception' that molecules split off in physical changes and atoms split off in chemical changes. The authors claim that this misconception has not been reported in previous studies. But is this really a misconception? It may be a simplistic, and imprecise, statement – but I think when I was teaching youngsters of this age I would have been happy to find they have this notion – which at least seems to reflect an ability to imagine and visualise processes at the molecular level.

In dissolving or melting/boiling of simple molecular substances, molecules do indeed 'split off' in a sense, and in at least some chemical changes we can posit mechanisms that, in simple terms at least, involve atoms 'splitting off' from molecules.

So, again, this is another example of how this study is tantalising, without being very informative. The reader is not clear in what sense this is viewed as wrong, or how the conception was detected. (Again, for ten different students to specifically think that 'molecules split off in physical changes and atoms split off in chemical changes' makes one wonder if they volunteered this, or have simply agreed with the statement when having it presented to them).

In conclusion

The main thrust of Tarhan and colleagues' study was to report on an innovation using jig-saw learning (which unfortunately compared this with a form of pedagogy widely considered unsuitable for young children, so offering a limited basis for judging effectiveness of the innovation). As part of the study they collected data to evaluate learning in the two conditions, and used this to identify misconceptions students demonstrated after being taught about physical and chemical changes. The researchers provide a long list of identified misconceptions – but it is not always obvious why these are considered misconceptions, and what the desired responses matching teaching models were.

The researchers do not detail their data collection and analysis instruments and protocols in sufficient detail for a readers to appreciate what they mean by their results. In particular, what it means to have a misconception – e.g., to give a definitive statement in an interview, or just to select some response on a test as the answer that looked most promising at the time. Clearly we give much more weight to a notion that a learner presents in their own words as an explanation for some phenomenon, than the selection of one option from a menu of statements presented to them that comes with no indication of their confidence in the selection made.

Of particular concern: either the children were asked questions in a second language that they may not have been sufficiently fluent in to fully understand questions or compose clear responses; or none of the misconceptions reported are presented in their original form and they have all been translated by someone (unspecified) of uncertain ability as a translator. (A suitably qualified translator would need to have high competence in both languages and a strong familiarity with the subject matter being translated.)

In the circumstances, Tarhan and colleagues' reported misconceptions are little more than intriguing. In science, the outcome of a study is only informative in the context of understanding exactly how the data were obtained, and how they have been processed. Without that, readers are asked to take a researcher's conclusions on faith, rather than be persuaded of them by a logical chain of argument.


p.s. For anyone who did not know, but wondered: s orbitals are not so-called because they are spherical: the designation derives from a label ('sharp') that was applied to some lines in atomic spectra.


Work cited

Notes


1 To my reading, the publication title 'Research in Science and Technological Education' seems to suggest the journal has two distinct and somewhat disconnected foci, that is:

Research in ( Science ) and ( Technological Education )

And it would be better (that is, most consistently) titled as

Research in Science and Technology Education

{Research in ( Science and Technology ) Education}

or

Research in Scientific and Technological Education

{Research in ( Scientific and Technological ) Education}

but, hey, I know I am pedantic.


2 The table (Table 1.2 in the source) was followed by the following text:

"The first criterion listed is the most fundamental and is generally clear cut as long as the substances present before and after the change are known. If a new substance has been produced, it will almost certainly have different melting and boiling temperatures than the original substance.

The other [criteria] are much more dubious. Some chemical changes involve a great deal of energy being released, such as the example of burning magnesium in air, or even require a considerable energy input, such as the example of the electrolysis of water. However, other reactions may not obviously involve large energy transfers, for example when the enthalpy and entropy changes more or less cancel each other out…. The rusting of iron is a chemical reaction, but usually occurs so slowly that it is not apparent whether the process involves much energy transfer ….

Generally speaking, physical changes are more readily reversible than chemical changes. However, again this is not a very definitive criterion. The idea that chemical reactions tend to either 'go' or not is a useful approximation, but there are many examples of reactions that can be readily reversed…. In principle, all reactions involve equilibria of forward and reverse reactions, and can be reversed by changing the conditions sufficiently. When hydrogen and oxygen are exploded, it takes a pedant to claim that there is also a process of water molecules being converted into oxygen and hydrogen molecules as the reaction proceeds, which means the reaction will continue for ever. Technically such a claim may be true, but for all practical purposes the explosion reflects a reaction that very quickly goes to completion.

One technique that can be used to separate iodine from sand is to warm the mixture gently in an evaporating basin, over which is placed an upturned beaker or funnel. The iodine will sublime – turn to vapour – before recondensing on the cold glass, separated from the sand. The same technique may be used if ammonium chloride is mixed with the sand. In both cases the separation is achieved because sand (which has a high melting temperature) is mixed with another substance in the solid state that is readily changed into a vapour by warming, and then readily recovered as a solid sample when the vapour is in contact with a colder surface. There are then reversible changes involved in both cases:

solid iodine ➝ iodine vapour

ammonium chloride ➝ ammonia + hydrogen chloride

In the first case, the process involves only changes of state: evaporation and condensation – collectively called sublimation. However the second case involves one substance (a salt) changing to two other substances. To a student seeing these changes demonstrated, there would be little basis to infer one is (usually considered as) a chemical change, but not the other. …

The final criterion in Table 1.2 concerns whether bonds are broken and made during a change, and this can only be meaningful for students once they have learnt about particle models of the submicroscopic structure of matter… In a chemical change, there will be the breaking of bonds that hold together the reactants and the formation of new bonds in the products. However, we have to be careful here what we mean by 'bond' …

When ice melts and water boils, 'intermolecular' forces between molecules are disrupted and this includes the breaking of hydrogen 'bonds'. However, when people talk about bond breaking in the context of chemical and physical changes, they tend to mean strong chemical bonds such as covalent, ionic and metallic bonds…

Yet even this is not clear cut. When metals evaporate or are boiled, metallic bonds are broken, although the vapour is not normally considered a different substance. When elements such as carbon and phosphorus undergo phase changes relating to allotropy, there is breaking, and forming, of bonds, which might suggest these changes are chemical and that the different forms of the same elements should be considered different substances. …

A particularly tricky case occurs when we dissolve materials to form solutions, especially materials with ionic bonding…. Dissolving tends to involve small energy changes, and to be readily reversible, and is generally considered a physical change. However, to dissolve an ionic compound such as sodium chloride (table salt), the strong ionic bonds between the sodium and chloride ions have to be overcome (and new bonds must form between the ions and solvent molecules). This would seem to suggest that dissolving can be a chemical change according to the criterion of bond breaking and formation (Table 1.2)."

(Taber, 2012b, pp.31-33)

Study reports that non-representative sample of students has average knowledge of earthquakes

When is a cross-sectional study not a cross-sectional study?


Keith S. Taber


A biomedical paper?

I only came to this paper because I was criticising the Biomedical Journal of Scientific & Technical Research's claimed Impact Factor which seems to be a fabrication. I saw this particular paper being featured in a recent tweet from the journal and wondered how it fitted in a biomedical journal. The paper is on an important topic – what young people know about how to respond to an earthquake, but I was not sure why it fitted in this particular journal.

Respectable journals normally have a clear scope (i.e., the range of topics within which they consider submissions for publication) – whereas predatory journals are often primarily interested in publishing as many papers as possible (and so attracting publication fees from as many authors as possible) and so may have no qualms about publishing material that would seem to be out of scope.

This paper reports a questionnaire about secondary age students' knowledge of earthquakes. It would seem to be an education study, possibly even a science education study, rather than a 'biomedical' study. (The journal invites papers from a wide range of fields 1, some of which – geology, chemical engineering – are not obviously 'biomedical' in nature; but not education.)

The paper reports research (so I assume is classed as 'research' in terms of the scale of charges) and comes from Bangladesh (which I assume the journal publishers consider a low income country) and so it would seem that the author's would have been charged $799 to be published in this journal. Part of what authors are supposed to get for that fee is for editors to arrange peer review to provide evaluation of, feedback on, and recommendations for improving, their work.

Peer review

Respectable journals employ rigorous peer review to ensure that only work of quality is published.

Read about peer review

According to the Biomedical Journal of Scientific & Technical Research website:

Peer review process is the system used to assess the quality of a manuscript before it is published online. Independent professionals/experts/researchers in the relevant research area are subjected to assess the submitted manuscripts for originality, validity and significance to help editors determine whether a manuscript should be published in their journal. 

This Peer review process helps in validating the research works, establish a method by which it can be evaluated and increase networking possibilities within research communities. Despite criticisms, peer review is still the only widely accepted method for research validation

Only the articles that meet good scientific standards, explanations, records and proofs of their work presented with Bibliographic reasoning (e.g., acknowledge and build upon other work in the field, rely on logical reasoning and well-designed studies, back up claims with evidence etc.) are accepted for publication in the Journal.

https://biomedres.us/peer-review-process.php

Which seems reassuring. It seems 'Preventive Practice on Earthquake Preparedness Among Higher Level Students of Dhaka City' should then only have been published after evaluation in rigorous peer review. Presumably any weaknesses in the submission would have been highlighted in the review process helping the authors to improve their work before publication. Presumably, the (unamed) editor did not approve publication until peer reviewers were satisfied the paper made a valid new contribution to knowledge and, accordingly, recommended publication. 2


The paper was, apparently, submitted; screened by editors; sent to selected expert peer reviewers; evaluated by reviewers, so reports could be returned to the editor who collated them, and passed them to the authors with her/his decision; revised as indicated; checked by editors and reviewers, leading to a decision to publish; copy edited, allowing proofs to be sent to authors for checking; and published, all in less than three weeks.

Although supposedly published in July 2021, the paper seems to be assigned to an issue published a year before it was submitted

Although one might wonder if a journal which seems to advertise with an inflated Impact Factor can be trusted to follow the procedures it claims. So, I had a quick look at the paper.

The abstract begins:

The present study was descriptive Cross-sectional study conducted in Higher Secondary Level Students of Dhaka, Bangladesh, during 2017. The knowledge of respondent seems to be average regarding earthquake. There is a found to have a gap between knowledge and practice of the respondents.

Gurung & Khanum, 2021, p.29274

Sampling a population (or not)

So, this seems to be a survey, and the population sampled was Higher Secondary Level Students of Dhaka, Bangladesh. Dhaka has a population of about 22.5 million people. I could not readily find out how many of these might be considered 'Higher Secondary Level', but clearly it will be many, many thousands – I would imagine about half a million as a 'ball-park' figure.


Dhaka has a large population of 'higher secondary level students'
(Image by Mohammad Rahmatullah from Pixabay)

For a survey of a population to be valid it needs to be based on a sample which is large enough to minimise errors in extrapolating to the full population, and (even more importantly) the sample needs to be representative of the population.

Read about sampling

Here:

"Due to time constrain the sample of 115."

Gurung & Khanum, 2021, p.29276

So, the sample size was limited to 115 because of time constraints. This would likely lead to large errors in inferring population statistics from the sample, but could at least give some indication of the population as long as the 115 were known to be reasonable representative of the wider population being surveyed.

The reader is told

"the study sample was from Mirpur Cantonment Public School and College , (11 and 12 class)."

Gurung & Khanum, 2021, p.29275

It seems very unlikely that a sample taken from any one school among hundreds could be considered representative of the age cohort across such a large City.

Is the school 'typical' of Dhaka?

The school website has the following evaluation by the school's 'sponsor':

"…one the finest academic institutions of Bangladesh in terms of aesthetic beauty, uncompromised quality of education and, most importantly, the sheer appeal among its learners to enrich themselves in humanity and realism."

Major General Md Zahirul Islam

The school Principal notes:

"Our visionary and inspiring teachers are committed to provide learners with all-rounded educational experiences by means of modern teaching techniques and incorporation of diverse state-of-the-art technological aids so that our students can prepare themselves to face the future challenges."

Lieutenant Colonel G M Asaduzzaman

While both of these officers would be expected to be advocates for the school, this does not give a strong impression that the researchers have sought a school that is typical of Dhakar schools.

It also seems unlikely that this sample of 115 reflects all of the students in these grades. According to the school website, there are 7 classes in each of these two grades so the 115 students were drawn from 14 classes. Interestingly, in each year 5 of the 7 classes are following a science programme 3 – alongside with one business studies and one humanities class. The paper does not report which programme(s) were being followed by the students in the sample. Indeed no information is given regarding how the 115 were selected. (Did the researchers just administer the research instrument to the first students they came across in the school? Were all the students in these grades asked to contribute, and only 115 returned responses?)

Yet, if the paper was seen and evaluated by "independent professionals/experts/researchers in the relevant research area" they seem to have not questioned whether such a small and unrepresentative sample invalidated the study as being a survey of the population specified.

Cross-sectional studies

A cross-sectional study examines and compares different slices of a population – so here, different grades. Yet only two grades were sampled, and these were adjacent grades – 11 and 12 – which is not usually ideal to make comparisons across ages.

There could be a good reason to select two grades that are adjacent in this way. However, the authors do not present separate data for year 11 and year 12, but rather pool it. So they make no comparisons between these two year groups. This "Cross-sectional study" was then NOT actually a cross-sectional study.

If the paper did get sent to "independent professionals/experts/researchers in the relevant research area" for review, it seems these experts missed that error.

Theory and practice?

The abstract of the paper claims

"There is a found to have a gap between knowledge and practice of the respondents. The association of the knowledge and the practice of the students were done in which after the cross-tabulation P value was 0.810 i.e., there is not any [statistically significant?] association between knowledge and the practice in this study."

Gurung & Khanum, 2021, p.29274

This seems to suggest that student knowledge (what they knew about earthquakes) was compared in some way with practice (how they acted during an earthquake or earthquake warning). But the authors seem to have only collected data with (what they label) a questionnaire. They do not have any data on practice. The distinction they seem to really be making is between

  • knowledge about earthquakes, and
  • knowledge about what to do in the event of an earthquake.

That might be a useful thing to examine, but any "independent professionals/experts/researchers in the relevant research area"asked to look at the submission do not seem to have noted that the authors do not investigate practice and so needed to change the descriptions they use an claims they make.

Average levels of knowledge

Another point that any expert reviewer 'worth their salt' would have queried is the use of descriptors like 'average' in evaluating students responses. The study concluded that

"The knowledge of earthquake and its preparedness among Higher Secondary Student were average."

Gurung & Khanum, 2021, p.29280

But how do the authors know what counts as 'average'?

This might mean that there is some agreed standard here described in extant literature – but, if so, this is not revealed. It might mean that the same instrument had previously been used to survey nationally or internationally to offer a baseline – but this is not reported. Some studies on similar themes carried out elsewhere are referred to, but it is not clear they used the same instrumentation or analytical scheme. Indeed, the reader is explicitly told very little about the instrument used:

"Semi-structured both open ended and close ended questionnaire was used for this study."

Gurung & Khanum, 2021, p.29276

The authors seem to have forgotten to discuss the development, validation and contents of the questionnaire – and any experts asked to evaluate the submission seem to have forgotten to look for this. I would actually suggest that the authors did not really use a questionnaire, but rather an assessment instrument.

Read about questionnaires

A questionnaire is used to survey opinions, views and so forth – and there are no right or wrong answers. (What type of music do you like? Oh jazz, sorry that's not the right answer.) As the authors evaluated and scored the student responses this was really an assessment.

The authors suggest:

"In this study the poor knowledge score was 15 (13%), average 80 (69.6%) and good knowledge score 20 (17.4%) among the 115 respondents. Out of the 115 respondents most of the respondent has average knowledge and very few 20 (17.4%) has good knowledge about earthquake and the preparedness of it."

Gurung & Khanum, 2021, p.29280

Perhaps this means that the authors had used some principled (but not revealed) technique to decide what counted as poor, average and good.

ScoreDescription
15poor knowledge
80average knowledge
20good knowledge
Descriptors applied to student scores on the 'questionnaire'

Alternatively, perhaps "poor knowledge score was 15 (13%), average 80 (69.6%) and good knowledge score 20 (17.4%)" is reporting what was found in terms of the distribution in this sample – that is, they empirically found these outcomes in this distribution.

Well, not actually these outcomes, of course, as that would suggest that a score of 20 is better than a score of 80, but presumably that is just a typographic error that was somehow missed by the authors when they made their submission, then missed by the editor who screened the paper for suitability (if there is actually an editor involved in the 'editorial' process for this journal), then missed by expert reviewers asked to scrutinise the manuscript (if there really were any), then missed by production staff when preparing proofs (i.e., one would expect this to have been raised as an 'author query' on proofs 4), and then missed again by authors when checking the proofs for publication.

If so, the authors found that most respondents got fairly typical scores, and fewer scored at the tails of the distribution – as one would expect. On any particular assessment, the average performance is (as the authors report here)…average.


Work cited:
  • Gurung, N. and Khanum, H. (2021) Preventive Practice on Earthquake Preparedness Among Higher Level Students of Dhaka City. Biomedical Journal of Scientific & Technical Research, July, 2020, Volume 37, 2, pp 29274-29281

Note:

1 The Biomedical Journal of Scientific & Technical Research defines its scope as including:

  • Agri and Aquaculture 
  • Biochemistry
  • Bioinformatics & Systems Biology 
  • Biomedical Sciences
  • Clinical Sciences
  • Chemical Engineering
  • Chemistry
  • Computer Science 
  • Economics & Accounting 
  • Engineering
  • Environmental Sciences
  • Food & Nutrition
  • General Science
  • Genetics & Molecular Biology
  • Geology & Earth Science
  • Immunology & Microbiology
  • Informatics
  • Materials Science
  • Orthopaedics
  • Mathematics
  • Medical Sciences
  • Nanotechnology
  • Neuroscience & Psychology
  • Nursing & Health Care
  • Pharmaceutical Sciences
  • Physics
  • Plant Sciences
  • Social & Political Sciences 
  • Veterinary Sciences 
  • Clinical & Medical 
  • Anesthesiology
  • Cardiology
  • Clinical Research 
  • Dentistry
  • Dermatology
  • Diabetes & Endocrinology
  • Gastroenterology
  • Genetics
  • Haematology
  • Healthcare
  • Immunology
  • Infectious Diseases
  • Medicine
  • Microbiology
  • Molecular Biology
  • Nephrology
  • Neurology
  • Nursing
  • Nutrition
  • Oncology
  • Ophthalmology
  • Pathology
  • Pediatrics
  • Physicaltherapy & Rehabilitation 
  • Psychiatry
  • Pulmonology
  • Radiology
  • Reproductive Medicine
  • Surgery
  • Toxicology

Such broad scope is a common characteristic of predatory journals.


2 The editor(s) of a research journal is normally a highly regarded academic in the field of the journal. I could not find the name of the editor of this journal although it has seven associate editors and dozens of people named as being on an 'editorial committee'. Whether any of these people actually carry out the functions of an academic editor or whether this work is delegated to non-academic office staff is a moot point.


3 The classes are given names. So, nursery classes include Lotus and Tulip and so forth. In the senior grades, the science classes are called:

  • Flora
  • Neon
  • Meson
  • Sigma
  • Platinam [sic]
  • Argon
  • Electron
  • Neutron
  • Proton
  • Redon [sic]

4 Production staff are not expected to be experts in the topic of the paper, but they do note any obvious omissions (such as missing references) or likely errors and list these as 'author queries' for authors to respond to when checking 'proofs', i.e., the article set in the journal format as it will be published.

Assessing Chemistry Laboratory Equipment Availability and Practice

Comparative education on a local scale?

Keith S. Taber

Image by Mostafa Elturkey from Pixabay 

I have just read a paper in a research journal which compares the level of chemistry laboratory equipment and 'practice' in two schools in the "west Gojjam Administrative zone" (which according to a quick web-search is in the Amhara Region in Ethiopia). According to Yesgat and Yibeltal (2021),

"From the analysis of Chemistry laboratory equipment availability and laboratory practice in both … secondary school and … secondary school were found in very low level and much far less than the average availability of chemistry laboratory equipment and status of laboratory practice. From the data analysis average chemistry laboratory equipment availability and status of laboratory practice of … secondary school is better than that of Jiga secondary school."

Yesgat and Yibeltal, 2021: abstract [I was tempted to omit the school names in this posting as I was not convinced the schools had been treated reasonably, but the schools are named in the very title of the article]

Now that would seem to be something that could clearly be of interest to teachers, pupils, parents and education administrators in those two particular schools, but it raises the question that can be posed in relation to any research: 'so what?' The findings might be a useful outcome of enquiry in its own context, but what generalisable knowledge does this offer that justifies its place in the research literature? Why should anyone outside of West Gojjam care?

The authors tell us,

"There are two secondary schools (Damot and Jiga) with having different approach of teaching chemistry in practical approach"

Yesgat and Yibeltal, 2021: 96

So, this suggests a possible motivation.

  • If these two approaches reflect approaches that are common in schools more widely, and
  • if these two schools can be considered representative of schools that adopt these two approaches, and
  • if 'Chemistry Laboratory Equipment Availability and Practice' can be considered to be related to (a factor influencing? an effect of?) these different approaches, and
  • if the study validly and reliably measures 'Chemistry Laboratory Equipment Availability and Practice', and
  • if substantive differences are found between the schools

then the findings might well be of wider interest. As always in research, the importance we give to findings depends upon a whole logical chain of connections that collectively make an argument.

Spoiler alert!

At the end of the paper, I was none the wiser what these 'different approaches' actually were.

A predatory journal

I have been reading some papers in a journal that I believed, on the basis of its misleading title and website details, was an example of a poor-quality 'predatory journal'. That is, a journal which encourages submissions simply to be able to charge a publication fee (currently $1519, according to the website), without doing the proper job of editorial scrutiny. I wanted to test this initial evaluation by looking at the quality of some of the work published.

Although the journal is called the Journal of Chemistry: Education Research and Practice (not to be confused, even if the publishers would like it to be, with the well-established journal Chemistry Education Research and Practice) only a few of the papers published are actually education studies. One of the articles that IS on an educational topic is called 'Assessment of Chemistry Laboratory Equipment Availability and Practice: A Comparative Study Between Damot and Jiga Secondary Schools' (Yesgat & Yibeltal, 2021).

Comparative education?

Yesgat and Yibeltal imply that their study falls in the field of comparative education. 1 They inform readers that 2,

"One purpose of comparative education is to stimulate critical reflection about our educational system, its success and failures, strengths and weaknesses. This critical reflection facilitates self-evaluation of our work and is the basis for determining appropriate courses of action. Another purpose of comparative education is to expose us to educational innovations and systems that have positive outcomes. Most compartivest states [sic] that comparative education has four main purposes. These are:

To describe educational systems, processes or outcomes

To assist in development of educational institutions and practices

To highlight the relationship between education and society

To establish generalized statements about education that are valid in more than one country

Yesgat & Yibeltal, 2021: 95-96
Comparative education studies look to characterise (national) education systems in relation to their social/cultural contexts (Image by Gerd Altmann from Pixabay)

Of course, like any social construct, 'comparative education' is open to interpretation and debate: for example, "that comparative education brings together data about two or more national systems of education, and comparing and contrasting those data" has been characterised as an "a naive and obvious answer to the question of what constitutes comparative education" (Turner, 2019, p.100).

There is then some room for discussion over whether particular research outputs should count as 'comparative education' studies or not. Many comparative education studies do not actually compare two educational systems, but rather report in detail from a single system (making possible subsequent comparisons based across several such studies). These educational systems are usually understood as national systems, although there may be a good case to explore regional differences within a nation if regions have autonomous education systems and these can be understood in terms of broader regional differences.

Yet, studying one aspect of education within one curriculum subject at two schools in one educational educational administrative area of one region of one country cannot be understood as comparative education without doing excessive violence to the notion. This work does not characterise an educational system at national, regional or even local level.

My best assumption is that as the study is comparing something (in this case an aspect of chemistry education in two different schools) the authors feel that makes it 'comparative education', by which account of course any educational experiment (comparing some innovation with some kind of comparison condition) would automatically be a comparative education study. We all make errors sometimes, assuming terms have broader or different meanings than their actual conventional usage – and may indeed continue to misuse a term till someone points this out to us.

This article was published in what claims to be a peer reviewed research journal, so the paper was supposedly evaluated by expert reviewers who would have provided the editor with a report on strengths and weaknesses of the manuscript, and highlighted areas that would need to be addressed before possible publication. Such a reviewer would surely have reported that 'this work is not comparative education, so the paragraph on comparative education should either be removed, or authors should contextualise it to explain why it is relevant to their study'.

The weak links in the chain

A research report makes certain claims that derive from a chain of argument. To be convinced about the conclusions you have to be convinced about all the links in the chain, such as:

  • sampling (were the right people asked?)
  • methodology (is the right type of research design used to answer the research question?)
  • instrumentation (is the data collection instrument valid and reliable?)
  • analysis (have appropriate analytical techniques been carried out?)

These considerations cannot be averaged: if, for example, a data collection instrument does not measure what it is said to measure, then it does not matter how good the sample, or how careful the analysis, the study is undermined and no convincing logical claims can be built. No matter how skilled I am in using a tape measure, I will not be able to obtain accurate weights with it.

Sampling

The authors report the make up of their sample – all the chemistry teachers in each school (13 in one, 11 in the other), plus ten students from each of grades 9, 10 and 11 in each school. They report that "… 30 natural science students from Damot secondary school have been selected randomly. With the same technique … 30 natural sciences students from Jiga secondary school were selected".

Random selection is useful to know there is no bias in a sample, but it is helpful if the technique for randomisation is briefly reported to assure readers that 'random' is not being used as a synonym for 'arbitrary' and that the technique applied was adequate (Taber, 2013b).

A random selection across a pooled sample is unlikely to lead to equal representation in each subgroup (From Taber, 2013a)

Actually, if 30 students had been chosen at random from the population of students taking natural sciences in one of the schools, it would be extremely unlikely they would be evenly spread, 10 from each year group. Presumably, the authors made random selections within these grade levels (which would be eminently sensible, but is not quite what they report).

Read about the criterion for randomness in research

Data collection

To collect data the authors constructed a questionnaire with Likert-type items.

"…questionnaire was used as data collecting instruments. Closed ended questionnaires with 23 items from which 8 items for availability of laboratory equipment and 15 items for laboratory practice were set in the form of "Likert" rating scale with four options (4=strongly agree, 3=agree, 2=disagree and 1=strongly disagree)"

Yesgat & Yibeltal, 2021: 96

These categories were further broken down (Yesgat & Yibeltal, 2021: 96): "8 items of availability of equipment were again sub grouped in to

  • physical facility (4 items),
  • chemical availability (2 items), and
  • laboratory apparatus (2 items)

whereas 15 items of laboratory practice were further categorized as

  • before actual laboratory (4 items),
  • during actual laboratory practice (6 items) and
  • after actual laboratory (5 items)

Internal coherence

So, there were two basic constructs, each broken down into three sub-constructs. This instrument was piloted,

"And to assure the reliability of the questionnaire a pilot study on a [sic] non-sampled teachers and students were conducted and Cronbach's Alpha was applied to measure the coefficient of internal consistency. A reliability coefficient of 0.71 was obtained and considered high enough for the instruments to be used for this research"

Yesgat & Yibeltal, 2021: 96

Running a pilot study can be very useful as it can highlight issues about items. However, although simply asking people to complete a questionnaire might highlight items people could not make any sense of, it may not be as useful as interviewing them about how they understood items to check that respondents understand items in the same way as researchers.

The authors cite the value of Cronbach's alpha to demonstrate their instrument has internal consistency. However, they seem to be quoting the value obtained in the pilot study, where the statistic strictly applies to a particular administration of an instrument (so the value from the main study is more relevant to the results reported).

More problematic, the authors appear to cite a value of alpha from across all 23 items (n.b., the value of alpha tends to increase as the number of items increases, so what is considered an acceptable value needs to allow for the number of items included) when these are actually two distinct scales: 'availability of laboratory equipment' and 'laboratory practice'. Alpha should be quoted separately for each scale – values across distinct scales are not useful (Taber, 2018). 3

Do the items have face validity?

The items in the questionnaire are reported in appendices (pp.102-103), so I have tabulated them here, so readers can consider

  • (a) whether they feel these items reflect the constructs of 'availability of equipment' and 'laboratory practice';
  • (b) whether the items are phrased in a clear way for both teachers and students (the authors report "conceptually the same questionnaires with different forms were prepared" (p.101) but if this means different wording fro teachers than students this is not elaborated – teachers were also asked demographic questions about their educational level)); and
  • (c) whether they are all reasonable things to expect both teachers and students to be able to rate.
'Availability of equipment' items'Laboratory practice' items
Structured and well- equipped laboratory roomYou test the experiments before your work with students
Availability of electric system in laboratory roomYou give laboratory manuals to student before practical work
Availability of water system in laboratory roomYou group and arrange students before they are coming to laboratory room
Availability of laboratory chemicals are available [sic]You set up apparatus and arrange chemicals for activities
No interruption due to lack of lab equipmentYou follow and supervise students when they perform activities
Isolated bench to each student during laboratory activitiesYou work with the lab technician during performing activity
Chemicals are arranged in a logical order.You are interested to perform activities?
Laboratory apparatus are arranged in a logical orderYou check appropriate accomplishment of your students' work
Check your students' interpretation, conclusion and recommendations
Give feedbacks to all your students work
Check whether the lab report is individual work or group
There is a time table to teachers to conduct laboratory activities.
Wear safety goggles, eye goggles, and other safety equipment in doing so
Work again if your experiment is failed
Active participant during laboratory activity
Items teachers and students were asked to rate on a four point scale (agree / strongly agree / disagree / strongly disagree)

Perceptions

One obvious limitation of this study is that it relies on reported perceptions.

One way to find out about the availability of laboratory equipment might be to visit teaching laboratories and survey them with an observation schedule – and perhaps even make a photographic record. The questionnaire assumes that teacher and student perceptions are accurate and that honest reports would be given (might teachers have had an interest in offering a particular impression of their work?)

Sometimes researchers are actually interested in impressions (e.g., for some purposes whether a students considers themselves a good chemistry student may be more relevant than an objective assessment), and sometimes researchers have no direct access to a focus of interest and must rely on other people's reports. Here it might be suggested that a survey by questionnaire is not really the best way to, for example, "evaluate laboratory equipment facilities for carrying out practical activities" (p.96).

Findings

The authors describe their main findings as,

"Chemistry laboratory equipment availability in both Damot secondary school and Jiga secondary school were found in very low level and much far less than the average availability of chemistry laboratory equipment. This finding supported by the analysis of one sample t-values and as it indicated the average availability of laboratory equipment are very much less than the test value and the p-value which is less than 0.05 indicating the presence of significant difference between the actual availability of equipment to the expected test value (2.5).

Chemistry laboratory practice in both Damot secondary school and Jiga secondary school were found in very low level and much far less than the average chemistry laboratory practice. This finding supported by the analysis of one sample t-values and as it indicated the average chemistry laboratory practice are very much less than the test value and the p-value which is less than 0.05 indicating the presence of significant difference between the actual chemistry laboratory practice to the expected test value."

Yesgat & Yibeltal, 2021: 101 (emphasis added)

This is the basis for the claim in the abstract that "From the analysis of Chemistry laboratory equipment availability and laboratory practice in both Damot secondary school and Jiga secondary school were found in very low level and much far less than the average availability of chemistry laboratory equipment and status of laboratory practice."

'The average …': what is the standard?

But this raises a key question – how do the authors know what the "the average availability of chemistry laboratory equipment and status of laboratory practice" is, if they have only used their questionnaire in two schools (which are both found to be below average)?

Yesgat & Yibeltal have run a comparison between the average ratings they get from the two schools on their two scales and the 'average test value' rating of 2.5. As far as I can see, this is not an empirical value at all. It seems the authors have just assumed that if people are asked to use a four point scale – 1, 2, 3, 4 – then the average rating will be…2.5. Of course, that is a completely arbitrary assumption. (Consider the question – 'how much would you like to be beaten and robbed today?': would the average response be likely to be nominal mid-point of a ratings scale?) Perhaps if a much wider survey had been undertaken the actual average rating would have been 1.9 0r 2.7 or …

That is even assuming that 'average' is a meaningful concept here. A four point Likert scale is an ordinal scale ('agree' is always less agreement than 'strongly agree' and more than 'disagree') but not a ratio scale (that is, it cannot be assumed that the perceived 'agreement' gap (i) from 'strongly disagree' to 'disagree' is the same for each respondent and the same as that (ii) from 'disagree' to 'agree' and (iii) from 'agree' to 'strongly agree'). Strictly, Likert scale ratings cannot be averaged (better being presented as bar charts showing frequencies of response) – so although the authors carry out a great deal of analysis, much of this is, strictly, invalid.

So what has been found out from this study?

I would very much like to know what peer reviewers made of this study. Expert reviewers would surely have identified some very serious weaknesses in the study and would have been expected to have recommended some quite major revisions even if they thought it might eventually be publishable in a research journal.

An editor is expected to take on board referee evaluations and ask authors to make such revisions as are needed to persuade the editor the submission is ready for publication. It is the job of the editor of a research journal, supported by the peer reviewers, to

a) ensure work of insufficient quality is not published

b) help authors strengthen their paper to correct errors and address weaknesses

Sometimes this process takes some time, with a number of cycles of revision and review. Here, however, the editor was able to move to a decision to publish in 5 days.

The study reflects a substantive amount of work by the authors. Yet, it is hard to see how this study, at least as reported in this journal, makes a substantive contribution to public knowledge. The study finds that one school has somewhat higher survey ratings on an instrument that has not been fully validated than another school, and is based on a pooling of student and teacher perceptions, and which guesses that both rate lower than a hypothetical 'average' school. The two schools were supposed to represent a "different approach[es] of teaching chemistry in practical approach" – but even if that is the case, the authors have not shared with their readers what these different approaches are meant to be. So, there would be no possibility of generalising from the schools to 'approach[es] of teaching chemistry', even if that was logically justifiable. And comparative education it is not.

This study, at least as published, does not seem to offer useful new knowledge to the chemistry education community that could support teaching practice or further research. Even in the very specific context of the two specific schools it is not clear what can be done with the findings which simply reflect back to the informants what they have told the researchers, without exploring the reasons behind the ratings (how do different teachers and students understand what counts as 'Chemicals are arranged in a logical order') or the values the participants are bringing to the study (is 'Check whether the lab report is individual work or group' meant to imply that it is seen as important to ensure that students work cooperatively or to ensure they work independently or …?)

If there is a problem highlighted here by the "very low levels" (based on a completely arbitrary interpretation of the scales) there is no indication of whether this is due to resourcing of the schools, teacher preparation, levels of technician support, teacher attitudes or pedagogic commitments, timetabling problems, …

This seems to be a study which has highlighted two schools, invited teachers and students to complete a dubious questionnaire, and simply used this to arbitrarily characterise the practical chemistry education in the schools as very poor, without contextualising any challenges or offering any advice on how to address the issues.

Work cited:
Note:

1 'Imply' as Yesgat and Yibeltal do not actually state that they have carried out comparative education. However, if they do not think so, then the paragraph on comparative education in their introduction has no clear relationship with the rest of the study and is not more than a gratuitous reference, like suddenly mentioning Nottingham Forest's European Cup triumphs or noting a preferred flavour of tea.


2 This seemed an intriguing segment of the text as it was largely written in a more sophisticated form of English than the rest of the paper, apart from the odd reference to "Most compartivest [comparative education specialists?] states…" which seemed to stand out from the rest of the segment. Yesgat and Yibeltal do not present this as a quote, but cite a source informing their text (their reference [4] :Joubish, 2009). However, their text is very similar to that in another publication:

Quote from Mbozi, 2017, p.21Quote from Yesgat and Yibeltal, 2021, pp.95-96
"One purpose of comparative education is to stimulate critical reflection about our educational system, its success and failures, strengths and weaknesses."One purpose of comparative education is to stimulate critical reflection about our educational system, its success and failures, strengths and weaknesses.
This critical reflection facilitates self-evaluation of our work and is the basis for determining appropriate courses of action.This critical reflection facilitates self-evaluation of our work and is the basis for determining appropriate courses of action.
Another purpose of comparative education is to expose us to educational innovations and systems that have positive outcomes. Another purpose of comparative education is to expose us to educational innovations and systems that have positive outcomes.
The exposure facilitates our adoption of best practices.
Some purposes of comparative education were not covered in your exercise above.
Purposes of comparative education suggested by two authors Noah (1985) and Kidd (1975) are presented below to broaden your understanding of the purposes of comparative education.
Noah, (1985) states that comparative education has four main purposes [4] and these are:Most compartivest states that comparative education has four main purposes. These are:
1. To describe educational systems, processes or outcomes• To describe educational systems, processes or outcomes
2. To assist in development of educational institutions and practices• To assist in development of educational institutions and practices
3. To highlight the relationship between education and society• To highlight the relationship between education and society
4. To establish generalized statements about education, that are valid in more than one country."• To establish generalized statements about education that are valid in more than one country"
Comparing text (broken into sentences to aid comparison) from two sources

3 There are more sophisticated techniques which can be used to check whether items do 'cluster' as expected for a particular sample of respondents.


4 As suggested above, researchers can pilot instruments with interviews or 'think aloud' protocols to check if items are understood as intended. Asking assumed experts to read through and check 'face validity' is of itself quite a limited process, but can be a useful initial screen to identify items of dubious relevance.

What COVID really likes

Researching viral preferences

Keith S. Taber

When I was listening to the radio news I heard a clip of the Rt. Hon. Sajid Javid MP, the U.K. Secretary of State for Health and Social Care, talking about the ongoing response to the COVID pandemic:

Health Secretary Sajid Javid talking on 12th September

"Now that we are entering Autumn and Winter, something that COVID and other viruses, you know, usually like, the prime minister this week will be getting out our plans to manage COVID over the coming few months."

Sajid Javid

So, COVID and other viruses usually like Autumn and Winter (by implication, presumably, in comparison with Spring and Summer).

This got me wondering how we (or Sajid, at least) could know what the COVID virus (i.e., SARS-CoV-2 – severe acute respiratory syndrome coronavirus 2) prefers – what the virus 'likes'. I noticed that Mr Javid offered a modal qualification to his claim: usually. It seemed 'COVID and other viruses' did not always like Autumn and Winter, but usually did.

Yet there was a potential ambiguity here depending how one parsed the claim. Was he suggesting that

[COVID and other viruses]

usually

like Autumn and Winter
orCOVID

[and other viruses usually]

like Autumn and Winter

This might have been clearer in a written text as either

COVID and other viruses usually like Autumn and WinterorCOVID, and other viruses usually, like Autumn and Winter

The second option may seem a little awkward in its phrasing, 1 but then not all viral diseases are more common in the Winter months, and some are considered to be due to 'Summer viruses':

"Adenovirus, human bocavirus (HBoV), parainfluenza virus (PIV), human metapneumovirus (hMPV), and rhinovirus can be detected throughout the year (all-year viruses). Seasonal patterns of PIV are type specific. Epidemics of PIV type 1 (PIV1) and PIV type 3 (PIV3) peak in the fall [Autumn] and spring-summer, respectively. The prevalence of some non-rhinovirus enteroviruses increases in summer (summer viruses)"


Moriyama, Hugentobler & Iwasaki, 2020: 86

Just a couple of days later Mr Javid was being interviewed on the radio, and he made a more limited claim:

Health Secretary Sajid Javid talking on BBC Radio 4's 'Today' programme, 15th September

"…because we know Autumn and Winter, your COVID is going to like that time of year"

Sajid Javid

So, this claim was just about the COVID virus, not viruses more generally, and that we know that COVID is going to like Autumn and Winter. No ambiguity there. But how do we know?

Coming to knowledge

Historically there have been various ways of obtaining knowledge.

  • Divine revelation: where God reveals the knowledge to someone, perhaps through appearing to the chosen one in a dream.
  • Consulting an oracle, or a prophet or some other kind of seer.
  • Intuiting the truth by reflecting on the nature of things using the rational power of the human intellect.
  • Empirical investigation of natural phenomena.

My focus in this blog is related to science, and given that we are talking about public health policy in modern Britain, I would like to think Mr Javid was basing his claim on the latter option. Of course, even empirical methods depend upon some metaphysical assumptions. For example, if one assumes the cosmos has inbuilt connections one might look for evidence in terms of sympathies or correspondences. Perhaps, if the COVID virus was observed closely and looked like a snowflake, that could (in this mindset) be taken as a sign that it liked Winter.

A snowflake – or is it a virus particle?
(Image by Gerd Altmann from Pixabay)

Sympathetic magic

This kind of correspondence, a connection indicated by appearance, was once widely accepted, so that a plant which was thought to resemble some part of the anatomy might be assumed to be an appropriate medicine for diseases or disorders associated with that part of the body.

This is a kind of magic, and might seem a 'primitive' belief to many people today, but such an idea was sensible enough in the context of a common set of underlying beliefs about the nature and purposes of the world, and the place and role of people in that world. One might expect that specific beliefs would soon die out if, for example, the plant shaped like an ear turned out to do nothing for ear ache. Yet, at a time when medical practitioners could offer little effective treatment, and being sent to a hospital was likely to reduce life expectancy, herbal remedies at least often (if not always) did no harm.

Moreover, many herbs do have medicinal properties, and something with a general systemic effect might work as topical medicine (i.e., when applied to a specific site of disease). Add to that, the human susceptibility to confirmation bias (taking more notice of, and giving more weight to, instances that meet our expectations than those which do not) and the placebo effect (where believing we are taking effective medication can sometimes in itself have beneficial effects) and the psychological support offered by spending time with an attentive practitioner with a good 'bedside' manner – and we can easily see how beliefs about treatments may survive limited definitive evidence of effectiveness.

The gold standard of experimental method

Of course, today, we have the means to test such medicines by taking a large representative sample of a population (of ear ache sufferers, or whatever), randomly dividing them into two groups, and using a double-blind (or should that be double-deaf) approach, treat them with the possible medicine or a placebo, without either the patient or the practitioner knowing who was getting which treatment. (The researchers have a way to know of course – or it would difficult to deduce anything from the results.) That is, the randomised control trial (RCT).

Now, I have been very critical of the notion that these kinds of randomised experimental designs should be automatically be seen as the preferred way of testing educational innovations (Taber, 2019) – but in situations where control of variables and 'blinding' is possible, and where randomisation can be applied to samples of well-defined populations, this does deserve to be considered the gold standard. (It is when the assumptions behind a research methodology do not apply that we should have reservations about using it as a strategy for enquiry.)

So can the RCT approach be used to find out if COVID has a preference for certain times of year? I guess this depends on our conceptual framework for the research (e.g., how do we understand what a 'like' actually is) and the theoretical perspective we adopt.

So, for example, behaviourists would suggest that it is not useful to investigate what is going on in someone's mind (perhaps some behaviorists do not even think the mind concept corresponds to anything real) so we should observe behaviours that allow us to make inferences. This has to be done with care. Someone who buys and eats lots of chocolate presumably likes chocolate, and someone who buys and listens to a lot of reggae probably likes reggae, but a person who cries regularly, or someone that stumbles around and has frequent falls, does not necessary like crying, or falling over, respectively.

A viral choice chamber

So, we might think that woodlice prefer damp conditions because we have put a large number of woodlice in choice chambers with different conditions (dry and light, dry and dark, damp and light, damp and dark) and found that there was a statistically significant excess of woodlice settling down in the damp sections of the chamber.

Of course, to infer preferences from behaviour – or even to use the term 'behaviour' – for some kinds of entity is questionable. (To think that woodlice make a choice based on what they 'like' might seem to assume a level of awareness that they perhaps lack?) In a cathode ray tube electrons subject to a magnetic field may be observed (indirectly!) to move to one side of the tube, just as woodlice might congregate in one chamber, but I am not sure I would describe this as electrons liking that part of the tube. I think it can be better explained with concepts such as electrical charge, fields, forces, and momentum.

It is difficult to see how we can do double blind trials to see which season a virus might like, as if the COVID virus really does like Winter, it must surely have a way of knowing when it is Winter (making blinding impossible). In any case, a choice chamber with different sections at different times of the year would require some kind of time portal installed between its sections.

Like electrons, but unlike woodlice, COVID viral particles do not have an active form of transport available to them. Rather, they tend to be sneezed and coughed around and then subject to the breeze, or deposited by contact with surfaces. So I am not sure that observing virus 'behaviour' helps here.

So perhaps a different methodology might be more sensible.

A viral opinion poll

A common approach to find out what people like would be a survey. Surveys can sometimes attract responses from large numbers of respondents, which may seem to give us confidence that they offer authentic accounts of widespread views. However, sample size is perhaps less important than sample representativeness. Imagine carrying out a survey of people's favourite football teams at a game at Stamford Bridge; or undertaking a survey of people's favourite bands as people queued to enter a King Crimson concert! The responses may [sic, almost certainly would] not fully reflect the wider population due to the likely bias in such samples. Would these surveys give reliable results which could be replicated if repeated at the Santiago Bernabeu or at a Marillion concert?

How do we know what 'COVID 'really likes?
(Original Images by OpenClipart-Vectors and Gordon Johnson from Pixabay)

A representative sample of vairants?

This might cause problems with the COVID-19 virus (SARS-CoV-2). What counts as a member of the population – perhaps a viable virus particle? Can we even know how big the population actually is at the time of our survey? The virus is infecting new cells, leading to new virus particles being produced all the time, just as shed particles become non-viable all the time. So we have no reliable knowledge of population numbers.

Moreover, a survey needs a representative sample: do the numbers of people in a sample of a human population reflect the wider population in relevant terms (be that age, gender, level of educational qualifications, earnings, etc.)? There are viral variants leading to COVID-19 infection – and quite a few of them. That is, SARS-CoV-2 is a class with various subgroups. The variants replicate to different extents under particular conditions, and new variants appear from time to time.

So, the population profile is changing rapidly. In recent months in the UK nearly all infections where the variant has been determined are due to the variant VOC-21APR-02 (or B.1.617.2 or Delta) but many people will be infected asymptotically or with mild symptoms and not be tested, and so this likely does not mean that VOC-21APR-02 dominates the SARS-CoV-2 population as a whole to the extent it currently dominates in investigated cases. Assuming otherwise would be like gauging public opinion from the views of those particular people who make themselves salient by attending a protest, e.g.:

"Shock finding – 98% of the population would like to abolish the nuclear arsenal,

according to a [hypothetical] survey taken at the recent Campaign for Nuclear Disarmament march"

In any case, surveys are often fairly blunt instruments as they need to present objectively the same questions to all respondents, and elicit responses in a format that can be readily classified into a discrete number of categories. This is why many questionnaires use Likert type items:

Would you say you like Autumn and Winter:

12345
AlwaysNearly alwaysUsuallySometimesNever

Such 'objective' measures are often considered to avoid the subjective nature of some other types of research. It may seem that responses do not need to be interpreted – but of course this assumes that the researchers and all the respondents understand language the same way (what exactly counts as Autumn and Winter? What does 'like' mean? How is 'usually' understood – 60-80% of the time, or 51-90% of the time or…). We can usually (sic) safely assume that those with strong language competence will have somewhat similar understandings of terms, but we cannot know precisely what survey participants meant by their responses or to what extent they share a meaning for 'usually'.

There are so-called 'qualitative surveys' which eschew this kind of objectivity to get more in-depth engagement with participants. They will usually use interviews where the researcher can establish rapport with respondents and ask them about their thoughts and feelings, observe non-verbal signals such as facial expressions and gestures, and use follow-up questions… However, the greater insight into individuals comes at a cost of smaller samples as these kinds of methods are more resource-intensive.

But perhaps Mr Javid does not actually mean that COVID likes Autumn and Winter?

So, how did the Department of Health & Social Care, or the Health Secretary's scientific advisors, find out that COVID (or the COVID virus) likes Autumn and Winter? The virus does not think, or feel, and it does not have preferences in the way we do. It does not perceive hot or cold, and it does not have a sense of time passing, or of the seasons.2 COVID does not like or dislike anything.

Mr Javid needs to make himself clear to a broad public audience, so he has to avoid too much technical jargon. It is not easy to pitch a presentation for such an audience and be pithy, accurate, and engaging, but it is easy for someone (such as me) to be critical when not having to face this challenge. Cabinet ministers, unlike science teachers, cannot be expected to have skills in communicating complex and abstract scientific ideas in simplified and accessible forms that remain authentic to the science.

It is easy and perhaps convenient to use anthropomorphic language to talk about the virus, and this will likely make the topic seem accessible to listeners, but it is less clear what is actually meant by a virus liking a certain time of year. In teaching the use of anthropomorphic language can be engaging, but it can also come to stand in place of scientific understanding when anthropomorphic statements are simply accepted uncritically at face value. For example, if the science teacher suggests "the atom wants a full shell of electrons" then we should not be surprised that students may think this is a scientific explanation, and that atoms do want to fill their shells. (They do not of course. 3)

Image by Gordon Johnson from Pixabay

Of course Mr Javid's statements cannot be taken as a literal claim about what the virus likes – my point in this posting is to provoke the question of what this might be intended to mean? This is surely intended metaphorically (at least if Mr Javid had thought about his claim critically): perhaps that there is higher incidence of infection or serious illness caused by the COVID virus in the Winter. But by that logic, I guess turkeys really would vote for Christmas (or Thanksgiving) after all.

Typically, some viruses cause more infection in the Winter when people are more likely to mix indoors and when buildings and transport are not well ventilated (both factors being addressed in public health measures and advice in regard to COVID-19). Perhaps 'likes' here simply means that the conditions associated with a higher frequency/population of virus particles occur in Autumn and Winter?

A snowflake.
The conditions suitable for a higher frequency of snowflakes are more common in Winter.
So do snowflakes also 'like' Winter?
(Image by Gerd Altmann from Pixabay)

However, this is some way from assigning 'likes' to the virus. After all, in evolutionary terms, a virus might 'prefer', so to speak, to only be transmitted asymptomatically, as it cannot be in the virus's 'interests', so to speak, to encourage a public health response that will lead to vaccines or measures to limit the mixing of people.

If COVID could like anything (and of course it cannot), I would suggest it would like to go 'under the radar' (another metaphor) and be endemic in a population that was not concerned about it (perhaps doing so little harm it is not even noticed, such that people do not change their behaviours). It would then only 'prefer' a Season to the extent that that time of year brings conditions which allow it to go about its life cycle without attracting attention – from Mr Javid or anyone else.

Keith S. Taber, September 2021

Addendum: 1st December 2021

Déjà vu?

The health secretary was interviewed on 1st December

"…we have always known that when it gets darker, it gets colder, the virus likes that, the flu virus likes that and we should not forget that's still lurking around as well…"

Rt. Hon. Sajid Javid MP, the U.K. Secretary of State for Health and Social Care, interviewed on BBC Radio 4 Today programme, 1st December, 2021
Works cited:
Footnotes:

1. It would also seem to be a generalisation based on the only two Winters that the COVID-19 virus had 'experienced'

2. Strictly I cannot know what it is like to be a virus particle. But a lot of well-established and strongly evidenced scientific principles would be challenged if a virus particle is sentient.

3. Yet this is a VERY common alternative conceptions among school children studying chemistry: The full outer shells explanatory principle

Related reading:

So who's not a clever little virus then?

COVID is like a fire because…

Anthropomorphism in public science discourse

Psychological skills, academic achievement and…swimming

Keith S. Taber

'Psychological Skills in Relation to Academic Achievement through Swimming Context'

 


Original image by Clker-Free-Vector-Images from Pixabay

I was intrigued by the title of an article I saw in a notification: "Psychological Skills in Relation to Academic Achievement through Swimming Context". In part, it was the 'swimming context' – despite never having been very athletic or sporty (which is not to say I did not enjoy sports, just that I was never particularly good at any), I have always been a regular and enthusiastic swimmer.  Not a good swimmer, mind (too splashy, too easily veering off-line) – but an enthusiastic one. But I was also intrigued by the triad of psychological skills, academic achievement, and swimming.

Perhaps I had visions of students' psychological skills being tested in relation to their academic achievement as they pounded up and down the pool. So, I was tempted to follow this up.

Investigating psychological skills and academic achievement

The abstract of the paper by Bayyat and colleagues reported three aims for their study:

"This study aimed to investigate:

  • (1) the level of psychological skills among students enrolled in swimming courses at the Physical Education faculties in the Jordanian Universities.
  • (2) the relation between their psychological skills and academic achievement.
  • (3) the differences in these psychological skills according to gender."

Bayyat et al., 2021: 4535

The article was published in a journal called 'Psychology and Education', which, its publishers*  suggest is "a quality journal devoted to basic research, theory, and techniques and arts of practice in the general field of psychology and education".

A peer reviewed journal

The peer review policy reports this is a double-blind peer-reviewed journal. This means other academics have critiqued and evaluated a submission prior to its being accepted for publication. Peer review is a necessary (but not sufficient) condition for high quality research journals.

Journals with high standards use expert peer reviewers, and the editors use their reports to both reject low-quality submissions, and to seek to improve high-quality submissions by providing feedback to authors about points that are not clear, any missing information, incomplete chains of argumentation, and so forth. In the best journals editors only accept submissions after reviewers' criticisms have been addressed to the satisfaction of reviewers (or authors have made persuasive arguments for why some criticism does not need addressing).

(Read about peer review)

The authors here report that

"The statistical analysis results revealed an average level of psychological skills, significant differences in psychological skills level in favor of female students, A students and JU[**], and significant positive relation between psychological skills and academic achievement."

Bayyat et al., 2021: 4535

Rewriting slightly it seems that, according to this study:

  • the students in the study had average levels of psychological skills;
  • the female students have higher levels of psychological skills than their male peers;
  • and that there was some kind of positive correlation between psychological
  • skills and academic achievement;

Anyone reading a research paper critically asks themselves questions such as

  • 'what do they mean by that?';
  • 'how did they measure that?;
  • 'how did they reach that conclusion?'; and
  • 'who does this apply to?'

Females are better – but can we generalise?

In this study it was reported that

"By comparing psychological skills between male and female participants, results revealed significant differences in favor [sic] of female participants"

"All psychological skills' dimensions of female participants were significant in favor [sic] of females compared to their male peers. They were more focused, concentrated, confident, motivated to achieve their goals, and sought to manage their stress."

Bayyat et al., 2021: 4541, 4545

"It's our superior psychological skills that make us this good!" (Image by CristianoCavina from Pixabay)

A pedant (such as the present author) might wonder if "psychological skills' dimensions of female participants" [cf. psychological skills' dimensions of male participants?] would not be inherently likely to be in favour of females , but it is clear from the paper that this is intended to refer to the finding that females (as a group) got significantly higher ratings than males (as a group) on the measures of 'psychological skills'.

If we for the moment (but please read on below…) accept these findings as valid, an obvious question is the extent to which these results might generalise beyond the study. That is, to what extent would these findings being true for the participants of this study imply the same thing would be found more widely (e.g., among all students in Jordanian Universities? among all university students? Among all adult Jordanians? among all humans?)

Statistical generalisation
Statistical generalisation (From Taber, 2019)

Two key concepts here are the population and the sample. The population is the group that we wish our study to be about (e.g., chemistry teachers in English schools, 11-year olds in New South Wales…), and the sample is the group who actually provide data. In order to generalise to the population from the sample it is important that the sample is large enough and representative of the population (which of course may be quite difficult to ascertain).

(Read about sampling in research)

(Read about generalisation)

In this study the reader is told that "The population of this study was undergraduate male and female students attending both intermediate and advanced swimming courses" (Bayyat et al., 2021: 4536). Taken at face value this might raise the question of why a sample was drawn exclusively from Jordan – unless of course this is the only national context where students attend intermediate or advanced swimming courses. *** However, it was immediately clarified that "They consisted of (n= 314) students enrolled at the schools of Sport Sciences at three state universities". That is, the population was actually undergraduate male and female students from schools of Sport Sciences at three Jordanian state universities attending both intermediate and advanced swimming courses.

"The Participants were an opportunity sample of 260 students" (Bayyat et al., 2021: 4536). So in terms of sample size, 260, the sample made up most of the population – almost 83%. This is in contrast to many educational studies where the samples may necessarily only reflect a small proportion of the population. In general, representatives of a sample is more important than size as skew in the sample undermines statistical generalisations (whereas size, for a representative sample, influences the magnitude of the likely error ****) – but a reader is likely to feel that when over four-fifths of the population were sampled it is less critical that a convenience sample was used.

This still does not ensure us that the results can be generalised to the population (students from schools of Sport Sciences at three Jordanian state universities attending 'both' intermediate and advanced swimming courses), but psychologically it seems quite convincing.

Ontology: What are we dealing with?

The study is only useful if it is about something that readers think is important – and it is clear what it is about. The authors tells us their study is about

  • Psychological Skills
  • Academic Achievement

which would seem to be things educators should be interested in. We do need to know however how the authors understand these constructs: what do they mean by 'a Psychological Skill' and 'Academic achievement'? Most people would probably think they have a pretty good idea what these terms might mean, but that is no assurance at all that different people would agree on this.

So, in reading this paper it is important to know what the authors themselves mean by these terms – so a reader can check they understand these terms in a sufficiently similar way.

What is academic achievement?

The authors suggest that

"academic achievement reflects the learner's accomplishment of specific goals by the end of an educational experience in a determined amount of time"

Bayyat et al., 2021: 4535

Bayyat et al., 2021: 4535

This seems to be  the extent of the general characterisation of this construct in the paper *****.

What are psychological skills?

The authors tell readers that

"Psychological skills (PS) are a group of skills and abilities that enhances peoples' performance and achievement…[It has been] suggested that PS includes a whole set of trainable skills including emotional control and self-confidence"

Bayyat et al., 2021: 4535

Bayyat et al., 2021: 4535

For the purposes of this particular study, they

"identified the psychological skills related to the swimming context such as; leadership, emotional stability, sport achievement motivation, self-confidence, stress management, and attention"

Bayyat et al., 2021: 4536

Bayyat et al., 2021: 4536

So the relevant skills are considered to be:

  • leadership
  • emotional stability
  • sport achievement motivation
  • self-confidence
  • stress management
  • attention

I suspect that there would not be complete consensus among psychologists or people working in education over whether all of these constructs actually are 'skills'. Someone who did not consider these (or some of these) characteristics as skills would need to read the authors' claims arising from the study about 'psychological skills' accordingly (i.e., perhaps as being about something other than skills) but as the authors have been clear about their use of the term, this should not confuse or mislead readers.

Epistemology: How do we know?

Having established what is meant by 'psychological skills' and 'academic achievement' a reader would want to know how these were measured in the present study – do the authors use techniques that allow them to obtain valid and reliable measures of 'psychological skills' and 'academic achievement'?

How is academic achievement measured?

The authors inform readers that

"To calculate students' academic achievement, the instructors of the swimming courses conducted a valid and reliable assessment as a pre-midterm, midterm, and final exam throughout the semester…The assessment included performance tests and theoretical tests (paper and pencil tests) for each level"

Bayyat et al., 2021: 4538

Bayyat et al., 2021: 4538

Although the authors claim their assessment are valid and reliable, a careful reader will note that the methodology here does not match the definition of "accomplishment of specific goals by the end of an educational experience" (emphasis added)- as only the final examinations took place at the end of the programme. On that point, then, there is a lack of internal consistency in the study. This might not matter to a reader who did not think academic achievement needed to be measured at the end of a course of study.

Information on the "Academic achievement assessment tool", comprising six examinations (pre-midterm, midterm, and final examinations at each of the intermediate and advanced levels) is included as an appendix – good practice that allows a reader to interrogate the instrument.

Although this appendix is somewhat vague on precise details, it offers a surprise to someone (i.e., me) with a traditional notion of what is meant by 'academic achievement' – as both theory and practical aspects are included. Indeed, most  of the marks seem to be given for practical swimming proficiency. So, the 'Intermediate swimming Pre-midterm exam' has a maximum of 20 marks available – with breast stroke leg technique and arm technique each scored out of ten marks.

The 'Advanced swimming midterm exam' is marked out of 30, with 10 marks each available for the 200m crawl (female), individual medley (female) and life guarding techniques. This seems to suggest that 20 of the 30 marks available can only be obtained by being female, but this point does not seem to be clarified. Presumably (?) male students had a different task that the authors considered equivalent.

How are psychological skills measured?

In order to measure psychological skills the authors proceeded to "to develop and validate a questionnaire" (p.4536). Designing a new instrument is a complex and challenging affair. The authors report how they

"generated a 40 items-questionnaire reflecting the psychological skills previously mentioned [leadership, emotional stability, sport achievement motivation, self-confidence, stress management, and attention] by applying both deductive and inductive methods. The items were clear, understandable, reflect the real-life experience of the study population, and not too long in structure."

Bayyat et al., 2021: 4538

So, items were written which it was thought would reflect the focal skills of interest. (Unfortunately there are no details of what the authors mean by "applying both deductive and inductive methods" to generate the items.) Validity was assured by asking a panel of people considered to have expertise to critique the items:

"the scale was reviewed and assessed by eight qualified expert judges from different related fields (sport psychology, swimming, teaching methodology, scientific research methodology, and kinesiology). They were asked to give their opinion of content representation of the suggested PS [psychological skills], their relatedness, clarity, and structure of items. According to the judges' reviews, we omitted both leadership and emotional stability domains, in addition to several items throughout the questionnaire. Other items were rephrased, and some items were added. Again, the scale was reviewed by four judges, who agreed on 80% of the items."

So, construct validity was a kind of face validity, in that people considered to be experts thought the final set of items would elicit the constructs intended, but there was no attempt to see if responses correlated in any way with any actual measurements of the 'skills'.

Readers of the paper wondering if they should be convinced by the study would need to judge if the expert panel had the right specialisms to evaluate scale items for 'psychological skills',and might find some of the areas of expertise (i.e.,

  • sport psychology
  • swimming
  • teaching methodology
  • scientific research methodology
  • kinesiology)

more relevant than others:

Self-reports

If respondents responded honestly, their responses would have reflected their own estimates of their 'skills' – at least to the extent that their interpretation of the items matched that of the experts. (That is, there was no attempt to investigate how members of the population of interest would understand what was meant by the items.)

Here are some examples of the items in the instrument:

Construct ('psychological skill')Example item

self-confidence

I manage my time effectively while in class

sports motivation achievement

I do my best to control everything
related to swimming lessons.

attention

I can pay attention and focus on different places in the pool while carrying out swimming tasks

stress-management

I am not afraid to perform any difficult swimming skill, no matter what
Examples of statements students were asked to rate in order to measure their 'psychological skills' (source: Bayyat et al., 2021: 4539-4541)

Analysis of data

The authors report various analyses of their data, that lead to the conclusions they reach. If a critical reader was convinced about matters so far, they would still need to beleive that the analyses undertaken were

  • appropriate, and
  • completed competently, and
  • correctly interpreted.

Drawing conclusions

However, as a reader I personally would have too many quibbles with the conceptualisation and design of instrumentation to consider the analysis in much detail.

To my mind, at least, the measure of 'academic achievement' seems to be largely an evaluation of swimming skills. They are obviously important in a swimming course, but I do not consider this a valid measure of academic achievement. That is not a question of suggesting academic achievement is better or more important than practical or athletic achievements, but it is surely something different (akin to me claiming to have excellent sporting achievement on the basis of holding a PhD in education).

The measure of psychological skills does not convince me either. I am not sure some of the focal constructs can really be called 'skills' (self-confidence? motivation?), but even if they were, there is no attempt to directly measure skill. At best, the questionnaire offers self-reports of how students perceive (or perhaps wish to be seen as perceiving) their characteristics.

It is quite common in research to see the term 'questionnaire' used for an instrument that is intended to test knowledge or skill – but questionnaires are not the right kind of instrument for that job.

(Read about questionnaires)

Significant positive relation between psychological skills and academic achievement?

So, I do not think this methodology would allow anyone to find a "significant positive relation between psychological skills and academic achievement" – only a relationship between students self-ratings on some psychological characteristics and swimming achievement. (That may reflect an interesting research question, and could perhaps be a suitable basis for a study, but is not what this study claims to be about.)

Significant differences in psychological skills level in favor of female students?

In a similar way, although it is interesting that females tended to score higher on the questionnaire scales, this shows they had higher self-ratings on average, and tells us nothing about their actual skills.

It may be that the students have great insight into these constructs and their own characteristics and so make very accurate ratings on these scales – but with absolutely no evidential basis for thinking this there are no grounds for making such a claim.

An alternative interpretation of the results is that on average the male students under-rate their 'skills' compared to their female peers. That is the 'skills' could be much the same across gender, but there might be a gender-based difference in perception. (I am not suggesting that is the case, but the evidence presented in the paper can be explained just as well by that possibility.)

An average level of psychological skills?

Finally, we might ask what is meant by

"The statistical analysis results revealed an average level of psychological skills…"

"Results of this study revealed that the participants acquired all four psychological skills at a moderate level."

Bayyat et al., 2021: 4535, 4545

Even leaving aside that what is being measured is something other than psychological skills, it is hard to see how these statements can be justified. This was the first administration of a new instrument being applied to a sample of a very specific population.


Image by Igor Drondin from Pixabay

The paper reports standard deviations for the ratings on the items in the questionnaire, so – as would be expected – there were distributions of results: spreads with different students giving different ratings. Within the sample tested, some of the students will have given higher than median ratings on an item, some will have given lower than median ratings – although on average the ratings for that particular item would have been – average for this sample (that is, by definition!) So, assuming this claim (of average/moderate levels of psychological skills) was not meant as a tautology, the authors might seem to be suggesting that the ratings given on this administration of the instrument align with what would be typically obtained, that is from across other administrations.

That is, the authors seem to be suggesting that the ratings given on this administration of the instrument align with what they expect would be typically obtained from across other administrations. Of course they have absolutely no way of knowing that is the case without collecting data from samples of other populations.

What the authors actually seem to be basing these claims (of average/moderate levels of psychological skills) on is that the average responses on these scales did not give a very high or very low rating in terms of the raw scale. Yet, with absolutely no reference data for how other groups of people might respond on the same instrument that offers little useful information. At best, it suggests something welcome about the instrument itself (ideally one would wish items to elicit a spread of responses, rather than having most responses rated very high or very low) but nothing about the students sampled.

On this point the authors seem to be treating the scale as calibrated in terms of some nominal standard (e.g. 'a rating of 3-4 would be the norm'), when there is no inherent interpretation of particular ratings of items in such a scale that can just be assumed – rather this would be a matter that would need to be explored empirically.

The research paper as an argument

The research paper is a very specific genre of writing. It is an argument for new knowledge claims. The conclusions of the paper rest on a chain of argument that starts with the conceptualisation of the study and moves through research design, data collection, analysis, and interpretation. As a reader, any link in the argument chain that is not convincing potentially invalidates the knowledge claim(s) being made. Thus the standards expected for research papers are very high.


Research writing

In sum then, this was an intriguing study, but did not convince me (even if it apparently convinced the peer reviewers and editor of Psychology and Education). I am not sure it was really about psychological skills or, academic achievement

…but at least it was clearly set in the context of swimming.


Work cited:

Bayyat, M. M., Orabi, S. M., Al-Tarawneh, A. D., Alleimon, S. M., & Abaza, S. N. (2021). Psychological Skills in Relation to Academic Achievement through Swimming Context. Psychology and Education, 58(5), 4535-4551.

Taber, K. S. (2019). Experimental research into teaching innovations: responding to methodological and ethical challenges. Studies in Science Education, 55(1), 69-119. doi:10.1080/03057267.2019.1658058 [Download manuscript version]


* Despite searching the different sections of the journal site, I was unable to find who publishes the journal. However, searching outside the site I found a record of the publisher of this journal being 'Auricle Technologies, Pvt., Ltd'.

** It transpired later in the paper that 'JU' referred to students at the University of Jordan: one of three universities involved in the study.

*** I think literally this means those who participated in the study were students attending both an intermediate swimming course and an advanced swimming course – but I read this to mean those who participated in the study were students attending either an intermediate or advanced swimming course. This latter interpretation is consistent with information given elsewhere in the paper: "All schools of sports sciences at the universities of Jordan offer mandatory, reliable, and valid swimming programs. Students enroll in one of three swimming courses consequently: the basic, intermediate, and advanced levels". (Bayyat et al., 2021: 4535, emphasis added)

**** That is, if the sample is unrepresentative of the population, there is no way to know how biased the sample might be. However, if there is a representative sample, then although there will still likely be some error (the results for the sample will not be precisely what the results across the whole population would be) it is possible to calculate the likely size of this error (e.g., say ±3%) which will be smaller when a higher proportion of the population are sampled.

***** It is possible some text that was intended to be at this point has gone missing during production – as, oddly, the following sentence is

facilisi, veritus invidunt ei mea (Times New Roman, 10)

Bayyat et al., 2021: 4535

which seems to be an accidental retention of text from the journal's paper template.