POEsing assessment questions…

…but not fattening the cow


Keith S. Taber


A well-known Palestinian proverb reminds us that we do not fatten the cow simply by repeatedly weighing it. But, sadly, teachers and others working in education commonly get so fixated on assessment that it seems to become an end in itself.


Images by Clker-Free-Vector-Images from PixabayOpenClipart-Vectors and Deedster from Pixabay

A research study using P-O-E

I was reading a report of a study that adopted the predict-observe-explain, P-O-E, technique as a means to elicit "high school students' conceptions about acids and bases" (Kala, Yaman & Ayas, 2013, p.555). As the name suggests, P-O-E asks learners to make a prediction before observing some phenomenon, and then to explain their observations (something that can be specially valuable when the predictions are based on strongly held intuitions which are contrary to what actually happens).

Read about Predict-Observe-Explain


The article on the publisher website

Kala and colleagues begin the introduction to their paper by stating that

"In any teaching or learning approach enlightened by constructivism, it is important to infer the students' ideas of what is already known"

Kala, Yaman & Ayas, 2013, p.555
Constructivism?

Constructivism is a perspective on learning that is informed by research into how people learn and a great many studies into student thinking and learning in science. A key point is how a learner's current knowledge and understanding influences how they make sense of teaching and what they go on to learn. Research shows it is very common for students to have 'alternative conceptions' of science topics, and often these conceptions either survive teaching or distort how it is understood.

The key point is that teachers who teach the science without regard to student thinking will often find that students retain their alternative ways of thinking, so constructivist teaching is teaching that takes into account and responds to the ideas about science topics that students bring to class.

Read about constructivism

Read about constructivist pedagogy

Assessment: summative, formative and diagnostic

If teachers are to take into account, engage with, and try to reshape, learners ideas about science topics, then they need to know what those ideas are. Now there is a vast literature reporting alternative conceptions in a wide range of science topics, spread across thousands or research reports – but no teacher could possibly find time to study them all. There are books which discuss many examples and highlight some of the most common alternative conceptions (including one of my own, Taber, 2014)



However, in any class studying some particular topic there will nearly always be a spread of different alternative conceptions across the students – including some so idiosyncratic that they have never been reported in any literature. So, although reading about common misconceptions is certainly useful to prime teachers for what to look out for, teachers need to undertake diagnostic assessment to find out about the thinking of their own particular students.

There are many resources available to support teachers in diagnostic assessment, and some activities (such as using concept cartoons) that are especially useful at revealing student thinking.

Read about diagnostic assessment

Diagnostic assessment, assessment to inform teaching, is carried out at the start of a topic, before the teaching, to allow teachers to judge the learners' starting points and any alternative conceptions ('misconceptions') they may have. It can therefore be considered aligned to formative assessment ('assessment for learning') which is carried out as part of the learning process, rather than summative assessment (assessment of leaning) which is used after studying to check, score, grade and certify learning.

P-O-E as a learning activity…

P-O-E can best support learning in topics where it is known learners tend to have strongly held, but unhelpful, intuitions. The predict stage elicits students' expectations – which, when contrary to the scientific account, can be confounded by the observe step. The 'cognitive conflict' generated by seeing something unexpected (made more salient by having been asked to make a formal prediction) is thought to help students concentrate on that actual phenomena, and to provide 'epistemic relevance' (Taber, 2015).

Epistemic relevance refers to the idea that students are learning about things they are actually curious about, whereas for many students following a conventional science course must be experienced as being presented with the answers to a seemingly never-ending series questions that had never occurred to them in the first place.

Read about the Predict-Observe-Explain technique

Students are asked to provide an explanation for what they have observed which requires deeper engagement than just recording an observation. Developing explanations is a core scientific practice (and one which is needed before another core scientific practice – testing explanations – is possible).

Read about teaching about scientific explanations

To be most effective, P-O-E is carried out in small groups, as this encourages the sharing, challenging and justifying of ideas: the kind of dialogic activity thought to be powerful in supporting learners in developing their thinking, as well as practicing their skills in scientific argumentation. As part of dialogic teaching such an open-forum for learners' ideas is not an end in itself, but a preparatory stage for the teacher to marshal the different contributions and develop a convincing argument for how the best account of the phenomenon is the scientific account reflected in the curriculum.

Constructivist teaching is informed by learners' ideas, and therefore relies on their elicitation, but that elicitation is never the end in itself but is a precursor to a customised presentation of the canonical account.

Read about dialogic teaching and learning

…and as a diagnostic activity

Group work also has another function – if the activity is intended to support diagnostic assessment, then the teacher can move around the room listening in to the various discussions and so collecting valuable information on what students think and understand. When assessment is intended to inform teaching it does not need to be about students completing tests and teachers marking them – a key principle of formative assessment is that it occurs as a natural part of the teaching process. It can be based on productive learning activities, and does not need marks or grades – indeed as the point is to help students move on in their thinking, any kind of formal grading whilst learning is in progress would be inappropriate as well as a misuse of teacher time.

Probing students' understandings about acid-base chemistry

The constructivist model of learning applies to us all: students, teachers, professors, researchers. Given what I have written above about P-O-E, about diagnostic assessment, and dialogic approaches to learning, I approached Kala and colleagues' paper with expectations about how they would have carried out their project.

These authors do report that they were able to diagnose aspects of student thinking about acids and bases, and found some learning difficulties and alternative conceptions,

"it was observed that eight of the 27 students had the idea that the "pH of strong acids is the lowest every time," while two of the 27 students had the idea that "strong acids have a high pH." Furthermore, four of the 27 students wrote the idea that the "substance is strong to the extent to which it is burning," while one of the 27 students mentioned the idea that "different acids which have equal concentration have equal pH."

Kala, Yaman & Ayas, 2013, pp.562-3

The key feature seems to be that, as reported in previous research, students conflate acid concentration and acid strength (when it is possible to have a high concentration solution of a weak acid or a very dilute solution of a strong acid).

Yet some aspects of this study seemed out of alignment with the use of P-O-E.

The best research style?

One feature was the adoption of a positivistic approach to the analysis,

Although there has been no reported analyzing procedure for the POE, in this study, a different [sic] analyzing approach was offered taking into account students' level of understanding… Data gathered from the written responses to the POE tasks were analyzed and divided into six groups. In this context, while students' prediction were divided into two categories as being correct or wrong, reasons for predictions were divided into three categories as being correct, partially correct, or wrong.

Kala, Yaman & Ayas, 2013, pp.560


GroupPredictionReasons
correctcorrect
correctpartially correct
correctwrong
wrongcorrect
wrongpartially correct
wrongwrong
"the written responses to the POE tasks were analyzed and divided into six groups"

There is nothing inherently wrong with doing this, but it aligns the research with an approach that seems at odds with the thinking behind constructivist studies that are intended to interpret a learner's thinking in its own terms, rather than simply compare it with some standard. (I have explored this issue in some detail in a comparison of two research studies into students' conceptions of forces – see Taber, 2013, pp.58-66.)

In terms of research methodology we might say it seem to be conceptualised within the 'wrong' paradigm for this kind of work. It seems positivist (assuming data can be unambiguously fitted into clear categories), nomothetic (tied to 'norms' and canonical answers) and confirmatory (testing thinking as matching model responses or not), rather than interpretivist (seeking to understand student thinking in its own terms rather than just classifying it as right or wrong), idiographic (acknowledging that every learner's thinking is to some extent unique to them) and discovery (exploring nuances and sophistication, rather than simply deciding if something is acceptable or not).

Read about paradigms in educational research

The approach used seemed more suitable for investigating something in the science laboratory, than the complex, interactive, contextualised, and ongoing life of classroom teaching. Kala and colleagues describe their methodology as case study,

"The present study used a case study because it enables the giving of permission to make a searching investigation of an event, a fact, a situation, and an individual or a group…"

Kala, Yaman & Ayas, 2013, pp.558
A case study?

Case study is a naturalistc methodology (rather than involving an intervention, such as an experiment), and is idiographic, reflecting the value of studying the individual case. The case is one from among many instances of its kind (one lesson, one school, one examination paper, etc.), and is considered as a somewhat self contained entity yet one that is embedded in a context in which it is to some extent entangled (for example, what happens in a particular lesson is inevitably somewhat influenced by

  • the earlier sequence of lessons that teacher taught that class {the history of that teacher with that class},
  • the lessons the teacher and student came from immediately before this focal lesson,
  • the school in which it takes place,
  • the curriculum set out to be followed…)

Although a lesson can be understood as a bounded case (taking place in a particular room over a particular period of time involving a specified group of people) it cannot be isolated from the embedding context.

Read about case study methodology


Case study – study of one instance from among many


As case study is idiographic, and does not attempt to offer direct generalisation to other situations beyond that case, a case study should be reported with 'thick description' so a reader has a good mental image of the case (and can think about what makes it special – and so what makes it similar to, or different from, other instances the reader may be interested in). But that is lacking in Kala and colleagues' study, as they only tell readers,

"The sample in the present study consisted of 27 high school students who were enrolled in the science and mathematics track in an Anatolian high school in Trabzon, Turkey. The selected sample first studied the acid and base subject in the middle school (grades 6 – 8) in the eighth year. Later, the acid and base topic was studied in high school. The present study was implemented, based on the sample that completed the normal instruction on the acid and base topic."

Kala, Yaman & Ayas, 2013, pp.558-559

The reference to a sample can be understood as something of a 'reveal' of their natural sympathies – 'sample' is the language of positivist studies that assume a suitably chosen sample reflects a wider population of interest. In case study, a single case is selected and described rather than a population sampled. A reader is left to rather guess what population being sampled here, and indeed precisely what the 'case' is.

Clearly, Kala and colleagues elicited some useful information that could inform teaching, but I sensed that their approach would not have made optimal use of a learning activity (P-O-E) that can give insight into the richness, and, sometimes, subtlety of different students' ideas.

Individual work

Even more surprising was the researchers' choice to ask students to work individually without group discussion.

"The treatment was carried out individually with the sample by using worksheets."

Kala, Yaman & Ayas, 2013, p.559

This is a choice which would surely have compromised the potential of the teaching approach to allow learners to explore, and reveal, their thinking?

I wondered why the researchers had made this choice. As they were undertaking research, perhaps they thought it was a better way to collect data that they could readily analyse – but that seems to be choosing limited data that can be easily characterised over the richer data that engagement in dialogue would surely reveal?

Assessment habits

All became clear near the end of the study when, in the final paragraph, the reader is told,

"In the present study, the data collection instruments were used as an assessment method because the study was done at the end of the instruction/ [sic] on the acid and base topics."

Kala, Yaman & Ayas, 2013, p.571

So, it appears that the P-O-E activity, which is an effective way of generating the kind of rich but complex data that helps a teacher hone their teaching for a particular group, was being adopted, instead, as means of a summative assessment. This is presumably why the analysis focused on the degree of match to the canonical science, rather than engaging in interpreting the different ways of thinking in the class. Again presumably, this is why the highly valuable group aspect of the approach was dropped in favour of individual working – summative assessment needs to not only grade against norms, but do this on the basis of each individual's unaided work.

An activity which offers great potential for formative assessment (as it is a learning activity as well as a way of exploring student thinking); and that offers an authentic reflection of scientific practice (where ideas are presented, challenged, justified, and developed in response to criticism); and that is generally enjoyed by students because it is interactive and the predictions are 'low stakes' making for a fun learning session, was here re-purposed to be a means of assessing individual students once their study of a topic was completed.

Kala and colleagues certainly did identify some learning difficulties and alternative conceptions this way, and this allowed them to evaluate student learning. But I cannot help thinking an opportunity was lost here to explore how P-O-E can be used in a formative assessment mode to inform teaching:

  • diagnostic assessment as formative assessment can inform more effective teaching
  • diagnostic assessment as summative assessment only shows where teaching has failed

Yes, I agree that "in any teaching or learning approach enlightened by constructivism, it is important to infer the students' ideas of what is already known", but the point of that is to inform the teaching and so support student learning. What were Kala and colleagues going to do with their inferences about students ideas when they used the technique as "an assessment method … at the end of the instruction".

As the Palestinian adage goes, you do not fatten up the cow by weighing it, just as you do not facilitate learning simply by testing students. To mix my farmyard allusions, this seems to be a study of closing the barn door after the horse has already bolted.


Work cited

Acute abstracts correcting Copernicus

Setting the history of science right


Keith S. Taber


I recently read a book of essays by Edward Rosen (1995) who (as described by his publisher) was "the editor and translator of Copernicus' complete works, was the leading authority on this most celebrated of Renaissance scientists". Copernicus is indeed, rightly, highly celebrated (for reasons I summarise below *).

The book was edited by Rosen's collaborator, Erna Hilfstein 1, and although the book was an anthology of reprinted journal articles, none of the chapters (articles) had abstracts. This reflects different disciplinary norms. In the natural and social sciences most journals require abstracts – and some even offer a menu of what should be included – but abstracts are not always expected in humanities disciplines.

Read about the abstract in academic articles

A collection of published papers from various journals – all lacking abstracts

It is not unusual for an academic book to be a compilation of published articles – especially when anthologising a single scholar's work. I was a little surprised to find the different chapters in the same book having different formats and typefaces – it had been decided to reproduce the articles as they had originally appeared in a range of journals (perhaps for authenticity – or perhaps to avoid the costs of new typesetting?)

But it was the absence of article abstracts that most felt odd. The potential reader is given a title, but otherwise little idea of the scope of an article before reading. Perhaps it was my awareness of this 'omission' that led me to thinking that for a number of the chapters it would be possible to offer a very minimal abstract (an acute abstract?) that would do the job! Certainly, for some of these chapters, I thought a sentence each might do.

That is not to dismiss the scholarship that has gone into developing the arguments, but Rosen often wrote on a very specific historical point, set out pertinent ideas from previous scholarship, and then argued for a clear position contrary to some earlier scholars.

So, here are my suggestions for 'acute' abstracts

Six summary encapsulations

Chapter 6: on the priest question

Abstract:

Copernicus has often been described as a priest, but Copernicus was never ordained a priest.

Copernicus was a canon in the Roman Catholic church, but this made him an administrator (and he also acted as physician), but he never became a monk or a priest.


Chapter 7: on the notary question

Abstract:

Copernicus has been described as a 'happy notary' but Copernicus was not a notary.

Although Copernicus had various roles as an administrator, even as something of a diplomat, he never took on the role of a legal notary.


Chapter 8: on the disdain question

Abstract:

Copernicus is sometimes said to have had a dismissive attitude to the common people, but there is no evidence that this was so

A comment of Copernicus on not being concerned with the views of certain philosophers seems to have been misinterpreted.


Chapter 11: on the axioms question

Abstract:

It has been claimed that Copernicus misused the term axioms in his work, but his use was perfectly in line with authorities

Today axioms are usually expected to be the self-evident starting points for developing a deductive argument, but Aristotle's definition of axioms did not require them to seem self-evident.


Chapter 16: on the papal question

Abstract:

It has been claimed that Copernicus' 'Revolutions' was approved by the pope before publication, but the manuscript was never shown to the pope

This seems to be a confusion regarding an anecdote concerning a completely different scholar.


Chapter 17: on the Calvin question

Abstract:

It has been suggested that Calvin was highly critical of Copernicus, but it seems unlikely Calvin had ever heard of him

While Calvin's writing strongly suggest he was committed to a stationary earth and a sun that moved around the earth, there is no evidence he had specifically come across Copernicus.


A manifold chapter

Having noticed how so many of Rosen's articles took one claim or historically contentious idea and developed it in the light of various sources to come to a position, I was a little surprised when I reached Chapter 20, 'Galileo's misstatements about Copernicus', to find that Rosen was dealing with 5 distinct (if related) points at once – several of which he had elsewhere made the unitary focus of an article.

Rather than write my own abstract, I could here suggest a couplet of sentences from the text might have done the job,

"According to Galileo, (1) Copernicus was a priest; (2) he was called to Rome; (3) he wrote the Revolutions by order of the pope; (4) his book was never adversely criticised; (5) it was the basis of the Gregorian calendar. Actually, Copernicus was not a priest; he was not called to Rome; he did not write the Revolutions by order of the pope; the book received much adverse criticism, particularly on the ground that it contradicted the Bible; it was not the basis of the Gregorian calendar."

Rosen, 1958/1995, pp.203-204

I noticed that this was the earliest of Rosen's writings that had been included in the compilation – perhaps he had decided to dispense his ideas more sparingly after this paper?

Actually, there's a lot to be said for abstracts that pithily précise the key point of an article, a kind of tag-line perhaps, acting for a reader as an aide-mémoire (useful at least for readers like me who commonly stare at rows of books thinking 'I read something interesting about this, somewhere here…'). I have also read a lot of abstracts in research journals that would benefit from their own (further) abstracts, so perhaps such acute abstraction might catch on?


* Appendix: A scientific giant

Copernicus is indeed 'celebrated', being seen as one of the scientific greats who helped establish modern ways of thinking about the world – part of what is often perceived as a chain that goes Copernicus – Kelper – Galileo – Newton.

Copernicus is most famous for his book known in English as 'On the Revolutions of the Heavenly Spheres', or just 'Revolutions'. The key point of note is that at a time when it was almost universally agreed that the earth was stationary at the centre of 'the world', i.e., the cosmos, and that everything else revolved around the earth, Copernicus proposed a system that put the sun at the centre and had the earth moving around the sun.


The geocentric model of the cosmos was widely accepted for many centuries
(Image by OpenClipart-Vectors from Pixabay)

From our modern worldview, it is difficult to imagine just how, well yes, revolutionary, that move was (even if Copernicus only moved the centre of the universe from earth to the sun, so our solar system still had a very special status in his system). This is clear from how long it took the new view to become the accepted position, and the opposition it attracted. Newton later realised that strictly the centre of revolution was the centre of mass of the solar system not the sun per se. 2

One problem was that there was no absolute observational test to distinguish between the two models and there were well-established reasons to accept the conventional geocentric model (e.g., we do not feel the earth move, or a great wind as it spins beneath its atmosphere; as the most dense element earth would naturally fall to the centre of the world, beneath water, air, fire, and the ether that filled the heavens {although the Earth was not considered a pure form of the element earth, it was earthy, considered mostly earth in composition 3}; and scriptures, if given a literal interpretation, seemed to suggest the earth was fixed and the sun moved.)

Copernicus' model certainly had some advantages. If the earth is still, the distant sphere with all the fixed stars must be moving about it at an incredible rate of rotation. But if the earth spun on its axis, this stellar motion was just an illusion. 4 Moreover, if everything revolves around the earth, some of the planets behave very oddly, first moving one way, then slowing down to reverse direction ('retrograde' motion), before again heading off in their original sense. But, if the planets are orbiting the sun along with the earth (now itself seen as a planet) but at different rates then this motion can be explained as an optical illusion – "these phenomena…happen on account of the single motion of the earth" – the planets only seem to loop because of the motion of the earth.

Despite this clear improvement, Copernicus model did not entirely simplify the system as Copernicus retained the consensus view that the planets moved in circles: the planets' "motions are circular or compounded of several circles,…since only the circle can bring back the past". With such an assumption the observational data can only be made to fit (either to the heliocentric model or its geocentric alternative) by having a complex series of circles rather than one circle per planet. Today when we call the night sky 'the heavens', we are using the term without implying any supernatural association – but the space beyond the moon was once literally considered as heaven. In heaven everything is perfect, and the perfect shape is a circle.

It was only when Kepler later struggled to match the best observational data available (from his employer Tycho Brahe's observatory) to the Copernican model that, after a number of false starts, he decided to see if ellipses would fit – and he discovered how the system could be described in terms of planets each following a single elliptical path that almost repeated indefinitely.

A well-known story is how by the time Copernicus had finished his work and decided to get it printed he was near the end of his life, and he was supposedly only shown a printed copy brought from the printer as he lay on his deathbed (in 1543). In the printed copy of the book an anonymous foreword/preface 5 had been inserted to the effect that readers should consider the model proposed as a useful calculating system for following the paths of heavenly bodies, and not as a proposal for how the world actually was.

Despite this, the book was later added to the Roman Catholic Church's index of banned works awaiting correction. This only occurred much later – in 1616, after Galileo taught that Copernicus' system did describe the actual 'world system'. But, in the text itself Copernicus is clear that he is suggesting a model for how the world is – "to the best of my ability I have discussed the earth's revolution around the sun" – not just a scheme for calculating purposes. Indeed, he goes as far to suggest that where he uses language implying the sun moves this is only to be taken as adopting the everyday way of talking reflecting appearances (we say 'the sun rises'). For Copernicus, it was the earth, not the sun, that moved.


Sources cited:
  • Copernicus, N. (1543/1978). On the Revolutions of the Heavenly Spheres (E. Rosen, Trans.). Prometheus Books.
  • Rosen, E. (1995). Copernicus and his successors (E. Hilfstein, Ed.). The Hambledon Press.

Notes

1 I discovered from some 'internet research' (i.e., Googling) that Erna was a holocaust survivor, "[husband] Max and Erna, along with their families, were sent first to Płaszów, a slave-labor camp, and then on a death march to Auschwitz".

An article in the Jewish Standard reports how Erna's daughter undertook a charity bike ride "from Auschwitz-Birkenau, the Nazi-run death camp in the verdant Polish countryside, to the" Jewish Community Centre of Krakow (the town where her parents lived before being deported by the Nazis).


2 Newton also wrote as if the solar system was the centre of the cosmos, but of course the solar system is itself moving around the galaxy, which is moving away from most other galaxies…


3 These are not the chemical elements recognised today, of course, but were considered the elements for many centuries. Even today, people sometimes refer to the air and water as 'the elements.'


4 Traditionally, the 'heavenly spheres' were not the bodies such as planets, moons and stars but a set of eight conjectured concentric crystalline spheres that supposedly rotated around the earth carrying the distant stars, Saturn, Jupiter, Mars, the Sun, the Moon, Venus and Mercury.


5 A preface is written by the author of a book. A foreword is written by someone else for the author (perhaps saying how wonderful the author and the work are). Technically then this was a foreword, BUT because it was not signed, it would appear to be a preface – something written by Copernicus himself. Perhaps the foreword did actually protect the book from being banned as, until Galileo made it a matter of very public debate, it is likely only other astronomers had actually scrutinised the long and very technical text in any detail!

Counting both the bright and the very dim

What is 1% of a very large, unknown, number?


Keith S. Taber


1, skip 99; 2, skip 99; 3, skip 99; 4,… skip 99, 1 000 000 000!
(Image by FelixMittermeier from Pixabay)

How can we count the number of stars in the galaxy?

On the BBC radio programme 'More or Less' it was mooted that there might be one hundred billion (100 000 000 000) stars in our own Milky Way Galaxy (and that this might be a considerable underestimate).

The estimate was suggested by Prof. Catherine Heymans who is
the Astronomer Royal for Scotland and Professor of Astrophysics at the University of Edinburgh.

Programme presenter Tim Harford was tackling a question sent in by a young listener (who is very almost four years of age) about whether there are more bees in the world than stars in the galaxy? (Spoiler alert: Prof. Catherine Heymans confessed to knowing less about bees than stars.)


An episode of 'More or Less' asks: Are there more bees in the world or stars in the galaxy?

Hatford asked how the 100 billion stars figure was arrived at:

"have we counted them, or got a computer to count them, or is it more a case of, well, you take a photograph of a section of sky and you sort of say well the rest is probably a bit like that?"

The last suggestion here is of course the basis for many surveys. As long as there is good reason to think a sample is representative of the wider population it is drawn from we can collect data from the sample and make inferences about the population at large.

Read about sampling a population

So, if we counted all the detectable stars in a typical 1% of the sky and then multiplied the count by 100 we would get an approximation to the total number of detectable stars in the whole sky. That would be a reasonable method to find approximately how many stars there are in the galaxy, as long as we thought all the detected stars were in our galaxy and that all the stars in our galaxy were detectable.

Prof. Heymans replied

"So, we have the European Space Agency Gaia mission up at the moment, it was launched in 2013, and that's currently mapping out 1% of all the stars in our Milky Way galaxy, creating a three dimensional map. So, that's looking at 1 billion of the stars, and then to get an idea of how many others are there we look at how bright all the stars are, and we use our sort of models of how different types of stars live [sic] in our Milky Way galaxy to give us that estimate of how many stars are there."

Prof. Catherine Heymans interviewed on 'More or Less'

A tautology?

This seemed to beg a question: how can we know we are mapping 1% of stars, before we know how many stars there are?

This has the appearance of a tautology – a circular argument.

Read about tautology

To count the number of stars in the galaxy,
  • (i) count 1% of them, and then
  • (ii) multiply by 100.

So,

  • If we assume there are one hundred billion, then we need to
  • count one billion, and then
  • multiply by 100 to give…
  • one hundred billion.

Clearly that did not seem right. I am fairly sure that was not what Prof. Haymans meant. As this was a radio programme, the interview was presumably edited to fit within the limited time allocated for this item, so a listener can never be sure that a question and (apparently immediately direct) response that makes the edit fully reflects the original conversation.

Counting the bright ones

According to the website of the Gaia mission, "Gaia will achieve its goals by repeatedly measuring the positions of all objects down to magnitude 20 (about 400 000 times fainter than can be seen with the naked eye)." Hartman's suggestion that "you take a photograph of a section of sky and you sort of say well the rest is probably a bit like that?" seems very reasonable, until you realise that even with a powerful telescope sent outside of the earth's atmosphere, many of the stars in the galaxy may simply not be detectable. So, what we see cannot be considered to be fully representative of what is out there.

It is not then that the scientists have deliberately sampled 1%, but rather they are investigating EVERY star with an apparent brightness above a certain critical cut off. Whether a star makes the cut, depends on such factors as how bright it is (in absolute terms – which we might imagine we would measure from a standard distance 1) and how close it is, as well as whether the line of sight involves the starlight passing through interstellar dust that absorbs some (or all) of the radiation.

Of course, these are all strictly, largely, unknowns. Astrophysics relies a good on boot-strapping, where our best, but still developing, understanding of one feature is used to build models of other features. In such circumstances, observational tests of predictions from theory are often as much testing the underlying foundations upon which a model used to generate a prediction is built as that specific focal model itself. Knowledge moves on incrementally as adjustments are made to different aspects of interacting models.

Observations are theory-dependent

So, this is, in a sense, a circular process, but it is a virtuous circle rather than just a tautology as there are opportunities for correcting and improving the theoretical framework.

In a sense, what I have described here is true of science more generally, and so when an experiment fails to produce a result predicted by a new theory, it is generally possible to seek to 'save' the theory by suggesting the problem was (if not a human error) not in the actual theory being tested, but in some other part of the more extended theoretical network – such as the theory underpinning the apparatus used to collect data or the the theory behind the analysis used to treat data.

In most mature fields, however, these more foundational features are generally considered to be sound and unlikely to need modifying – so, a scientist who explains that their experiment did not produce the expected answer because electron microscopes or mass spectrometers or Fourier transform analyses do not work they way everyone has for decades thought they did would need to offer a very persuasive case.

However, compared to many other fields, astrophysics has much less direct access to the phenomena it studies (which are often vast in terms of absolute size, distance and duration), and largely relies on observing without being able to manipulate the phenomena, so understandably faces special challenges.

Why we need a theoretical model to finish the count

Researchers can use our best current theories to build a picture of how what we see relates to what is 'out there' given our best interpretations of existing observations. This is why the modelling that Prof. Heymans refers to is so important. Our current best theories tell us that the absolute brightness of stars (which is a key factor in deciding whether they will be detected in a sky survey) depends on their mass, and the stage of their 'evolution'.2

So, completing the count needs a model which allows data for detectable stars to be extrapolated, bearing in mind our best current understanding about the variations in frequencies of different kinds (age, size) of star, how stellar 'densities' vary in different regions of a spiral galaxy like ours, the distribution of dust clouds, and so forth.


…keep in mind we are off-centre, and then allow for the thinning out near the edges, remember there might be a supermassive black hole blocking our view through the centre, take into account dust, acknowledge dwarf stars tend to be missed, take into account that the most massive stars will have long ceased shining, then take away the number you first thought of, and add a bit for luck… (Image by WikiImages from Pixabay)

I have taken the liberty of offering an edited exchange

Hartford: "have we counted [the hundred billion stars], or got a computer to count them, or is it more a case of, well, you take a photograph of a section of sky and you sort of say well the rest is probably a bit like that?"

Heymans "So, we have the European Space Agency Gaia mission up at the moment, it was launched in 2013, and that's currently mapping out…all the stars in our Milky Way galaxy [that are at least magnitude 20 in brightness], creating a three dimensional map. So, that's looking at 1 billion of the [brightest] stars [as seen from our solar system], and then to get an idea of how many others are there we look at how bright all the stars are, and we use our models of how different types of stars [change over time 2] in our Milky Way galaxy to give us that estimate of how many stars are there."

No more tautology. But some very clever and challenging science.

(And are there more bees in the world or stars in the galaxy? The programme is available at https://www.bbc.co.uk/sounds/play/m00187wq.)


Note:

1 This issue of what we mean by the brightness of a star also arose in a recent post: Baking fresh electrons for the science doughnut


2 Stars are not alive, but it is common to talk about their 'life-cycles' and 'births' and 'deaths' as stars can change considerably (in brightness, colour, size) as the nuclear reactions at their core change over time once the hydrogen has all been reacted in fusion reactions.

Study reports that non-representative sample of students has average knowledge of earthquakes

When is a cross-sectional study not a cross-sectional study?


Keith S. Taber


A biomedical paper?

I only came to this paper because I was criticising the Biomedical Journal of Scientific & Technical Research's claimed Impact Factor which seems to be a fabrication. I saw this particular paper being featured in a recent tweet from the journal and wondered how it fitted in a biomedical journal. The paper is on an important topic – what young people know about how to respond to an earthquake, but I was not sure why it fitted in this particular journal.

Respectable journals normally have a clear scope (i.e., the range of topics within which they consider submissions for publication) – whereas predatory journals are often primarily interested in publishing as many papers as possible (and so attracting publication fees from as many authors as possible) and so may have no qualms about publishing material that would seem to be out of scope.

This paper reports a questionnaire about secondary age students' knowledge of earthquakes. It would seem to be an education study, possibly even a science education study, rather than a 'biomedical' study. (The journal invites papers from a wide range of fields 1, some of which – geology, chemical engineering – are not obviously 'biomedical' in nature; but not education.)

The paper reports research (so I assume is classed as 'research' in terms of the scale of charges) and comes from Bangladesh (which I assume the journal publishers consider a low income country) and so it would seem that the author's would have been charged $799 to be published in this journal. Part of what authors are supposed to get for that fee is for editors to arrange peer review to provide evaluation of, feedback on, and recommendations for improving, their work.

Peer review

Respectable journals employ rigorous peer review to ensure that only work of quality is published.

Read about peer review

According to the Biomedical Journal of Scientific & Technical Research website:

Peer review process is the system used to assess the quality of a manuscript before it is published online. Independent professionals/experts/researchers in the relevant research area are subjected to assess the submitted manuscripts for originality, validity and significance to help editors determine whether a manuscript should be published in their journal. 

This Peer review process helps in validating the research works, establish a method by which it can be evaluated and increase networking possibilities within research communities. Despite criticisms, peer review is still the only widely accepted method for research validation

Only the articles that meet good scientific standards, explanations, records and proofs of their work presented with Bibliographic reasoning (e.g., acknowledge and build upon other work in the field, rely on logical reasoning and well-designed studies, back up claims with evidence etc.) are accepted for publication in the Journal.

https://biomedres.us/peer-review-process.php

Which seems reassuring. It seems 'Preventive Practice on Earthquake Preparedness Among Higher Level Students of Dhaka City' should then only have been published after evaluation in rigorous peer review. Presumably any weaknesses in the submission would have been highlighted in the review process helping the authors to improve their work before publication. Presumably, the (unamed) editor did not approve publication until peer reviewers were satisfied the paper made a valid new contribution to knowledge and, accordingly, recommended publication. 2


The paper was, apparently, submitted; screened by editors; sent to selected expert peer reviewers; evaluated by reviewers, so reports could be returned to the editor who collated them, and passed them to the authors with her/his decision; revised as indicated; checked by editors and reviewers, leading to a decision to publish; copy edited, allowing proofs to be sent to authors for checking; and published, all in less than three weeks.

Although supposedly published in July 2021, the paper seems to be assigned to an issue published a year before it was submitted

Although one might wonder if a journal which seems to advertise with an inflated Impact Factor can be trusted to follow the procedures it claims. So, I had a quick look at the paper.

The abstract begins:

The present study was descriptive Cross-sectional study conducted in Higher Secondary Level Students of Dhaka, Bangladesh, during 2017. The knowledge of respondent seems to be average regarding earthquake. There is a found to have a gap between knowledge and practice of the respondents.

Gurung & Khanum, 2021, p.29274

Sampling a population (or not)

So, this seems to be a survey, and the population sampled was Higher Secondary Level Students of Dhaka, Bangladesh. Dhaka has a population of about 22.5 million people. I could not readily find out how many of these might be considered 'Higher Secondary Level', but clearly it will be many, many thousands – I would imagine about half a million as a 'ball-park' figure.


Dhaka has a large population of 'higher secondary level students'
(Image by Mohammad Rahmatullah from Pixabay)

For a survey of a population to be valid it needs to be based on a sample which is large enough to minimise errors in extrapolating to the full population, and (even more importantly) the sample needs to be representative of the population.

Read about sampling

Here:

"Due to time constrain the sample of 115."

Gurung & Khanum, 2021, p.29276

So, the sample size was limited to 115 because of time constraints. This would likely lead to large errors in inferring population statistics from the sample, but could at least give some indication of the population as long as the 115 were known to be reasonable representative of the wider population being surveyed.

The reader is told

"the study sample was from Mirpur Cantonment Public School and College , (11 and 12 class)."

Gurung & Khanum, 2021, p.29275

It seems very unlikely that a sample taken from any one school among hundreds could be considered representative of the age cohort across such a large City.

Is the school 'typical' of Dhaka?

The school website has the following evaluation by the school's 'sponsor':

"…one the finest academic institutions of Bangladesh in terms of aesthetic beauty, uncompromised quality of education and, most importantly, the sheer appeal among its learners to enrich themselves in humanity and realism."

Major General Md Zahirul Islam

The school Principal notes:

"Our visionary and inspiring teachers are committed to provide learners with all-rounded educational experiences by means of modern teaching techniques and incorporation of diverse state-of-the-art technological aids so that our students can prepare themselves to face the future challenges."

Lieutenant Colonel G M Asaduzzaman

While both of these officers would be expected to be advocates for the school, this does not give a strong impression that the researchers have sought a school that is typical of Dhakar schools.

It also seems unlikely that this sample of 115 reflects all of the students in these grades. According to the school website, there are 7 classes in each of these two grades so the 115 students were drawn from 14 classes. Interestingly, in each year 5 of the 7 classes are following a science programme 3 – alongside with one business studies and one humanities class. The paper does not report which programme(s) were being followed by the students in the sample. Indeed no information is given regarding how the 115 were selected. (Did the researchers just administer the research instrument to the first students they came across in the school? Were all the students in these grades asked to contribute, and only 115 returned responses?)

Yet, if the paper was seen and evaluated by "independent professionals/experts/researchers in the relevant research area" they seem to have not questioned whether such a small and unrepresentative sample invalidated the study as being a survey of the population specified.

Cross-sectional studies

A cross-sectional study examines and compares different slices of a population – so here, different grades. Yet only two grades were sampled, and these were adjacent grades – 11 and 12 – which is not usually ideal to make comparisons across ages.

There could be a good reason to select two grades that are adjacent in this way. However, the authors do not present separate data for year 11 and year 12, but rather pool it. So they make no comparisons between these two year groups. This "Cross-sectional study" was then NOT actually a cross-sectional study.

If the paper did get sent to "independent professionals/experts/researchers in the relevant research area" for review, it seems these experts missed that error.

Theory and practice?

The abstract of the paper claims

"There is a found to have a gap between knowledge and practice of the respondents. The association of the knowledge and the practice of the students were done in which after the cross-tabulation P value was 0.810 i.e., there is not any [statistically significant?] association between knowledge and the practice in this study."

Gurung & Khanum, 2021, p.29274

This seems to suggest that student knowledge (what they knew about earthquakes) was compared in some way with practice (how they acted during an earthquake or earthquake warning). But the authors seem to have only collected data with (what they label) a questionnaire. They do not have any data on practice. The distinction they seem to really be making is between

  • knowledge about earthquakes, and
  • knowledge about what to do in the event of an earthquake.

That might be a useful thing to examine, but any "independent professionals/experts/researchers in the relevant research area"asked to look at the submission do not seem to have noted that the authors do not investigate practice and so needed to change the descriptions they use an claims they make.

Average levels of knowledge

Another point that any expert reviewer 'worth their salt' would have queried is the use of descriptors like 'average' in evaluating students responses. The study concluded that

"The knowledge of earthquake and its preparedness among Higher Secondary Student were average."

Gurung & Khanum, 2021, p.29280

But how do the authors know what counts as 'average'?

This might mean that there is some agreed standard here described in extant literature – but, if so, this is not revealed. It might mean that the same instrument had previously been used to survey nationally or internationally to offer a baseline – but this is not reported. Some studies on similar themes carried out elsewhere are referred to, but it is not clear they used the same instrumentation or analytical scheme. Indeed, the reader is explicitly told very little about the instrument used:

"Semi-structured both open ended and close ended questionnaire was used for this study."

Gurung & Khanum, 2021, p.29276

The authors seem to have forgotten to discuss the development, validation and contents of the questionnaire – and any experts asked to evaluate the submission seem to have forgotten to look for this. I would actually suggest that the authors did not really use a questionnaire, but rather an assessment instrument.

Read about questionnaires

A questionnaire is used to survey opinions, views and so forth – and there are no right or wrong answers. (What type of music do you like? Oh jazz, sorry that's not the right answer.) As the authors evaluated and scored the student responses this was really an assessment.

The authors suggest:

"In this study the poor knowledge score was 15 (13%), average 80 (69.6%) and good knowledge score 20 (17.4%) among the 115 respondents. Out of the 115 respondents most of the respondent has average knowledge and very few 20 (17.4%) has good knowledge about earthquake and the preparedness of it."

Gurung & Khanum, 2021, p.29280

Perhaps this means that the authors had used some principled (but not revealed) technique to decide what counted as poor, average and good.

ScoreDescription
15poor knowledge
80average knowledge
20good knowledge
Descriptors applied to student scores on the 'questionnaire'

Alternatively, perhaps "poor knowledge score was 15 (13%), average 80 (69.6%) and good knowledge score 20 (17.4%)" is reporting what was found in terms of the distribution in this sample – that is, they empirically found these outcomes in this distribution.

Well, not actually these outcomes, of course, as that would suggest that a score of 20 is better than a score of 80, but presumably that is just a typographic error that was somehow missed by the authors when they made their submission, then missed by the editor who screened the paper for suitability (if there is actually an editor involved in the 'editorial' process for this journal), then missed by expert reviewers asked to scrutinise the manuscript (if there really were any), then missed by production staff when preparing proofs (i.e., one would expect this to have been raised as an 'author query' on proofs 4), and then missed again by authors when checking the proofs for publication.

If so, the authors found that most respondents got fairly typical scores, and fewer scored at the tails of the distribution – as one would expect. On any particular assessment, the average performance is (as the authors report here)…average.


Work cited:
  • Gurung, N. and Khanum, H. (2021) Preventive Practice on Earthquake Preparedness Among Higher Level Students of Dhaka City. Biomedical Journal of Scientific & Technical Research, July, 2020, Volume 37, 2, pp 29274-29281

Note:

1 The Biomedical Journal of Scientific & Technical Research defines its scope as including:

  • Agri and Aquaculture 
  • Biochemistry
  • Bioinformatics & Systems Biology 
  • Biomedical Sciences
  • Clinical Sciences
  • Chemical Engineering
  • Chemistry
  • Computer Science 
  • Economics & Accounting 
  • Engineering
  • Environmental Sciences
  • Food & Nutrition
  • General Science
  • Genetics & Molecular Biology
  • Geology & Earth Science
  • Immunology & Microbiology
  • Informatics
  • Materials Science
  • Orthopaedics
  • Mathematics
  • Medical Sciences
  • Nanotechnology
  • Neuroscience & Psychology
  • Nursing & Health Care
  • Pharmaceutical Sciences
  • Physics
  • Plant Sciences
  • Social & Political Sciences 
  • Veterinary Sciences 
  • Clinical & Medical 
  • Anesthesiology
  • Cardiology
  • Clinical Research 
  • Dentistry
  • Dermatology
  • Diabetes & Endocrinology
  • Gastroenterology
  • Genetics
  • Haematology
  • Healthcare
  • Immunology
  • Infectious Diseases
  • Medicine
  • Microbiology
  • Molecular Biology
  • Nephrology
  • Neurology
  • Nursing
  • Nutrition
  • Oncology
  • Ophthalmology
  • Pathology
  • Pediatrics
  • Physicaltherapy & Rehabilitation 
  • Psychiatry
  • Pulmonology
  • Radiology
  • Reproductive Medicine
  • Surgery
  • Toxicology

Such broad scope is a common characteristic of predatory journals.


2 The editor(s) of a research journal is normally a highly regarded academic in the field of the journal. I could not find the name of the editor of this journal although it has seven associate editors and dozens of people named as being on an 'editorial committee'. Whether any of these people actually carry out the functions of an academic editor or whether this work is delegated to non-academic office staff is a moot point.


3 The classes are given names. So, nursery classes include Lotus and Tulip and so forth. In the senior grades, the science classes are called:

  • Flora
  • Neon
  • Meson
  • Sigma
  • Platinam [sic]
  • Argon
  • Electron
  • Neutron
  • Proton
  • Redon [sic]

4 Production staff are not expected to be experts in the topic of the paper, but they do note any obvious omissions (such as missing references) or likely errors and list these as 'author queries' for authors to respond to when checking 'proofs', i.e., the article set in the journal format as it will be published.

What shape should a research thesis be?

Being flummoxed by a student question was the inspiration for a teaching metaphor

Keith S. Taber

An artist's impression of the author being lost for words (Image actually by Christian Dorn from Pixabay)

In my teaching on the 'Educational Research' course I used to present a diagram of a shape something like the lemniscate – the infinity symbol, ∞ – and tell students that was the shape their research project and thesis should take. I would suggest this was a kind of visual metaphor.

This may seem a rather odd idea, but I was actually responding to a question I had previously been asked by a student. Albeit, this was a rather deferred response.

'Lost for words'

As a teacher one gets asked all kinds of questions. I've often suggested that preparing for teaching is more difficult than preparing for an examination. When taking an examination it is usually reasonable to assume that the examination question have been set by experts in the subject.

A candidate therefore has a reasonable chance of foreseeing at least the general form of the questions that night asked. There is usually a syllabus or specification which gives a good indication of the subject matter and the kinds of skills expected to be demonstrated – and usually there are past papers (or, if not, specimen papers) giving examples of what might be asked. The documentation reflects some authority's decisions about the bounds of the subject being examined (e.g., what counts as included in 'chemistry' or whatever), the selection of topics to be included in the course, and the level of treatment excepted at this level of study (Taber, 2019). Examiners may try to find novel applications and examples and contexts – but good preparation should avoid the candidate ever being completely stumped and having no basis to try to develop a response.

However, teachers are being 'examined' so to speak, by people who by definition are not experts and so may be approaching a subject or topic from a wide range of different perspectives. In science teaching, one of the key issues is how students do not simply come to class ignorant about topics to be studied, but often bring a wide range of existing ideas and intuitions ('alternative conceptions') that may match, oppose, or simply be totally unconnected with, the canonical accounts.

Read about alternative conceptions

This can happen in any subject area. But a well prepared teacher, even if never able to have ready answers to all question or suggestions learners might offer, will seldom be lost for words and have no idea how to answer. But I do recall an occasion when I was indeed flummoxed.

I was in what is known as the 'Street' in the main Faculty of Education Building (the Donald McIntyre Building) at Cambridge at a time when students were milling about as classes were just ending and starting. Suddenly out of the crowd a student I recognised from teaching the Educational Research course loomed at me and indicated he wanted to talk. I saw he was clutching a hardbound A4 notebook.

We moved out of the melee to an area where we could talk. He told me he had a pressing question about the dissertation he had to write for his M.Phil. programme.

"What should the thesis look like?"

His question sounded simple enough – "What should the thesis look like?"

Now at one level I had an answer – it should be an A4 document that would be eventually bound in blue cloth with gold lettering on the spine. However, I was pretty sure that was not what he meant.

What does a thesis look like?

I said I was not sure what he meant. He opened his notebook at a fresh double page and started sketching, as he asked me: 'Should the thesis look like this?' as he drew a grid on one page of his book. Whilst I was still trying to make good sense of this option, he started sketching on the facing page. "Or, should it look like this?"

I have often thought back to this exchange as I was really unsure how to respond. He seemed no more able to explain these suggestions than I was able to appreciate how these representations related to my understanding of the thesis. As I looked at the first option I was starting to think in terms of the cells as perhaps being the successive chapters – but the alternative option seemed to undermine this. For, surely, if the question was about whether to have 6 or 8 chapters – a question that has no sensible answer in abstract without considering the specific project – it would have been simpler just to pose the question verbally. Were the two columns (if that is what they were) meant to be significant? Were the figures somehow challenging the usual linear nature of a thesis?

I could certainly offer advice on structuring a thesis, but as a teacher – at least as the kind of constructivist teacher I aspired to be – I failed here. I was able to approach the topic from my own perspective, but not to appreciate the student's own existing conceptual framework and work from there. This if of course what research suggests teachers usually need to do to help learners with alternative conceptions shift their thinking.

Afterwards I would remember this incident (in a way I cannot recall the responses I gave to student questions on hundreds of other occasions) and reflect on it – without ever appreciating what the student was thinking. I know the student had a background in a range of artistic fields including as a composer – and I wondered if this was informing his thinking. Perhaps if I had studied music at a higher level I might have appreciated the question as being along the lines of, say, whether the should the thesis be, metaphorically speaking, in sonata form or better seen as a suite?

I think it was because the question played on my mind that later, indeed several years later, I had the insight that 'the thesis' (a 'typical' thesis) did not look like either of those rectangular shapes, but rather more like the leminscape:

A visual metaphor for a thesis project (after Taber, 2013)

The focus of a thesis

My choice of the leminscate was because its figure-of-eight nature made it two loops which are connected by a point – which can be seen as some kind of focal point of the image:

A thesis project has a kind of focal point

This 'focus' represents the research question or questions (RQ). The RQ are not the starting point of most projects, as good RQ have to be carefully chosen and refined, and that usually take a lot of reading around a topic.

However, they act as a kind of fulcrum around which the thesis is organised because the sections of the thesis leading up to the RQ are building up to them – offering a case for why those particular questions are interesting, important, and so-phrased. And everything beyond that point reflects the RQ, as the thesis then describes how evidence was collected and analysed in order to try to answer the questions.

Two cycles of activity

A thesis project cycles through expansive and focusing phases

Moreover, the research project described in a thesis reflects two cycles of activity.

The first cycle has an expansive phase where the researcher is reading around the topic, and exposing themselves to a wide range of literature and perspectives that might be relevant. Then, once a conceptual framework is developed from this reading (in the literature review), the researcher focuses in, perhaps selecting one of several relevant theoretical perspectives, and informed by prior research and scholarship, crystallises the purpose of the project in the RQ.

Then the research is planned in order to seek to answer the RQ, which involves selecting or developing instruments, going out and collecting data – often quite a substantive amount of data. After this expansive phase, there is another focusing stage. The collected data is then processed into evidence – interpreted, sifted, selected, summarised, coded and tallied, categorised – and so forth – in analysis. The data analysis is summarised in the results, allow conclusions to be formed: conclusions which reflect back to the RQ.

The lemniscate, then, acts a simple visual metaphor that I think acts as a useful device for symbolising some important features of a research project, and so, in one sense at least, what a thesis 'looks' like. If any of my students (or readers) have found this metaphor useful then they have benefited from a rare occasion when a student question left me lost for words.

Work cited:

Assessing Chemistry Laboratory Equipment Availability and Practice

Comparative education on a local scale?

Keith S. Taber

Image by Mostafa Elturkey from Pixabay 

I have just read a paper in a research journal which compares the level of chemistry laboratory equipment and 'practice' in two schools in the "west Gojjam Administrative zone" (which according to a quick web-search is in the Amhara Region in Ethiopia). According to Yesgat and Yibeltal (2021),

"From the analysis of Chemistry laboratory equipment availability and laboratory practice in both … secondary school and … secondary school were found in very low level and much far less than the average availability of chemistry laboratory equipment and status of laboratory practice. From the data analysis average chemistry laboratory equipment availability and status of laboratory practice of … secondary school is better than that of Jiga secondary school."

Yesgat and Yibeltal, 2021: abstract [I was tempted to omit the school names in this posting as I was not convinced the schools had been treated reasonably, but the schools are named in the very title of the article]

Now that would seem to be something that could clearly be of interest to teachers, pupils, parents and education administrators in those two particular schools, but it raises the question that can be posed in relation to any research: 'so what?' The findings might be a useful outcome of enquiry in its own context, but what generalisable knowledge does this offer that justifies its place in the research literature? Why should anyone outside of West Gojjam care?

The authors tell us,

"There are two secondary schools (Damot and Jiga) with having different approach of teaching chemistry in practical approach"

Yesgat and Yibeltal, 2021: 96

So, this suggests a possible motivation.

  • If these two approaches reflect approaches that are common in schools more widely, and
  • if these two schools can be considered representative of schools that adopt these two approaches, and
  • if 'Chemistry Laboratory Equipment Availability and Practice' can be considered to be related to (a factor influencing? an effect of?) these different approaches, and
  • if the study validly and reliably measures 'Chemistry Laboratory Equipment Availability and Practice', and
  • if substantive differences are found between the schools

then the findings might well be of wider interest. As always in research, the importance we give to findings depends upon a whole logical chain of connections that collectively make an argument.

Spoiler alert!

At the end of the paper, I was none the wiser what these 'different approaches' actually were.

A predatory journal

I have been reading some papers in a journal that I believed, on the basis of its misleading title and website details, was an example of a poor-quality 'predatory journal'. That is, a journal which encourages submissions simply to be able to charge a publication fee (currently $1519, according to the website), without doing the proper job of editorial scrutiny. I wanted to test this initial evaluation by looking at the quality of some of the work published.

Although the journal is called the Journal of Chemistry: Education Research and Practice (not to be confused, even if the publishers would like it to be, with the well-established journal Chemistry Education Research and Practice) only a few of the papers published are actually education studies. One of the articles that IS on an educational topic is called 'Assessment of Chemistry Laboratory Equipment Availability and Practice: A Comparative Study Between Damot and Jiga Secondary Schools' (Yesgat & Yibeltal, 2021).

Comparative education?

Yesgat and Yibeltal imply that their study falls in the field of comparative education. 1 They inform readers that 2,

"One purpose of comparative education is to stimulate critical reflection about our educational system, its success and failures, strengths and weaknesses. This critical reflection facilitates self-evaluation of our work and is the basis for determining appropriate courses of action. Another purpose of comparative education is to expose us to educational innovations and systems that have positive outcomes. Most compartivest states [sic] that comparative education has four main purposes. These are:

To describe educational systems, processes or outcomes

To assist in development of educational institutions and practices

To highlight the relationship between education and society

To establish generalized statements about education that are valid in more than one country

Yesgat & Yibeltal, 2021: 95-96
Comparative education studies look to characterise (national) education systems in relation to their social/cultural contexts (Image by Gerd Altmann from Pixabay)

Of course, like any social construct, 'comparative education' is open to interpretation and debate: for example, "that comparative education brings together data about two or more national systems of education, and comparing and contrasting those data" has been characterised as an "a naive and obvious answer to the question of what constitutes comparative education" (Turner, 2019, p.100).

There is then some room for discussion over whether particular research outputs should count as 'comparative education' studies or not. Many comparative education studies do not actually compare two educational systems, but rather report in detail from a single system (making possible subsequent comparisons based across several such studies). These educational systems are usually understood as national systems, although there may be a good case to explore regional differences within a nation if regions have autonomous education systems and these can be understood in terms of broader regional differences.

Yet, studying one aspect of education within one curriculum subject at two schools in one educational educational administrative area of one region of one country cannot be understood as comparative education without doing excessive violence to the notion. This work does not characterise an educational system at national, regional or even local level.

My best assumption is that as the study is comparing something (in this case an aspect of chemistry education in two different schools) the authors feel that makes it 'comparative education', by which account of course any educational experiment (comparing some innovation with some kind of comparison condition) would automatically be a comparative education study. We all make errors sometimes, assuming terms have broader or different meanings than their actual conventional usage – and may indeed continue to misuse a term till someone points this out to us.

This article was published in what claims to be a peer reviewed research journal, so the paper was supposedly evaluated by expert reviewers who would have provided the editor with a report on strengths and weaknesses of the manuscript, and highlighted areas that would need to be addressed before possible publication. Such a reviewer would surely have reported that 'this work is not comparative education, so the paragraph on comparative education should either be removed, or authors should contextualise it to explain why it is relevant to their study'.

The weak links in the chain

A research report makes certain claims that derive from a chain of argument. To be convinced about the conclusions you have to be convinced about all the links in the chain, such as:

  • sampling (were the right people asked?)
  • methodology (is the right type of research design used to answer the research question?)
  • instrumentation (is the data collection instrument valid and reliable?)
  • analysis (have appropriate analytical techniques been carried out?)

These considerations cannot be averaged: if, for example, a data collection instrument does not measure what it is said to measure, then it does not matter how good the sample, or how careful the analysis, the study is undermined and no convincing logical claims can be built. No matter how skilled I am in using a tape measure, I will not be able to obtain accurate weights with it.

Sampling

The authors report the make up of their sample – all the chemistry teachers in each school (13 in one, 11 in the other), plus ten students from each of grades 9, 10 and 11 in each school. They report that "… 30 natural science students from Damot secondary school have been selected randomly. With the same technique … 30 natural sciences students from Jiga secondary school were selected".

Random selection is useful to know there is no bias in a sample, but it is helpful if the technique for randomisation is briefly reported to assure readers that 'random' is not being used as a synonym for 'arbitrary' and that the technique applied was adequate (Taber, 2013b).

A random selection across a pooled sample is unlikely to lead to equal representation in each subgroup (From Taber, 2013a)

Actually, if 30 students had been chosen at random from the population of students taking natural sciences in one of the schools, it would be extremely unlikely they would be evenly spread, 10 from each year group. Presumably, the authors made random selections within these grade levels (which would be eminently sensible, but is not quite what they report).

Read about the criterion for randomness in research

Data collection

To collect data the authors constructed a questionnaire with Likert-type items.

"…questionnaire was used as data collecting instruments. Closed ended questionnaires with 23 items from which 8 items for availability of laboratory equipment and 15 items for laboratory practice were set in the form of "Likert" rating scale with four options (4=strongly agree, 3=agree, 2=disagree and 1=strongly disagree)"

Yesgat & Yibeltal, 2021: 96

These categories were further broken down (Yesgat & Yibeltal, 2021: 96): "8 items of availability of equipment were again sub grouped in to

  • physical facility (4 items),
  • chemical availability (2 items), and
  • laboratory apparatus (2 items)

whereas 15 items of laboratory practice were further categorized as

  • before actual laboratory (4 items),
  • during actual laboratory practice (6 items) and
  • after actual laboratory (5 items)

Internal coherence

So, there were two basic constructs, each broken down into three sub-constructs. This instrument was piloted,

"And to assure the reliability of the questionnaire a pilot study on a [sic] non-sampled teachers and students were conducted and Cronbach's Alpha was applied to measure the coefficient of internal consistency. A reliability coefficient of 0.71 was obtained and considered high enough for the instruments to be used for this research"

Yesgat & Yibeltal, 2021: 96

Running a pilot study can be very useful as it can highlight issues about items. However, although simply asking people to complete a questionnaire might highlight items people could not make any sense of, it may not be as useful as interviewing them about how they understood items to check that respondents understand items in the same way as researchers.

The authors cite the value of Cronbach's alpha to demonstrate their instrument has internal consistency. However, they seem to be quoting the value obtained in the pilot study, where the statistic strictly applies to a particular administration of an instrument (so the value from the main study is more relevant to the results reported).

More problematic, the authors appear to cite a value of alpha from across all 23 items (n.b., the value of alpha tends to increase as the number of items increases, so what is considered an acceptable value needs to allow for the number of items included) when these are actually two distinct scales: 'availability of laboratory equipment' and 'laboratory practice'. Alpha should be quoted separately for each scale – values across distinct scales are not useful (Taber, 2018). 3

Do the items have face validity?

The items in the questionnaire are reported in appendices (pp.102-103), so I have tabulated them here, so readers can consider

  • (a) whether they feel these items reflect the constructs of 'availability of equipment' and 'laboratory practice';
  • (b) whether the items are phrased in a clear way for both teachers and students (the authors report "conceptually the same questionnaires with different forms were prepared" (p.101) but if this means different wording fro teachers than students this is not elaborated – teachers were also asked demographic questions about their educational level)); and
  • (c) whether they are all reasonable things to expect both teachers and students to be able to rate.
'Availability of equipment' items'Laboratory practice' items
Structured and well- equipped laboratory roomYou test the experiments before your work with students
Availability of electric system in laboratory roomYou give laboratory manuals to student before practical work
Availability of water system in laboratory roomYou group and arrange students before they are coming to laboratory room
Availability of laboratory chemicals are available [sic]You set up apparatus and arrange chemicals for activities
No interruption due to lack of lab equipmentYou follow and supervise students when they perform activities
Isolated bench to each student during laboratory activitiesYou work with the lab technician during performing activity
Chemicals are arranged in a logical order.You are interested to perform activities?
Laboratory apparatus are arranged in a logical orderYou check appropriate accomplishment of your students' work
Check your students' interpretation, conclusion and recommendations
Give feedbacks to all your students work
Check whether the lab report is individual work or group
There is a time table to teachers to conduct laboratory activities.
Wear safety goggles, eye goggles, and other safety equipment in doing so
Work again if your experiment is failed
Active participant during laboratory activity
Items teachers and students were asked to rate on a four point scale (agree / strongly agree / disagree / strongly disagree)

Perceptions

One obvious limitation of this study is that it relies on reported perceptions.

One way to find out about the availability of laboratory equipment might be to visit teaching laboratories and survey them with an observation schedule – and perhaps even make a photographic record. The questionnaire assumes that teacher and student perceptions are accurate and that honest reports would be given (might teachers have had an interest in offering a particular impression of their work?)

Sometimes researchers are actually interested in impressions (e.g., for some purposes whether a students considers themselves a good chemistry student may be more relevant than an objective assessment), and sometimes researchers have no direct access to a focus of interest and must rely on other people's reports. Here it might be suggested that a survey by questionnaire is not really the best way to, for example, "evaluate laboratory equipment facilities for carrying out practical activities" (p.96).

Findings

The authors describe their main findings as,

"Chemistry laboratory equipment availability in both Damot secondary school and Jiga secondary school were found in very low level and much far less than the average availability of chemistry laboratory equipment. This finding supported by the analysis of one sample t-values and as it indicated the average availability of laboratory equipment are very much less than the test value and the p-value which is less than 0.05 indicating the presence of significant difference between the actual availability of equipment to the expected test value (2.5).

Chemistry laboratory practice in both Damot secondary school and Jiga secondary school were found in very low level and much far less than the average chemistry laboratory practice. This finding supported by the analysis of one sample t-values and as it indicated the average chemistry laboratory practice are very much less than the test value and the p-value which is less than 0.05 indicating the presence of significant difference between the actual chemistry laboratory practice to the expected test value."

Yesgat & Yibeltal, 2021: 101 (emphasis added)

This is the basis for the claim in the abstract that "From the analysis of Chemistry laboratory equipment availability and laboratory practice in both Damot secondary school and Jiga secondary school were found in very low level and much far less than the average availability of chemistry laboratory equipment and status of laboratory practice."

'The average …': what is the standard?

But this raises a key question – how do the authors know what the "the average availability of chemistry laboratory equipment and status of laboratory practice" is, if they have only used their questionnaire in two schools (which are both found to be below average)?

Yesgat & Yibeltal have run a comparison between the average ratings they get from the two schools on their two scales and the 'average test value' rating of 2.5. As far as I can see, this is not an empirical value at all. It seems the authors have just assumed that if people are asked to use a four point scale – 1, 2, 3, 4 – then the average rating will be…2.5. Of course, that is a completely arbitrary assumption. (Consider the question – 'how much would you like to be beaten and robbed today?': would the average response be likely to be nominal mid-point of a ratings scale?) Perhaps if a much wider survey had been undertaken the actual average rating would have been 1.9 0r 2.7 or …

That is even assuming that 'average' is a meaningful concept here. A four point Likert scale is an ordinal scale ('agree' is always less agreement than 'strongly agree' and more than 'disagree') but not a ratio scale (that is, it cannot be assumed that the perceived 'agreement' gap (i) from 'strongly disagree' to 'disagree' is the same for each respondent and the same as that (ii) from 'disagree' to 'agree' and (iii) from 'agree' to 'strongly agree'). Strictly, Likert scale ratings cannot be averaged (better being presented as bar charts showing frequencies of response) – so although the authors carry out a great deal of analysis, much of this is, strictly, invalid.

So what has been found out from this study?

I would very much like to know what peer reviewers made of this study. Expert reviewers would surely have identified some very serious weaknesses in the study and would have been expected to have recommended some quite major revisions even if they thought it might eventually be publishable in a research journal.

An editor is expected to take on board referee evaluations and ask authors to make such revisions as are needed to persuade the editor the submission is ready for publication. It is the job of the editor of a research journal, supported by the peer reviewers, to

a) ensure work of insufficient quality is not published

b) help authors strengthen their paper to correct errors and address weaknesses

Sometimes this process takes some time, with a number of cycles of revision and review. Here, however, the editor was able to move to a decision to publish in 5 days.

The study reflects a substantive amount of work by the authors. Yet, it is hard to see how this study, at least as reported in this journal, makes a substantive contribution to public knowledge. The study finds that one school has somewhat higher survey ratings on an instrument that has not been fully validated than another school, and is based on a pooling of student and teacher perceptions, and which guesses that both rate lower than a hypothetical 'average' school. The two schools were supposed to represent a "different approach[es] of teaching chemistry in practical approach" – but even if that is the case, the authors have not shared with their readers what these different approaches are meant to be. So, there would be no possibility of generalising from the schools to 'approach[es] of teaching chemistry', even if that was logically justifiable. And comparative education it is not.

This study, at least as published, does not seem to offer useful new knowledge to the chemistry education community that could support teaching practice or further research. Even in the very specific context of the two specific schools it is not clear what can be done with the findings which simply reflect back to the informants what they have told the researchers, without exploring the reasons behind the ratings (how do different teachers and students understand what counts as 'Chemicals are arranged in a logical order') or the values the participants are bringing to the study (is 'Check whether the lab report is individual work or group' meant to imply that it is seen as important to ensure that students work cooperatively or to ensure they work independently or …?)

If there is a problem highlighted here by the "very low levels" (based on a completely arbitrary interpretation of the scales) there is no indication of whether this is due to resourcing of the schools, teacher preparation, levels of technician support, teacher attitudes or pedagogic commitments, timetabling problems, …

This seems to be a study which has highlighted two schools, invited teachers and students to complete a dubious questionnaire, and simply used this to arbitrarily characterise the practical chemistry education in the schools as very poor, without contextualising any challenges or offering any advice on how to address the issues.

Work cited:
Note:

1 'Imply' as Yesgat and Yibeltal do not actually state that they have carried out comparative education. However, if they do not think so, then the paragraph on comparative education in their introduction has no clear relationship with the rest of the study and is not more than a gratuitous reference, like suddenly mentioning Nottingham Forest's European Cup triumphs or noting a preferred flavour of tea.


2 This seemed an intriguing segment of the text as it was largely written in a more sophisticated form of English than the rest of the paper, apart from the odd reference to "Most compartivest [comparative education specialists?] states…" which seemed to stand out from the rest of the segment. Yesgat and Yibeltal do not present this as a quote, but cite a source informing their text (their reference [4] :Joubish, 2009). However, their text is very similar to that in another publication:

Quote from Mbozi, 2017, p.21Quote from Yesgat and Yibeltal, 2021, pp.95-96
"One purpose of comparative education is to stimulate critical reflection about our educational system, its success and failures, strengths and weaknesses."One purpose of comparative education is to stimulate critical reflection about our educational system, its success and failures, strengths and weaknesses.
This critical reflection facilitates self-evaluation of our work and is the basis for determining appropriate courses of action.This critical reflection facilitates self-evaluation of our work and is the basis for determining appropriate courses of action.
Another purpose of comparative education is to expose us to educational innovations and systems that have positive outcomes. Another purpose of comparative education is to expose us to educational innovations and systems that have positive outcomes.
The exposure facilitates our adoption of best practices.
Some purposes of comparative education were not covered in your exercise above.
Purposes of comparative education suggested by two authors Noah (1985) and Kidd (1975) are presented below to broaden your understanding of the purposes of comparative education.
Noah, (1985) states that comparative education has four main purposes [4] and these are:Most compartivest states that comparative education has four main purposes. These are:
1. To describe educational systems, processes or outcomes• To describe educational systems, processes or outcomes
2. To assist in development of educational institutions and practices• To assist in development of educational institutions and practices
3. To highlight the relationship between education and society• To highlight the relationship between education and society
4. To establish generalized statements about education, that are valid in more than one country."• To establish generalized statements about education that are valid in more than one country"
Comparing text (broken into sentences to aid comparison) from two sources

3 There are more sophisticated techniques which can be used to check whether items do 'cluster' as expected for a particular sample of respondents.


4 As suggested above, researchers can pilot instruments with interviews or 'think aloud' protocols to check if items are understood as intended. Asking assumed experts to read through and check 'face validity' is of itself quite a limited process, but can be a useful initial screen to identify items of dubious relevance.

The mystery of the disappearing authors

Original image by batian lu from Pixabay 

Can an article be simultaneously out of scope, and limited in scope?

Keith S. Taber

Not only had two paragraphs from the abstract gone missing, along with the figures, but the journal article had also lost two-thirds of its authors.

I have been reading some papers in a journal that I believed, on the basis of its misleading title and website details, was an example of a poor-quality 'predatory journal'. That is, a journal which encourages submissions simply to be able to charge a publication fee (currently $1519, according to the website), without doing the proper job of editorial scrutiny. I wanted to test this initial evaluation by looking at the quality of some of the work published.

Although the journal is called the Journal of Chemistry: Education Research and Practice (not to be confused, even if the publishers would like it to be, with the well-established journal Chemistry Education Research and Practice) only a few of the papers published are actually education studies.

One of the articles that IS on an educational topic is called 'An overview of the first year Undergraduate Medical Students [sic] Feedback on the Point of Care Ultrasound Curriculum' (Mohialdin, 2018a), by Vian Mohialdin, an
Associate Professor of Pathology and Molecular Medicine at McMaster University in Ontario.

A single-authored paper by Prof. Mohialdin

Review articles

Research journals tend to distinguish between different types of articles, and most commonly:

  • papers that report empirical studies,
  • articles which set out theoretical perspectives/positions, and
  • articles that offer reviews of the existing literature on a topic.

'An overview of the first year Undergraduate Medical Students Feedback on the Point of Care Ultrasound Curriculum' is classified as a review article.

A review article?

Typically, review articles cite a good deal of previous literature. Prof. Mohialdin cites a modest number of previous publications – just 10. Now one might suspect that perhaps the topic of point-of-care ultrasound in undergraduate medical education is a fairly specialist topic, and perhaps even a novel topic, in which case there may not be much literature to review. But a review of ultrasound in undergraduate medical education published a year earlier (Feilchenfeld, Dornan, Whitehead & Kuper, 2017) cited over a hundred works.

Actually a quick inspection of Mohialdin's paper reveals it is not a review article at all, as it reports a single empirical study. Either the journal has misclassified the article, or the author submitted it as a review article and the journal did not query this. To be fair, the journal website does note that classification into article types "is subjective to some degree". 1

So, is it a good study?

Not a full paper

Well, that is not easy to evaluate as the article is less than two pages in length whereas most research studies in education are much more substantial. Even the abstract of the article seems lacking (see the table below, left hand column). An abstract of a research paper is usually expected to very briefly report something about the research sample/population (who participated in the study?); the research design/methodology (is it an experiment, a survey…), and the results (what did the researchers find out?) The abstract of Prof. Mohialdin's paper misses all these points and so tells readers nothing about the research.

The main text also lacks some key information. The study is a type of research report that is sometimes called a 'practice paper' – the article reports some teaching innovation carried out by practitioners in their own teaching context. The text does give some details of what the practice was – but simply writing about practice is not usually considered sufficient for a research paper. At the least, there needs to be some evaluation of the innovation.

The research design for the evaluation is limited to two sentences under the section heading 'Conclusion/Result Result'. (Mohialdin, 2018a, p.1)

Here there has been some evaluation, but the report is very sketchy, and so might seem inadequate for a research report. Under a rather odd section heading, the reader is informed,

"A questionnaire was handed to the first year undergraduate medical students at the end of session four, to evaluate their hands on ultrasound session experience."

Mohialdin, 2018a, p.1

That one sentence comprises the account of data collection.

The questionnaire is not reproduced for readers. Nor is it described (how many questions, what kinds of questions?) Nor is its development reported. There is not any indication of how many of the 150 students in the population completed the questionnaire, whether ethical procedures were followed 2, where the students completed the questionnaire (for example, was this undertaken in a class setting where participants were being observed by the teaching staff, or did they take it away with them "at the end of session four" to complete in private?) or whether they were able to respond anonymously (rather than have their teachers be able to identify who made which responses).

Perhaps there are perfectly appropriate responses to these questions – but as the journal peer reviewers and editor do not seem to have asked, the reader is left in the dark.

Invisible analytical techniques

Similarly, details of the analysis undertaken are, again, sketchy. A reader is told:

"Answers were collected and data was [sic] analyzed into multiple graphs (as illustrated on this poster)."

Mohialdin, 2018a, p.1

Now that sounds promising, except either the author forgot to submit the graphs with the text, or the journal somehow managed to lose them in production. 3 (And as I've found out, even the most prestigious and well established publishers can lose work they have accepted for publication!)

So, readers are left with no idea what questions were asked, nor what responses were offered, that led to the graphs – that are not provided.

There were also comments – presumably [sic – it would be good to be told] in response to open-ended items on the questionnaire.

"The comments that we [sic, not I] got from this survey were mainly positive; here are a few of the constructive comments that we [sic] received:…

We [sic] also received some comments about recommendations and
ways to improve the sessions (listed below):…"

Mohialdin, 2018a, 1-2.

A reader might ask who decided which comments should be counted as positive (e.g., was it a rater independent of the team who implemented the innovation?), and what does 'mainly' mean here (e.g., 90 of 100 responses? 6 of 11?).

So, in summary, there is no indication of what was asked, who exactly responded, or how the analysis was carried out. As the Journal of Chemistry: Education Research and Practice claims to be a peer reviewed journal one might expect reviewers to have recommended at least that such information (along with the missing graphs) should be included before publication might be considered.

There is also another matter that one would expect peer reviewers, and especially the editor, to have noticed.

Not in scope

Research journals usually have a scope – a range of topics they publish articles on. This is normally made clear in the information on journal websites. Despite its name, the Journal of Chemistry: Education Research and Practice does not restrict itself to chemistry education, but invites work on all aspects of the chemical sciences, and indeed most of its articles are not educational.

Outside the scope of the journal? (Original Image by Magnascan from Pixabay )

But 'An overview of the first year Undergraduate Medical Students Feedback on the Point of Care Ultrasound Curriculum' is not about chemistry education or chemistry in a wider sense. Ultrasound diagnostic technology falls under medical physics, not a branch of chemistry. And, more pointedly, teaching medical students to use ultrasound to diagnose medical conditions falls under medical education – as the reference to 'Medical Students' in the article title rather gives away. So, it is odd that this article was published where it was, as it should have been rejected from this particular journal as being out of scope.

Despite the claims of Journal of Chemistry: Education Research and Practice to be a peer reviewed journal (that means that all submissions are supposedly sent out to, and scrutinised and critiqued by, qualified experts on the topic who make recommendations about whether something is sufficient quality for publication, and, if so, whether changes should be made first – like perhaps including graphs that are referred to, but missing), the editor managed to decide the submission should be published just seven days after it was submitted for consideration.

The chemistry journal accepted the incomplete report of the medical education study, to be described as a review article, one week after submission.

The journal article as a truncated conference poster?

The reference to "multiple graphs (as illustrated on this poster)" (my emphasis) suggested that the article was actually the text (if not the figures) of a poster presented at a conference, and a quick search revealed that Mohialdin, Wainman and Shali had presented on 'An overview of the first year Undergraduate Medical Students Feedback on the Point of Care Ultrasound Curriculum' at an experimental biology (sic, not chemistry) conference.

A poster at a conference is not considered a formal publication, so there is nothing inherently wrong with publishing the same material in a journal – although often posters report either quite provisional or relatively inconsequential work so it is unusual for the text of a poster to be considered sufficiently rigorous and novel to justify appearing in a research journal in its original form. It is notable that despite being described by Prof. Mohialdin as a 'preliminary' study, the journal decided it was of publishable quality.

Although norms vary between fields, it is generally the case that a conference poster is seen as something quite different from a journal article. There is a limited amount of text and other material that can be included on a poster if it is to be readable. Conferences often have poster sessions where authors are invited to stand by their poster and engage with readers – so anyone interested can ask follow-up questions to supplement the often limited information given on the poster itself.

By contrast, a journal article has to stand on its own terms (as the authors cannot be expected to pop round for a conversation when you decide to read it). It is meant to present an argument for some new knowledge claim(s): an argument that depends on the details of the research conceptualisation, design, and data analysis. So what may seem as perfectly adequate in a poster may well not be sufficient to satisfy journal peer review.

The abstract of the conference poster was published in a journal (Mohialdin, Wainman & Shali, 2018) and I have reproduced that abstract in the table below, in the right hand column.


Mohialdin, 2018a
(Journal paper)
Mohialdin, Wainman & Shali, 2018
(Conference poster)
With the technological progress of different types of portable Ultrasound machines, there is a growing demand by all health care providers to perform bedside Ultrasonography, also known as Point of Care Ultrasound (POCUS). This technique is becoming extremely useful as part of the Clinical Skills/Anatomy teaching in the undergraduate Medical School Curriculum.With the technological progress of different types of portable Ultrasound machines, there is a growing demand by all health care providers to perform bedside Ultrasonography, also known as Point of Care Ultrasound (POCUS). This technique is becoming extremely useful as part of the Clinical Skills/Anatomy teaching in the undergraduate Medical School Curriculum.
Teaching/training health care providers how to use these portable Ultrasound machines can complement their physical examination findings and help in a more accurate diagnosis, which leads to a faster and better improvement in patient outcomes. In addition, using portable Ultrasound machines can add more safety measurements to every therapeutic/diagnostic procedure when it is done under an Ultrasound guide. It is also considered as an extra tool in teaching Clinical Anatomy to Medical students. Using an Ultrasound is one of the different imaging modalities that health care providers depend on to reach their diagnosis, while also being the least invasive method.Teaching/training health care providers how to use these portable Ultrasound machines can complement their physical examination findings and help in a more accurate diagnosis, which leads to a faster and better improvement in patient outcomes. In addition, using portable Ultrasound machines can add more safety measurements to every therapeutic/diagnostic procedure when it is done under an Ultrasound guide. It is also considered as an extra tool in teaching Clinical Anatomy to Medical students. Using an Ultrasound is one of the different imaging modalities that health care providers depend on to reach their diagnosis, while also being the least invasive method.
We thought investing in training the undergraduate Medical students on the basic Ultrasound scanning skills as part of their first year curriculum will help build up the foundation for their future career.We thought investing in training the undergraduate Medical students on the basic Ultrasound scanning skills as part of their first year curriculum will help build up the foundation for their future career.
The research we report in this manuscript is a preliminary qualitative study. And provides the template for future model for teaching a hand on Ultrasound for all health care providers in different learning institutions.
A questionnaire was handed to the first year medical students to evaluate their hands on ultrasound session experience. Answers were collected and data was [sic] analyzed into multiple graphs.
Abstracts from Mohialdin's paper, plus the abstract from co-authored work presented at the Experimental Biology 2018 Meeting according to the journal of the Federation of American Societies for Experimental Biology. (See note 4 for another version of the abstract.)

The abstract includes some very brief information about what the researchers did (which is strangely missing from the journal article's abstract). Journals usually put limits on the word count for abstracts. Surely the poster's abstract was not considered too long for the journal, so someone (the author? the editor?) simply dropped the final two paragraphs – that is, arguably the two most relevant paragraphs for readers?

The lost authors?

Not only had two paragraphs from the abstract gone missing, along with the figures, but the journal article had also lost two-thirds of its authors.

A poster with multiple authors

Now in the academic world authorship of research reports is not an arbitrary matter (Taber, 2018). An author is someone who has made a substantial intellectual contribution to the work (regardless of how much of the writing-up they undertake, or whether they are present when work is presented at a conference). That is a simple principle, which unfortunately may lead to disputes as it needs to be interpreted when applied; but, in most academic fields, there are conventions regarding what kind of contribution is judged significant and substantive enough for authorship

It may well be that Prof. Mohialdin was the principal investigator on this study and that the contributions of Prof. Wainman and Prof. Shali were more marginal, and so it was not obvious whether or not they should be considered authors when reporting the study. But it is less easy to see how they qualified for authorship on the poster but not on the journal article with the same title which seems (?) to be the text of the poster (i.e., describes itself as being the poster). [It is even more difficult to see how they could be authors of the poster when it was presented at one conference, but not when it was presented somewhere else. 4]

Of course, one trivial suggestion might be to suggest that Wainman and Shali contributed the final two paragraphs of the abstract, and the graphs, and that without these the – thus reduced – version in the journal only deserved one author according to the normal academic authorship conventions. That is clearly not an acceptable rationale as academic studies have to be understood more holistically than that!

Perhaps Wainman and Shali asked to have their names left off the paper as they did not want to be published in a journal of chemistry that would publish a provisional and incomplete account of a medical education practice study classified as a review article. Maybe they suspected that this would hardly enhance their scholarly reputations?

Work cited:
  • Feilchenfeld, Z., Dornan, T., Whitehead, C., & Kuper, A. (2017). Ultrasound in undergraduate medical education: a systematic and critical review. Medical Education. 51: 366-378. doi: 10.1111/medu.13211
  • Mohialdin, V. (2018a) An overview of the first year Undergraduate Medical Students Feedback on the Point of Care Ultrasound Curriculum. Journal of Chemistry: Education Research and Practice, 2 (2), 1-2.
  • Mohialdin, V. (2018b). An overview of the first year undergraduate medical students feedback on the point of care ultrasound curriculum. Journal of Health Education Research & Development, 6, 30.
  • Mohialdin, V., Wainman, B. & Shali, A. (2018) An overview of the first year Undergraduate Medical Students Feedback on the Point of Care Ultrasound Curriculum. The FASIB Journal. 32 (S1: Experimental Biology 2018 Meeting Abstracts), 636.4
  • Taber, K. S. (2013). Classroom-based Research and Evidence-based Practice: An introduction (2nd ed.). London: Sage.
  • Taber, K. S. (2018). Assigning Credit and Ensuring Accountability. In P. A. Mabrouk & J. N. Currano (Eds.), Credit Where Credit Is Due: Respecting Authorship and Intellectual Property (Vol. 1291, pp. 3-33). Washington, D.C.: American Chemical Society. [The publisher appears to have made this open access]

Footnotes:

1 The following section appears as part of the instructions for authors:

"Article Types

Journal of Chemistry: Education Research and Practice accepts Original Articles, Review, Mini Review, Case Reports, Editorial, and Letter to the Editor, Commentary, Rapid Communications and Perspectives, Case in Images, Clinical Images, and Conference Proceedings.

In general the Manuscripts are classified in to following [sic] groups based on the criteria noted below [I could not find these]. The author(s) are encouraged to request a particular classification upon submitting (please include this in the cover letter); however the Editor and the Associate Editor retain the right to classify the manuscript as they see fit, and it should be understood by the authors that this process is subjective to some degree. The chosen classification will appear in the printed manuscript above the manuscript title."

https://opastonline.com/journal/journal-of-chemistry-education-research-and-practice/author-guidelines

2 The ethical concerns in this kind of research are minimal, and in an area like medical education one might feel there is a moral imperative for future professionals to engage in activities to innovate and to evaluate such innovations. However, there is a general principle that all participants in research should give voluntary, informed consent.

(Read about Research Ethics here).

According to the policy statement on the author's (/authors'?) University's website (Research involving human participants, Sept. 2002) at the time of this posting (November, 2021) McMaster University "endorses the ethical principles cited in the Tri-Council Policy Statement: Ethical Conduct for Research Involving Humans (1998)".

According to Article 2.1 of that document, Research Ethics Board Review is required for any research involving "living human participants". There are some exemptions, including (Article 2.5): "Quality assurance and quality improvement studies, program evaluation activities, and performance reviews, or testing within normal educational requirements when used exclusively for assessment, management or improvement purposes" (my emphasis).

My reading then is that this work would not have been subject to requiring approval following formal ethical review if it had been exclusively used for internal purposes, but that publication of the work as research means it should have been subject to Research Ethics Board Review before being carried out. This is certainly in line with advice to teachers who invite their own students to participate in research into their teaching that may be reported later (in a thesis, at a conference, etc.) (Taber, 2013, pp.244-248).


3 Some days ago, I wrote to the Journal of Chemistry: Education Research and Practice (in reply to an invitation to publish in the journal), with a copy of the email direct to the editor, asking where I could find the graphs referred to in this paper, but have not yet had a response. If I do get a reply I will report this in the comments below.


4 Since drafting this post, I have found another publication with the same title published in an issue of another journal reporting conference proceedings (Mohialdin, 2018b):

A third version of the publication (Mohialdin, 2018b).

The piece begins with the same material as in the table above. It ends with the following account of empirical work:

A questionnaire was handed to the first year undergraduate medical students at the end of session four, to evaluate their hands on ultrasound session experience. Answers were collected and data was [sic] analyzed into multiple graphs. The comments that we [sic] got from this survey were mainly positive; here are a few of the constructive comments that we [sic] received: This was a great learning experience; it was a great learning opportunity; very useful, leaned [sic] a lot; and loved the hand on experience.

Mohialdin, 2018b, p.30

There is nothing wrong with the same poster being presented at multiple conferences and this is quite a common academic strategy. Mohialdin (2018b) reports from a conference in Japan, whereas Mohialdin, Wainman, Shali (2018) refers to a US meeting – but it is not clear why the author list is different as the two presentations would seem to report the same research – indeed, it seems reasonable to assume from the commonality of Mohialdin, 2018b) with Mohialdin, Wainman, Shali, 2018 that they are the same report (poster).

Profs. Wainman and Shali should be authors of any report of this study if, and only if, they made substantial intellectual contributions to the work reported – and, surely, either they did, or they did not.

Not motivating a research hypothesis

A 100% survey return that represents 73% (or 70%, or perhaps 48%) of the population

Keith S. Taber

…the study seems to have looked for a lack of significant difference regarding a variable which was not thought to have any relevance…

This is like hypothesising…that the amount of alkali needed to neutralise a certain amount of acid will not depend on the eye colour of the researcher; experimentally confirming this is the case; and then seeking to publish the results as a new contribution to knowledge.

…as if a newspaper headline was 'Earthquake latest' and then the related news story was simply that, as usual, no earthquakes had been reported.

Structuring a research report

A research report tends to have a particular kind of structure. The first section sets out background to the study to be described. Authors offer an account of the current state of the relevant field – what can be called a conceptual framework.

In the natural sciences it may be that in some specialised fields there is a common, accepted way of understanding that field (e.g., the nature of important entities, the relevant variables to focus on). This has been described as working within an established scientific 'paradigm'. 1 However, social phenomena (such as classroom teaching) may be of such complexity that a full account requires exploration at multiple levels, with a range of analytical foci (Taber, 2008). 2 Therefore the report may indicate which particular theoretical perspective (e.g., personal constructivism, activity theory, Gestalt psychology, etc.) has informed the study.

This usually leads to one or more research questions, or even specific hypotheses, that are seen to be motivated by the state of the field as reflected in the authors' conceptual framework.

Next, the research design is explained: the choice of methodology (overall research strategy), the population being studied and how it was sampled, the methods of data collection and development of instruments, and choice of analytical techniques.

All of this is usually expected before any discussion (leaving aside a short statement as part of the abstract) of the data collected, results of analysis, conclusions and implications of the study for further research or practice.

There is a logic to designing research. (Image after Taber, 2014).

A predatory journal

I have been reading some papers in a journal that I believed, on the basis of its misleading title and website details, was an example of a poor-quality 'predatory journal'. That is, a journal which encourages submissions simply to be able to charge a publication fee (currently $1519, according to the website), without doing the proper job of editorial scrutiny. I wanted to test this initial evaluation by looking at the quality of some of the work published.

Although the journal is called the Journal of Chemistry: Education Research and Practice (not to be confused, even if the publishers would like it to be, with the well-established journal Chemistry Education Research and Practice) only a few of the papers published are actually education studies. One of the articles that IS on an educational topic is called 'Students' Perception of Chemistry Teachers' Characteristics of Interest, Attitude and Subject Mastery in the Teaching of Chemistry in Senior Secondary Schools' (Igwe, 2017).

A research article

The work of a genuine academic journal

A key problem with predatory journals is that because their focus is on generating income they do not provide the service to the community expected of genuine research journals (which inevitably involves rejecting submissions, and delaying publication till work is up to standard). In particular, the research journal acts as a gatekeeper to ensure nonsense or seriously flawed work is not published as science. It does this in two ways.

Discriminating between high quality and poor quality studies

Work that is clearly not up to standard (as judged by experts in the field) is rejected. One might think that in an ideal world no one is going to send work that has no merit to a research journal. In reality we cannot expect authors to always be able to take a balanced and critical view of their own work, even if we would like to think that research training should help them develop this capacity.

This assumes researchers are trained, of course. Many people carrying out educational research in science teaching contexts are only trained as natural scientists – and those trained as researchers in natural science often approach the social sciences with significant biases and blind-spots when carrying out research with people. (Watch or read 'Why do natural scientists tend to make poor social scientists?')

Also, anyone can submit work to a research journal – be they genius, expert, amateur, or 'crank'. Work is meant to be judged on its merits, not by the reputation or qualifications of the author.

De-bugging research reports – helping authors improve their work

The other important function of journal review is to identify weaknesses and errors and gaps in reports of work that may have merit, but where these limitations make the report unsuitable for publication as submitted. Expert reviewers will highlight these issues, and editors will ensure authors respond to the issues raised before possible publication. This process relies on fallible humans, and in the case of reviewers usually unpaid volunteers, but is seen as important for quality control – even if it not a perfect system. 3

This improvement process is a 'win' all round:

  • the quality of what is published is assured so that (at least most) published studies make a meaningful contribution to knowledge;
  • the journal is seen in a good light because of the quality of the research it publishes; and
  • the authors can be genuinely proud of their publications which can bring them prestige and potentially have impact.

If a predatory journal which claims (i) to have academic editors making decisions and (ii) to use peer review does not rigorously follow proper processes, and so publishes (a) nonsense as scholarship, and (b) work with major problems, then it lets down the community and the authors – if not those making money from the deceit.

The editor took just over a fortnight to arrange any peer review, and come to a decision that the research report was ready for publication

Students' perceptions of chemistry teachers' characteristics

There is much of merit in this particular research study. Dr Iheanyi O. Igwe explains why there might be a concern about the quality of chemistry teaching in the research context, and draws upon a range of prior literature. Information about the population (the public secondary schools II chemistry students in Abakaliki Education Zone of Ebonyi State) and the sample is provided – including how the sample, of 300 students at 10 schools, was selected.

There is however an unfortunate error in characterising the population:

"the chemistry students' population in the zone was four hundred and ten (431)"

Igwe, 2017, p.8

This seems to be a simple typographic error, but the reader cannot be sure if this should read

  • "…four hundred and ten (410)" or
  • "…four hundred and thirty one (431)".

Or perhaps neither, as the abstract tells readers

"From a total population of six hundred and thirty (630) senior secondary II students, a sample of three hundred (300) students was used for the study selected by stratified random sampling technique."

Igwe, 2017, abstract

Whether the sample is 300/410 or 300/431 or even 300/630 does not fundamentally change the study, but one does wonder how these inconsistencies were not spotted by the editor, or a peer reviewer, or someone in the production department. (At least, one might wonder about this if one had not seen much more serious failures to spot errors in this journal.) A reader could wonder whether the presence of such obvious errors may indicate a lack of care that might suggest the possibility of other errors that a reader is not in a position to spot. (For example, if questionnaire responses had not been tallied correctly in compiling results, then this would not be apparent to anyone who did not have access to the raw data to repeat the analysis.) The author seems to have been let down here.

A multi-scale instrument

The final questionnaire contained 5 items on each of three scales

  • students' perception of teachers' interest in the teaching of chemistry;
  • students' perception of teachers' attitude towards the teaching of chemistry;
  • students' perception of teachers' mastery of the subject in the teaching of chemistry

Igwe informs readers that,

"the final instrument was tested for reliability for internal consistency through the Cronbach Alpha statistic. The reliability index for the questionnaire was obtained as 0.88 which showed that the instrument was of high internal consistency and therefore reliable and could be used for the study"

Igwe, 2017, p.4

This statistic is actually not very useful information as one would want to know about the internal consistency within the scales – an overall value across scales is not informative (conceptually, it is not clear how it should be interpreted – perhaps that the three scales are largely eliciting much the same underlying factor? ) (Taber, 2018). 4

There are times when aggregate information is not very informative (Image by Syaibatul Hamdi from Pixabay )

Again, one might have hoped that expert reviewers would have asked the author to quote the separate alpha values for the three scales as it is these which are actually informative.

The paper also offers a detailed account of the analysis of the data, and an in-depth discussion of the findings and potential implications. This is a serious study that clearly reflects a lot of work by the researcher. (We might hope that could be taken for granted when discussing work published in a 'research journal', but sadly that is not so in some predatory journals.) There are limitations of course. All research has to stop somewhere, and resources and, in particular, access opportunities are often very limited. One of these limitations is the wider relevance of the population sampled.

But do the results apply in Belo Horizonte?

This is the generalisation issue. The study concerns the situation in one administrative zone within a relatively small state in South East Nigeria. How do we know it has anything useful to tell us about elsewhere in Nigeria, let alone about the situation in Mexico or Vietnam or Estonia? Even within Ebonyi State, the Abakaliki Education Zone (that is, the area of the state capital) may well be atypical – perhaps the best qualified and most enthusiastic teachers tend to work in the capital? Perhaps there would have been different findings in a more rural area?

Yet this is a limitation that applies to a good deal of educational research. This goes back to the complexity of educational phenomena. What you find out about an electron or an oxidising agent studied in Abakaliki should apply in Cambridge, Cambridgeshire or equally in Cambridge, Massachusetts. That cannot be claimed about what you may find out about a teacher in Abakaliki, or a student, a class, a school, a University

Misleading study titles?

Educational research studies often have strictly misleading titles – or at least promise a lot more than they deliver. This may in part be authors making unwarranted assumptions, or it may be journal editors wanting to avoid unwieldy titles.

"This situation has inadvertently led to production of half backed graduate Chemistry educators."

Igwe, 2017, p.2

The title of this study does suggest that the study concerns perceptions of Chemistry Teachers' Characteristics …in Senior Secondary Schools, when we cannot assume that chemistry teachers in the Abakaliki Education Zone of Ebonyi State can stand for chemistry teachers more widely. Indeed some of the issues raised as motivating the need for the study are clearly not issues that would apply in all other educational contexts – that is the 'situation', which is said to be responsible for the "production of half backed [half-baked?] graduate Chemistry educators" in Nigeria, will not apply everywhere. Whilst the title could be read as promising more general findings than were possible in the study, Igwe's abstract is quite explicit about the specific population sampled.

A limited focus?

Another obvious limitation is that whilst pupils' perceptions of their teachers are very important, it does not offer a full picture. Pupils may feel the need to give positive reviews, or may have idealistic conceptions. Indeed, assuming that voluntary, informed consent was given (which would mean that students knew they could decline to take part in the research without fear of sanctions) it is of note that every one of the 30 students targeted in each of the ten schools agreed to complete the survey,

"The 300 copies of the instrument were distributed to the respondents who completed them for retrieval on the spot to avoid loss and may be some element of bias from the respondents. The administration and collection were done by the researcher and five trained research assistants. Maximum return was made of the instrument."

Igwe, 2017, p.4

To get a 100% return on a survey is pretty rare, and if normal ethical procedures were followed (with the voluntary nature of the activity made clear) then this suggests these students were highly motivated to appease adults working in the education system.

But we might ask how student perceptions of teacher characteristics actually relate to teacher characteristics?

For example, observations of the chemistry classes taught by these teachers could possibly give a very different impression of those teachers than that offered by the student ratings in the survey. (Another chemistry teacher may well be able to distinguish teacher confidence or bravado from subject mastery when a learner is not well placed to do so.) Teacher self-reports could also offer a different account of their 'Interest, Attitude and Subject Mastery', as could evaluations by their school managers. Arguably, a study that collected data from multiple sources would offer the possibility of 'triangulating' between sources.

However, Igwe, is explicit about the limited focus of the study, and other complementary strands of research could be carried out to follow-up on the study. So, although the specific choice of focus is a limitation, this does not negate the potential value of the study.

Research questions

Although I recognise a serious and well-motivated study, there is one aspect of Igwe's study which seemed rather bizarre. The study has three research questions (which are well-reflected in the title of the study) and a hypothesis which I suspect will likely surprise some readers.

That is not a good thing. At least, I always taught research students that unlike in a thriller or 'who done it?' story, where a surprise may engage and amuse a reader, a research report or thesis is best written to avoid such surprises. The research report is an argument that needs to flow though the account – if a reader is surprised at something the researcher reports doing then the author has probably forgotten to properly introduce or explain something earlier in the report.

Here are the research questions and hypotheses:

"Research Questions

The following research questions guided the study, thus:

How do students perceive teachers' interest in the teaching of chemistry?

How do students perceive teachers' attitude towards the teaching of chemistry?

How do students perceive teachers' mastery of the subjects in the teaching of chemistry?

Hypotheses
The following null hypothesis was tested at 0.05 alpha levels, thus:
HO1 There is no significant difference in the mean ratings of male and female students on their perception of chemistry teachers' characteristics in the teaching of chemistry."

Igwe, 2017, p.3

A surprising hypothesis?

A hypothesis – now where did that come from?

Now, I am certainly not criticising a researcher for looking for gender differences in research. (That would be hypocritical as I looked for such differences in my own M.Sc. thesis, and published on gender differences in teacher-student interactions in physics classes, gender differences in students' interests in different science topics on stating secondary school, and links between pupil perceptions of (i) science-relatedness and (ii) gender-appropriateness of careers.)

There might often be good reasons in studies to look for gender differences. But these reasons should be stated up-front. As part of the conceptual framework motivating the study, researchers should explain that based on their informal observations, or on anecdotal evidence, or (better) drawing upon explicit theoretical considerations, or that informed by the findings of other related studies – or whatever reason there might – there are good reasons to check for gender differences.

The flow of research (Underlying image from Taber, 2013) The arrows can be read as 'inform(s)'.

Perhaps Igwe had such reasons, but there seems to be no mention of 'gender' as a relevant variable prior to the presentation of the hypothesis: not even a concerning dream, or signs in the patterns of tea leaves. 5 To some extent, this is reinforced by the choice of the null hypothesis – that no such difference will be found. Even if it makes no substantive difference to a study whether a hypothesis is framed in terms of there being a difference or not, psychologically the study seems to have looked for a lack of significant difference regarding a variable which was not thought to have any relevance.

Misuse of statistics

It is important for researchers not to test for effects that are not motivated in their studies. Statistical significance tells a researcher something is unlikely to happen just by chance – but it still might. Just as someone buying a lottery ticket is unlikely to win the lottery – but they might. Logically a small proportion of all the positive statistical results in the literature are 'false positives' because unlikely things do happen by chance – just not that often. 6 The researcher should not (metaphorically!) go round buying up lots of lottery tickets, and then seeing an occasional win as something more than chance.

No alarms and no surprises

And what was found?

"From the result of analysis … the null hypothesis is accepted which means that there is no significant difference in the mean ratings of male and female students in their perception of chemistry teachers' characteristics (interest, attitude and subject mastery) in the teaching of chemistry."

Igwe, 2017, p.6

This is like hypothesising, without any motivation, that the amount of alkali needed to neutralise a certain amount of acid will not depend on the eye colour of the researcher; experimentally confirming this is the case; and then seeking to publish the results as a new contribution to knowledge.

Why did Igwe look for gender difference (or more strictly, look for no gender difference)?

  • A genuine relevant motivation missing from the paper?
  • An imperative to test for something (anything)?
  • Advice that journals are more likely to publish studies using statistical testing?
  • Noticing that a lot of studies do test for gender differences (whether there seems a good reason to do so or not)?

This seems to be an obvious point for peer reviewers and the editor to raise: asking the author to either (a) explain why it makes sense to test for gender differences in this study – or (b) to drop the hypothesis from the paper. It seems they did not notice this, and readers are simply left to wonder – just as you would if a newspaper headline was 'Earthquake latest' and then the related news story was simply that, as usual, no earthquakes had been reported.

Work cited:


Footnotes:

1 The term paradigm became widely used in this sense after Kuhn's (1970) work although he later acknowledged criticisms of the ambiguous way he used the term, in particular as learning about a field through working through standard examples, paradigms, and the wider set of shared norms and values that develop in an established field which he later termed 'disciplinary matrix'. In psychology research 'paradigm' may be used in the more specific sense of an established research design/protocol.


2 There are at least three ways of explaining why a lot of research in the social science seems more chaotic and less structured to outsiders than most research in the natural sciences.

  • a) Ontology. Perhaps the things studied in the natural sciences really exist, and some of those in the social sciences are epiphenomena and do not reflect fundamental, 'real', things. There may be some of that sometimes, but if so I think it is a matter of degree (that is, scientists have not been beyond studying the ether or phlogiston), because of the third option (c).
  • b) The social sciences are not as mature as many areas of the natural sciences and so are sill 'pre-paradigmatic'. I am sure there is sometimes an element of this: any new field will take time to focus in on reliable and productive ways of making sense of its domain.
  • c) The complexity of the phenomena. Social phenomena are inherently more complex, often involving feedback loops between participants' behaviours and feelings and beliefs (including about the research, the researcher, etc.)

Whilst (a) and (b) may sometimes be pertinent, I think (c) is often especially relevant to this question.


3 An alternative approach that has gained some credence is to allow authors to publish, but then invite reader reviews which will also be published – and so allowing a public conversation to develop so readers can see the original work, criticism, responses to those criticisms, and so forth, and make their own judgements. To date this has only become common practice in a few fields.

Another approach for empirical work is for authors to submit research designs to journals for peer review – once a design has been accepted by the journal, the journal agrees to publish the resulting study as long as the agreed protocol has been followed. (This is seen as helping to avoid the distorting bias in the literature towards 'positive' results as studies with 'negative' results may seem less interesting and so less likely to be accepted in prestige journals.) Again, this is not the norm (yet) in most fields.


4 The statistic has a maximum value of 1, which would indicate that the items were all equivalent, so 0.88 seems a high value, till we note that a high value of alpha is a common artefact of including a large number of items.

However, playing Devil's advocate, I might suggest that the high overall value of alpha could suggest that the three scales

  • students' perception of teachers' interest in the teaching of chemistry;
  • students' perception of teachers' attitude towards the teaching of chemistry;
  • students' perception of teachers' mastery of the subject in the teaching of chemistry

are all tapping into a single underlying factor that might be something like

  • my view of whether my chemistry teacher is a good teacher

or even

  • how much I like my chemistry teacher

5 Actually the discrimination made is between male and female students – it is not clear what question students were asked to determine 'gender', and whether other response options were available, or whether students could decline to respond to this item.


6 Our intuition might be that only a small proportion of reported positive results are false positives, because, of course, positive results reflect things unlikely to happen by chance. However if, as is widely believed in many fields, there is a bias to reporting positive results, this can distort the picture.

Imagine someone looking for factors that influence classroom learning. Consider that 50 variables are identified to test, such as teacher eye colour, classroom wall colour, type of classroom window frames, what the teacher has for breakfast, the day of the week that the teacher was born, the number of letters in the teacher's forename, the gender of the student who sits nearest the fire extinguisher, and various other variables which are not theoretically motivated to be considered likely to have an effect. With a confidence level of p[robability] ≤ 0.05 it is likely that there will be a very small number of positive findings JUST BY CHANCE. That is, if you look across enough unlikely events, it is likely some of them will happen. There is unlikely to be a thunderstorm on any particular day. Yet there will likely be a thunderstorm some day in the next year. If a report is written and published which ONLY discusses a positive finding then the true statistical context is missing, and a likely situation is presented as unlikely to be due to chance.


Reviewing initial teacher education

Some responses to the "Initial teacher training market review"

A 'market' review

Image by Pexels from Pixabay

The UK Government's Department for Education (responsible for the school system in England) is currently undertaking what it called a 'market review' of initial teacher education (ITE) or initial teacher 'training' as it prefers to describe ite. (Arguably, 'education' suggests broad professional preparation for someone who will need to make informed decisions in complex situations, whereas 'training' implies learning the skills needed for a craft.)

The aims of the review are certainly unobjectionable:

The review has aimed to make well informed, evidence-based recommendations on how to make sure:

• all trainees receive high-quality training
• the ITT market maintains the capacity to deliver enough trainees and is accessible to candidates
• the ITT system benefits all schools1

https://www.gov.uk/government/publications/initial-teacher-training-itt-market-review/initial-teacher-training-itt-market-review-overview

Despite such intentions clearly being laudable, the actual proposals (which, inter alia, can be seen as looking to further increase central government control over the professional preparation of teachers) raised concerns among many of those actually involved in teacher education.

The consultation

There was a public consultation to which all interested were invited to respond. Since the consultation closed, the Secretary of State (i.e., senior minister) for Education has changed, so we await to see whether this will derail the review.

The review is wide ranging, but there is a widespread view that once again government is seeking to reduce the influence of academic education experts (see for example, 'Valuing the gold standard in teacher education'), and my colleagues still working in university-school based ITE partnerships certainly felt that if all the proposals were brought to fruition such partnership would be at risk. Not that Universities would not be able to contribute, but they would not be able to do so in a way that allowed full quality control and proper planning and sustainable commitment.

My own University, Cambridge, has suggested

We cannot, in all conscience, envisage our continuing involvement with ITT should the proposals be implemented in their current format.

Government ITT market review consultation, Faculty of Education website

Some discussion on one teachers' email list I subscribe to, provoked me me decide to look back at my own consultation responses.

A selective response – and a generic default hole-filler

I have not worked in I.T.E. for some years, and so did not feel qualified to comment on all aspects of the review. However, there were some aspects of the plans (or at least my interpretation of what  was intended) that I felt would put at risk some of the strongest and most important aspects of high quality teacher preparation.

As being able to submit a response to the consultation required providing a response at every section (a cynic might suggest that expecting full completion of such a long consultation document is a disincentive for most people to contribute), I used a generic statement to cover those questions where I  did not feel I had anything informed and useful to say:

I am aware of concerns raised in responses by the Russell group of Universities, the University of Cambridge (of which I am an emeritus officer), and Homerton College, Cambridge (of which I am a senior member). I concur with these concerns, and rather than seek to reproduce or mirror all of their comments (already available to you), I refer you to those responses. Further, I am offering some specific comments on particular issues where I have strong concerns based on my past experiences as a PGCE student teacher; as a teacher in comprehensive secondary schools; as a school/college-based mentor supporting graduates preparing for teaching in schools and also in a further education context; as a researcher exploring aspects of student learning and the teaching that supports it; as a lecturer and supervisor on initial teacher education courses as part of University-School training partnerships; as a supervisor for teachers in post undertaking school-based research; as an advisor to schools undertaking context-directed research; and as a lecturer teaching graduates how to undertake research into educational contexts.

Here are my more specific responses highlighting my particular concerns:

Individual differences

Having worked in initial teacher education as well as having been a school teacher, I am well aware that one of the most important things anyone working in the education sector has to appreciate is individual differences – between pupils, between teachers, between classes, between schools, and between new-entrants. Too much focus on uniformity is therefore unwelcome and likely to reduce the quality of the highest provision which takes this into diversity into account, Similarly, genuinely 'rigorous' sequencing of the educational experience will be responsive to individual needs and that would be welcome. However, uniform and inflexible sequencing, which would be far from rigorous, would be damaging.

Being equipped to engage with research

I am aware that the diversity in routes for new entrants now available has reduced the quality of training experience available to some new teachers. In particular, the fully professional teacher has to be a critical reader of research, and to have the tools and confidence to undertake their own small-scale context based enquiry to develop their own practice.

Table 1 from Taber, 2010

This is essential because the research shows clearly that whilst it is sometimes possible to identify some features of best practice that generalise across most teaching contexts, this is by no means always the case. Teaching and learning are highly complex phenomena and are strongly influenced by contextual factors. So, what has been found to 'normally' work best will not be the best approach in all teaching contexts. Teachers needs to be able to read research claims critically

(there are always provisos

  • most studies are small-scale where strict generalisation is simply not possible,
  • few studies are sufficiently supported with the resources to test ideas across a wide range of contexts; and
  • experimental studies which are the gold standard in the natural sciences are usually problematic in education
    • as randomisation {a critical aspect of true experimental research} is seldom possible, and
    • there is seldom the information or means to characterise populations sufficiently to build representative samples;
    • moreover the complexity of educational contexts does not allow the identification (let alone control) of all relevant variable, and
    • there are some key known factors which influence results when double-blind methods are not viable
      • – a situation that is very common when testing innovations in educational practices as teachers and learners are usually well aware of deviations from normal practice)

and identify the most promising recommendations when taking into account their own teaching context (i.e., what is referred to as reader or naturalistic generalisation) and test out ideas in their own classrooms, and iteratively develop their own practice.

Sadly, whilst the M-level PGCE type programmes usually support new teachers in introducing these skills, this does not seem to necessarily be the case on some other routes.

On 'intensive' practice placements

I consider this is a misguided notion based on a flawed conceptualisation of teaching and teacher skills. It is certainly the case that generally speaking teachers develop their skills over time with greater teaching experience, and that all other things being equal, the more direct teaching experience a new entrant has during the period of initial teacher education the better, as long as this is productive experience.

However, teaching is a highly complex activity that requires making myriad in the moment decisions in response to interactions with unique classes of unique people. The quality of those decisions tends to increase over time with experience, but only if the teacher is well prepared for the teaching in terms of subject knowledge, general and specialist pedagogic knowledge, and knowledge of the particular learners.

This requires that the teacher has extensive preparation time especially when new to teaching a topic, age, group or pedagogic approach, and opportunities for productive debrief and reflection. Given the intensity of teaching as an experience, it is much better for new entrants to initially focus on parts of lessons with plenty of opportunity for preparation and reflection than to too quickly progress to whole lessons where much of the experience will not be fully processed before moving on. Similarly, it is better that new teachers have sufficient time between classes to focus intensely on those classes rather than be moving directly from class to class.

In the same way, the programmes that allow regular movements between the teaching context and an HEI or similar context offer an ideal context for effective learning. The intense focus on the school is broken up by time in faculty (still focused, but as a student without the intense scrutiny in school), where there are extensive opportunities for peer support (especially important given the extreme highs and lows often experienced by new teachers).

Partnerships of Universities with Schools offer new entrants complementary expertise, and opportunities for 'iteration' – moving between the 'graduate student' and 'teaching department member' contexts 2 (Figure 1 from Taber, 2017)

This is also critical for developing teaching that is informed by research-informed and evidence-based theories and constructs. Being taught 'theory' in an academic context, and expecting such content to be automatically applied in a teaching context is unrealistic – rather the new teacher has to learn to conceptualise actual classroom experience in terms of the theory, and to see how to apply the theory in terms of actual teaching experience. 2

This learning is best supported by an iterative process – where there are plenty of opportunities to reflect on and analyse experience, and compare and discuss experiences with peers, as well as with mentors, other experienced teachers, and with academic staff. Over time, as new teachers build experiences, especially ones they recognise as productive and successful, they will come to automatically apply ideas and skills and techniques, and will be able to 'chunk' component teaching moves into longer sequences – being able to work effectively for sequences of whole classes, with less reflection time, and less explicit support. 3

The aim is for the new teachers to be able to prepare, teach, assess, on something approaching a teaching timetable whilst working in school full-time. However, efforts to move to such a state too quickly will [be counter-productive] for many potentially excellent teachers, and will likely increase drop-out rates.

Ultimately, the quality of the teaching experience, and the ability to manage increasing workload according to individual needs, is what is important. Any attempts to increase the intensity of the teaching placements, or to accelerate the rate at which new teachers take on responsibility without recourse to individual circumstances is likely to be counterproductive in terms of retention, and the quality of the 'training' experience in supporting the development of excellent teachers.

I am very pleased that I would not be 'training' nor still working in teacher education under such expectations as I think the incidents of crises, mental health issues, and drop-out, would be likely to increase.

On common timetables for progress

As suggested above, any attempt to tightly quantify these things would be misplaced as it removes the ability of providers to manage the process to seek the best outcomes for individual trainees, and it ignores the responsibilities of teachers and schools to ensure that trainees are only given responsibilities as and when they are ready.

Please remember that every class taught by a trainee contains children or young people who are required to be in school and are entitled to be taught by someone who

  • is prepared for class,
  • confident they are ready to teach that class, and
  • is not under such intense stress that they cannot perform to their potential.

You have a responsibility to consider the pupils as well as to your 'market'.

On applying research evidence

A postgraduate award is meant to include a strong research component. As suggested in earlier comments, it is essential for the fully professional teacher who will need to make informed decisions about her own classroom practice to be be provided with the skills to access research (including understanding strengths and weaknesses of methodology), critique it, evaluate its potential relevance to the immediate teaching and learning contexts, and to evaluate it in the context. Many PGCE-MEd and PGCE-MA programmes already support this.

I totally agree that this should be provided to all new trainees, and would have thought there are enough HEIs with expertise in educational research for this to be possible (as it is on the PGCE-M route already). However, it is not enough to simply provide teachers the skills, they also have to have

  • access to research publications,
  • time to
    • read them and
    • undertake small-scale context-directed enquiry, and
    • to give them the confidence that this aspect of professional practice is recognised and appreciated.

For example, a teacher has to know that if they are doing something differently to some government advice because they have looked at the research, considered it in relation to their specific context, and evaluated approaches in their own teaching context and concluded that for a particular class/course/students some approach other than that generally recommended is indicated, THEN this would be recognised (e.g., in Inspections) as praiseworthy.

On 'incentives that could encourage schools and trusts to participate in ITT'

I would think it is dangerous and perhaps foolish to add to schools' expected responsibilities where they do not welcome this.

On proposed reforms on the recruitment and selection process

To me, this seems to complicate matters for a PGCE applicant who at the moment has to only select a university-schools partnership.

Potential equality impacts

As discussed above, in my experience current arrangements, at least for the PGCE route, offer flexibility to meet the individual needs of a range of new entrants. My sense is the proposals would be unhelpful in this regard.

Comments on 'any aspect'

I was lucky enough to undertake my PGCE at a university that at the time was recognised as one with excellent provision in my teaching subjects (chemistry and physics, at Nottingham Trent). At that time the structure of the teaching placement (two isolated blocks, one of 4 weeks, one of 8 weeks) did not allow the kind of incredibly valuable iterative experience of moving between the university and school contexts I discuss above, and the teachers in the schools did not act as mentors, but merely handed over their classes for a period of time.

Otherwise I was very happy with my 'training' experience.

I was also privileged to work for about 10 years in initial teacher education in a PGCE university-schools partnership that has consistently been awarded the very top inspection grades across categories. I have therefore seen much excellent initial teacher education practice in a stable partnership with many committed (if diverse) schools. We were also able to be pretty selective in recruitment, so were working with incredibly keen and committed new teachers.

If (some) university-schools partnerships (such as that based at the University of Cambridge) are recognised as excellent, why change the system in ways that threaten those providers?

Despite this, I know some of our excellent new recruits went through serious periods of doubt and crises in their teaching due to the intense and highly skilled nature of the work. In the context where I was lucky enough to work, the structure of the training year and the responsive and interactive nature of managing the graduates in their work meant that nearly always these setbacks were temporary, and so could be overcome.

I am concerned that some of this good practice may not continue if some of the proposals in the review are carried through – and that consequently a significant number of potentially excellent new teachers will not get the support they need to develop at the pace that best matches their needs. This will lead to drop-out, and early burn-out – or potentially candidates doing enough to cope, without meeting the high standards they wish to set for themselves to the benefit of their pupils.

Keith S. Taber

1 It strikes me that the third bullet point might seem a little superfluous – after all, surely a system of initial teacher education that both maintains the supply of new teachers at the level needed (which in some subjects would be a definite improvement on the existing system) and ensures they all receive high quality preparation should inherently benefit all schools by making sure there was always a pool of suitably qualified and well-prepared teachers to fill teaching vacancies across the school curriculum.

Perhaps, however, this means something else – such as (in view of the reference to 'incentives that could encourage schools and trusts to participate in ITT' in the consultation) making sure all schools receive funding for contributing to the preparation of new teachers (by making sure all schools make a substantial contribution to the preparation of new teachers).

2 It strikes me that the way in which teachers in preparation are able to move back and forth between a study context and a practitioner context, giving opportunities to apply learning in practice, and to 'stand back' and reflect on and conceptualise that practice, reflects the way science proceeds – where theory motivates new practical investigations, and experience of undertaking the empirical enquiry informs new theoretical refinements and insights (which then…).

3 That is, the pedagogic principles which teachers are expected to apply when working with their students are, in general terms, just as relevant in their own professional education.

Work cited:

What COVID really likes

Researching viral preferences

Keith S. Taber

When I was listening to the radio news I heard a clip of the Rt. Hon. Sajid Javid MP, the U.K. Secretary of State for Health and Social Care, talking about the ongoing response to the COVID pandemic:

Health Secretary Sajid Javid talking on 12th September

"Now that we are entering Autumn and Winter, something that COVID and other viruses, you know, usually like, the prime minister this week will be getting out our plans to manage COVID over the coming few months."

Sajid Javid

So, COVID and other viruses usually like Autumn and Winter (by implication, presumably, in comparison with Spring and Summer).

This got me wondering how we (or Sajid, at least) could know what the COVID virus (i.e., SARS-CoV-2 – severe acute respiratory syndrome coronavirus 2) prefers – what the virus 'likes'. I noticed that Mr Javid offered a modal qualification to his claim: usually. It seemed 'COVID and other viruses' did not always like Autumn and Winter, but usually did.

Yet there was a potential ambiguity here depending how one parsed the claim. Was he suggesting that

[COVID and other viruses]

usually

like Autumn and Winter
orCOVID

[and other viruses usually]

like Autumn and Winter

This might have been clearer in a written text as either

COVID and other viruses usually like Autumn and WinterorCOVID, and other viruses usually, like Autumn and Winter

The second option may seem a little awkward in its phrasing, 1 but then not all viral diseases are more common in the Winter months, and some are considered to be due to 'Summer viruses':

"Adenovirus, human bocavirus (HBoV), parainfluenza virus (PIV), human metapneumovirus (hMPV), and rhinovirus can be detected throughout the year (all-year viruses). Seasonal patterns of PIV are type specific. Epidemics of PIV type 1 (PIV1) and PIV type 3 (PIV3) peak in the fall [Autumn] and spring-summer, respectively. The prevalence of some non-rhinovirus enteroviruses increases in summer (summer viruses)"


Moriyama, Hugentobler & Iwasaki, 2020: 86

Just a couple of days later Mr Javid was being interviewed on the radio, and he made a more limited claim:

Health Secretary Sajid Javid talking on BBC Radio 4's 'Today' programme, 15th September

"…because we know Autumn and Winter, your COVID is going to like that time of year"

Sajid Javid

So, this claim was just about the COVID virus, not viruses more generally, and that we know that COVID is going to like Autumn and Winter. No ambiguity there. But how do we know?

Coming to knowledge

Historically there have been various ways of obtaining knowledge.

  • Divine revelation: where God reveals the knowledge to someone, perhaps through appearing to the chosen one in a dream.
  • Consulting an oracle, or a prophet or some other kind of seer.
  • Intuiting the truth by reflecting on the nature of things using the rational power of the human intellect.
  • Empirical investigation of natural phenomena.

My focus in this blog is related to science, and given that we are talking about public health policy in modern Britain, I would like to think Mr Javid was basing his claim on the latter option. Of course, even empirical methods depend upon some metaphysical assumptions. For example, if one assumes the cosmos has inbuilt connections one might look for evidence in terms of sympathies or correspondences. Perhaps, if the COVID virus was observed closely and looked like a snowflake, that could (in this mindset) be taken as a sign that it liked Winter.

A snowflake – or is it a virus particle?
(Image by Gerd Altmann from Pixabay)

Sympathetic magic

This kind of correspondence, a connection indicated by appearance, was once widely accepted, so that a plant which was thought to resemble some part of the anatomy might be assumed to be an appropriate medicine for diseases or disorders associated with that part of the body.

This is a kind of magic, and might seem a 'primitive' belief to many people today, but such an idea was sensible enough in the context of a common set of underlying beliefs about the nature and purposes of the world, and the place and role of people in that world. One might expect that specific beliefs would soon die out if, for example, the plant shaped like an ear turned out to do nothing for ear ache. Yet, at a time when medical practitioners could offer little effective treatment, and being sent to a hospital was likely to reduce life expectancy, herbal remedies at least often (if not always) did no harm.

Moreover, many herbs do have medicinal properties, and something with a general systemic effect might work as topical medicine (i.e., when applied to a specific site of disease). Add to that, the human susceptibility to confirmation bias (taking more notice of, and giving more weight to, instances that meet our expectations than those which do not) and the placebo effect (where believing we are taking effective medication can sometimes in itself have beneficial effects) and the psychological support offered by spending time with an attentive practitioner with a good 'bedside' manner – and we can easily see how beliefs about treatments may survive limited definitive evidence of effectiveness.

The gold standard of experimental method

Of course, today, we have the means to test such medicines by taking a large representative sample of a population (of ear ache sufferers, or whatever), randomly dividing them into two groups, and using a double-blind (or should that be double-deaf) approach, treat them with the possible medicine or a placebo, without either the patient or the practitioner knowing who was getting which treatment. (The researchers have a way to know of course – or it would difficult to deduce anything from the results.) That is, the randomised control trial (RCT).

Now, I have been very critical of the notion that these kinds of randomised experimental designs should be automatically be seen as the preferred way of testing educational innovations (Taber, 2019) – but in situations where control of variables and 'blinding' is possible, and where randomisation can be applied to samples of well-defined populations, this does deserve to be considered the gold standard. (It is when the assumptions behind a research methodology do not apply that we should have reservations about using it as a strategy for enquiry.)

So can the RCT approach be used to find out if COVID has a preference for certain times of year? I guess this depends on our conceptual framework for the research (e.g., how do we understand what a 'like' actually is) and the theoretical perspective we adopt.

So, for example, behaviourists would suggest that it is not useful to investigate what is going on in someone's mind (perhaps some behaviorists do not even think the mind concept corresponds to anything real) so we should observe behaviours that allow us to make inferences. This has to be done with care. Someone who buys and eats lots of chocolate presumably likes chocolate, and someone who buys and listens to a lot of reggae probably likes reggae, but a person who cries regularly, or someone that stumbles around and has frequent falls, does not necessary like crying, or falling over, respectively.

A viral choice chamber

So, we might think that woodlice prefer damp conditions because we have put a large number of woodlice in choice chambers with different conditions (dry and light, dry and dark, damp and light, damp and dark) and found that there was a statistically significant excess of woodlice settling down in the damp sections of the chamber.

Of course, to infer preferences from behaviour – or even to use the term 'behaviour' – for some kinds of entity is questionable. (To think that woodlice make a choice based on what they 'like' might seem to assume a level of awareness that they perhaps lack?) In a cathode ray tube electrons subject to a magnetic field may be observed (indirectly!) to move to one side of the tube, just as woodlice might congregate in one chamber, but I am not sure I would describe this as electrons liking that part of the tube. I think it can be better explained with concepts such as electrical charge, fields, forces, and momentum.

It is difficult to see how we can do double blind trials to see which season a virus might like, as if the COVID virus really does like Winter, it must surely have a way of knowing when it is Winter (making blinding impossible). In any case, a choice chamber with different sections at different times of the year would require some kind of time portal installed between its sections.

Like electrons, but unlike woodlice, COVID viral particles do not have an active form of transport available to them. Rather, they tend to be sneezed and coughed around and then subject to the breeze, or deposited by contact with surfaces. So I am not sure that observing virus 'behaviour' helps here.

So perhaps a different methodology might be more sensible.

A viral opinion poll

A common approach to find out what people like would be a survey. Surveys can sometimes attract responses from large numbers of respondents, which may seem to give us confidence that they offer authentic accounts of widespread views. However, sample size is perhaps less important than sample representativeness. Imagine carrying out a survey of people's favourite football teams at a game at Stamford Bridge; or undertaking a survey of people's favourite bands as people queued to enter a King Crimson concert! The responses may [sic, almost certainly would] not fully reflect the wider population due to the likely bias in such samples. Would these surveys give reliable results which could be replicated if repeated at the Santiago Bernabeu or at a Marillion concert?

How do we know what 'COVID 'really likes?
(Original Images by OpenClipart-Vectors and Gordon Johnson from Pixabay)

A representative sample of vairants?

This might cause problems with the COVID-19 virus (SARS-CoV-2). What counts as a member of the population – perhaps a viable virus particle? Can we even know how big the population actually is at the time of our survey? The virus is infecting new cells, leading to new virus particles being produced all the time, just as shed particles become non-viable all the time. So we have no reliable knowledge of population numbers.

Moreover, a survey needs a representative sample: do the numbers of people in a sample of a human population reflect the wider population in relevant terms (be that age, gender, level of educational qualifications, earnings, etc.)? There are viral variants leading to COVID-19 infection – and quite a few of them. That is, SARS-CoV-2 is a class with various subgroups. The variants replicate to different extents under particular conditions, and new variants appear from time to time.

So, the population profile is changing rapidly. In recent months in the UK nearly all infections where the variant has been determined are due to the variant VOC-21APR-02 (or B.1.617.2 or Delta) but many people will be infected asymptotically or with mild symptoms and not be tested, and so this likely does not mean that VOC-21APR-02 dominates the SARS-CoV-2 population as a whole to the extent it currently dominates in investigated cases. Assuming otherwise would be like gauging public opinion from the views of those particular people who make themselves salient by attending a protest, e.g.:

"Shock finding – 98% of the population would like to abolish the nuclear arsenal,

according to a [hypothetical] survey taken at the recent Campaign for Nuclear Disarmament march"

In any case, surveys are often fairly blunt instruments as they need to present objectively the same questions to all respondents, and elicit responses in a format that can be readily classified into a discrete number of categories. This is why many questionnaires use Likert type items:

Would you say you like Autumn and Winter:

12345
AlwaysNearly alwaysUsuallySometimesNever

Such 'objective' measures are often considered to avoid the subjective nature of some other types of research. It may seem that responses do not need to be interpreted – but of course this assumes that the researchers and all the respondents understand language the same way (what exactly counts as Autumn and Winter? What does 'like' mean? How is 'usually' understood – 60-80% of the time, or 51-90% of the time or…). We can usually (sic) safely assume that those with strong language competence will have somewhat similar understandings of terms, but we cannot know precisely what survey participants meant by their responses or to what extent they share a meaning for 'usually'.

There are so-called 'qualitative surveys' which eschew this kind of objectivity to get more in-depth engagement with participants. They will usually use interviews where the researcher can establish rapport with respondents and ask them about their thoughts and feelings, observe non-verbal signals such as facial expressions and gestures, and use follow-up questions… However, the greater insight into individuals comes at a cost of smaller samples as these kinds of methods are more resource-intensive.

But perhaps Mr Javid does not actually mean that COVID likes Autumn and Winter?

So, how did the Department of Health & Social Care, or the Health Secretary's scientific advisors, find out that COVID (or the COVID virus) likes Autumn and Winter? The virus does not think, or feel, and it does not have preferences in the way we do. It does not perceive hot or cold, and it does not have a sense of time passing, or of the seasons.2 COVID does not like or dislike anything.

Mr Javid needs to make himself clear to a broad public audience, so he has to avoid too much technical jargon. It is not easy to pitch a presentation for such an audience and be pithy, accurate, and engaging, but it is easy for someone (such as me) to be critical when not having to face this challenge. Cabinet ministers, unlike science teachers, cannot be expected to have skills in communicating complex and abstract scientific ideas in simplified and accessible forms that remain authentic to the science.

It is easy and perhaps convenient to use anthropomorphic language to talk about the virus, and this will likely make the topic seem accessible to listeners, but it is less clear what is actually meant by a virus liking a certain time of year. In teaching the use of anthropomorphic language can be engaging, but it can also come to stand in place of scientific understanding when anthropomorphic statements are simply accepted uncritically at face value. For example, if the science teacher suggests "the atom wants a full shell of electrons" then we should not be surprised that students may think this is a scientific explanation, and that atoms do want to fill their shells. (They do not of course. 3)

Image by Gordon Johnson from Pixabay

Of course Mr Javid's statements cannot be taken as a literal claim about what the virus likes – my point in this posting is to provoke the question of what this might be intended to mean? This is surely intended metaphorically (at least if Mr Javid had thought about his claim critically): perhaps that there is higher incidence of infection or serious illness caused by the COVID virus in the Winter. But by that logic, I guess turkeys really would vote for Christmas (or Thanksgiving) after all.

Typically, some viruses cause more infection in the Winter when people are more likely to mix indoors and when buildings and transport are not well ventilated (both factors being addressed in public health measures and advice in regard to COVID-19). Perhaps 'likes' here simply means that the conditions associated with a higher frequency/population of virus particles occur in Autumn and Winter?

A snowflake.
The conditions suitable for a higher frequency of snowflakes are more common in Winter.
So do snowflakes also 'like' Winter?
(Image by Gerd Altmann from Pixabay)

However, this is some way from assigning 'likes' to the virus. After all, in evolutionary terms, a virus might 'prefer', so to speak, to only be transmitted asymptomatically, as it cannot be in the virus's 'interests', so to speak, to encourage a public health response that will lead to vaccines or measures to limit the mixing of people.

If COVID could like anything (and of course it cannot), I would suggest it would like to go 'under the radar' (another metaphor) and be endemic in a population that was not concerned about it (perhaps doing so little harm it is not even noticed, such that people do not change their behaviours). It would then only 'prefer' a Season to the extent that that time of year brings conditions which allow it to go about its life cycle without attracting attention – from Mr Javid or anyone else.

Keith S. Taber, September 2021

Addendum: 1st December 2021

Déjà vu?

The health secretary was interviewed on 1st December

"…we have always known that when it gets darker, it gets colder, the virus likes that, the flu virus likes that and we should not forget that's still lurking around as well…"

Rt. Hon. Sajid Javid MP, the U.K. Secretary of State for Health and Social Care, interviewed on BBC Radio 4 Today programme, 1st December, 2021
Works cited:
Footnotes:

1. It would also seem to be a generalisation based on the only two Winters that the COVID-19 virus had 'experienced'

2. Strictly I cannot know what it is like to be a virus particle. But a lot of well-established and strongly evidenced scientific principles would be challenged if a virus particle is sentient.

3. Yet this is a VERY common alternative conceptions among school children studying chemistry: The full outer shells explanatory principle

Related reading:

So who's not a clever little virus then?

COVID is like a fire because…

Anthropomorphism in public science discourse

Not a great experiment…

What was wrong with The Loneliness Experiment?

Keith S. Taber

The loneliness experiment, a.k.a. The BBC Loneliness Experiment was a study publicised through the BBC (British public service broadcaster), and in particular through it's radio programme All in the Mind, ("which covers psychology, neuroscience & mental health" according to presenter, Claudia Hammond's website.)1 It was launched back in February 2018 – pre-COVD.2

"All in the Mind: The Loneliness Experiment launches the world's largest ever survey of its kind on loneliness." https://www.bbc.co.uk/programmes/b09r6fvn

Claudia Hammond describes herself as an "award-winning broadcaster, author and psychology lecturer". In particular "She is Visiting Professor of the Public Understanding of Psychology at the University of Sussex" where according to the University of Sussex  "the post has been specially created for Claudia, who studied applied psychology at the University in the 1990s", so she is very well qualified for her presenting role. (I think she is very good at this role: she has a good voice for the radio and manages to balance the dual role of being expert enough to exude authority, whilst knowing how to ask necessarily naive questions of guests on behalf of non-specialist listeners.)

A serious research project

The study was a funded project based on a collaboration between academics from a number of universities, led by Prof Pamela Qualter, Professor of Education at the Manchester Institute of Education at the University of Manchester. Moreoever, "55,000 people from around the world chose to take part in the BBC Loneliness Experiment, making it the world's largest ever study on loneliness" (https://claudiahammond.com/bbc-loneliness-experiment/)

Loneliness is a serious matter that affects many people, and is not be made light of. So this was a serious study, on an important topic – yet every time I heard this mentioned on the radio (and it was publicised a good deal at the time) I felt myself mentally (and sometimes physically) cringe. Even without hearing precise details of the research design, I could tell this was simply not a good experiment.

This was not due to any great insight on my behalf, but was obvious from the way the work was being described. Readers may wish to see if they can spot for themselves what so irked me.

What is the problem with this research design?

This is how the BBC described the study at its launch:

The Loneliness Experiment, devised by Professor Pamela Qualter and colleagues, aims to look at causes and possible solutions to loneliness. And we want as many people as possible to fill in our survey, even if they've never felt lonely, because we want to know what stops people feeling lonely, so that more of us can feel connected.

https://www.bbc.co.uk/programmes/b09r6fvn

This is how Prof. Hammond described the research in retrospect:

55,000 people from around the world chose to take part in the BBC Loneliness Experiment, making it the world's largest ever study on loneliness. Researchers from the universities of Manchester, Brunel and Exeter, led by Professor Pamela Qualter and funded by the Wellcome Trust, developed a questionnaire asking people what they thought loneliness was, when they felt lonely and for how long.

https://claudiahammond.com/bbc-loneliness-experiment/

And this is how the work is described on the University of Manchester's pages:

The Loneliness Experiment was a study conducted by BBC Radio 4's All in the Mind….

The study asked respondents to give their opinions and record their experiences of loneliness and related topics, including friendship, relationships, and the use of technology – as well as recording lifestyle and background information. Respondents also engaged in a number of experiments.

The survey was developed by Professor Pamela Qualter, from The University of Manchester's Manchester Institute of Education (MIE), with colleagues from Brunel University London, and the University of Exeter. The work was funded by a grant from The Wellcome Trust.

https://www.seed.manchester.ac.uk/education/research/impact/bbc-loneliness-experiment/

When is an experiment not an experiment?

These descriptions make it obvious that the The Loneliness Experiment was not an experiment. Experiment is a specific kind of research – a methodology where the researchers randomly assign participants randomly to conditions, intervene in the experimental condition,and take measurements to see what effect the intervention has by comparing with measurements in a control condition. True experiments are extremely difficult to do in the social sciences (Taber, 2019), and often quasi-experiments or natural experiments are used which do not meet all the expectations for true experiments. BUT, to be an experiment there has to be something that can be measured as changing over time in relation to specified different conditions.

Experiment involves intervention (Image by Gerd Altmann from Pixabay)

Experiment is not the only methodology used in research – there are also case studies, there is action research and grounded theory, for example – and non-experimental research may be entirely appropriate in certain situations, and can be of very high quality. One alternative methodology is the survey which collects data form a sample of a population at some particular time. Although surveys can be carried out in various ways (for example, through a series of observations), especially common in social science is the survey (a methodology) carried out by using participant self-responses to a questionnaire (a research instrument).

it is clear from the descriptions given by the BBC, Professor Hammond and the University of Manchester that the The Loneliness Experiment was not actually an experiment at all, but basically a survey (even if, tantalisingly, the Manchester website suggests that "Respondents also [sic] engaged in a number of experiments". )

The answer to the question 'when is an experiment not an experiment?' might simply be: when it is something other than an experiment

Completing a questionnaire (Image by Andreas Breitling from Pixabay)

What's in a name: does it really matter?

Okay, so I am being pedantic again.

But I do think this matters.

I think it is safe to assume that Prof. Hammond, Prof. Qualter and colleagues know the difference between an experiment and a survey. Presumably someone decided that labelling the research as the loneliness study or the loneliness survey would not be accessible (or perhaps not as impressive) to a general audience and so decided to incorrectly use the label experiment as if experiment was synonymous with study/research.

As a former research methods lecturer, that clearly irks as part of my job was to teach new researchers about key research concepts. But I would hope that people actually doing research or learning to do research are not going to be confused by this mislabelling.

But, as a former school science teacher, I know that there is widespread public misunderstanding of key nature of science terms such as theory and experiment. School age students do need to learn what is meant by the word experiment, and what counts as an experiment, and the BBC is being unhelpful in presenting research that is not experimental as an experiment – as this will simply reinforce common misconceptions of what the term experiment is actually used to denote in research .

So, in summary, I'll score The BBC Loneliness Experiment

  • motivation – excellent;
  • reach – impressive;
  • presentation – unfortunate and misleading
Further reading:

Read about methodology

Read about experiments

Read about surveys

Work cited:

Taber, K. S. (2019). Experimental research into teaching innovations: responding to methodological and ethical challenges. Studies in Science Education, 55(1), 69-119. doi:10.1080/03057267.2019.1658058 [Download manuscript version]

Note:

1: Websites cited accessed on 28th August, 2021.

2: It would have been interesting to repeat when so many people around the world were in 'lock-down'. (A comparison between pre-COVID and pandemic conditions might have offered something of a natural experiment.)

Shock! A typical honey bee colony comprises only six chemicals!

Is it half a dozen of one, or six of the other?

Keith S. Taber

Bee-ware chemicals!
(Images by PollyDot and Clker-Free-Vector-Images from Pixabay)

A recent episode of the BBC Inside science radio programme and podcast was entitled 'Bees and multiple pesticide exposure'. This discussed a very important issue that I have no wish to make light of. Researchers were looking at the stressors which might be harming honey bees, very important pollinators for many plants, and concluded that these likely act synergistically. That is a colony suffering from, say a drought and at the same time a mite infection, will show more damage that one would expect from simply adding the typical harm of each as if independent effects.  Rather there are interactions.

This is hardly surprising, but is none-the-less a worrying finding.

Bees and multiple pesticide exposure episode of BBC Inside Science

However,  my 'science teacher' radar honed in on an aspect of the language used to explain the research. The researcher interviewed was Dr Harry Siviter of the University of Texas at Austin. As part of his presentation he suggested that…

"Exposure to multiple pesticides is the norm, not the exception. So, for example a study in North America showed that the average number of chemicals found in a honey bee colony is six, with a high of 42. So, we know that bees are exposed to multiple chemicals…"

Dr Harry Siviter

The phrase that stood out for me was "the average number of chemicals found in a honey bee colony is six" as clearly that did not make any sense scientifically. At least, not if the term 'chemical' was meant to refer to 'chemical substance'. I cannot claim to know just how many different substances would be found if one analysed honey bee colonies, but I am pretty confident the average would be orders of magnitude greater than six. An organism such as a bee (leaving aside for a moment the hive in which it lives) will be, chemically, 'made up' of a great many different proteins, amino acids, lipids, sugars, nuclei acids, and so forth.

"the average number of chemicals found in a honey bee colony is six"

From the context, I understood that Dr Siviter was not really talking about chemicals in general, but pesticides. So, I am (not for the first time) being a pedant in pointing out that technically he was wrong to suggest "the average number of chemicals found in a honey bee colony is six" as any suitably informed listener would have immediately, and unproblematically, understood what he meant by 'chemicals' in this context.

Yet, as a teacher, my instinct is to consider that programmes such as this, designed to inform the public about science, are not only heard by those who are already well-versed in the sciences. By its nature, BBC Inside Science is intended to engage with a broad audience, and has a role in educating the public about science. I also knew that this particular pedantic point linked to a genuine issue in science teaching.

A common alternative conception

The term chemical is not usually used in science discourse as such, but rather the term substance. Chemical substances are ubiquitous, although in most everyday contexts we do not come across many pure samples of single substances. Tap water is nearly all water, and table salt is usually about 99% sodium chloride, and sometimes metals such as copper or aluminium are used in more or less pure form. But these tend to be exceptions – most material entities we engage with are not pure substances ('chemicals'), rather being mixtures or even more complex (e.g., wood or carrot or hair).

In everyday life, the term chemical tends to be used more loosely – so, for example, household bleach may be considered 'a chemical'. More problematically 'chemicals' tends to be seen as hazardous, and often even poisonous. So, people object to there being 'chemicals' in their food – when of course their food comprises chemicals and we eat food to access those chemicals because we are also made up of a great many chemicals. Food with the chemicals removed is not food, or indeed, anything at all!

In everyday discourse 'chemical' is often associated with 'dangerous' (Image by Arek Socha from Pixabay)

So, science teachers not only have the problem that in everyday discourse the term 'chemical' does not map unproblematically on 'substance' (as it is often used also for mixtures), but even more seriously that chemicals are assumed to be bad, harmful, undesirable – something to be avoided and excluded. By contrast, the scientific perspective is that whilst some chemicals are potentially very harmful, others are essential for life. Therefore, it is unhelpful when science communicators (whether journalists, or scientists themselves) use the term 'chemical' to refer only to potentially undesirable chemicals (which even then tend to be undesirable only in certain contexts), such as pesticides which are found in, and may harm, pollinators.

I decided to dig into the background of the item.

The news item

I found a news item in 'the Conversation' that discuses the work.

Dr Siviter's Article in the Conversation

It began

"A doctor will always ask if you are on any other medication before they write you a prescription. This is because pharmaceuticals can interact with each other and potentially disrupt the treatment, or even harm the patient. But when agrochemicals, such as pesticides, are licensed for use on farms, little attention is paid to how they interact with one another, and so their environmental impact is underestimated."

Siviter, 2021

This seemed a very good point, made with an analogy that seemed very telling.

(Read about science analogies)

This was important because:

"We analysed data gathered in scientific studies from the last two decades and found that when bees are exposed to a combination of pesticides, parasites and poor nutrition, the negative impact of each is exacerbated. We say that the cumulative effect of all these things is synergistic, meaning that the number of bees that are killed is more than we would predict if the negative effects were merely added together."

Siviter, 2021

This seems important work, and raises an issue we should be concerned about. The language used here was subtly different from in the radio programme:

"Many agrochemicals, such as neonicotinoids, are systemic, meaning they accumulate in the environment over several months, and in some cases years. It is perhaps not surprising then that honeybee colonies across the US have on average six different agrochemicals present in their wax, with one hive contaminated with 39 [sic, not 42]. It's not just honeybees which are at risk, though: wild bees such as bumblebees are also routinely exposed."

Siviter, 2021

So, here it was not 'chemicals' that were being counted but 'agrochemicals' (and the average figure of 6 now referred not to the colony as a whole, but only to the beeswax.)

The meta-analysis

'Agrochemicals' was also the term used in the research paper in the prestigious journal Nature where the research had been first reported,

"we conducted a meta-analysis of 356 interaction effect sizes from 90 studies in which bees were exposed to combinations of agrochemicals, nutritional stressors and/or parasites."

Siviter, et al., 2021

A meta-analysis is a type of secondary research study which collects results form a range of related published studies and seeks to identify overall patterns.

The original research

Moreover, the primary study being referred to as the source of the dubious statistic (i.e., that "the average number of chemicals found in a honey bee colony is six") referred not to 'chemicals' but to "pesticides and metabolites" (that is, substances which would be produced as the bee's metabolism broke the pesticides down):

"We have found 121 different pesticides and metabolites within 887 wax, pollen, bee and associated hive samples….

Almost all comb and foundation wax samples (98%) were contaminated with up to 204 and 94 ppm [parts per million], respectively, of fluvalinate and coumaphos, and lower amounts of amitraz degradates and chlorothalonil, with an average of 6 pesticide detections per sample and a high of 39."

Mullin, et al., 2010

Translation and representation

Scientific research is reported in research journals primarily for the benefit of other researchers in the field, and so is formatted and framed accordingly – and this is reflected in the language used in primary sources.

A model of the flow of scientific to public knowledge (after McInerney et al., 2004)

Fig. 10.2 from Taber, 2013

It is important that science which impacts on us all, and is often funded from public funds, is accessible to the public. Science journalism, is an important conduit for the communication of science, and for his to be effective it has to be composed with non-experts in the public in mind.

(Read about science in public discourse and the media)

It is perfectly sensible and desirable for a scientist engaging with a public audience to moderate technical language to make the account of research more accessible for a non-specialist audience. This kind of simplification is also a core process in developing science curriculum and teaching.

(Read about representing science in the curriculum)

However, in the case of 'chemical' I would suggest scientists take care with using the term (and avoid it if possible), as science teachers commonly have to persuade students that chemicals are all around of us, are not always bad for us, are part of us, and are essential. That pesticides and their breakdown products have been so widely detected in bee colonies is a matter of concern, as pesticides are substances that are used because of their detrimental effects on many insects and other organisms that might damage crops.

Whilst that is science deserving public attention, there are a good many more than 6 chemicals in any bee colony, and – indeed – we would want most of them to be there.

References: