Shock result: more study time leads to higher test scores

(But 'all other things' are seldom equal)


Keith S. Taber


I came across an interesting journal article that reported a quasi-experimental study where different groups of students studied the same topic for different periods of time. One group was given 3 half-hour lessons, another group 5 half-hour lessons, and the third group 8 half-hour lessons. Then they were tested on the topic they had been studying. The researchers found that the average group performance was substantially different across the different conditions. This was tested statistically, but the results were clear enough to be quite impressive when presented visually (as I have below).


Results from a quasi-experiment: its seems more study time can lead to higher achievement

These results seem pretty clear cut. If this research could be replicated in diverse contexts then the findings could have great significance.

  • Is your manager trying to cut course hours to save budget?
  • Does your school want you to teach 'triple science' in a curriculum slot intended for 'double science'?
  • Does your child say they have done enough homework?

Research evidence suggests that, ceteris paribus, learners achieve more by spending more time studying.

Ceteris paribus?

That is ceteris paribus (no, it is not a newly discovered species of whale): all other things being equal. But of course, in the real world they seldom – if ever – are.

If you wondered about the motivation for a study designed to see whether more teaching led to more learning (hardly what Karl Popper would have classed as a suitable 'bold conjecture' on which to base productive research), then I should confess I am being disingenuous. The information I give above is based on the published research, but offers a rather different take on the study from that offered by the authors themselves.

An 'alternative interpretation' one might say.

How useful are DARTs as learning activities?

I came across this study when looking to see if there was any research on the effectiveness of DARTs in chemistry teaching. DARTs are directed activities related to text – that is text-based exercises designed to require learners to engage with content rather than just copy or read it. They have long been recommended, but I was not sure I had seen any published research on their use in science classrooms.

Read about using DARTs in teaching

Shamsulbahri and Zulkiply (2021) undertook a study that "examined the effect of Directed Activity Related to Texts (DARTs) and gender on student achievement in qualitative analysis in chemistry" (p.157). They considered their study to be a quasi-experiment.

An experiment…

Experiment is the favoured methodology in many areas of natural science, and, indeed, the double blind experiment is sometimes seen as the gold standard methodology in medicine – and when possible in the social sciences. This includes education, and certainly in science education the literature reports many, many educational experiments. However, doing experiments well in education is very tricky and many published studies have major methodological problems (Taber, 2019).

Read about experiments in education

…requires control of variables

As we teach in school science, fair testing requires careful control of variables.

So, if I suggest there are some issues that prevent a reader from being entirely confident in the conclusions that Shamsulbahri and Zulkiply reach in their paper, it should be borne in mind that I think it is almost impossible to do a rigorously 'fair' small-scale experiment in education. By small-scale, I mean the kind of study that involves a few classes of learners as opposed to studies that can enrol a large number of classes and randomly assign them to conditions. Even large scale randomised studies are usually compromised by factors that simply cannot be controlled in educational contexts (Taber, 2019) , and small scale studies are subject to additional, often (I would argue) insurmountable, 'challenges'.

The study is available on the web, open access, and the paper goes into a good deal of detail about the background to, and aspects of, the study. Here, I am focusing on a few points that relate to my wider concerns about the merits of experimental research into teaching, and there is much of potential interest in the paper that I am ignoring as not directly relevant to my specific argument here. In particular, the authors describe the different forms of DART they used in the study. As, inevitably (considering my stance on the intrinsic problems of small-scale experiments in education), the tone of this piece is critical, I would recommend readers to access the full paper and make up your own minds.

Not a predatory journal

I was not familiar with the journal in which this paper was published – the Malaysian Journal of Learning and Instruction. It describes itself as "a peer reviewed interdisciplinary journal with an international advisory board". It is an open access journal that charges authors for publication. However, the publication fees are modest (US$25 if authors are from countries that are members of The Association of Southeast Asian Nations, and US$50 otherwise). This is an order of magnitude less than is typical for some of the open-access journals that I have criticised here as being predatory – those which do not engage in meaningful peer review, and will publish some very low quality material as long as a fee is paid. 25 dollars seems a reasonable charge for the costs involved in publishing work, unlike the hefty fees charged by many of the less scrupulous journals.

Shamsulbahri and Zulkiply seem, then, to have published in a well-motivated journal and their paper has passed peer review. But this peer thinks that, like most small scale experiments into teaching, it is very hard to draw any solid conclusions from this work.

What do the authors conclude?

Shamsulbahri and Zulkiply argue that their study shows the value of DARTs activities in learning. I approach this work with a bias, as I also think DARTs can be very useful. I used different kinds of DARTs extensively in my teaching with 14-16 years olds when I worked in schools.

The authors claim their study,

"provides experimental evidence in support of the claim that the DARTs method has been beneficial as a pedagogical approach as it helps to enhance qualitative analysis learning in chemistry…

The present study however, has shown that the DARTs method facilitated better learning of the qualitative analysis component of chemistry when it was combined with the experimental method. Using the DARTs method only results in better learning of qualitative analysis component in chemistry, as compared with using the Experimental method only."

Shamsulbahri & Zulkiply, 2021

Yet, despite my bias, which leads me to suspect they are right, I do not think we can infer this much from their quasi-experiment.

I am going to separate out three claims in the quote above:

  1. the DARTs method has been beneficial as a pedagogical approach as it helps to enhance qualitative analysis learning in chemistry
  2. the DARTs method facilitated better learning of the qualitative analysis component of chemistry when it was combined with the [laboratory1] method
  3. the DARTs method [by itself] results in better learning of qualitative analysis component in chemistry, as compared with using the [laboratory] method only.

I am going to suggest that there are two weak claims here and one strong claim. The weak claims are reasonably well supported (but only as long as they are read strictly as presented and not assumed to extend beyond the study) but the strong claim is not.

Limitations of the experiment

I suggest there are several major limiations of this research design.

What population is represented in the study?

In a true experiment researchers would nominate the population of interest (say, for example, 14-16 year old school learners in Malaysia), and then randomly select participants from this population who would be randomly assigned to the different conditions being compared. Random selection and assignment cannot ensure that the groupings of participants are equivalent, nor that the samples genuinely represent the population; as by chance it could happen that, say, the most studious students are assigned to one condition and all the lazy students to an other – but that is very unlikely. Random selection and assignment means that there is strong statistical case to think the outcomes of the experiment probably represent (more or less) what would have happened on a larger scale had it been possible to include the whole population in the experiment.

Read about sampling in research

Obviously, researchers in small-scale experiments are very unlikely to be able to access full populations to sample. Shamsulbahri and Zulkiply did not – and it would be unreasonable to criticise them for this. But this does raise the question of whether what happens in their samples will reflect what would happen with other groups of students. Shamsulbahri and Zulkiply acknowledge their sample cannot be considered typical,

"One limitation of the present study would be the sample used; the participants were all from two local fully residential schools, which were schools for students with high academic performance."

Shamsulbahri & Zulkiply, 2021

So, we have to be careful about generalising from what happened in this specific experiment to what we might expect with different groups of learners. In that regard, two of the claims from the paper that I have highlighted (i.e., the weaker claims) do not directly imply these results can be generalised:

  1. the DARTs method has been beneficial as a pedagogical approach…
  2. the DARTs method facilitated better learning of the qualitative analysis component of chemistry when it was combined with the [laboratory] method

These are claims about what was found in the study – not inferences about what would happen in other circumstances.

Read about randomisation in studies

Equivalence at pretest?

When it is not possible to randomly assign participants to the different conditions then there is always the possibility that whatever process has been used to assign conditions to groups produces a bias. (An extreme case would be in a school that used setting, that is assigning students to teaching groups according to achievement, if one set was assigned to one condition, and another set to a different condition.)

In quasi-experiments on teaching it is usual to pre-test students and to present analysis to show that at the start of the experiment the groups 'are equivalent'. Of course, it is very unlikely two different classes would prove to be entriely equivalent on a pre-test, so often there is a judgement made of the test results being sufficiently similar across the conditions. In practice, in many published studies, authors settle for the very weak (and inadequate) test of not finding differences so great that would be very unlikely to occur by chance (Taber, 2019)!

Read about testing for equivalence

Shamsulbahri and Zulkiply did pretest all participants as a screening process to exclude any students who already had good subject knowledge in the topic (qualitative chemical analysis),

"Before the experimental manipulation began, all participants were given a pre-screening test (i.e., the Cation assessment test) with the intention of selecting only the most qualified participants, that is, those who had a low-level of knowledge on the topic….The participants who scored ten or below (out of a total mark of 30) were selected for the actual experimental manipulation. As it turned out, all 120 participants scored 10 and below (i.e., with an average of 3.66 out of 30 marks), which was the requirement that had been set, and thus they were selected for the actual experimental manipulation."

Shamsulbahri & Zulkiply, 2021

But the researchers do not report the mean results for the groups in the three conditions (laboratory1; DARTs; {laboratory+DARTs}) or give any indication of how similar (or not) these were. Nor do these scores seem to have been included as a variable in the analysis of results. The authors seem to be assuming that as no students scored more than one-third marks in the pre-test, then any differences beteen groups at pre-test can be ignored. (This seems to suggest that scoring 30% or 0% can be considered the same level of prior knowledge in terms of the potential influence on further learning and subsequent post-test scores.) That does not seem a sound assumption.

"It is important to note that there was no issue of pre-test treatment interaction in the context of the present study. This has improved the external validity of the study, since all of the participants were given a pre-screening test before they got involved in the actual experimental manipulation, i.e., in one of the three instructional methods. Therefore, any differences observed in the participants' performance in the post-test later were due to the effect of the instructional method used in the experimental manipulation."

Shamsulbahri & Zulkiply, 2021 (emphasis added)

There seems to be a flaw in the logic here, as the authors seem to be equating demonstrating an absence of high scorers at pre-test with there being no differences between groups which might have influenced learning. 2

Units of analysis

In any research study, researchers need to be clear regarding what their 'unit of analysis' should be. In this case the extreme options seem to be:

  • 120 units of analysis: 40 students in each of three conditions
  • 3 units of analysis: one teaching group in each condition

The key question is whether individual learners can be considered as being subject to the treatment conditions independently of others assiged to the same condition.

"During the study phase, student participants from the three groups were instructed by their respective chemistry teachers to learn in pairs…"

Shamsulbahri & Zulkiply, 2021

There is a strong argument that when a group of students attend class together, and are taught together, and interact with each other during class, they strictly should not be considered as learning independently of each other. Anyone who has taught parallel classes that are supposedly equivalent will know that classes take on their own personalities as groups, and the behaviour and learning of individual students is influenced by the particular class ethos.

Read about units of analysis

So, rigorous research into class teaching pedagogy should not treat the individual learners as units of analysis – yet it often does. The reason is obvious – it is only possible to do statistical testing when the sample size is large enough, and in small scale educational experiments the sample size is never going to be large enough unless one…hm…pretends/imagines/considers/judges/assumes/hopes?, that each learner is independently subject to the assigned treatment without being substantially influenced by others in that condition.

So, Shamsulbahri and Zulkiply treated their participants as independent units of analysis and based on this find a statistically significant effect of treatment:

⎟laboratory⎢ vs. ⎟DARTs⎢ vs. ⎟laboratory+DARTs⎢.

That is questionable – but what if, for argument's sake, we accept this assumption that within a class of 40 students the learners can be considered not to influence each other (even their learning partner?) or the classroom more generally sufficiently to make a difference to others in the class?

A confounding variable?

Perhaps a more serious problem with the research design is that there is insufficient control of potentially relevant variables. In order to make a comparison of ⎟laboratory⎢ vs. ⎟DARTs⎢ vs. ⎟laboratory+DARTs⎢ then the only relevant difference between the three treatment conditions should be whether the students learn by laboratory activity, DARTs, or both. There should not be any other differences between the groups in the different treatments that might reasonably be expected to influence the outcomes.

Read about confounding variables

But the description of how groups were set up suggests this was not the case:

"….the researchers conducted a briefing session on the aims and experimental details of the study for the school's [schools'?] chemistry teachers…the researchers demonstrated and then guided the school's chemistry teachers in terms of the appropriate procedures to implement the DARTs instructional method (i.e., using the DARTs handout sheets)…The researcher also explained to the school's chemistry teachers the way to implement the combined method …

Participants were then classified into three groups: control group (experimental method), first treatment group (DARTs method) and second treatment group (Combination of experiment and DARTs method). There was an equal number of participants for each group (i.e., 40 participants) as well as gender distribution (i.e., 20 females and 20 males in each group). The control group consisted of the participants from School A, while both treatment groups consisted of participants from School B"


Shamsulbahri & Zulkiply, 2021

Several different teachers seems to have been involved in teaching the classes, and even if it is not entirely clear how the teaching was divided up, it is clear that the group that only undertook the laboratory activities were from a different school than those in the other two conditions.

If we think one teacher can be replaced by another without changing learning outcomes, and that schools are interchangeable such that we would expect exactly the same outcomes if we swapped a class of students from one school for a class from another school, then these variables are unimportant. If, however, we think the teacher doing the teaching and the school from which learners are sampled could reasonably make a difference to the learning achieved, then these are confounding variables which have not been properly controlled.

In my own experience, I do not think different teachers become equivalent even when their are briefed to teach in the same way, and I do not think we can assume schools are equivalent when providing students to participate in learning. These differences, then, undermine our ability to assign any differences in outcomes as due to the differences in pedagogy (that "any differences observed…were due to the effect of the instructional method used").

Another confounding variable

And then I come back to my starting point. Learners did not just experience different forms of pedagogy but also different amounts of teaching. The difference between 3 lessons and 5 lessons might in itself be a factor (that is, even if the pedagogy employed in those lessons had been the same), as might the difference between 5 lessons and 8 lessons. So, time spent studying must be seen as a likely confounding variable. Indeed, it is not just the amount of time, but also the number of lessons, as the brain processes learning between classes and what is learnt in one lesson can be reinforced when reviewed in the next. (So we could not just assume, for example, that students automatically learn the same amount from, say, two 60 min. classes and four 30 min. classes covering the same material.)

What can we conclude?

As with many experiments in science teaching, we can accept the results of Shamsulbahri and Zulkiply's study, in terms of what they found in the specific study context, but still not be able to draw strong conclusions of wider significance.

Is the DARTs method beneficial as a pedagogical approach?

I expect the answer to this question is yes, but we need to be careful in drawing this conclusion from the experiment. Certainly the two groups which undertook the DARTs activities outperformed the group which did not. Yet that group was drawn from a different school and taught by a different teacher or teachers. That could have explained why there was less learning. (I am not claiming this is so – the point is we have no way of knowing as different variables are conflated.) In any case, the two groups that did undertake the DARTs activity were both given more lessons and spent substantially longer studying the topic they were tested on, than the class that did not. We simply cannot make a fair comparison here with any confidence.

Did the DARTs method facilitate better learning when it was combined with laboratory work?

There is a stronger comparison here. We still do not know if the two groups were taught by the same teacher/teachers (which could make a difference) or indeed whether the two groups started from a very similar level of prior knowledge. But, at least the two groups were from the same school, and both experienced the same DARTs based instruction. Greater learning was achieved when students undertook laboratory work as well as undertaking DARTs activities compared with students who only undertook the DARTs activity.

The 'combined' group still had more teaching than the DARTs group, but that does not matter here in drawing a logical conclusion because the question being explored is of the form 'does additional teaching input provide additional value?' (Taber, 2019). The question here is not whether one type of pedagogy is better than the other, but simply whether also undertaking practical works adds something over just doing the paper based learning activities.

Read about levels of control in experimental design

As the sample of learners was not representative of any specfiic wider population, we cannot assume this result would generalise beyond the participants in the study, although we might reasonably expect this result would be found elsewhere. But that is because we might already assume that learning about a practical activity (qualitative chemical analysis) will be enhanced by adding some laboratory based study!

Does DARTs pedagogy produce more learning about qualitative analysis than laboratory activities?

Shamsulbahri and Zulkiply's third claim was bolder because it was framed as a generalisation: instruction through DARTs produces more learning about qualitative analysis than laboratory-based instruction. That seems quite a stretch from what the study clearly shows us.

What the research does show us with confidence is that a group of 40 students in one school taught by a particular teacher/teaching team with 5 lessons of a specific set of DARTs activities, performed better on a specific assessment instrument than a different group of 40 students in another school taught by a different teacher/teaching team through three lessons of laboratory work following a specific scheme of practical activities.


a group of 40 students
performed better on a specific assessment instrumentthan a different group of 40 students
in one schoolin another school
taught by a particular teacher/teaching team
taught by a different teacher/teaching team
with 5 lessonsthrough 3 lessons
of a specific set of DARTs activities, of laboratory work following a specific scheme of practical activities
Confounded variables

Test instrument bias?

Even if we thought the post-test used by Shamsulbahri and Zulkiply was perfectly valid as an assessment of topic knowledge, we might be concerned by knowing that learning is situated in a context – we better recall in a similar context to that in which we learned.


How can we best assess students' learning about qualitative analysis?


So:

  • should we be concerned that the form of assessment, a paper-based instrument, is closer in nature to the DARTs learning experience than the laboratory learning experience?

and, if so,

  • might this suggest a bias in the measurement instrument towards one treatment (i.e., DARTs)

and, if so,

  • might a laboratory-based assessment have favoured the group that did the laboratory based learning over the DARTs group, and led to different outcomes?

and, if so,

  • which approach to assessment has more ecological validity in this case: which type of assessment activity is a more authentic way of testing learning about a laboratory-based activity like qualitative chemical analysis?

A representation of my understanding of the experimental design

Can we generalise?

As always with small scale experiments into teaching, we have to judge the extent to which the specifics of the study might prevent us from generalising the findings – to be able to assume they would generally apply elsewhere.3 Here, we are left to ask to what extent we can

  • ignore any undisclosed difference between the groups in levels of prior learning;
  • ignore any difference between the schools and their populations;
  • ignore any differences in teacher(s) (competence, confidence, teaching style, rapport with classes, etc.);
  • ignore any idiosyncrasies in the DARTs scheme of instruction;
  • ignore any idiosyncrasies in the scheme of laboratory instruction;
  • ignore any idiosyncrasies (and potential biases) in the assessment instrument and its marking scheme and their application;

And, if we decide we can put aside any concerns about any of those matters, we can safely assume that (in learning this topic at this level)

  • 5 sessions of learning by DARTs is more effective than 3 sessions of laboratory learning.

Then we only have to decide if that is because

  • (i) DARTs activities teach more about this topic at this level than laboratory activities, or
  • (ii) whether some or all of the difference in learning outcomes is simply because 150 minutes of study (broken into five blocks) has more effect than 90 minutes of study (broken into three blocks).

What do you think?


Loading poll ...
Work cited:

Notes:

1 The authors refer to the conditions as

  • Experimental control group
  • DARTs
  • combination of Experiment + DARTs

I am referring to the first group as 'laboratory' both because it not clear the students were doing any experiments (that is, testing hypotheses) as the practical activity was learning to undertake standard analytical tests, and, secondly, to avoid confusion (between the educational experiment and the laboratory practicals).


2 I think the reference to "no issue of pre-test treatment interaction" is probably meant to suggest that as all students took the same pre-test it will have had the same effect on all participants. But this not only ignores the potential effect of any differences in prior knowledge reflected in the pre-test scores that might influence subsequent learning, but also the effect of taking the pre-test cannot be assumed to be neutral if for some learners it merely told them they knew nothing about the topic, whilst for others it activated and so reinforced some prior knowledge in the subject. In principle, the interaction between prior knowledge and taking the pretest could have influenced learning at both cognitive and affective levels: that is, both in terms of consolidation of prior learning and cuing for the new learning; and in terms of a learner's confidence in, and attitude towards, learning the topic.


3 Even when we do have a representative sample of a population to test, we can only infer that the outcomes of an experiment reflect what will be most likely for members (schools, learners, classes, teachers…) of the wider population. Individual differences are such that we can never say that what most probably is the case will always be the case.


When an experiment tests a sample drawn at random from a wider population, then the findings of the experiment can be assumed to apply (on average) to the population. (Source: after Taber, 2019).

Are the particles in all solids the same?

Particle intuitions may not match scientific models


Keith S. Taber


Sophia was a participant in the Understanding Science Project. I first talked to her when she was in Y7, soon after she began her secondary school course.

One of the first topics she studied in her science was 'solids, liquids and gases', where she had learnt,

that solids are really hard and they stay together more, and then liquids are close together but they move around, and gases are really free and they just go anywhere

She had studied a little about the topic in her last year of primary school (Y6), but now she was being told

about the particles…the things that make – the actual thing, make them a solid, and make them a gas and make them a liquid

Particle theory, or basic kinetic theory, is one of the most fundamental theories of modern science. In particular, much of what is taught in school chemistry is explained in terms of theories involving how the observed macroscopic properties emerge from the characteristics and interactions of conjectured sub-microscopic particles that themselves often have quite unfamiliar properties. This makes the subject very abstract, challenging, and tricky to teach (Taber, 2013a).

Read about conceptions of atoms

Particle theory is often introduced in terms of the states of matter. Strictly there are more than three states of matter (plasma and Bose-Einstein condensates are important in some areas of science) but the familiar ones, and the most important in everyday phenomena, are solid, liquid and gas.

The scientific account is, in simple terms, that

  • different substances are made up of different types of particle
  • the different states of matter of a single substance have the same particles arranged differently

These are very powerful ideas, even if there are many complications. For example,

  • the terms solid, liquid and gas only strictly apply to pure samples of a single substance, not mixtures (so not, for example, to bronze, or honey, or, milk, or ketchup, or even {if one is being very pedantic} air or sea water. And cats (please note, BBC) are completely inadmissible. )
  • common salt is an example of a pure substance, that none-the-less is considered to be made up of more than one type of particle

This reflects a common type of challenge in teaching science – the full scientific account is complex and nuanced, and not suitable for presenting in an introductory account; so we need to teach a simplified version that introduced the key ideas, and then only once this is mastered by learners are they ready to develop a more sophisticated understanding.

Yet, there is a danger that students will learn the simplified models as truths supported by the authority of science – and then later have difficulty shifting their thinking on. This is not only counter-productive, but can be frustrating and de-motivating for learners who find hard-earned knowledge is not as sound as they assumed.

One response to this is to teach science form very early in a way that is explicit about how science builds models of the natural world: models that are often simplifications which are useful but need to be refined and developed to become powerful enough to expand the range of contexts and examples where they can be applied. That is, students should learn they are being taught models that are often partial or imperfect, but that is just a reflection of how science works, developing more sophisticated understanding over time (Taber, 2017).

Sophia confirmed that the iron clamp stand near where she was sitting would have particles in it, as would a lump of ice.

Are they the same particles in the ice as the iron?

Yeah, because they are a solid, but they can change.

Ah, how can they change?

Cause if, erm, they melted they would be a liquid so they would have different particles in.

Right, so the iron is a solid, 

Uh hm.

So that's got one type of particle?

Yeah.

And ice is also a solid?

Yeah.

So that has the same sort of particles?

Yeah, but they can change.

The ones in the ice?

Mm,


To a learner just meeting particle theory for the first time, it may seem just as feasible that the same type of particle is found in one state as in one substance.


In the scientific model, we explain that different substances contain different types of particles, whereas different states of the same substance contain different arrangements of the same particles: but this may not be intuitively obvious to learners.1 It seemed Sophia was thinking that the same particles would be in different liquids, but a change of state led to different particles. This may seem a more forced model to a teacher, but then the teacher is already very familiar with the scientific account, and also has an understanding of the nature of those particles (molecules, ions, atoms – with internal structure and charges that interact with each other within and between the particles) – which are just vague, recently imagined, entities to the novice.

Sophia seemed to misunderstood or misremembered the model she had been taught, but to a novice learner these 'particles' have no more immediate referent than an elf or an ogre and would be considerably more tenuous than a will-o'-the-wisp.

Sophia seemed to have an alternative conception, that all solids have one type of particle, and all liquids another. If I had stopped probing at that point I might have considered this to be her thinking on the matter. However, when one spends time talking to students it soon becomes clear that often they have ideas that are not fully formed, or that may be hybrids of different models under consideration, and that often as they talk they can talk themselves into a position.

So, if I melted the ice – that changes the particles in the solid?

Well they are still the same particles but they are just changing the way they act…

Oh.

How do they change?

A particle in a liquid [sic, solid] is all crammed together and don't move around, but in a liquid they can move around a little but they are still close and, can, you can pour a liquid, where you can't a solid, because they can move in. 

Okay, so if I have got my ice, that's a solid, and there are particles in the ice, and they behave in a certain way, and if the ice melts, the particles behave differently?

Yeah.

Do you know why they behave differently in the liquid?

No. {giggles} So, they can, erm

• • • • • • • • • • • •  [A pause of approximately 12 s]

They've more room cause it's all spread out more1, whereas it would be in a clump

The literature on learners conceptions often suggests that students have this or that conception, or (when survey questions are used) that this percentage thinks this, and that percentage thinks that (Taber, 2013b). That this is likely to be a simplification seems obvious is we consider what thinking is – whatever thought may be, is it a dynamic process, something that moves along. Our thinking is, in part, resourced by accessing what we have represented in memory, but it is not something fixed – rather something that shifts, and that often becomes more sophisticated and nuanced as we explore a focus in greater depth.

I think Sophia did seem to have an intuition that there were different types of particles in different states of matter, and that therefore a change of state meant the particles themselves changed in some way. As I probed her, she seemed to shift to a more canonical account where change of state involved a change in the arrangement or organisation of particles rather than their identity.

This may have simply been her gradually bringing to mind what she had been taught – remembering what the teacher had said. It is also possible that the logic of the phenomenon of a solid becoming a liquid impressed on her that they must be the same particles. I suspect there was a little of both.

When interviewing students for research we inevitably change their thinking and understanding to some extent (hopefully, mostly in a beneficial way!) (If only teachers had time to engage each of their students in this way about each new topic they might both better understand their students' thinking, and help reinforce what has been taught.)

Did Sophia 'have a misconception'? 1 What did she 'really think'? That, surely, is to oversimplify.

She presented with an alternative conception, that under gentle questioning she seemed to talk /think herself out of. The extent to which her shift in position reflected further recall (so, correcting her response) or 'thinking through' (so, developing her understanding) cannot be known. Likely there was a little of both. What memory research does suggest is that being asked to engage in and think about this material will have modified and reinforced her memories of the material for the future.

Read about the role of memory in teaching and learning


Work cited:

Note

1 Actually, the particles in a liquid are not substantially spread further apart than in a solid. (Indeed, when ice melts the water molecules move closer together on average.) Understanding melting requires an appreciation of the attractions between particles, and how heating provides more energy for the particles. This idea of increased separation on melting is therefore something of an alternative conception, if one that is sometimes encouraged by the diagrams in school textbooks.

Teaching an introductory particle theory based on the arrangement of particles in different states, without reference to the attractions between particles is problematic as it offers no rational basis for why condensed states exists, and why energy is needed to disrupt them – something highlighted in the work of Philip Johnson (2012).



Balls to Nature

Making the unfamiliar familiar – with everyday spheres



Keith S. Taber


Even scientists reporting their work in top research journals are not above using comparisons with everyday analogues to explain their ideas.


An analogue for a molecular structure?

(Image by Eduardo Ponce de Leon from Pixabay)


One of the phrases I return to a good deal on these pages is 'making the unfamiliar familiar' because a large part of science teaching is indeed about introducing scientific concepts that are currently unfamiliar to learners (oxidising agents, the endoplasmic reticulum, moments of inertia…the list is extensive!), so they become familiar to learners.

So, teachers use analogies, metaphors, narratives, images, models, and so forth, to help link something new (and often abstract) to whatever 'interpretive resources' the teacher thinks the learners have available to make sense of what is still novel to them.

Read about key ideas for constructivist teaching

This process can certainly go wrong – learners can confuse what is meant as a kind of stepping stone towards a scientific concept (e.g., a teaching analogy, or a simplified model) for the concept itself. So, as just one example, dot and cross figures showing electron transfer between atoms that are sometimes employed to help introduce the idea of ionic bonding come to be confused with ionic bonding itself – so that learners come to wrongly assume electron transfer is a necessary part of ionic bond formation – or, worse, that ionic bonding is electron transfer (e.g., Taber, 1994).

The familiarisation devices used in teaching, then, could be seen as a kind of 'dumbing down' as they work with the familiar and concrete or easily visualised or represented, and fall short of the scientific account. Yet, this approach may be necessary to produce meaningful learning (rather than rote learning that is not understood, and is soon forgotten or becomes confused).

Scientists need to make the unfamiliar familiar

So, it is worth pointing out that scientists themselves, not just science teachers and journalists, often appreciate the need to introduce new ideas in terms their readers can imagine and make sense of. I have noted lots of examples from such contexts on this site. 1 Now this happens a lot in 'popular' science communication, when a scientist is writing for a general audience or being interviewed by a journalist.

Read about science in public discourse and the media

But it also happens when scientists are primarily addressing their peers in the scientific research community. One of my favourite examples is the liquid drop model of the nucleus.

The atomic nucleus is like a drop of liquid because…

Lise Meitner had been working with Otto Hahn and Fritz Strassmann in the Kaiser Wilhelm Gesellschaft in Berlin, Germany, where they were investigating properties of radioactive elements. It was known some heavy elements would decay through processes such as alpha decay, which leads to an element with an atomic number two less than the starting material. 2 Their laboratory results, however, suggested that bombarding uranium with neutrons would directly lead to elements much less massive than the uranium.


Lise Meitner in the laboratory (with Otto Hahn) [Hahn and Meitner in Emil Fischer's Chemistry Institute in Berlin, 1909 – source: https://commons.wikimedia.org/wiki/File:Hahn_and_Meitner_in_1912.jpg]

By the time these results were available, Meitner had left Germany for her own safety. She would have been subject to persecution by the Nazis – quite likely she would have been removed from her scientific work, and then later sent to one of the concentration camps before being murdered as part of the genocide carried out against people the Nazis identified as Jews. 3

Hahn and Strassmann sent Meitner their findings – which did not make sense in terms of the nuclear processes known at the time. With her nephew, Otto Robert Frisch, Meitner decided the results provided evidence of a new phenomenon based on a previously unexpected mechanism of nuclear decay – fission. Nuclear fission was the splitting of a heavy nucleus into two smaller nuclei of roughly similar mass (where alpha decay produced a daughter nearly as heavy along with the very light helium nucleus).

Meitner and Frisch explained this by suggesting a new model or analogy for the nucleus:

"On account of their close packing and strong energy exchange, the particles in a heavy nucleus would be expected to move in a collective way which has some resemblance to the movement of a liquid drop. If the movement is made sufficiently violent by adding energy, such a drop may divide itself into two smaller drops."

Meitner & Frisch, 1939

This was published in the top scientific journal, Nature – but this was no barrier to the scientists using an everyday, familiar, analogy to explain their ideas.


An energetic liquid drop may fission
(Image by Gerhard Bögner from Pixabay)

Chemistry and the beautiful game?

A much later example appeared in the same journal when Kroto and colleagues published their paper about the newly reported allotrope of carbon (alongside graphite and diamond) with formula C60 by including a photograph in their article. A photograph of…an ordinary football!

They used the football to explain the suggested molecular geometry of C60, which they referred to as buckinsterfullerene,

"Concerning the question of what kind of 60-carbon atom structure might give rise to a superstable species, we suggest a truncated icosahedron, a polygon with 60 vertices and 32 faces, 12 of which are pentagonal and 20 hexagonal. This object is commonly encountered as the football shown in Fig. 1."

Kroto, et al., 1985

A football (notice the panels are hexagons and pentagons 4). (Image by NoName_13 from Pixabay)

Kroto and colleagues submitted a photograph like this to be published as a figure in their scientific report of the discovery of the buckminsterfullerene allotrope of carbon


What could be more familiar to people than the kind of ball used in Association Football ('soccer')? (Even if this is not really a truncated icosahedron 4). Their figure 1 showed,

"A football (in the United States, a soccerball) on Texas grass. The C60 molecule featured in this letter is suggested to have the truncated icosahedral structure formed by replacing each vertex on the seams of such a ball by a carbon atom."

Kroto, et al., 1985

The scientists explained they had come across the suggested shape when searching for a viable molecular structure that fitted the formula (sixty carbon atoms and nothing else) and which would also satisfy the need for carbon to be tetravalent. They investigated the works of the designer/architect Richard Buckminster Fuller, famous for his geodesic domes.


A stamp commemorating the life and works of Richard Buckminster Fuller and representing geodesic domes.


Thus they provisionally called the new substance buckinsterfullerene, albeit they acknowledged this name might be something of a 'mouthful', so to speak,

"We are disturbed at the number of letters and syllables in the rather fanciful but highly appropriate name we have chosen in the title [of their paper] to refer to this C60 species. For such a unique and centrally important molecular structure, a more concise name would be useful. A number of alternatives come to mind (for example, ballene, spherene, soccerene, carbosoccer), but we prefer to let this issue of nomenclature be settled by consensus."

Kroto, et al., 1985

We now know that the term 'buckyballs' has become popular, but only as a shorthand for the mooted name: buckinsterfullerene. (Later other allotropic form of carbon based on closed shell structures were discovered – e.g., C70. The shorter term fullerenes refers to this group of allotropes: buckminsterfullerene is one of the fullerenes.)

I recall seeing a recording of an interview with Harry Kroto where he suggested that the identification of the structure with the shape of a football came during a transatlantic phone call. What I would love to know is whether Kroto and his co-authors were being somewhat mischievous when they decided to illustrate the idea by asking the world's most famous science journal to publish a figure that was not some abstract scientific representation, but just a photograph of a football. Whether or not they were expecting kick-back [sorry] from the journal's peer reviewers and editor, it did not act as an impediment to Curl, Kroto and Smalley being awarded the 1996 Nobel prize for chemistry "for their discovery of fullerenes" (https://www.nobelprize.org/prizes/chemistry/1996/summary/).


Work cited:
  • Kroto, H., Heath, J., O'Brien, S., Curl, R. F. & Smalley, R. E. (1985) C60: Buckminsterfullerene. Nature, 318, 162-163. https://doi.org/10.1038/318162a0
  • Meitner, L., Frisch, O.R. (1939) Disintegration of Uranium by Neutrons: a New Type of Nuclear Reaction. Nature, 143, 239-240. https://doi.org/10.1038/143239a0
  • Taber, K. S. (1994) Misunderstanding the ionic bond, Education in Chemistry, 31 (4), pp.100-103.


Notes:

1 There is a range of tactics that can be used to help communicate science. Generally, to the extent these make abstract ideas accessible, they are presentations that fall short of the scientific account – and so they are best seen as transitional devices to offer intermediate understandings that will be further developed.

I have included on the site a range of examples I have come across of some of the ways in which science is taught and communicated through analogies, metaphors and so forth. Anthropomorphism is when non-human objects are discussed as if having human feelings intentions and so forth.

Read about science analogies

Read about science metaphors

Read about science similes

Read about anthropomorphism in science discourse

Scientific certainty in the media

Personification in science


2 The radioactive decay of unstable but naturally occurring uranium and thorium takes place by a series of nuclear processes, each producing another radioactive species, till a final step produces an isotope which can be considered stable – 206Pb (from decay of 238U), 207Pb (from decay of 235U) or 208Pb (from decay of 232Th). By a pure coincidence of language (a homograph), in English, these radioactive decay cascades lead to lead (Pb).


3 That is not to say most of those murdered because they were Jewish would not have self-identified as such, but rather that the Third Reich had its own racist criteria (established by law in 1935) for deciding who should be considered a Jew based on unscientific notions of bloodlines – so, for example, being a committed and practising Christian was no protection if the Nazis decided you were from a Jewish family.

(Nazi thinking also drew on a very influential but dangerous medical analogy of the volk (people) as a body that allowed those not considered to belong to the body to be seen as akin to foreign microbes that could cause disease unless eliminated.)


4 Of course a football is not a truncated icosahedron – it is intended to be, as far as possible, spherical! The pentagons and hexagons are made of a flexible material, and within them is a 'bladder' (nowadays this is just a metaphor!) which is an elastic sphere that when inflated presses against the outer layers.

If a football was built using completely rigid panels, then it would be a truncated icosahedron. However, such a 'ball' would not roll very well, and would likely cause some nasty head injuries. Presumably the authors were well aware of this, and assumed their readers would see past the problem with this example and spontaneously think of some kind of idealised, if far from ideal, football.


A molecular Newton's cradle?

A chain reaction with no return


Keith S. Taber


Have chemist's created an atomic scale Newton's cradle?

(Image by Michelle from Pixabay)

Mimicking a Newton's cradle

I was interested to read in an issue of Chemistry World that

"Scientists in Canada have succeeded in setting off a chain of reactions in which fluorine atoms are passed between molecules tethered to a copper surface. The sequence can be repeated in alternating directions, mimicking the to-and-fro motions of a Newton's cradle."

Blow, 2022

The Chemistry World report explained that

"The team of researchers…affixed fluorocarbons to a [copper] surface by chemisorption, constructing chains of CF3 molecules terminated by a CFmolecule – up to four molecules in total….

The researchers applied an electron impulse to the foremost CF3 molecule, causing it to spit out a fluorine atom along the chain. The second CF3 absorbed this atom, but finding itself unstable, ejected its leading fluorine towards the third molecule. This in turn passed on a fluorine of its own, which was taken up by the taken up by the CF2 molecule in fourth position."

Blow, 2022

There is some interesting language here – a molecule "spits out" (a metaphor?) an atom, and another "finds itself" (a hint of anthropomorphism?) unstable.


Molecular billiards?
Can a line of molecules 'tethered' onto a metal surface behave like a Newton's cradle?

Generating reverse swing

The figure below was drawn to represent the work as described, showing that "another electron impulse could be used to set… off…a reverse swing".


A representation of the scheme described in Chemistry World. The different colours used for the fluorine 'atoms' 1 are purely schematic to give a clear indication of the changes – the colours have no physical significance as all the fluorine atoms are equivalent. 2 The molecules are shown here as if atoms were simply stuck to each other in molecules (rather than having become one larger multi-nuclear structure) for the same reason. 1 In science we select from different possible models and representations for particular purposes.3


That reference to "another electron impulse" being needed is significant,

"What was more, each CF3 had been flipped in the process, so the Newton's cradle as a whole was a mirror image of how it had begun, giving the potential for a reverse swing. Unlike a desk Newton's cradle, it did not swing back on its own accord, but another electron impulse could be used to set it off."

Blow, 2022
"…the Newton's cradle as a whole was a mirror image of how it had begun"

Mirroring a Newton's cradle

Chemistry World is the monthly magazine of the Royal Society of Chemistry (a learned society and professional body for chemists, primarily active in the UK and Eire) sent to all its members. So, Chemistry World is part of the so-called secondary literature that reports, summarises, and comments on the research reports published in the journals that are considered to comprise the primary academic literature. The primary literature is written by the researchers involved in the individual studies reported. Secondary literature is often written by specialist journalists or textbook authors.

The original report of the work (Leung, Timm & Polanyi, 2021) was published in the research journal Chemical Communications. That paper describes how:

"Hot [sic] F-atoms travelling along the line in six successive 'to-and-fro' cycles paralleled the rocking of a macroscopic Newton's cradle."

Leung, Timm & Polanyi, 2021, p.12647

A simple representation of a Newton's cradle (that is, "a macroscopic Newton's cradle")


These authors explain that

"…energised F can move to- and-fro. This occurs in six successive linear excursions, under the influence of electron-induced molecular dissociation at alternate ends of the line…. The result is a rocking motion of atomic F which mirrors, at the molecular scale, the classic to-and-fro rocking of a macroscopic Newton's cradle. Whereas a classic Newton's cradle is excited only once, the molecular analogue [4] here is subjected to opposing impulses at successive 'rocks' of the cradle.

The observed multiple knock-on of F-atoms travelling to-and-fro along a 1D row of adsorbates [molecules bound to a substrate] is shown…to be comparable with the synchronous motion of a Newton's cradle."

Leung, Timm & Polanyi, 2021, p.12647-50
Making molecules rock?

'Rocking' refers to a particular kind of motion. In a macroscopic context, there are familiar example of rocking as when a baby is cradled in the arms and gently 'rocked' back and forth.


A rocking chair is designed to enable a rocking motion where the person in the chair moves back and forth through space.

The molecular system described by Leung and colleagues is described as "mirror[ing], at the molecular scale…to-and-fro rocking"

[Image by OpenClipart-Vectors from Pixabay]


The researchers are suggesting that, in some sense, the changes in their molecular scale system are equivalent to "the synchronous motion of a Newton's cradle".

Titles and texts in scientific writing

One feature of interest here is a difference between the way work is described in the article titles and the main texts.


Chemistry society professional journalAcademic research journal
Title"…molecular Newton's cradle""…an atomic-scale Newton's cradle"
TextThe effect was "mimicking … a Newton's cradle."The effect
"paralleled…
mirrors…
[is] comparable with
"
Newton's cradle
Bold titles: nuanced details

Titles need to capture the reader's attention (and in science today the amount of published material is vastly more than only one person could read) so there is a tendency to be bold. Both these articles have titles suggesting that they are reporting a nanoscopic Newton's cradle. The reader enticed to explore further then discovers that there are caveats. What is being claimed is not a Newton's cradle at minuscule scale but something which though not actually a Newton's cradle, does have some similarity to (mimics, parallels, mirrors) one.

This is important as "the molecular analogue" is only analogous in some respects.

The analogy

There is an analogy, but the analogy can only be drawn so far. In the analogy, the suspended balls of the Newton's cradle are seen as analogous to the 'chemisorbed' molecules lined up on the surface of a copper base.

Analogies are used in teaching and in science communication to help 'make the unfamiliar familiar', to show someone that something they do not (yet) know about is actually, in some sense at least, a bit like something they are already familiar with. In an analogy, there is a mapping between some aspect(s) of the structure of the target ideas and the structure of the familiar phenomenon or idea being offered as an analogue. Such teaching analogies can be useful to the extent that someone is indeed highly familiar with the 'analogue' (and more so than with the target knowledge being communicated); that there is a helpful mapping across between the analogue and the target; and that comparison is clearly explained (making clear which features of the analogue are relevant, and how).

Analogies only map some features from analogue to target. If there was a perfect transfer from one system to the other, then this would not be an analogy at all, but an identity! So, in a sense there are no perfect analogies as that would be an oxymoron. Understanding an analogy as intended therefore means appreciating which features of the analogue do map across to the target, and which do not. Therefore in using analogies in teaching (or communicating science) it is important to be explicit about which features of the analogue map across (the 'positive' analogy) and which do not, including features which it would be misleading to seek to map across – the so called 'negative analogy.' For example, when students think of an atom as a tiny solar system, they may assume that atom, like the solar system, is held together by gravitational force (Taber, 2013).

It probably seems obvious to most science teachers that, if comparing the atom with a solar system, the role that gravity has in binding the solar system maps across to the electrical attraction between a positive nucleus and negative electrons; but when a sample of 14-18 year-olds were asked about atoms and solar systems, a greater number of them suggested the force binding the atom was gravitational than suggested it was electrical (Taber, 2013)!

Perhaps the most significant 'negative analogy' in the research discussed here was pointed out in both the research paper and the subsequent Chemistry World report, and relates to the lack of inherent oscillation in the molecular level system. The nanoscopic system is like a Newton's cradle that only has one swing, so the owner has to reset it each half cycle.

  • "Unlike a desk Newton's cradle, it did not swing back on its own accord, but another electron impulse could be used to set it off."
  • "Whereas a classic Newton's cradle is excited only once, the molecular analogue here is subjected to opposing impulses at successive 'rocks' of the cradle"

That is quite a major difference when using the Newton's cradle for an analogy.


Who wants a Newton's cradle as an executive toy if it needs to be manually reset after each swing?


The positive and negative analogies

We can consider that the Newton's cradle is a little like a simple pendulum that swings back and forth, with the complication that instead of a single bob swinging back and forth, the two terminal spheres share the motion between them due to the momentum acquired by one terminal sphere being transferred thorough the intermediate spheres to the other terminal sphere.

In understanding the analogy it is useful to separately consider these two features of a Newton's cradle

  • a) the transfer of momentum through the sequence
  • b) moving a mass through a gravitational field

If we then think of the Newton's cradle as a 'pendulum with complications' it seems that the molecular system described by Leung and colleagues fails to share a critical feature of a pendulum.

A chain reaction – the positive analogy

The two systems map well in so far as that they comprise a series of similar units (spheres, molecules) that are carefully aligned, and constrained from moving out of alignment, and that there is a mechanism that allows a kind of chain reaction.

In the molecular scenario, the excitation of a terminal molecule causes a fluorine atom to become unbound from the molecule and to carry enough momentum to collide with and excite a second molecule, binding to it, whilst causing the release of one of the molecule's original fluorine atoms which is similarly ejected with sufficient momentum to collide with the next molecule…

This 'chain reaction' 5 is somewhat similar to how, in a Newton's cradle, the momentum of a swinging sphere is transferred to the next, and then to the next, and then the next, until finally all the momentum is transferred to the terminal sphere. (This is an idealised cradle, in any real cradle the transfer will not be 100% perfect.) This happens because the spheres are made from materials which collide 'elastically'.6


The positive analogy: The notion of an atomic level Newton's cradle makes use of a similarity between two systems (at very different scales) where features of one system map onto analogous features of the other.

The negative analogy

Given that positive mapping, a key difference here is the way the components of the system (suspended spheres or chemisorbed molecules) are 'tethered'.

Chemisorbed molecules

The molecules are attached to the copper surface by chemical bonding, which is essentially an electromagnetic interaction. A sufficient input of energy could certainly break these bonds, but the the impulse being applied parallel to the metal surface is not sufficient to release the molecules from the substrate. It is enough to eject a fluorine atom from a molecule where carbon is already bound to the surface and three other fluorines atoms (carbon is tetravalent, but it is is bonded to the copper as well as the fluorines) – but the final molecule is an adsorbed CF2 molecule, which 'captures' the fluorine and becomes an absorbed CF3 molecule.

Now, energy is always conserved in all interactions, and momentum is also always conserved. If the kinetic energy of the 'captured' fluorine atom does not lead to bond breaking it must end up somewhere else. The momentum from the 'captured' atom must also be transferred somewhere.

Here, it may be useful to think of chemical bonds as having a similarity to springs – in the limited sense that they can be set vibrating. If we imagine a large structure made up of spheres connected by springs, we can see that if we apply a force to one of the spheres, and the force is not enough to break the spring, the sphere will start to oscillate, and move any spheres connected to it (which will move spheres attached to them…). We can imagine the energy from the initial impulse, and transferred through the chain of molecules, is dissipated though the copper lattice, and adds to its internal energy. 7


The fluorocarbon molecules are bound to the surface by chemical bonding. If the energy of impact is insufficient to cause bond breaking, it will be dissipated.

Working against gravity

In a simple pendulum, work is done on a raised sphere by the gravitational field, which accelerates the bob when it is released, so that it is moving at maximum speed when it reaches the lowest point. So, as it is moving, it has momentum, and its inertia means it continues to swing past the equilibrium position which is the 'attractor' for the system. In a Newton's cradle the swinging sphere cannot continue when it collides with the next sphere, but as its momentum is transferred through the train of spheres the other terminal sphere swings off, vicariously continuing the motion.

In an ideal pendulum with no energy losses the bob rises to its original altitude (but on the other side of the support) by which time it has no momentum left (as gravitational force has acted downwards on it to reduce its momentum) – but gravitational potential energy has again built up in the system to its original level. So, the bob falls under gravity again, but, being constrained by the wire, does not fall vertically, rather it swings back along the same arc.

It again passes the equilibrium position and returns to the point where it started, and the process is repeated. In an ideal pendulum this periodic oscillation would continue for ever. In a real pendulum there are energy losses, but even so, a suitable bob can swing back an forth for some time, as the amplitude slowly reduces and the bob will eventually stop at the attractor, when the bob is vertical.

In a (real) Newton's cradle, one ball is raised, so increasing the gravitational potential energy of the system (which is the configuration of the cradle, with its spheres, plus the earth). When it is released, gravity acts to cause the ball to fall. It cannot fall vertically as it is tethered by a steel (or similar) wire which is barely extendible, so the net force acting causes the ball to swing though an arc, colliding with the next ball.


The Newton's cradle design allows the balls to change their 'height' in relation to a vertical gravitational field direction – in effect storing energy in a higher gravitational field configuration that can do work to continue the oscillation. The molecular analogue 4 does not include an equivalent mechanism that can lead to simultaneous oscillation.
(Image by 3D Animation Production Company from Pixabay)

Two types of force interactions

The steel spheres, however, are actually subject to two different kinds of force. They are, like the molecules, also tethered by the electromagnetic force (they are attached to steel wires which are effectively of fixed length due to the bonding in the metal 8), but, in addition, subject to the gravitational field of the earth. 9 The gravitational field is relevant because a sphere is supported by a wire that is fixed to a rigid support (the cradle) at one end, but free to swing at the end attached to the sphere.

The Newton's cradle operates in what is in effect a uniform gravitational field (neither the radial nature or variation with altitude of the earth's field are relevant on the scale of the cradle) – and the field direction is parallel to the plane in which the balls hang. So, the gravitational potential of the system changes as a sphere swings higher in the field.


In a Newton's cradle, a tethered sphere's kinetic energy allows it to rise in a gravitational field, before swinging back gaining speed (and regaining kinetic energy)

The design of the system is such that a horizontal impulse on a sphere leads to it swinging upwards – and gravity then acts to accelerate it towards a new collision. 10 This collision, indirectly, gives a horizontal impulse to the sphere at the other end of the 'train' where again the nature of the support means the sphere swings upward – being constrained by both the wire maintaining its distance from the point of suspension at the rigid support of the frame, and its weight acting downwards.

The negative analogy concerns the means of constraining the system components

The two systems then both have a horizontal impulse being transferred successively along a 'train' of units. Leung and colleagues' achievement of this at the molecular scale is impressive.

However, the means of 'tethering' in the two systems is different in two significant ways. The spheres in the Newton's cradle are suspended from a rigid frame by inextensible wires that are free to swing. Moreover, the cradle is positioned in a field with a field direction perpendicular to the direction of the impulse. This combination allows horizontal motion to be converted to vertical motion reversibly.

The molecular system comprises molecules bound to a metal substrate. The chemisorbtion is less like attaching the molecules with long wires that are free to swing, and more like attaching them with short, stiff springs. Moreover, at the scale of the system, the substrate is less like a rigid frame, and more like a highly sprung mattress. So, even though kinetic energy from the 'captured' fluorine atom can be transferred to the bond, this can then be dissipated thorough the lattice.


The negative analogy: the two systems fail to map across in a critical way such that in a Newton's cradle one initial impulse can lead to an extended oscillation, but in the molecular system the initiating energy is dissipated rather than stored to reverse the chemical chain reaction.

The molecular system does not enable the terminal molecule to do work in some form that can be recovered to reverse the initial process. By contrast, a key feature of a Newton's cradle is that the spheres are constrained ('tethered') in a way that allows them to move against the gravitational field – they cannot move further away from, nor nearer to, their point of support, yet they can swing up and down and change their distance from the earth. Mimicking that kind of set-up in a molecular level system would indeed be an impressive piece of nano-engineering!


Work cited:
  • Blow, M. (2022). Molecular Newton's cradle challenges theory of transition states. Chemistry World, 19(1), 38.
  • Leung, L., Timm, M. J., & Polanyi, J. C. (2021). Reversible 1D chain-reaction gives rise to an atomic-scale Newton's cradle. Chemical Communications, 57(94), 12647-12650. doi:10.1039/D1CC05378G
  • Taber, K. S. (2013). Upper Secondary Students' Understanding of the Basic Physical Interactions in Analogous Atomic and Solar Systems. Research in Science Education, 43(4), 1377-1406. doi:10.1007/s11165-012-9312-3 (The author's manuscript version may be downloaded here.)

Notes

1 Strictly they are no distinct atoms once several atoms have been bound together into a molecule, but chemists tend to talk in a shorthand as if the atoms still existed in the molecules.


2 Whilst I expect this is obvious to people who might choose to read this posting, I think it is worth always being explicit about such matters as students may develop alternative conception at odds with scientific accounts.

In the present case, I would be wary of a learner thinking along the lines "of course the atom will go back to its own molecule"

Students will commonly transfer the concepts of 'ownership' and 'belonging' from human social affairs to the molecular level models used in science. Students often give inappropriate status to the history of molecular processes (as if species like electrons recall and care about their pasts). One example was a student who suggested to me that in homolytic bond breaking each atom would get its own electron back – meaning the electrons in the covalent bond would return to their 'own' atoms.

I have also been told that in double decomposition (precipitation) reactions the 'extra' electron in an anion would go back to its own cation in the reagents, before the precipitation process can occur (that is, precipitation was not due to the mutual attraction between ions known to be present in the reaction mixture: they first had to become neutral atoms that could then from an ionic bond by electron transfer!) In ionic bonding it is common for learners to think that an ionic bond can only be formed between ions that have been formed by a (usually fictitious) electron transfer event.

Read about common alternative conceptions of ionic bonding

Read about a classroom resource to diagnose common alternative conceptions (misconceptions) of ionic bonding

Read about a classroom resource to support learning about the reaction mechanism in precipitation reactions


3 I have here represented the same molecules both as atoms linked by bonds (where I am focusing on the transfer of fluorine atoms) and in other diagrams as unitary spheres (where I am focusing on the transfer of energy/momentum). All models and representations used for atoms and molecules are limited and only able to reflect some features of what is being described.


4 A note on terminology. An analogy is used to make the unfamiliar familiar by offering a comparison with something assumed to already be familiar to an audience, in this case the molecular system is the intended target, and the (that is, a generic) Newton's cradle is the analogue. However, analogy – as a mapping between systems – is symmetrical so each system can be considered the analogue of the other.


5 In some way's Leung's system is more like a free radical reaction than a Newton's cradle. A free radical is an atom (or molecule) with an unpaired electron – such as an unbound fluorine atom!

In a free radical reaction a free radical binds to a molecule and in doing so causes another atom to be ejected from the molecule – as a free radical. That free radical can bind to another molecule, again causing it to generate a new free radical. In principle this process can continue indefinitely, although the free radical could also collide with another free radical instead of a molecule, which terminates the chain reaction.


6 The balls need to be (near enough) perfectly elastic for this to work so the total amount of kinetic energy remains constant. Momentum (mv) is always conserved in any collision between balls (or other objects).

If there were two balls, then the first (swinging) sphere would be brought to a stop by the second (stationary) sphere, to which its momentum would be transferred. So, the first ball would stop swinging, but the second would swing in its place. The only way mv and mv2 (and so kinetic energy) can be both conserved in collisions between balls of the same mass is if the combination of velocities does not change. That is, mathematically, the only solutions are where neither of the two balls' velocities change, or where they are swapped to the other permutation (here, the velocity of the moving ball becomes zero, but the stationary ball moves off with the velocity that the ball that hit it had approached it with).

The first solution would require the swinging steel ball to pass straight through the stationary steel ball without disturbing it. Presumably, quantum mechanics would suggest that ('tunnelling') option has a non-zero (but tiny, tiny – I mean really tiny) probability. To date, in all known observations of Newton's cradles no one has reported seeing the swinging ball tunnel though the stationary ball. If you are hoping to observe that, then, as they say, please do not hold your breath!

With more balls momentum is transferred through the series: only the final ball is free to move off.


7 We can imagine that in an ideal system of a lattice of perfectly rigid spheres attached to perfect springs (i.e., with no hysteresis) and isolated from any other material (n.b., in Leung et al 's apparatus the copper would not have been isolated from other materials), the whole lattice might continue to oscillate indefinitely. In reality the orderliness will decay and the energy will have in effect warmed the metal.


8 Strictly, the wires will be longest when the spheres are directly beneath the points of support, as the weight of a sphere slightly extends the wire from its equilibrium length, and it will get slightly shorter the further the sphere swings away from the vertical position. In the vertical position, all the weight is balanced by a tension in the wire. As the ball swings away from the vertical position, the tension in the wire decreases (as only the component of weight acting along the wire needs to be balanced) and an increasing component of the weight acts to decelerate it. But the change in extension of the wire is not significant and is not noticeable to someone watching a Newton's cradle.

When the wire support is not vertical a component of the weight of the sphere acts to change the motion of the sphere


9 Molecules are also subject to gravity, but in condensed matter the effect is negligible compared with the very much stronger electromagnetic forces acting.


10 We might say that gravity decelerates the sphere as is swings upwards and then accelerates as it swings back down. This is true because that description includes a change of reference direction. A scientist might prefer to say that gravity applies a (virtually) constant downward acceleration during the swing. This point is worth making in teaching as a very common alternative conception is to see gravity only really taking effect at the top of the swing.


The passing of stars

Birth, death, and afterlife in the universe


Keith S. Taber


stars are born, start young, live, sometimes living alone but sometimes not, sometimes have complicated lives, have lifetimes, reach the end of their lives, and die, so, becoming dead, eventually long dead; and, indeed, there are generations of stars with life cycles


One of the themes I keep coming back to here is the challenge of communicating abstract scientific ideas. Presenting science in formal technical language will fail to engage most general audiences, and will not support developing understanding if the listener/reader cannot make good sense of the presentation. But, if we oversimplify, or rely on figures of speech (such as metaphors) in place of formal treatments of concepts, then – even if the audience does engage and make sense of the presentation – audience members will be left with a deficient account.

Does that matter? Well, often a level of understanding that provides some insight into the science is far better than the impression that science is so far detached from everyday experience that it is not for most people.

And the context matters.

Public engagement with science versus science education

In the case of a scientist asked to give a public talk, or being interviewed for news media, there seems a sensible compromise. If people come away from the presentation thinking they have heard about something interesting, that seems in some way relevant to them, and that they understood the scientist's key messages, then this is a win – even if it is only a shift to an over-simplified account, or an understanding in terms of a loose analogy. (Perhaps some people will want to learn more – but, even if not, surely this meets some useful success criterion?)

In this regard science teachers have a more difficult job to do. 1 The teacher is not usually considered successful just because the learners think they have understood teaching, but rather only when the learners can demonstrate that what they have learnt matches a specified account set out as target knowledge in the curriculum. This certainly does not mean a teacher cannot (or should not) use simplification and figures of speech and so forth – this is often essential – but rather that such such moves can usually only be seen as starting points in moving learners onto temporary 'stepping stones' towards creditable knowledge that will eventually lead to test responses that will be marked correct.


An episode of 'In Our Time' on 'The Death of Stars'
"The image above is of the supernova remnant Cassiopeia A, approximately 10,000 light years away, from a once massive star that died in a supernova explosion that was first seen from Earth in 1690"

The Death of Stars

With this in mind, I was fascinated by an episode of the BBC's radio show, 'In Our Time' which took as its theme the death of stars. Clearly, this falls in the category of scientists presenting to a general public audience, not formal teaching, and that needs to be borne in mind as I discuss (and perhaps even gently 'deconstruct') some aspects of the presentation from the perspective of a science educator.

The show was broadcast some months ago, but I made a note to revisit it because I felt it was so rich in material for discussion, and I've just re-listened. I thought this was a fascinating programme, and I think it is well worth a listen, as the programme description suggests:

"Melvyn Bragg and guests discuss the abrupt transformation of stars after shining brightly for millions or billions of years, once they lack the fuel to counter the force of gravity. Those like our own star, the Sun, become red giants, expanding outwards and consuming nearby planets, only to collapse into dense white dwarves. The massive stars, up to fifty times the mass of the Sun, burst into supernovas, visible from Earth in daytime, and become incredibly dense neutron stars or black holes. In these moments of collapse, the intense heat and pressure can create all the known elements to form gases and dust which may eventually combine to form new stars, new planets and, as on Earth, new life."

https://www.bbc.co.uk/sounds/play/m0018128

I was especially impressed by the Astronomer Royal, Professor Martin Rees (and not just because he is a Cambridge colleague) who at several points emphasised that what was being presented was current understanding, based on our present theories, with the implication that this was open to being revisited in the light (sic) of new evidence. This made a refreshing contrast to the common tendency in some popular science programmes to present science as 'proven' and so 'certain' knowledge. That tendency is an easy simplification that distorts both the nature and excitement of science.

Read about scientific certainty in the media

Presenter Melvyn Bragg's other guests were Carolin Crawford (Emeritus Member of the Institute of Astronomy, and Emeritus Fellow of Emmanuel College, University of Cambridge) and Mark Sullivan (Professor of Astrophysics at the University of Southampton).

Public science communication as making the unfamiliar familiar

Science communicators, whether professional journalists or scientists popularising their work, face similar challenges to science teachers in getting across often complex and abstract ideas; and, like them, need to make the unfamiliar familiar. Science teachers are taught about how they need to connect new material with the learners' prior knowledge and experiences if it is to make sense to the students. But successful broadcasters and popularisers also know they need to do this, using such tactics as simplification, modelling, metaphor and simile, analogy, teleology, anthropomorphism and narrative.

There were quite a few examples of the speakers seeking to make abstract ideas accessible to listeners in such ways in this programme. However, perhaps the most common trope was one set up by the episode title, and one which could very easily slip under radar (so to speak). In this piece I examine the seemingly ubiquitous metaphor (if, indeed, it is to be considered a metaphor!) of stars being alive; in a sequel I discuss some of the wide range of other figures of speech adopted in this one science programme.

Science: making the familiar, unfamiliar?

If when working as a teacher I saw a major part of my work as making the unfamiliar familiar to learners, in my research there was a sense in which I needed to make the familiar unfamiliar. Often, the researcher needs to focus afresh on the commonly 'taken-for-granted' and to start to enquire into it as if one does not already know about it. That is, one needs to problematise the common-place. (This reflects a process sometimes referred to as 'bracketing'.)

To give one obvious example. Why do some students do well in science tests and others less well? Obviously, because some learners are better science students than others! (Clearly in some sense this is true – but is it just a tautology? 2) But one clearly needs to dig into this truism in more detail to uncover any insights that would actually be useful in supporting students and improving teaching!

The same approach applies in science. We do not settle for tautologies such as fire burns because fire is the process of burning, or acids are corrosive because acids are the category of substances which corrode; nor what are in effect indirect disguised tautologies such as heavy objects fall because they are largely composed of the element earth, where earth is the element whose natural place is at the centre of the world. (If that seems a silly example, it was the widely accepted wisdom for many centuries. Of course, today, we do not recognise 'earth' as a chemical element.)

I mention this, because I would like to invite readers to share with me in making the familiar unfamiliar here – otherwise you could easily miss my point.

"so much in the Universe, and much of our understanding of it, depends on changes in stars as they die after millions or billions of stable years"

Tag line for 'the Death of Stars'

The lives of stars

The episode opens with

"Hello. Across the universe, stars have been dying for millions of years…

Melvyn Bragg introducing the episode

The programme was about the death of stars – which directly implies stars die, and, so, also suggests that – before dying – they live. And there were plenty of references in the programme to reinforce this notion. Carolin Crawford suggested,

"So, essentially, a star's life, it can exist as a star, for as long as it has enough fuel at the right temperature at the right density in the core of the star to stall the gravitational collapse. And it is when it runs out of its fuel at the core, that's when you reach the end of its lifetime and we start going through the death processes."

Prof. Carolin Crawford talking on 'In Our Time'

Not only only do stars have lives, but some have much longer lives than others,

"…more massive stars can … build quite heavy elements at their cores through their lifetimes. And … they actually have shorter lifetimes – it is counter-intuitive, but they have to chomp through their fuel supply so furiously that they exhaust it more rapidly. So, the mass of the star dictates what happens in the core, what you create in the core, and it also determines the lifetime of the star."

"The mass of the star…determines the lifetime of the star….
our sun…we reckon it is about halfway through its lifetime, so stars like the sun have lifetimes of 10 billions years or so…"


Prof. Carolin Crawford talking on 'In Our Time'

This was not some idiosyncratic way that Professor Crawford had of discussing stars, as Melvyn's other guests also used this language. Here are some examples I noted:

  • "this is a dead, dense star" (Martin Rees)
  • "the lifetime of a stable star, we can infer the … life cycles of stars" (Martin Rees)
  • "stars which lived and died before our solar system formed…stars which have more complicated lives" (Martin Rees)
  • "those old stars" (Martin Rees)
  • "earlier generations of massive stars which had lived and died …those long dead stars" (Martin Rees)
  • "it is an old dead star" (Mark Sullivan)
  • "our sun…lives by itself in space. But most stars in the universe don't live by themselves…" (Mark Sullivan)
  • "two stars orbiting each other…are probably born with different masses" (Mark Sullivan)
  • "when [stars] die" (Mark Sullivan)
  • "when [galaxies] were very young" (Martin Rees)
  • "stars that reach the end point of their lives" (Carolin Crawford )
  • "a star that's younger" (Martin Rees)

So, in the language of astronomy, stars are born, start young, live; sometimes living alone but sometimes not, sometimes have complicated lives; have lifetimes, reach the end of their lives, and die, so, becoming dead, eventually long dead; and, indeed, there are generations of stars with life cycles.


The processes that support a star's luminosity come to an end: but does the star therefore die?

(Cover art for the Royal Philharmonic Orchestra's recording of David Bedford's composition Star's End. Photographer: Monique Froese)


Are stars really alive?

Presumably, the use of such terms in this context must have originally been metaphorical. Life (and so death) has a complex but well-established and much-discussed meaning in science. Living organisms have certain necessary characteristics – nutrition, (inherent) movement, irritability/sensitivity, growth, reproduction, respiration, and excretion, or some variation on such a list. Stars do not meet this criterion. 3 Living organisms maintain a level of complex organisation by making use of energy stores that allow them to decrease entropy internally at the cost of entropy increase elsewhere.

Animals and decomposers (such as fungi) take in material that can be processed to support their metabolism and then the 'lower quality' products are eliminated. Photosynthetic organisms such as green plants have similar metabolic processes, but preface these by using the energy 'in' sunlight to first facilitate endothermic reactions that allow them to build up the material used later for their mortal imperative of working against the tendencies of entropy. Put simply, plants synthesise sugar (from carbon dioxide and water) that they can distribute to all their cells to support the rest of the metabolism (a complication that is a common source of alternative conceptions {misconceptions} to learners 4).

By contrast, generally speaking, during their 'lifetimes', stars only gain and lose marginal amounts of material (compared with a 70 kg human being that might well consume a tonne of food each year) – and do not have any quality control mechanism that would lead to them taking in what is more useful and expelling what is not.

As far as life on earth is concerned, virtually all of that complex organisation of living things depends upon the sun as a source of energy, and relies on the process by which the sun increases the universe's entropy by radiating energy from a relatively compact source into the diffuse vastness of space. 4 In other words, if anything, a star like our sun better reflects a dead being such as a felled tree or a zebra hunted down by a lion, providing a source of concentrated energy for other organisms feeding on its mortal remains!

Are the lives and deaths of stars simply pedagogical devices?

So, are stars really alive? Or is this just one example of the kind of rhetorical device I referred to above being adopted to help make the abstract unfamiliar becomes familiar? Is it the use of a familiar trope employed simply to aid in the communication of difficult ideas? Is this just a metaphor? That is,

  • Do stars actually die, or…
  • are they only figuratively alive and, so, only suffer (sic) a metaphorical death?

I do not think the examples I quote above represent a concerted targeted strategy by Professors Crawford, Rees and Sullivan to work with a common teaching metaphor for the sake of Melvyn and his listeners: but rather the actual language commonly used in the field. That is, the life cycles and lifetimes of stars have entered into the technical lexicon of the the science. If so, then stars do actually live and die, at least in terms of what those words now mean in the discipline of astronomy.

Gustav Strömberg referred to "the whole lifetime of a star" in a paper in the The Astrophysical Journal as long ago as 1927. He did not feel the need to explain the term so presumably it was already in use – or considered obvious. Kip Thorne published a paper in 1965 about 'Gravitational Collapse and the Death of a Star". In the first paragraph he pointed out that

"The time required for a star to consume its nuclear fuel is so long (many billions of years in most cases) that only a few stars die in our galaxy per century; and the evolution of a star from the end point of thermonuclear burning to its final dead state is so rapid that its death throes are observable for only a few years."

Thorne, 1965, p.1671

Again, the terminology die/death/dead is used without introduction or explanation.

He went on to refer to

  • deaths of stars
  • different types of death
  • final resting states

before shifting to what a layperson would recognise as a more specialist, technical, lexicon (zero point kinetic energy; Compton wavelength of an electron; neutron-rich nuclei; photodistintegration; gravitational potential energy; degenerate Fermi gas; lambda hyperons; the general relativity equation of hydrostatic equilibrium; etc.), before reiterating that he had been offering

"the story of the death of a star as predicted by a combination of nuclear theory, elementary particle theory, and general relativity"

Thorne, 1965, p.1678

So, this was a narrative, but one intended to be fit for a professional scientific audience. It seems the lives and deaths of stars have been part of the technical vocabulary of astronomers for a long time now.

When did scientists imbue stars with life?

Modern astronomy is quite distinct from astrology, but like other sciences astronomy developed from earlier traditions and at one time astronomy and astrology were not so discrete (an astronomical 'star' such as Johannes Kepler was happy to prepare horoscopes for paying customers) and mythological and religious aspects of thinking about the 'heavens' were not so well compartmentalised from what we would today consider as properly the realm of the scientific.

In Egyptian religion, Ra was both a creative force and identified with the sun. Mythology is full of origin stories explaining how the stars had been cast there after various misadventures on earth (the Greek myths but also in other traditions such as those of the indigenous North American and Australian peoples 5) and we still refer to examples such as the seven sisters and Orion with the sword hanging in his belt. The planets were associated with different gods – Venus (goddess of love), Mars (the god of war), Mercury (the messenger of the gods), and so on.6 It was traditional to refer to some heavenly bodies as gendered: Luna is she, Sol is he, Venus is she, and so on. This usage is sometimes found in scientific writing on astronomy.

Read about examples of personification in scientific writing

Yet this type of poetic license seems unlikely to explain the language of the life cycles of stars, even if there are parallels between scientific and poetic or spiritual accounts,

Stars are celestial objects having their own life cycles. Stars are born, grow up, mature and eventually die. …The author employs inductive and deductive analysis of the verses of the Quran and the Hadith texts related with the life and death of stars. The results show that the life and death of the stars from Islamic and Modern astronomy has some similarities and differences.

Wahab, 2015

After all, the heavenly host of mythology comprised of immortals, if sometimes starting out as mortals subsequently given a kind of immorality by the Gods when being made into stars. Indeed the classical tradition supported by interpretation of Christian orthodoxy was that unlike the mundane things of earth, the heavens were not subject to change and decay – anything from the moon outwards was perfect and unchanging. (This notion was held onto by some long after it was established that comets with their varying paths were not atmospheric phenomena – indeed well into the twentieth century some young earth creationists were still insisting in the perfect, unchanging nature of the heavens. 7)

So, presumably, we need to look elsewhere to find how science adopted life cycles for stars.

A natural metaphor?

Earlier in this piece I asked readers to bear with me, and to join with me in making the familiar unfamiliar, to 'bracket' the familiar notion that we say starts are born, live and later die, and to problematise it. In one scientific sense stars cannot die – as they were never alive. Yet, I accept this seems a pretty natural metaphor to use. Or, at least, it seems a natural metaphor to those who are used to hearing and reading it. A science teacher may be familiar with the trope of stars being born, living, and dying – but how might a young learner, new to astronomical ideas, make sense of what was meant?

Now, there is a candidate project for anyone looking for a topic for a student research assignment: how would people who have never previously been exposed to this metaphor respond to the kinds of references I've discussed above? I would genuinely like to know what 'naive' people would make of this 8 – would they just 'get' the references immediately (appreciate in what sense stars are born, live, and die); or, would it seem a bizarre way of talking about stars? Given how readily people accept and take up anthropomorphic references to molecules and viruses and electrons and so forth, I find the question intriguing.

Read about anthropomorphism in science

What makes a star alive or dead?

Even if for the disciplinary experts the language of living stars and their life cycles has become a 'dead metaphor 'and is now taken (i.e., taken for granted) as technical terminology – the novice learner, or lay member of the public listening to a radio show, still has to make sense of what it means to say a star is born, or is alive, or is nearing the end of its life, or is dead.

The critical feature discussed by Professors Crawford, Rees and Sullivan concerns an equilibrium that allow a star to exist in a balance between the gravitational attraction of its component matter and the pressure generated through its nuclear reactions.

A star forms when material comes together under its mutual gravitational attraction – and as the material becomes denser it gets hotter. Eventually a sufficient density and temperature is reached such that there is 'ignition' – not in the sense of chemical combustion, but self-sustaining nuclear processes occur, generating heat. This point of ignition is the 'birth' of the star.

Fusion processes continue as long as there is sufficient fissionable material, the 'fuel' that 'feeds' the nuclear 'furnace' (initially hydrogen, but depending on the mass of the star there can be a series of reactions with products from one stage undergoing further fusion to form even heavier elements). The life time of the star is the length of time that such processes continue.

Eventually there will not be sufficient 'fuel' to maintain the level of 'burning' that is needed to allow the ball of material to avoid ('resist') gravitational collapse. There are various specific scenarios, but this is the 'death' of the star. It may be a supernova offering very visible 'death throes'.

The core that is left after this collapse is a 'dead' star, even if it is hot enough to continue being detectable for some time (just as it takes time for the body of a homeothermic animal that dies to cool to the ambient temperature).

It seems then that there is a kind of analogy at work here.

Organisms are alive as long as they continue to metabolise sufficiently in order to maintain their organisation in the face of the entropic tendency towards disintegration and dispersal.Stars are alive as long as they exhibit sufficient fusion processes to maintain them as balls of material that have much greater volumes, and lower densities than the gravitational forces on their component particles would otherwise lead to.

It is clearly an imperfect analogy.

Organisms base metabolism on a through-put of material to process (and in a sense 'harvest' energy sources).Stars do acquire new materials and eject some, but this is largely incidental and it is essentially the mass of fissionable material that originally comes together to initiate fusion which is 'harvested' as the energy source.
Organisms may die if they cannot access external food sources, but some die of built-in senescence and others (those that reproduce by dividing) are effectively immortal.

We (humans) die because the amazing self-constructing and self-repairing abilities of our bodies are not perfect, and somatic cells cannot divide indefinitely to replace no longer viable cells.
Stars 'die' because they run out of their inherent 'fuel'.

Stars die when the hydrogen that came together to form them has substantially been processed.

Read about analogy in science

One person's dead star is another person's living metaphor

So, do stars die? Yes, because astronomers (the experts on stars) say they do, and it seems they are not simply talking down to the rest of us. The birth and death of stars seems to be based on an analogy: an analogy which is implicit in some of the detailed discussion of star life cycles. However, through the habitual use of this analogy, terms such as the birth, lifetimes, and death of stars have been adopted into mainstream astronomical discourse as unmarked (taken-for-granted) language such that to the uninitiated they are experienced as metaphors.

And these perspectival metaphors 9 become extended to describe stars that are considered young, old, dying, long dead, and so forth. These terms are used so readily, and so often without a perceived need for qualification or explanation, that we might consider them 'dead' metaphors within astronomical discourse – terms of metaphorical origin but now so habitually used that they have come to be literal (stars are born, they do have lifetimes, they do die). Yet for the uninitiated they are still 'living' metaphors, in the sense that the non-expert needs to work out what it means when a star is said to live or die.

There is a well recognised distinction between live and dead metaphors. But here we have dead-to-the-specialists metaphors that would surely seem to be non-literal to the uninitiated. These terms are not explained by experts as they are taken by them as literal, but they cannot be understood literally by the novice, for whom they are still metaphors requiring interpretation. That is, they are perspectival metaphors zombie words that may seem alive or dead (as figures of speech) according to audience, and so may be treated as dead in professional discourse, but may need to be made undead when used in communicating to the public.


Other aspects of the In Our Time discussion of 'The death of stars' are explored as The complicated social lives of stars: stealing, escaping, and blowing-off in space


Sources cited:
  • Strömberg, G. (1927). The Motions of Giant M Stars. The Astrophysical Journal, 65, 238.
  • Thorne, K. S. (1965). Gravitational Collapse and the Death of a Star. Science, 150(3704), 1671-1679. http://www.jstor.org.ezp.lib.cam.ac.uk/stable/1717408
  • Wahab, R. A. (2015). Life and death of stars: an analysis from Islamic and modern astronomy perspectives. International Proceedings of Economics Development and Research, 83, 89.

Notes

1 In this regard, but not in all regards. As I have suggested here before, the teacher usually has two advantages:

a) generally, a class has a limited spread in terms of the audience background: even a mixed ability class is usually from a single school year (grade level) whereas the public presentation may be addressing a mixed audience of all ages and levels of education.

b) usually a teacher knows the class, and so knows something about their starting points, and their interests


2 Some students do well in science tests and others less well.

If we say this is because

  • some learners are better science students than others
  • and settle for defining better science students as those who achieve good results in formal science tests (that is tests as currently administered, based on the present curriculum, taught in our usual way)

then we are simply 'explaining' the explicandum (i.e., some students do better on science tests that others) by a rephrasing of what is to be explained (some students are better science students: that is, they perform well in science tests!)

Read about tautology


3 Criterion (singular) as a living organism has to satisfy the entries in the list collectively. Each entry is of itself a necessary, but not sufficient, condition.


4 A simple misunderstanding is that animals respire but plants photosynthesise.

In a plant in a steady state, the rates of build-up and break down of sugars would be balanced. However, plants must photosynthesise more than they respire overall in order to to grow and ultimately to allow consumers to make use of them as food. (This needs to be seen at a system level – the plant is clearly not in any inherent sense photosynthesising to provide food for other organisms, but has evolved to be a suitable nutrition source as it transpires [no pun intended] that increases the fitness of plants within the wider ecosystem.)

A more subtle alternative conception is that plants photosynthesise during the day when they are illuminated by sunlight (fair enough) and then use the sugar produced to respire at night when the sun is not available as a source of energy. See, for example, 'Plants mainly respire at night because they are photosynthesising during the day'.

Actually cellular processes require continuous respiration (as even in the daytime sunlight cannot directly power cellular metabolism, only facilitate photosynthesis to produce the glucose that that can be oxidised in respiration).

Schematic reflection of the balance between how photosynthesis generates resources to allow respiration – typically a plant produces tissues that feed other organisms.
The area above the line represents energy from sunlight doing work in synthesising more complex substances. The area below the lines represents work done when the oxidation of those more complex substances provides the energy source for building and maintaining an organism's complex organisation of structure and processes (homoestasis).

5 Museum Victoria offers a pdf that can be downloaded and copied by teachers to teach about how "How the southern night sky is seen by the Boorong clan from north-west Victoria":

'Stories in the Stars – the night sky of the Boorong people' shows the constellations as recognised by this group, the names they were given, and the stories of the people and creatures represented.

(This is largely based on the nineteenth century reports made by William Edward Stanbridge of information given by Boorong informants – see 'Was the stellar burp really a sneeze?')

The illustration shown here is of 'Kulkunbulla' – a constellation that is considered in the U.K. to be only part of the constellation known here as Orion. (Constellations are not actual star groupings, but only what observers have perceived as stars seeming to be grouped together in the sky – the Boorong's mooting of constellations is no more right or wrong than that suggested in any other culture.)


6 The tradition was continued into modern times with the discovery of the planets that came to be named Neptune and Uranus after the Gods of the sea and sky respectively.


7 Creationism, per se, is simply the perspective or belief that the world (i.e., Universe) was created by some creator (God) and so creationism as such is not necessarily in conflict with scientific accounts. The theory of the big bang posits that time, space and matter had a beginning with an uncertain cause which could be seen as God (although some theorists such as Professor Roger Penrose develop theories which posit a sequence of universes that each give rise to the next and that could have infinite extent).

Read about science and religion

Young earth creationists, however, not only believe in a creator God (i.e., they are creationists), but one who created the World no more than about 10 thousand years ago (the earth is young!), rather than over 13 billion years ago. This is clearly highly inconsistent with a wide range of scientific findings and thinking. If the Young Earth Creationists are right, then either

  • a lot of very strongly evidenced science is very, very wrong
  • some natural laws (e.g. radioactive decay rates) that now seem fixed must have changed very substantially since the creation
  • the creator God went to a lot of trouble to set up the natural world to present a highly misleading account of its past history

8 I am not using the term naive here in a discourteous or demeaning way, but in a technical sense of someone who is meeting something for the first time.


9 That is, terms that will appear as metaphors from the perspective of the uninitiated, but now seem literal terms from the perspective of the specialist. We cannot simply say they are or are not metaphors, without asking 'for whom?'


Was the stellar burp really a sneeze?

Pulling back the veil on an astronomical metaphor


Keith S. Taber


It seems a bloated star dimmed because it sneezed, and spewed out a burp.


'Pardon me!' (Image by Angeles Balaguer from Pixabay)

I was intrigued to notice a reference in Chemistry World to a 'stellar burp'.

"…the dimming of the red giant Betelgeuse that was observed in 2019…was later attributed to a 'stellar burp' emitting gas and dust which condensed and then obscured light from the star"

Motion, 2022

The author, Alice Motion, quoted astrophysics doctoral candidate and science communicator Kirsten Banks commenting that

"In recorded history…It's the first time we've ever seen this happen, a star going through a bit of a burp"

Kirsten Banks quoted in Chemistry World

although she went on to suggest that the Boorong people (an indigenous culture from an area of the Australian state Victoria) had long ago noticed a phenomena that became recorded in their oral traditions 1, which

"was actually the star Eta Carinae which went through a stellar burp, just like Betelgeuse did"

Kirsten Banks quoted in Chemistry World

Composite image (optical appearing as white; ultraviolet as cyan; X-rays as purple) of Eta Carinae,

Source: NASA


Clearly a star cannot burp in the way a person can, so I took this to be a metaphor, and wondered if this was a metaphor used in the original scientific report.

A clump and a veil

The original report (Montargès, et al, 2021) was from Nature, one of the most prestigious science research journals. It did not seem to have any mention of belching. This article reported that,

"From November 2019 to March 2020, Betelgeuse – the second-closest red supergiant to Earth (roughly 220 parsecs, or 724 light years, away) – experienced a historic dimming of its visible brightness…an event referred to as Betelgeuse's Great Dimming….Observations and modelling support a scenario in which a dust clump formed recently in the vicinity of the star, owing to a local temperature decrease in a cool patch that appeared on the photosphere."

Montargès, et al., 2012, p.365

So, the focus seemed to be not on any burping but a 'clump' of material partially obscuring the star. That material may well have arisen from the star. The paper in nature suggests that Betelgeuse may loose material through two mechanisms: both by a "smooth homogeneous radial outflow that consists mainly of gas", that is a steady and continuous process; but also "an episodic localised ejection of gas clumps where conditions are favourable for efficient dust formation while still close to the photosphere" – that is the occasional, irregular, 'burp' of material, that then condenses near the star. But the word used was not 'burp', but 'eject'.

A fleeting veil

Interestingly the title of the article referred to "A dusty veil shading Betelgeuse". The 'veil' (another metaphor) only seemed to occur in the title. There is an understandable temptation, even in scholarly work, to seek a title which catches attention – perhaps simplifying, alliterating (e.g., 'mediating mental models of metals') or seeking a strong image ('…a dusty veil shading…'). In this case, the paper authors clearly thought the metaphor did not need to be explained, and that readers would understand how it linked to the paper content without any explicit commentary.


WordFrequency in Nature article
clump(s)25 (excluding reference list)
eject(ed, etc.)4
veil1 (in title only)
burp0
blob0
There's no burping in Nature

The European Southern Observatory released a press release (sorry, a 'science release') about the work entitled 'Mystery of Betelgeuse's dip in brightness solved', that explained

"In their new study, published today in Nature, the team revealed that the mysterious dimming was caused by a dusty veil shading the star, which in turn was the result of a drop in temperature on Betelgeuse's stellar surface.

Betelgeuse's surface regularly changes as giant bubbles of gas move, shrink and swell within the star. The team concludes that some time before the Great Dimming, the star ejected a large gas bubble that moved away from it. When a patch of the surface cooled down shortly after, that temperature decrease was enough for the gas to condense into solid dust.

'We have directly witnessed the formation of so-called stardust,' says Montargès, whose study provides evidence that dust formation can occur very quickly and close to a star's surface. 'The dust expelled from cool evolved stars, such as the ejection we've just witnessed, could go on to become the building blocks of terrestrial planets and life', adds Emily Cannon, from KU Leuven, who was also involved in the study."

https://www.eso.org/public/news/eso2109/

So, again, references to ejection and a veil – but no burping.

Delayed burping

Despite this, the terminology of the star burping, seems to have been widely taken up in secondary sources, such as the article in Chemistry World

A New Scientist report suggested "Giant gas burp made Betelgeuse go dim" (Crane, 2021). On the website arsTECHNICA, Jennifer Ouellette wrote that "a cold spot and a stellar burp led to strange dimming of Betelgeuse".

On the newsite Gizmodo, George Dvorsky wrote a piece entitled "A dusty burp could explain mysterious dimming of supergiant star Betelgeuse". Whilst the term burp was only used in the title, Dvorsky was not shy of making other corporeal references,

"a gigantic dust cloud, which formed after hot, dense gases spewed out from the dying star. Viewed from Earth, this blanket of dust shielded the star's surface, making it appear dimmer from our perspective, according to the research, led by Andrea Dupree from the Centre for Astrophysics at Harvard & Smithsonian.

A red supergiant star, Betelgeuse is nearing the end of its life. It's poised to go supernova soon, by cosmological standards, though we can't be certain as to exactly when. So bloated is this ageing star that its diameter now measures 1.234 million kilometers, which means that if you placed Betelgeuse at the centre of our solar system, it would extend all the way to Jupiter's orbit."

The New York Times published an article (June 17, 2021) entitled "Betelgeuse Merely Burped, Astronomers Conclude", where author Dennis Overbye began his piece:

"Betelgeuse, to put it most politely, burped."

The New York Times

Overbye also reports the work from the Nature paper

"We have directly witnessed the formation of so-called stardust," Miguel Montargès, an astrophysicist at the Paris Observatory, said in a statement issued by the European Southern Observatory. He and Emily Cannon of Catholic University Leuven, in Belgium, were the leaders of an international team that studied Betelgeuse during the Great Dimming with the European Southern Observatory's Very Large Telescope on Cerro Paranal, in Chile.

Parts of the star, they found, were only one-tenth as bright as normal and markedly cooler than the rest of the surface, enabling the expelled blob to cool and condense into stardust. They reported their results on Wednesday in Nature."

The New York Times

So, instead of the clumps referred to in the Nature article as ejected, we now have an expelled blob (neither word appears in the nature article itself). Overbye also explains how this study followed up on earlier observations of the star

"Their new results would seem to bolster findings reported a year ago by Andrea Dupree of the Harvard-Smithsonian Center for Astrophysics and her colleagues, who detected an upwelling of material on Betelgeuse in the summer of 2019.

'We saw the material moving out through the chromosphere in the south in September to November 2019,', Dr. Dupree wrote in an email. She referred to the expulsion as 'a sneeze.'

The New York Times

'…material moving out through the chromosphere in the south…': Hubble space telescope images of Betelgeuse (Source: NASA) 2

Bodily functions and stellar processes

I remain unsure why, if the event was originally considered a sneeze, it became transformed into a burp. However the use of such descriptions is not so unusual. Metaphor is a common tool in science communication to help 'make the unfamiliar familiar' by describing something abstract or out-of-the-ordinary in more familiar terms.

Read about metaphors in science

Here, the body [sic] of the scientific report keeps to technical language although a metaphor (the dust cloud as a veil) is considered suitable for the title. It is only when the science communication shifts from the primary literature (intended for the science community) into more popular media aimed at a wider audience that the physical processes occurring in a star became described in terms of our bodily functions. So, in this case, it seems a bloated star dimmed because it sneezed, and spewed out a burp.


Coda

The astute reader may have also noticed that the New York Times article referred to Betelgeuse as an "ageing star" that is "nearing the end of its life": terms that imply a star is a living, and mortal, being. This might seem to be journalistic license, but the NASA website from which the sequence of Betelgeuse images above are taken also refers to the star as ageing (as well as being 'petulant' and 'injured').2 NASA employs scientifically qualified people, but its public websites are intended for a broad, general audience, perhaps explaining the anthropomorphic references.

Thus, we might understand references to stars as alive as being a metaphorical device used in communicating astronomical ideas to the general public. Yet, an examination of the scientific literature might instead suggest instead that astronomers DO consider stars to be alive. But, that is a topic for another piece.


Work cited:
  • Crane, L. (2021). Giant gas burp made Betelgeuse go dim. New Scientist, 250(3340), 22. doi:10.1016/S0262-4079(21)01094-0
  • Hamacher, D. W., & Frew, D. J. (2010). An aboriginal Australian record of the great eruption of Eta Carinae. Journal of Astronomical History and Heritage, 13(3), 220-234.
  • Montargès, M., Cannon, E., Lagadec, E., de Koter, A., Kervella, P., Sanchez-Bermudez, J., . . . Danchi, W. (2021). A dusty veil shading Betelgeuse during its Great Dimming. Nature, 594(7863), 365-368. doi:10.1038/s41586-021-03546-8
  • Motion, A. 2022, Space for more science. Astrophysics and Aboriginal astronomy on TikTok, Chemistry World, December 2022, p.15 (https://www.chemistryworld.com/opinion/space-for-more-science/4016585.article)

Notes

1 William Edward Stanbridge (1816-1894) was an Englishman who moved to Australia in 1841. He asked Boorong informants about their astronomy, and recorded their accounts. He presented a report to the Philosophical Institute of Victoria in 1857 and published two papers (Hamacher & Frew, 2010). The website Australian Indigenous Astronomy explains that

"The larger star of [of the binary system] Eta Car is unstable and undergoes occasional violent outbursts, where it sheds material from its outer shells, making it exceptionally bright.  During the 1840s, Eta Car went through such an outburst where it shed 20 solar masses of its outer shell and became the second brightest star in the night sky, after Sirius, before fading from view a few years later.  This event, commonly called a "supernova-impostor" event, has been deemed the "Great Eruption of Eta Carinae".  The remnant of this explosion is evident by the Homunculus Nebulae [see figure above – nebulae are anything that appears cloud-like to astronomical observation].  This identification shows that the Boorong had noted the sudden brightness of this star and incorporated it into their oral traditions."

Duane Hamacher

A paper in the Journal of Astronomical History and Heritage concludes that

"the Boorong people observed 𝜂 Carinae in the nineteenth century, which we identify using Stanbridge's description of its position in Robur Carolinum, its colour and brightness, its designation (966 Lac, implying it is associated with the Carina Nebula), and the relationship between stellar brightness and positions of characters in Boorong oral traditions. In other words, the nineteenth century outburst of 𝜂 Carinae was recognised by the Boorong and incorporated into their oral traditions"

Hamacher & Frew 2010, p.231

2 The images reproduced here are presented on a NASA website under the heading 'Hubble Sees Red Supergiant Star Betelgeuse Slowly Recovering After Blowing Its Top'. This is apparently not a metaphor as the site informs readers that"Betelgeuse quite literally blew its top in 2019". Betelgeuse is described as a "monster star", and its activity as "surprisingly petulant behaviour" and a "titanic convulsion in an ageing star", such that "Betelgeuse is now struggling to recover from this injury."

This seems rather anthropomorphic – petulance and struggle are surely concepts that refer to sentient deliberate actors in the world, not massive hot balls of gas. However, anthropomorphic narratives are often used to make scientific ideas accessible.

Read about anthropomorphism

The recovery (from 'injury') is described in terms of two similes,

"The star's interior convection cells, which drive the regular pulsation may be sloshing around like an imbalanced washing machine tub, Dupree suggests. … spectra imply that the outer layers may be back to normal, but the surface is still bouncing like a plate of gelatin dessert [jelly] as the photosphere rebuilds itself."

NASA Website

Read about science similes


Passive learners in unethical control conditions

When 'direct instruction' just becomes poor instruction


Keith S. Taber


An experiment that has been set up to ensure the control condition fails, and so compares an innovation with a substandard teaching condition, can – at best – only show the innovation is not as bad as the substandard teaching

One of the things which angers me when I read research papers is examples of what I think of as 'rhetorical research' that use unethical control conditions (Taber, 2019). That is, educational research which sets up one group of students to be taught in a way that is clearly disadvantages them to ensure the success of an experimental teaching approach,

"I am suggesting that some of the experimental studies reported in the literature are rhetorical in the … sense that the researchers clearly expect to demonstrate a well- established effect, albeit in a specific context where it has not previously been demonstrated. The general form of the question 'will this much-tested teaching approach also work here' is clearly set up expecting the answer 'yes'. Indeed, control conditions may be chosen to give the experiment the best possible chance of producing a positive outcome for the experimental treatment."

Taber, 2019, p.108

This irks me for two reasons. The first, obviously, is that researchers have been prepared to (ab)use learners as 'data fodder' and subject them to poor learning contexts in order to have the best chance of getting positive results for the innovation supposedly being 'tested'. However, it also annoys me as this is inherently a poor research design (and so a poor use of resources) as it severely limits what can be found out. An experiment that compares an innovation with a substandard teaching condition can, at best, show the innovation is not as ineffecitive as the substandard teaching in the control condition; but it cannot tell us if the innovation is at least as effective as existing good practice.

This irritation is compounded when the work I am reading is not some amateur report thrown together for a predatory journal, but an otherwise serious study published in a good research outlet. That was certainly the case for a paper I read today in Research in Science Education (the journal of the Australasian Science Education Research Association) on problem-based learning (Tarhan, Ayar-Kayali, Urek & Acar, 2008).

Rhetorical studies?

Genuine research is undertaken to find something out. The researchers in this enquiry claim:

"This research study aims to examine the effectiveness of a [sic] problem-based learning [PbBL] on 9th grade students' understanding of intermolecular forces (dipole- dipole forces, London dispersion forces and hydrogen bonding)."

Tarhan, et al., 2008, p.285

But they choose to compare PbBL with a teaching approach that they expect to be ineffective. Here the researchers might have asked "how does teaching year 9 students about intermolecular forces though problem-based learning compared with current good practice?" After all, even if PbBL worked quite well, if it is not quite as effective as the way teachers are currently teaching the topic then, all other things being equal, there is no reason to shift to it; whereas if it outperforms even our best current approaches, then there is a reason to recommend it to teachers and roll out associated professional development opportunities.


Problem-based learning (third column) uses a problem (i.e., a task which cannot be solved simply by recalling prior learning or employing an algorithmic routine) as the focus and motivation for learning about a topic

Of course, that over-simplifies the situation, as in education, 'all other things' never are equal (every school, class, teacher…is unique). An approach that works best on average will not work best everywhere. But knowing what works best on average (that is, taken across the diverse range of teaching and learning contexts) is certainly a very useful starting point when teachers want to consider what might work best in their own classrooms.

Rhetorical research is poor research, as it is set up (deliberately or inadvertently) to demonstrate a particular outcome, and, so, has built-in bias. In the case of experimental studies, this often means choosing an ineffective instructional approach for the comparison class. Why else would researchers select a control condition they know is not suitable for bringing about the educational outcomes they are testing for?

Problem-Based Learning in a 9th Grade Chemistry Class

Tarhan and colleagues' study was undertaken in one school with 78 students divided into two groups. One group was taught through a sequence based on problem-based learning that involved students undertaking research in groups, gently supported and steered by the teacher. The approach allowed student dialogue, which is believed to be valuable in learning, and motivated students to be active engaged in enquiry. When such an approach is well judged it has potential to count as 'scaffolding' of learning. This seems a very worthwhile innovation – well worth developing and evaluating.

Of course, work in one school cannot be assumed to generalise elsewhere, and small-scale experimental work of this kind is open to major threats to validity, such as expectancy effects and researcher bias – but this is unfortunately always true of these kinds of studies (which are often all educational researchers are resourced to carry out). Finding out what works best in some educational context at least potentially contributes to building up an overall picture (Taber, 2019). 1

Why is this rhetorical research?

I consider this rhetoric research because of the claims the authors make at the start of the study:

"Research in science education therefore has focused on applying active learning techniques, which ensure the affective construction of knowledge, prevent the formation of alternate conceptions, and remedy existing alternate conceptions…Other studies suggest that active learning methods increase learning achievement by requiring students to play a more active role in the learning process…According to active learning principles, which emphasise constructivism, students must engage in researching, reasoning, critical thinking, decision making, analysis and synthesis during construction of their knowledge."

Tarhan, et al., 2008, pp.285-286

If they genuinely believed that, then to test the effectiveness of their PbBL activity, Tarhan and colleagues needed to compare it with some other teaching condition that they are confident can "ensure the affective construction of knowledge, prevent the formation of alternate conceptions, and remedy existing alternate conceptions… requir[e] students to play a more active role in the learning process…[and] engage in researching, reasoning, critical thinking, decision making, analysis and synthesis during construction of their knowledge." A failure to do that means that the 'experiment' has been biased – it has been set up to ensure the control condition fails.

Unethical research?

"In most educational research experiments of [this] type…potential harm is likely to be limited to subjecting students (and teachers) to conditions where teaching may be less effective, and perhaps demotivating. This may happen in experimental treatments with genuine innovations (given the nature of research). It can also potentially occur in control conditions if students are subjected to teaching inputs of low effectiveness when better alternatives were available. This may be judged only a modest level of harm, but – given that the whole purpose of experiments to test teaching innovations is to facilitate improvements in teaching effectiveness – this possibility should be taken seriously."

Taber, 2019, p.94

The same teacher taught both classes: "Both of the groups were taught by the same chemistry teacher, who was experienced in active learning and PbBL" (p.288). This would seem to reduce the 'teacher effect' – outcomes being effected because the teacher of one one class being more effective than that of another. (Reduce, rather than eliminate, as different teachers have different styles, skills, and varied expertise: so, most teachers are more suited to, and competent in, some teaching approaches than others.)

So, this teacher was certainly capable of teaching in the ways that Tarhan and colleagues claim as necessary for effective learning ("active learning techniques"). However, the control condition sets up the opposite of active learning, so-called passive learning:

"In this study, the control group was taught the same topics as the experimental group using a teacher-centred traditional didactic lecture format. Teaching strategies were dependent on teacher expression and question-answer format. However, students were passive participants during the lessons and they only listened and took notes as the teacher lectured on the content.

The lesson was begun with teacher explanation about polar and nonpolar covalent bonding. She defined formation of dipole-dipole forces between polar molecules. She explained that because of the difference in electronegativities between the H and Cl atoms for HCl molecule is 0.9, they are polar molecules and there are dipole-dipole forces between HCl molecules. She also stated that the intermolecular dipole-dipole forces are weaker than intramolecular bonds such as covalent and ionic bonding. She gave the example of vaporisation and decomposition of HCl. She explained that while 16 kJ/mol of energy is needed to overcome the intermolecular attraction between HCl molecules in liquid HCl during vaporisation process of HCl, 431 kJ/mol of energy is required to break the covalent bond between the H and Cl atoms in the HCl molecule. In the other lesson, the teacher reminded the students of dipole-dipole forces and then considered London dispersion forces as weak intermolecular forces that arise from the attractive force between instantaneous dipole in nonpolar molecules. She gave the examples of F2, Cl2, Br2, I2 and said that because the differences in electronegativity for these examples are zero, these molecules are non-polar and had intermolecular London dispersion forces. The effects of molecular size and mass on the strengths of London dispersion forces were discussed on the same examples. She compared the strengths of dipole-dipole forces and London dispersion forces by explaining the differences in melting and boiling points for polar (MgO, HCl and NO) and non-polar molecules (F2, Cl2, Br2, and I2). The teacher classified London dispersion forces and dipole- dipole as van der Waals forces, and indicated that there are both London dispersion forces and dipole-dipole forces between polar molecules and only London dispersion forces between nonpolar molecules. In the last lesson, teacher called attention to the differences in boiling points of H2O and H2S and defined hydrogen bonds as the other intermolecular forces besides dipole-dipole and London dispersion forces. Strengths of hydrogen bonds depending on molecular properties were explained and compared in HF, NH3 and H2O. She gave some examples of intermolecular forces in daily life. The lesson was concluded with a comparison of intermolecular forces with each other and intramolecular forces."

Tarhan, et al., 2008, p.293

Lecturing is not ideal for teaching university students. It is generally not suitable for teaching school children (and it is not consistent with what is expected in Turkish schools).

This was a lost opportunity to seriously evaluate the teaching through PbBL by comparing with teaching that followed the national policy recommendations. Moreover, it was a dereliction of the duty that educators should never deliberately disadvantage learners. It is reasonable to experiment with children's learning when you feel there is a good chance of positive outcomes: it is not acceptable to deliberately set up learners to fail (e.g., by organising 'passive' learning when you claim to believe effective learning activities are necessarily 'active').

Isn't this 'direct instruction'?

Now, perhaps the account of the teaching given by Tarhan and colleagues might seem to fit the label of 'direct teaching'. Whilst Tarhan et al. claim constructivist teaching is clearly necessary for effective learning, there are some educators who claim that constructivist approaches are inferior, and a more direct approach, 'direct instruction', is more likely to lead to learning gains.

This has been a lively debate, but often the various commentators use terminology differently and argue across each other (Taber, 2010). The proponents of direct instruction often criticise teaching that expects learners to take nearly all the responsibility for learning, with minimal teacher support. I would also criticise that (except perhaps in the case of graduate research students once they have demonstrated their competence, including knowing when to seek supervisory guidance). That is quite unlike genuine constructivist teaching which is optimally guided (Taber, 2011): where the teacher manages activities, constantly monitors learner progress, and intervenes with various forms of direction and support as needed. Tarhan and colleagues' description of their problem-based learning experimental condition appears to have had this kind of guidance:

"The teacher visited each group briefly, and steered students appropriately by using some guiding questions and encouraging them to generate their hypothesis. The teacher also stimulated the students to gain more information on topics such as the polar structure of molecules, differences in electronegativity, electron number, atom size and the relationship between these parameters and melting-boiling points…The teacher encouraged students to discuss the differences in melting and boiling points for polar and non-polar molecules. The students came up with [their] research questions under the guidance of the teacher…"

Tarhan, et al., 2008, pp.290-291

By contrast, descriptions of effective direct instruction do involve tightly planned teaching with carefully scripted teacher moves of the kind quoted in the account, above, of the control condition. (But any wise teacher knows that lessons can only be scripted as a provisional plan: the teacher has to constantly check the learners are making sense of teaching as intended, and must be prepared to change pace, repeat sections, re-order or substitute activities, invent new analogies and examples, and so forth.)

However, this instruction is not simply a one-way transfer of information, but rather a teacher-led process that engages students in active learning to process the material being introduced by the teacher. If this is done by breaking the material into manageable learning quanta, each of which students engage with in dialogic learning activities before preceding to the next, then this is constructivist teaching (even if it may also be considered by some as 'direct instruction'!)


Effective teaching moves between teacher input and student activities and is not just the teacher communicating information to the learners.

By contrast, the lecture format adopted by Tarhan's team was based on the teacher offering a multi-step argument (delivered over several lessons) and asking the learners to follow and retain an extensive presentation.

"The lesson was begun with teacher explanation …

She defined …

She explained…

She also stated…

She gave the example …

She explained that …

the teacher reminded the students …

She gave the examples of …

She compared…

The teacher classified …

and indicated that …

[the] teacher called attention to …

She gave some examples of …"

Tarhan, et al., 2008, p.293

This is a description of the transmission of information through a communication channel: not an account of teaching which engages with students' thinking and guides them to new understandings.

Ethical review

Despite the paper having been published in a major journal, Research in Science Education, there seems to be no mention that the study design has been through any kind of institutional ethical review before the research began. Moreover, there is no reference to the learners or their parents/guardians having been asked for, or having given, voluntary, informed, consent, as is usually required in research with human participants. Indeed Tarhen and colleagues refer to the children as the 'subjects' of their research, not participants in their study.

Perhaps ethical review was not expected in the national context (at least, in 2008). Certainly, it is difficult to imagine how voluntary, informed, consent would be obtained if parents were to be informed that half of the learners would be deliberately subject to a teaching approach the researchers claim lacks any of the features "students must engage in…during construction of their knowledge".

PbBL is better than…deliberately teaching in a way designed to limit learning

Tarhan and colleagues, unsurprisingly, report that on a post-test the students who were taught through PbBL out-performed these students who were lectured at. It would have been very surprising (and so potentially more interesting, and, perhaps, even useful, research!) had they found anything else, given the way the research was biased.

So, to summarise:

  1. At the outset of the paper it is reported that it is already established that effective learning requires students to engage in active learning tasks.
  2. Students in the experimental conditions undertook learning through a PbBL sequence designed to engage them in active learning.
  3. Students in the control condition were subject to a sequence of lecturing inputs designed to ensure they were passive.
  4. Students in the active learning condition outperformed the students in the passive learning condition

Which I suggest can be considered both rhetorical research, and unethical.


The study can be considered both rhetorical and unfair to the learners assigned to be in the control group

Read about rhetorical experiments

Read about unethical control conditions


Work cited:

Note:

1 There is a major issue which is often ignored in studies of his type (where a pedagogical innovation is trialled in a single school area, school or classroom). Finding that problem-based learning (or whatever) is effective in one school when teaching one topic to one year group does not allow us to generalise to other classrooms, schools, country, educational level, topics and disciplines.

Indeed, as every school, every teacher, every class, etc., is unique in some ways, it might be argued that one only really finds out if an approach will work well 'here' by trying it out 'here' – and whether it is universally applicable by trying it everywhere. Clearly academic researchers cannot carry out such a programme, but individual teachers and departments can try out promising approaches for themselves (i.e., context-directed research, such as 'action research').

We might ask if there is any point in researchers carrying out studies of the type discussed in this article, there they start by saying an approach has been widely demonstrated, and then test it in what seems an arbitrarily chosen (or, more likely, convenient) curriculum and classroom context, given that we cannot generalise from individual studies, and it is not viable to test every possible context.

However, there are some sensible guidelines for how series of such studies into the same type of pedagogic innovation in different contexts can be more useful in (a) helping determine the range of contexts where an approach is effective (through what we might call 'incremental generalisation'), and (b) document the research contexts is sufficient detail to support readers in making judgements about the degree of similarity with their own teaching context (Taber, 2019).

Read about replication studies

Read about incremental generalisation

Cells are buzzing cities that are balloons with harpoons

What can either wander door to door, or rush to respond; and when it arrives might touch, sniff, nip, rear up, stroke, seal, or kill?


Keith S. Taber


a science teacher would need to be more circumspect in throwing some of these metaphors out there, without then doing some work to transition from them to more technical, literal, and canonical accounts


BBC Radio 4's 'Start the week' programme is not a science programme, but tends to invite in guests (often authors of some kind) each week according to some common theme. This week there was a science theme and the episode was titled 'Building the Body, Opening the Heart', and was fascinating. It also offers something of a case study in how science gets communicated in the media.


Building the Body, Opening the Heart

The guests all had life-science backgrounds:

Their host was geneticist and broadcaster Adam Rutherford.

Communicating science through the media

As a science educator I listen to science programmes both to enhance and update my own science knowledge and understanding, but also to hear how experts present scientific ideas when communicating to a general audience. Although neither science popularisation nor the work of scientists in communicating to the public is entirely the same as formal teaching (for example,

  • there is no curriculum with specified target knowledge; and
  • the audiences
    • are not well-defined,
    • are usually much more diverse than found in classrooms, and
    • are free to leave at any point they lose interest or get a better offer),

they are, like teachers, seeking to inform and explain science.

Science communicators, whether professional journalists or scientists popularising their work, face similar challenges to science teachers in getting across often complex and abstract ideas; and, like them, need to make the unfamiliar familiar. Science teachers are taught about how they need to connect new material with the learners' prior knowledge and experiences if it is to make sense to the students. But successful broadcasters and popularisers also know they need to do this, using such tactics as simplification, modelling, metaphor and simile, analogy, teleology, anthropomorphism and narrative.

Perhaps one of the the biggest differences between science teaching and science communication in the media is the ultimate criterion of success. For science teachers this is (sadly) usually, primarily at least, whether students have understood the material, and will later recall it, sufficiently to demonstrate target knowledge in exams. The teacher may prefer to focus on whether students enjoy science, or develop good attitudes to science, or will consider working in science: but, even so, they are usually held to account for students' performance levels in high-stakes tests.

Science journalists and popularisers do not need to worry about that. Rather, they have to be sufficiently engaging for the audience to feel they are learning something of interest and understanding it. Of course, teachers certainly need to be engaging as well, but they cannot compromise what is taught, and how it is understood, in order to entertain.

With that in mind, I was fascinated at the range of ways the panel of guests communicated the science in this radio show. Much of the programme had a focus on cells – and these were described in a variety of ways.

Talking about cells

Dr Rutherford introduced cells as

  • "the basic building blocks of life on earth"; and observed that he had
  • "spent much of my life staring down microscopes at these funny, sort of mundane, unremarkable, gloopy balloons"; before suggesting that cells were
  • "actually really these incredible cities buzzing with activity".

Dr. Mukherjee noted that

"they're fantastical living machines" [where a cell is the] "smallest unit of life…and these units were built, as it were, part upon part like you would build a Lego kit"

Listeners were told how Robert Hooke named 'cells' after observing cork under the microscope because the material looked like a series of small rooms (like the cells where monks slept in monasteries). Hooke (1665) reported,

"I took a good clear piece of Cork, and with a Pen-knife sharpen'd as keen as a Razor, I cut a piece of it off, and…cut off from the former smooth surface an exceeding thin piece of it, and…I could exceeding plainly perceive it to be all perforated and porous, much like a Honey-comb, but that the pores of it were not regular; yet it was not unlike a Honey-comb in these particulars

…these pores, or cells, were not very deep, but consisted of a great many little Boxes, separated out of one continued long pore, by certain Diaphragms, as is visible by the Figure B, which represents a sight of those pores split the long-ways.

Robert Hooke

Hooke's drawing of the 'pores' or 'cells' in cork

Components of cells

Dr. Mukherjee described how

"In my book I sort of board the cell as though it's a spacecraft, you will see that it's in fact organised into rooms and there are byways and channels and of course all of these organelles which allow it to work."

We were told that "the cell has its own skeleton", and that the organelles included the mitochondria and nuclei ,

"[mitochondria] are the energy producing organelles, they make energy in most cells, our cells for instance, in human cells. In human cells there's a nucleus, which stores DNA, which is where all the genetic information is stored."


A cell that secretes antibodies which are like harpoons or missiles that it sends out to kill a pathogen?

(Images by by envandrare and OpenClipart-Vectors from Pixabay)


Immune cells

Rutherford moved the conversation onto the immune system, prompting 'Sid' that "There's a lovely phrase you use to describe T cells, which is door to door wanderers that can detect even the whiff of an invader". Dr. Mukherjee distinguished between the cells of the innate immune system,

"Those are usually the first responder cells. In humans they would be macrophages, and neutrophils and monocytes among them. These cells usually rush to the site of an injury, or an infection, and they try to kill the pathogen, or seal up the pathogen…"

and the cells of the adaptive system, such as B cells and T cells,

"The B cell is a cell that eventually becomes a plasma cell which secretes antibodies. Antibodies, they are like harpoons or missiles which the cell sends out to kill a pathogen…

[A T cell] goes around sniffing other cells, basically touching them and trying to find out whether they have been altered in some way, particularly if they are carrying inside them a virus or any other kind of pathogen, and if it finds this pathogen or a virus in your body, it is going to go and kill that virus or pathogen"


A cell that goes around sniffing other cells, touching them? 1
(Images by allinonemovie and OpenClipart-Vectors from Pixabay)

Cells of the heart

Another topic was the work of Professor Harding on the heart. She informed listeners that heart cells did not get replaced very quickly, so that typically when a person dies half of their heart cells had been there since birth! (That was something I had not realised. It is believed that this is related to how heart cells need to pulse in synchrony so that the whole organ functions as an effective pumping device – making long lasting cells that seldom need replacing more important than in many other tissues.)

At least, this relates to the cardiomyocytes – the cells that pulse when the heart beats (a pulse that can now be observed in single cells in vitro). Professor Harding described how in the heart tissue there are also other 'supporting' cells, such as "resident macrophages" (immune cells) as well as other cells moving around the cardiomyocytes. She describe her observations of the cells in Petri dishes,

"When you look at them in the dish it's incredible to see them interact. I've got a… video [of] cardiomyocytes in a dish. The cardiomyocytes pretty much just stay there and beat and don't do anything very much, and I had this on time lapse, and you could see cells moving around them. And so, in one case, the cell (I think it was a fibroblast, it looked like a fibroblast), it came and it palpated at the cardiomyocyte, and it nipped off bits of it, it sampled bits of the cardiomyocyte, and it just stroked it all the way round, and then it was, it seemed to like it a lot.

[In] another dish I had the same sort of cardiomyocyte, a very similar cell came in, it went up to the cardiomyocyte, it touched it, and as soon as it touched it, I can only describe it as it reared up and it had, little blobs appeared all over its surface, and it rushed off, literally rushed off, although it was time lapse so it was two minutes over 24 hours, so, it literally rushed off, so what had it found, why did one like it and the other one didn't?"

Making the unfamiliar, familiar

The snippets from the broadcast that I have reported above demonstrate a wide range of ways that the unfamiliar is made familiar by describing it in terms that a listener can relate to through their existing prior knowledge and experience. In these various examples the listener is left to carry across from the analogue features of the familiar (the city, the Lego bricks, human interactions, etc.) those that parallel features of the target concept – the cell. So, for example, the listener is assumed to appreciate that cells, unlike Lego bricks, are not built up through rigid, raised lumps that fit precisely in depressions on the next brick/cell. 2

Analogies with the familiar

Hooke's original label of the cell was based on a kind of analogy – an attempt to compare what we has seeing with something familiar: "pores, or cells…a great many little Boxes". He used the familiar simile of the honeycomb (something directly familiar to many more people in the seventeenth century when food was not subject to large-scale industrialised processing and packaging).

Other analogies, metaphors and similes abound. Cells are visually like "gloopy balloons", but functionally are "building blocks" (strictly a metaphor, albeit one that is used so often it has become treated as though a literal description) which can be conceptualised as being put together "like you would build a Lego kit" (a simile) although they are neither fixed, discrete blocks of a single material, nor organised by some external builder. They can be considered conceptually as the"smallest unit of life"(though philosophers argue about such descriptions and what counts as an individual in living systems).

The machine description ("fantastical living machines") reflects one metaphor very common in early modern science and cells as "incredible cities" is also a metaphor. Whether cells are literally machines is a matter of how we extend or limit our definition of machines: cells are certainly not actually cities, however, and calling them such is a way of drawing attention to the level of activity within each (often, apparently from observation, quite static) cell. B cells secrete antibodies, which the listener is old are like (a simile) harpoons or missiles – weapons.

Skeletons of the dead

Whether "the cell has its own skeleton" is a literal or metaphorical statement is arguable. It surely would have originally been a metaphoric description – there are structures in the cell which can be considered analogous to the skeleton of an organism. If such a metaphor is used widely enough, in time the term's scope expands to include its new use – and it becomes (what is called, metaphorically) a 'dead metaphor'.

Telling stories about cells

A narrative is used to help a listener imagine the cell at the scale of "a spacecraft". This is "organised into rooms and there are byways and channels" offering an analogy for the complex internal structure of a cell. Most people have never actually boarded a spacecraft, but they are ubiquitous in television and movie fiction, so a listener can certainly imagine what this might be like.


Endoplastic reticulum? (Still from Star Trek: The Motion Picture, Paramount Pictures, 1979)

Oversimplification?

The discussion of organelles illustrates how simplifications have to be made when introducing complex material. This always brings with it dangers of oversimplification that may impede further learning, or even encourage the development of alternative conceptions. So, the nucleus does not, strictly, 'store' "all the genetic information" in a cell (mitochondria carry their own genes for example).

More seriously, perhaps, mitochondria do not "make energy". 'More seriously' as the principle of conservation of energy is one of the most basic tenets of modern science and is considered a very strong candidate for a universal law. Children are often taught in school that energy cannot be created or destroyed. Science communication which is contrary to this basic curriculum science could confuse learners – or indeed members of the public seeking to understand debates about energy policy and sustainability.

Anthropomorphising cells

Cells are not only compared to inanimate entities like balloons, building bricks, cities and spaceships. They are also described in ways that make them seem like sentient agents – agents that have experiences, and conscious intentions, just as people do. So, some immune cells are metaphorical 'first responders' and just as emergency services workers they "rush to the site" of an incident. To rush is not just to move quickly, buy to deliberately do so. (By contrast, Paul McAuley refers to "innocent" amoeboid cells that collectively form into the plasmodium of a slime mould spending most of their lives"bumbling around by themselves" before they "get together". ) The immune cells act deliberately – they "try" to kill. Other immune cells "send out" metaphorical 'missiles' "to kill a pathogen". Again this language suggests deliberate action (i.e., to send out) and purpose.

That is, what is described is not just some evolved process, but something teleological: there is a purpose to sending out antibodies – it is a deliberate act with an aim in mind. This type of language is very common in biology – even referring to the 'function' of the heart or kidney or a reflex arc could be considered as misinterpreting the outcome of evolutionary developments. (The heart pumps blood through the vascular system, but referring to a function could suggest some sense of deliberate design.)

Not all cells are equal

I wonder how many readers noticed the reference above to 'supporting' cells in the heart. Professor Harding had said

"When you look inside the [heart] tissue there are many other cells [than cardiomyocytes] that are in there, supporting it, there are resident macrophages, I think we still don't know really what they are doing in there"

Why should some heart cells be seen as more important and others less so? Presumably because 'the function' of a heart is to beat, to pump, so clearly the cells that pulse are the stars, and the other cells that may be necessary but are not obviously pulsing just a supporting cast. (So, cardiomyocytes are considered heart cells, but macrophages in the same tissue are only cells that are found in the heart, "residents" – to use an analogy of my own, like migrants that have not been offered citizenship!)3

That is, there is a danger here that this way of thinking could bias research foci leading researchers to ignore something that may ultimately prove important. This is not fanciful, as it has happened before, in the case of the brain:

"Glial cells, consisting of microglia, astrocytes, and oligodendrocyte lineage cells as their major components, constitute a large fraction of the mammalian brain. Originally considered as purely non-functional glue for neurons, decades of research have highlighted the importance as well as further functions of glial cells."

Jäkel and Dimou, 2017
The lives of cells

Narrative is used again in relation to the immune cells: an infection is presented as a kind of emergency event which is addressed by special (human like) workers who protect the body by repelling or neutralising invaders. "Sniffing" is surely an anthropomorphic metaphor, as cells do not actually sniff (they may detect diffusing substances, but do not actively inhale them). Even "touching" is surely an anthropomorphism. When we say two objects are 'touching' we mean they are in contact, as we touch things by contact. But touching is sensing, not simply adjacency.

If that seems to be stretching my argument too far, to refer to immune cells "trying to find out…" is to use language suggesting an epistemic agent that can not only behave deliberately, but which is able to acquire knowledge. A cell can only "find" an infectious agent if it is (i.e., deliberately) looking for something. These metaphors are very effective in building up a narrative for the listener. Such a narrative adopts familiar 'schemata', recognisable patterns – the listener is aware of emergency workers speeding to the scene of an incident and trying to put out a fire or seeking to diagnose a medical issue. By fitting new information into a pattern that is familiar to the audience, technical and abstract ideas are not only made easier to understand, but more likely to be recalled later.

Again, an anthropomorphic narrative is used to describe interactions between heart cells. So, a fibroblast that "palpates at" a cardiomyocyte seems to be displaying deliberate behaviour: if "nipping" might be heard as some kind of automatic action – "sampling" and "stroking" surely seem to be deliberate behaviour. A cell that "came in, it went up [to another]" seems to be acting deliberately. "Rearing up" certainly brings to mind a sentient being, like a dog or a horse. Did the cell actually 'rear up'? It clearly gave that impression to Professor Harding – that was the best way, indeed the "only" way, she had to communicate what she saw.

Again we have cells "rushing" around. Or do we? The cell that had reared up, "rushed off". Actually, it appeared to "rush" when the highly magnified footage was played at 720 times the speed of the actual events. Despite acknowledging this extreme acceleration of the activity, the impression was so strong that Professor Harding felt justified in claiming the cell "literally rushed off, although it was time lapse so it was two minutes over 24 hours, so, it literally rushed off…". Whatever it did, that looked like rushing with the distortion of time-lapse viewing, it certainly did not literally rush anywhere.

But the narrative helps motivate a very interesting question, which is why the two superficially similar cells 'behaved' ('reacted', 'responded' – it is actually difficult to find completely neutral language) so differently when in contact with a cardiomyocyte. In more anthropomorphic terms: what had these cells "found, why did one like it and the other one didn't?"

Literally speaking?

Metaphorical language is ubiquitous as we have to build all our abstract ideas (and science has plenty of those) in terms of what we can experience and make sense of. This is an iterative process. We start with what is immediately available in experience, extend metaphorically to form new concepts, and in time, once those have "settled in" and "taken root" and "firmed up" (so to speak!) they can then be themselves borrowed as the foundation for new concepts. This is true both in how the individual learns (according to constructivism) and how humanity has developed culture and extended language.

So, should science communicators (whether scientists themselves, journalists or teachers) try to limit themselves to literal language?

Even if this were possible, it would put aside some of our strongest tools for 'making the unfamiliar familiar' (to broadcast audiences, to the public, to learners in formal education). However these devices also bring risks that the initial presentations (with their simplifications and metaphors and analogies and anthropomorphic narratives…) not only engage listeners but can also come to be understood as the scientific account. That is is not an imagined risk is shown by the vast numbers of learners who think atoms want to fill their shells with octets of electrons, and so act accordingly – and think this because they believe it is what they have been taught.

Does it matter if listeners think the simplification, the analogy, the metaphor, the humanising story,… is the scientific account? Perhaps usually not in the case of the audience listening to a radio show or watching a documentary out of interest.

In education it does matter, as often learners are often expected to progress beyond these introductory accounts in their thinking, and teachers' models and metaphors and stories are only meant as a starting point in building up a formal understanding. The teacher has to first establish some kind of anchor point in the students' existing understandings and experiences, but then mould this towards the target knowledge set out in the curriculum (which is often a simplified account of canonical knowledge) before the metaphor or image or story becomes firmed-up in the learners' minds as 'the' scientific account.

'Building the Body, Opening the Heart' was a good listen, and a very informative and entertaining episode that covered a lot of ideas. It certainly included some good comparisons that science teachers might borrow. But I think in a formal educational context a science teacher would need to be more circumspect in throwing some of these metaphors out there, without then doing some work to transition from them to more technical, literal, and canonical accounts.


Read about science analogies

Read about science metaphors

Read about science similes

Read about anthropomorphism

Read about teleology


Work cited:


Notes:

1 The right hand image portrays a mine, a weapon that is used at sea to damage and destroy (surface or submarine) boats. The mine is also triggered by contact ('touch').


2 That is, in an analogy there are positive and negative aspects: there are ways in which the analogue IS like the target, and ways in which the analogue is NOT like the target. Using an analogy in communication relies on the right features being mapped from the familiar analogue to the unfamiliar target being introduced. In teaching it is important to be explicit about this, or inappropriate transfers may be made: e.g., the atom is a tiny solar system so it is held together by gravity (Taber, 2013).


3 It may be a pure coincidence in relation to the choice of term 'resident' here, but in medicine 'residents' have not yet fully qualified as specialist physicians or surgeons, and so are on placement and/or under supervision, rather than having permanent status in a hospital faculty.


Falsifying research conclusions

You do not need to falsify your results if you are happy to draw conclusions contrary to the outcome of your data analysis.


Keith S. Taber


Li and colleagues claim that their innovation is successful in improving teaching quality and student learning: but their own data analaysis does not support this.

I recently read a research study to evaluate a teaching innovation where the authors

  • presented their results,
  • reported the statistical test they had used to analyse their results,
  • acknowledged that the outcome of their experiment was negative (not statistically significant), then
  • stated their findings as having obtained a positive outcome, and
  • concluded their paper by arguing they had demonstrated their teaching innovation was effective.

Li, Ouyang, Xu and Zhang's (2022) paper in the Journal of Chemical Education contravenes the scientific norm that your conclusions should be consistent with the outcome of your data analysis.
(Magnified portions of this scheme are presented below)

And this was not in a paper in one of those predatory journals that I have criticised so often here – this was a study in a well regarded journal published by a learned scientific society!

The legal analogy

I have suggested (Taber, 2013) that writing up research can be understood in terms of a number of metaphoric roles: researchers need to

  • tell the story of their research;
  • teach readers about the unfamiliar aspects of their work;
  • make a case for the knowledge claims they make.

Three metaphors for writing-up research

All three aspects are important in making a paper accessible and useful to readers, but arguably the most important aspect is the 'legal' analogy: a research paper is an argument to make a claim for new public knowledge. A paper that does not make its case does not add anything of substance to the literature.

Imagine a criminal case where the prosecution seeks to make its argument at a pre-trial hearing:

"The police found fingerprints and D.N.A. evidence at the scene, which they believe were from the accused."

"Were these traces sent for forensic analysis?"

"Of course. The laboratory undertook the standard tests to identify who left these traces."

"And what did these analyses reveal?"

"Well according to the current standards that are widely accepted in the field, the laboratory was unable to find a definite match between the material collected at the scene, and fingerprints and a D.N.A. sample provided by the defendant."

"And what did the police conclude from these findings?"

"The police concluded that the fingerprints and D.N.A. evidence show that the accused was at the scene of the crime."

It seems unlikely that such a scenario has ever played out, at least in any democratic country where there is an independent judiciary, as the prosecution would be open to ridicule and it is quite likely the judge would have some comments about wasting court time. What would seem even more remarkable, however, would be if the judge decided on the basis of this presentation that there was a prima facie case to answer that should proceed to a full jury trial.

Yet in educational research, it seems parallel logic can be persuasive enough to get a paper published in a good peer-reviewed journal.

Testing an educational innovation

The paper was entitled 'Implementation of the Student-Centered Team-Based Learning Teaching Method in a Medicinal Chemistry Curriculum' (Li, Ouyang, Xu & Zhang, 2022), and it was published in the Journal of Chemical Education. 'J.Chem.Ed.' is a well-established, highly respected periodical that takes peer review seriously. It is published by a learned scientific society – the American Chemical Society.

That a study published in such a prestige outlet should have such a serious and obvious flaw is worrying. Of course, no matter how good editorial and peer review standards are, it is inevitable that sometimes work with serious flaws will get published, and it is easy to pick out the odd problematic paper and ignore the vast majority of quality work being published. But, I did think this was a blatant problem that should have been spotted.

Indeed, because I have a lot of respect for the Journal of Chemical Education I decided not to blog about it ("but that is what you are doing…?"; yes, but stick with me) and to take time to write a detailed letter to the journal setting out the problem in the hope this would be acknowledged and the published paper would not stand unchallenged in the literature. The journal declined to publish my letter although the referees seemed to generally accept the critique. This suggests to me that this was not just an isolated case of something slipping through – but a failure to appreciate the need for robust scientific standards in publishing educational research.

Read the letter submitted to the Journal of Chemical Education

A flawed paper does not imply worthless research

I am certainly not suggesting that there is no merit in Li, Ouyang, Xu and Zhang's work. Nor am I arguing that their work was not worth publishing in the journal. My argument is that Li and colleague's paper draws an invalid conclusion, and makes misleading statements inconsistent with the research data presented, and that it should not have been published in this form. These problems are pretty obvious, and should (I felt) have been spotted in peer review. The authors should have been asked to address these issues, and follow normal scientific standards and norms such that their conclusions follow from, rather than contradict, their results.

That is my take. Please read my reasoning below (and the original study if you have access to J.Chem.Ed.) and make up your own mind.

Li, Ouyang, Xu and Zhang report an innovation in a university course. They consider this to have been a successful innovation, and it may well have great merits. The core problem is that Li and colleagues claim that their innovation is successful in improving teaching quality and student learning: when their own data analysis does not support this.

The evidence for a successful innovation

There is much material in the paper on the nature of the innovation, and there is evidence about student responses to it. Here, I am only concerned with the failure of the paper to offer a logical chain of argument to support their knowledge claim that the teaching innovation improved student achievement.

There are (to my reading – please judge for yourself if you can access the paper) some slight ambiguities in some parts of the description of the collection and analysis of achievement data (see note 5 below), but the key indicator relied on by Li, Ouyang, Xu and Zhang is the average score achieved by students in four teaching groups, three of which experienced the teaching innovation (these are denoted collectively as the 'the experimental group') and one group which did not (denoted as 'the control group', although there is no control of variables in the study 1). Each class comprised of 40 students.

The study is not published open access, so I cannot reproduce the copyright figures from the paper here, but below I have drawn a graph of these key data:


Key results from Li et al, 2022: this data was the basis for claiming an effective teaching innovation.

Loading poll ...

It is on the basis of this set of results that Li and colleagues claim that "the average score showed a constant upward trend, and a steady increase was found". Surely, anyone interrogating these data might have pause to wonder if that is the most authentic description of the pattern of scores year on year.

Does anyone teaching in a university really think that assessment methods are good enough to produce average class scores that are meaningful to 3 or 4 significant figures. To a more reasonable level of precision, nearest %age point (which is presumably what these numbers are – that is not made explicit), the results were:


CohortAverage class score
201780
201880
201980
202080
Average class scores (2 s.f.) year on year

When presented to a realistic level of precision, the obvious pattern is…no substantive change year on year!

A truncated graph

In their paper, Li and colleagues do present a graph to compare the average results in 2017 with (not 2018, but) 2019 and 2020, somewhat similar to the one I have reproduced here which should have made it very clear how little the scores varied between cohorts. However, Li and colleagues did not include on their axis the full range of possible scores, but rather only included a small portion of the full range – from 79.4 to 80.4.

This is a perfectly valid procedure often used in science, and it is quite explicitly done (the x-axis is clearly marked), but it does give a visual impression of a large spread of scores which could be quite misleading. In effect, their Figure 4b includes just a slither of my graph above, as shown below. If one takes the portion of the image below that is not greyed out, and stretches it to cover the full extent of the x axis of a graph, that is what is presented in the published account.


In the paper in J.Chem.Ed., Li and colleagues (2022) truncate the scale on their average score axis to expand 1% of the full range (approximated above in the area not shaded over) into a whole graph as their Figure 4b. This gives a visual impression of widely varying scores (to anyone who does not read the axis labels).

Compare images: you can use the 'slider' to change how much of each of the two images is shown.

What might have caused those small variations?

If anyone does think that differences of a few tenths of a percent in average class scores are notable, and that this demonstrates increasing student achievement, then we might ask what causes this?

Li and colleagues seem to be convinced that the change in teaching approach caused the (very modest) increase in scores year on year. That would be possible. (Indeed, Li et al seem to be arguing that the very, very modest shift from 2017 to subsequent years was due to the change of teaching approach; but the not-quite-so-modest shifts from 2018 to 2019 to 2020 are due to developing teacher competence!) However, drawing that conclusion requires making a ceteris paribus assumption: that all other things are equal. That is, that any other relevant variables have been controlled.

Read about confounding variables

Another possibility however is simply that each year the teaching team are more familiar with the science, and have had more experience teaching it to groups at this level. That is quite reasonable and could explain why there might be a modest increase in student outcomes on a course year on year.

Non-equivalent groups of students?

However, a big assumption here is that each of the year groups can be considered to be intrinsically the same at the start of the course (and to have equivalent relevant experiences outside the focal course during the programme). Often in quasi-experimental studies (where randomisation to conditions is not possible 1) a pre-test is used to check for equivalence prior to the innovation: after all, if students are starting from different levels of background knowledge and understanding then they are likely to score differently at the end of a course – and no further explanation of any measured differences in course achievement need be sought.

Read about testing for initial equivalence

In experiments, you randomly assign the units of analysis (e.g., students) to the conditions, which gives some basis for at least comparing any differences in outcomes with the variations likely by chance. But this was not a true experiment as there was no randomisation – the comparisons are between successive year groups.

In Li and colleagues' study, the 40 students taking the class in 2017 are implicitly assumed equivalent to the 40 students taking the class in each of the years 20818-2020: but no evidence is presented to support this assumption. 3

Yet anyone who has taught the same course over a period of time knows that even when a course is unchanged and the entrance requirements stable, there are naturally variations from one year to the next. That is one of the challenges of educational research (Taber, 2019): you never can "take two identical students…two identical classes…two identical teachers…two identical institutions".

Novelty or expectation effects?

We would also have to ignore any difference introduced by the general effect of there being an innovation beyond the nature of the specific innovation (Taber, 2019). That is, students might be more attentive and motivated simply because this course does things differently to their other current courses and past courses. (Perhaps not, but it cannot be ruled out.)

The researchers are likely enthusiastic for, and had high expectations for, the innovation (so high that it seems to have biased their interpretation of the data and blinded them to the obvious problems with their argument) and much research shows that high expectation, in its own right, often influences outcomes.

Read about expectancy effects in studies

Equivalent examination questions and marking?

We also have to assume the assessment was entirely equivalent across the four years. 4 The scores were based on aggregating a number of components:

"The course score was calculated on a percentage basis: attendance (5%), preclass preview (10%), in-class group presentation (10%), postclass mind map (5%), unit tests (10%), midterm examination (20%), and final examination (40%)."

Li, et al, 2022, p.1858

This raises questions about the marking and the examinations:

  • Are the same test and examination questions used each year (that is not usually the case as students can acquire copies of past papers)?
  • If not, how were these instruments standardised to ensure they were not more difficult in some years than others?
  • How reliable is the marking? (Reliable meaning the same scores/mark would be assigned to the same work on a different occasion.)

These various issues do not appear to have been considered.

Change of assessment methodology?

The description above of how the students' course scores were calculated raises another problem. The 2017 cohort were taught by "direct instruction". This is not explained as the authors presumably think we all know exactly what that is : I imagine lectures. By comparison, in the innovation (2018-2020 cohorts):

"The preclass stage of the SCTBL strategy is the distribution of the group preview task; each student in the group is responsible for a task point. The completion of the preview task stimulates students' learning motivation. The in-class stage is a team presentation (typically PowerPoint (PPT)), which promotes students' understanding of knowledge points. The postclass stage is the assignment of team homework and consolidation of knowledge points using a mind map. Mind maps allow an orderly sorting and summarization of the knowledge gathered in the class; they are conducive to connecting knowledge systems and play an important role in consolidating class knowledge."

Li, et al, 2022, p.1856, emphasis added.

Now the assessment of the preview tasks, the in-class group presentations, and the mind maps all contributed to the overall student scores (10%, 10%, 5% respectively). But these are parts of the innovative teaching strategy – they are (presumably) not part of 'direct instruction'. So, the description of how the student class scores were derived only applies to 2018-2020, and the methodology used in 2017 must have been different. (This is not discussed in the paper.) 5

A quarter of the score for the 'experimental' groups came from assessment components that could not have been part of the assessment regime applied to the 2017 cohort. At the very least, the tests and examinations must have been more heavily weighed into the 'control' group students' overall scores. This makes it very unlikely the scores can be meaningfully directly compared from 2017 to subsequent years: if the authors think otherwise they should have presented persuasive evidence of equivalence.


Li and colleagues want to convince us that variations in average course scores can be assumed to be due to a change in teaching approach – even though there are other conflating variables.

So, groups that we cannot assume are equivalent are assessed in ways that we cannot assume to be equivalent and obtain nearly identical average levels of achievement. Despite that, Li and colleagues want to persuade us that the very modest differences in average scores between the 'control' and 'experimental' groups (which is actually larger between different 'experimental group' cohorts than between the 'control' group and the successive 'experimental' cohort) are large enough to be significant and demonstrate their teaching innovation improves student achievement.

Statistical inference

So, even if we thought shifts of less than a 1% average in class achievement were telling, there are no good reasons to assume they are down to the innovation rather than some other factor. But Li and colleagues use statistical tests to tell them whether differences between the 'control' and 'experimental' conditions are significant. They find – just what anyone looking at the graph above would expect – "there is no significant difference in average score" (p.1860).

The scientific convention in using such tests is that the choice of test, and confidence level (e.g., a probability of p<0.05 to be taken as significant) is determined in advance, and the researchers accept the outcomes of the analysis. There is a kind of contract involved – a decision to use a statistical test (chosen in advance as being a valid way of deciding the outcome of an experiment) is seen as a commitment to accept its outcomes. 2 This is a form of honesty in scientific work. Just as it is not acceptable to fabricate data, nor is is acceptable to ignore experimental outcomes when drawing conclusions from research.

Special pleading is allowed in mitigation (e.g., "although our results were non-significant, we think this was due to the small samples sizes, and suggest that further research should be undertaken with large groups {and we are happy to do this if someone gives us a grant}"), but the scientist is not allowed to simply set aside the results of the analysis.


Li and colleagues found no significant difference between the two conditions, yet that did not stop them claiming, and the Journal of Chemical Education publishing, a conclusion that the new teaching approach improved student achievement!

Yet setting aside the results of their analysis is what Li and colleagues do. They carry out an analysis, then simply ignore the findings, and conclude the opposite:

"To conclude, our results suggest that the SCTBL method is an effective way to improve teaching quality and student achievement."

Li, et al, 2022, p.1861

It was this complete disregard of scientific values, rather than the more common failure to appreciate that they were not comparing like with like, that I found really shocking – and led to me writing a formal letter to the journal. Not so much surprise that researchers might do this (I know how intoxicating research can be, and how easy it is to become convinced in one's ideas) but that the peer reviewers for the Journal of Chemical Education did not make the firmest recommendation to the editor that this manuscript could NOT be published until it was corrected so that the conclusion was consistent with the findings.

This seems a very stark failure of peer review, and allows a paper to appear in the literature that presents a conclusion totally unsupported by the evidence available and the analysis undertaken. This also means that Li, Ouyang, Xu and Zhang now have a publication on their academic records that any careful reader can see is critically flawed – something that could have been avoided had peer reviewers:

  • used their common sense to appreciate that variations in class average scores from year to year between 79.8 and 80.3 could not possibly be seen as sufficient to indicate a difference in the effectiveness of teaching approaches;
  • recommended that the authors follow the usual scientific norms and adopt the reasonable scholarly value position that the conclusion of your research should follow from, and not contradict, the results of your data analysis.


Work cited:

Notes

1 Strictly the 2017 cohort has the role of a comparison group, but NOT a control group as there was no randomisation or control of variables, so this was not a true experiment (but a 'quasi-experiment'). However, for clarity, I am here using the original authors' term 'control group'.

Read about experimental research design


2 Some journals are now asking researchers to submit their research designs and protocols to peer review BEFORE starting the research. This prevents wasted effort on work that is flawed in design. Journals will publish a report of the research carried out according to an accepted design – as long as the researchers have kept to their research plans (or only made changes deemed necessary and acceptable by the journal). This prevents researchers seeking to change features of the research because it is not giving the expected findings and means that negative results as well as positive results do get published.


3 'Implicitly' assumed as nowhere do the authors state that they think the classes all start as equivalent – but if they do not assume this then their argument has no logic.

Without this assumption, their argument is like claiming that growing conditions for tree development are better at the front of a house than at the back because on average the trees at the front are taller – even though fast-growing mature trees were planted at the front and slow-growing saplings at the back.


4 From my days working with new teachers, a common rookie mistake was assuming that one could tell a teaching innovation was successful because students achieved an average score of 63% on the (say, acids) module taught by the new method when the same class only averaged 46% on the previous (say, electromagnetism) module. Graduate scientists would look at me with genuine surprise when I asked how they knew the two tests were of comparable difficulty!

Read about why natural scientists tend to make poor social scientists


5 In my (rejected) letter to the Journal of Chemical Education I acknowledged some ambiguity in the paper's discussion of the results. Li and colleagues write:

"The average scores of undergraduates majoring in pharmaceutical engineering in the control group and the experimental group were calculated, and the results are shown in Figure 4b. Statistical significance testing was conducted on the exam scores year to year. The average score for the pharmaceutical engineering class was 79.8 points in 2017 (control group). When SCTBL was implemented for the first time in 2018, there was a slight improvement in the average score (i.e., an increase of 0.11 points, not shown in Figure 4b). However, by 2019 and 2020, the average score increased by 0.32 points and 0.54 points, respectively, with an obvious improvement trend. We used a t test to test whether the SCTBL method can create any significant difference in grades among control groups and the experimental group. The calculation results are shown as follows: t1 = 0.0663, t2 = 0.1930, t3 =0.3279 (t1 <t2 <t3 <t𝛼, t𝛼 =2.024, p>0.05), indicating that there is no significant difference in average score. After three years of continuous implementation of SCTBL, the average score showed a constant upward trend, and a steady increase was found. The SCTBL method brought about improvement in the class average, which provides evidence for its effectiveness in medicinal chemistry."

Li, et al, 2022, p.1858-1860, emphasis added

This appears to refer to three distinct measures:

  • average scores (produced by weighed summations of various assessment components as discussed above)
  • exam scores (perhaps just the "midterm examination…and final examination", or perhaps just the final examination?)
  • grades

Formal grades are not discussed in the paper (the word is only used in this one place), although the authors do refer to categorising students into descriptive classes ('levels') according to scores on 'assessments', and may see these as grades:

"Assessments have been divided into five levels: disqualified (below 60), qualified (60-69), medium (70-79), good (80-89), and excellent (90 and above)."

Li, et al, 2022, p.1856, emphasis added

In the longer extract above, the reference to testing difference in "grades" is followed by reporting the outcome of the test for "average score":

"We used a t test to test …grades …The calculation results … there is no significant difference in average score"

As Student's t-test was used, it seems unlikely that the assignment of students to grades could have been tested. That would surely have needed something like the Chi-squared statistic to test categorical data – looking for an association between (i) the distributions of the number of students in the different cells 'disqualified', 'qualified', 'medium', 'good' and 'excellent'; and (ii) treatment group.

Presumably, then, the statistical testing was applied to the average course scores shown in the graph above. This also makes sense because the classification into descriptive classes loses some of the detail in the data and there is no obvious reason why the researchers would deliberately chose to test 'reduced' data rather than the full data set with the greatest resolution.


Reflecting the population

Sampling an "exceedingly large number of students"


Keith S. Taber


the key to sampling a population is identifying a representative sample

Obtaining a representative sample of a population can be challenging
(Image by Gerd Altmann from Pixabay)


Many studies in education are 'about' an identified population (students taking A level Physics examinations; chemistry teachers in German secondary schools; children transferring from primary to secondary school in Scotland; undergraduates majoring in STEM subjects in Australia…).

Read about populations of interest in research

But, in practice, most studies only collect data from a sample of the population of interest.

Sampling the population

One of the key challenges in social research is sampling. Obtaining a sample is usually not that difficult. However, often the logic of research is something along the lines:

  • 1. Aim – to find out about a population.
  • 2. As it is impractical to collect data from the whole population, collect data from a sample.
  • 3. Analyse data collected from the sample.
  • 4. Draw inferences about the population from the analysis of data collected form the sample.

For example, if one wished to do research into the views of school teachers in England and there are, say, 600 000 of them, it is, unlikely anyone could undertake research that collected and analysed data from all of them and produce results in a short enough period for the findings to still be valid (unless they were prepared to employ a research team of thousands!) But perhaps one could collect data from a sample that would be informative about the population.

This can be a reasonable approach (and, indeed, is a very common approach in research in areas like education) but relies on the assumption that what is true of the sample, can be generalised to the population.

That clearly depends on the sample being representatives of the larger population (at least in those ways which are pertinent to the the research).


When a study (as here in the figure an experiment) collects data from a sample drawn at random from a wider population, then the findings of the experiment can be assumed to apply (on average) to the population. (Figure from Taber, 2019.) In practice, unless a population of interest is quite modest in size (e.g., teachers in one school; post-graduate students in one university department; registered members of a society) it is usually simply not feasible to obtain a random sample.

For example, if we were interested in secondary school students in England, and we had a sample of secondary students from England that (a) reflected the age profile of the population; (b) reflected the gender profile of the population; but (c) were all drawn from one secondary school, this is unlikely to be a representative sample.

  • If we do have a representative sample, then the likely error in generalising from sample to population can be calculated (and can be reduced by having a larger sample);
  • If we do not have a representative sample, then there is no way of knowing how well the findings from the sample reflect the wider population and increasing sample size does not really help; and, for that matter,
  • If we do not know whether we have a representative sample, then, again, there is no way of knowing how well the findings from the sample reflect the wider population and increasing sample size does not really help.

So, the key to sampling a population is identifying a representative sample.

Read about sampling a population

If we know that only a small number of factors are relevant to the research then we may (if we are able to characterise members of the population on these criteria) be able to design a sample which is representative based on those features which are important.

If the relevant factors for a study were teaching subject; years of teaching experience; teacher gender, then we would want to build a sample that fitted the population profile accordingly, so, maybe, 3% female maths teachers with 10+ years of teaching experience, et cetera. We would need suitable demographic information about the population to inform the building of the sample.

We can then randomly select from those members of the the population with the right characteristics within the different 'cells'.

However, if we do not know exactly what specific features might be relevant to characterise a population in a particular research project, the best we might be able to do is to to employ a randomly chosen sample which at least allows the measurement error to be estimated.

Labs for exceedingly large numbers of students

Leopold and Smith (2020) were interested in the use of collaborative group work in a "general chemistry, problem-based lab course" at a United States university, where students worked in fixed groups of three or four throughout the course. As well as using group work for more principled reasons, "group work is also utilized as a way to manage exceedingly large numbers of students and efficiently allocate limited time, space, and equipment" (p.1). They tell readers that

"the case we examine here is a general chemistry, problem-based lab course that enrols approximately 3500 students each academic year"

Leopold & Smith, 2020, p.5

Although they recognised a wide range of potential benefits of collaborative work, these depend upon students being able to work effectively in groups, which requires skills that cannot be take for granted. Leopold and Smith report how structured support was put in place that help students diagnose impediments to the effective work of their groups – and they investigated this in their study.

The data collected was of two types. There was a course evaluation at the end of the year taken by all the students in the cohort, "795 students enrolled [in] the general chemistry I lab course during the spring 2019 semester" (p.7). However, they also collected data from a sample of student groups during the course, in terms of responses to group tasks designed to help them think about and develop their group work.

Population and sample

As the focus of their research was a specific course, the population of interest was the cohort of undergraduates taking the course. Given the large number of students involved, they collected qualitative data from a sample of the groups.

Units of analysis

The course evaluation questions sought individual learners' views so for that data the unit of analysis was the individual student. However, the groups were tasked with working as a group to improve their effectiveness in collaborative learning. So, in Leopold and Smith's sample of groups, the unit of analysis was the group. Some data was received from individual groups members, and other data were submitted as group responses: but the analysis was on the basis of responses from within the specific groups in the sample.

A stratified sample

Leopold and Smith explained that

"We applied a stratified random sampling scheme in order to account for variations across lab sections such as implementation fidelity and instructor approach so as to gain as representative a sample as possible. We stratified by individual instructors teaching the course which included undergraduate teaching assistants (TAs), graduate TAs, and teaching specialists. One student group from each instructor's lab sections was randomly selected. During spring 2019, we had 19 unique instructors teaching the course therefore we selected 19 groups, for a total of 76 students."

Leopold & Smith, 2020, p.7

The paper does not report how the random assignment was made – how it was decided which group would be selected for each instructor. As any competent scientist ought to be able to make a random selection quite easily in this situation, this is perhaps not a serious omission. I mention this because sadly not all authors who report having used randomisation can support this when asked how (Taber, 2013).

Was the sample representative?

Leopold and Smith found that, based on their sample, student groups could diagnose impediments to effective group working, and could often put in place effective strategies to increase their effectiveness.

We might wonder if the sample was representative of the wider population. If the groups were randomly selected in the way claimed then one would expect this would probably be the case – only 'probably', as that is the best randomisation and statistics can do – we can never know for certain that a random sample is representative, only that it is unlikely to be especially unrepresentative!

The only way to know for sure that a sample is genuinely representative of the population of interest in relation to the specific focus of a study, would be to collect data from the whole population and check the sample data matches the population data.* But, of course, if it was feasible to collect data from everyone in the population, there would be no need to sample in the first place.

However, because the end of course evaluation was taken by all students in the cohort (the study population) Leopold and Smith were able to see if those students in the sample responded in ways that were generally in line with the population as a whole. The two figures reproduced here seem to suggest they did!


Figure 1 from Leopold & Smith, 2020, p.10, which is published with a Creative Commons Attribution (CC BY) license allowing reproduction.

Figure 2 from Leopold & Smith, 2020, p.10, which is published with a Creative Commons Attribution (CC BY) license allowing reproduction.

There is clearly a pretty good match here. However, it is important to not over-interpret this data. The questions in the evaluation related to the overall experience of group working, whereas the qualitative data analysed from the sample related to the more specific issues of diagnosing and addressing issues in the working of groups. These are related matters but not identical, and we cannot assume that the very strong similarity between sample and population outcomes in the survey demonstrates (or proves!) that the analysis of data from the sample is also so closely representative of what would have been obtained if all the groups had been included in the data collection.


Experiences of learning through group-workLearning to work more effectively in groups
Samplepatterns in data closely reflected population responsesdata only collected from a sample of groups
Populationall invited to provide feedback[it seems reasonable to assume results from sample are likely to apply to the cohort as a whole]
The similarly of the feedback viewing by students in the sample of groups to the overall cohort responses suggests that the sample was broadly representative of the overall population in terms of developing group-work skills and practices

It might well have been, but we cannot know for sure. (* The only way to know for sure that a sample is genuinely representative of the population of interest in relation to the specific focus of a study, would be …)

However, the way the sample so strongly reflected the population in relation to the evaluation data, shows that in that (related if not identical) respect at least the sample is strongly representative, and that is very likely to give readers confidence in the sampling procedure used. If this had been my study I would have been pretty pleased with this, at least strongly suggestive, circumstantial evidence of the representativeness of the sampling of the student groups.


Work cited:

Didactic control conditions

Another ethically questionable science education experiment?


Keith S. Taber


This seems to be a rhetorical experiment where an educational treatment that is already known to be effective is 'tested' to demonstrate that it is more effective than suboptimal teaching – by asking a teacher to constrain her teaching to students assigned to be an unethical comparison condition

one group of students were deliberately disadvantaged by asking an experienced and skilled teacher to teach in a way all concerned knew was sub-optimal so as to provide a low base line that would be outperformed by the intervention, simply to replicate a much demonstrated finding

In a scientific experiment, an intervention is made into the natural state of affairs to see if it produces a hypothesised change. A key idea in experimental research is control of variables: in the ideal experiment only one thing is changed. In the control condition all relevant variables are fixed so that there is a fair test between the experimental treatment and the control.

Although there are many published experimental studies in education, such research can rarely claim to have fully controlled all potentially relevant variables: there are (nearly always, always?) confounding factors that simply can not be controlled.

Read about confounding variables

Experimental research in education, then, (nearly always, always?) requires some compromising of the pure experimental method.

Where those compromises are substantial, we might ask if experiment was the wrong choice of methodology: even if a good experiment is often the best way to test an idea, a bad experiment may be less informative than, for example, a good case study.

That is primarily a methodological matter, but testing educational innovations and using control conditions in educational studies also raises ethical issues. After all, an experiment means experimenting with real learners' educational experiences. This can certainly be sometimes justified – but there is (or should be) an ethical imperative:

  • researchers should never ask learners to participate in a study condition they have good reason to expect will damage their opportunities to learn.

If researchers want to test a genuinely innovative teaching approach or learning resource, then they have to be confident it has a reasonable chance of being effective before asking learners to participate in a study where they will be subjected to an untested teaching input.

It is equally the case that students assigned to a control condition should never be deliberately subjected to inferior teaching simply in order to help make a strong contrast with an experimental approach being tested. Yet, reading some studies leads to a strong impression that some researchers do seek to constrain teaching to a control group to help bias studies towards the innovation being tested (Taber, 2019). That is, such studies are not genuinely objective, open-minded investigations to test a hypothesis, but 'rhetorical' studies set up to confirm and demonstrate the researchers' prior assumptions. We might say these studies do not reflect true scientific values.


A general scheme for a 'rhetorical experiment'

Read about rhetorical experiments


I have raised this issue in the research literature (Taber, 2019), so when I read experimental studies in education I am minded to check see that any control condition has been set up with a concern to ensure that the interests of all study participants (in both experimental and control conditions) have been properly considered.

Jigsaw cooperative learning in elementary science: physical and chemical changes

I was reading a study called "A jigsaw cooperative learning application in elementary science and technology lessons: physical and chemical changes" (Tarhan, Ayyıldız, Ogunc & Sesen, 2013) published in a respectable research journal (Research in Science & Technological Education).

Tarhan and colleagues adopted a common type of research design, and the journal referees and editor presumably were happy with the design of their study. However, I think the science education community should collectively be more critical about the setting up of control conditions which require students to be deliberately taught in ways that are considered to be less effective (Taber, 2019).


Jigsaw learning involves students working in co-operative groups, and in undertaking peer-teaching

Jigsaw learning is a pedagogic technique which can be seen as a constructivist, student-centred, dialogic, form of 'active learning'. It is based on collaborative groupwork and includes an element of peer-tutoring. In this paper the technique is described as "jigsaw cooperative learning", and the article authors explain that "cooperative learning is an active learning approach in which students work together in small groups to complete an assigned task" (p.185).

Read about jigsaw learning

Random assignment

The study used an experimental design, to compare between learning outcomes in two classes taught the same topic in two different ways. Many studies that compare between two classes are problematic because whole extant classes are assigned to conditions which means that the unit of analysis should be the class (experimental condition, n=1; control condition, n=1). Yet, despite this, such studies commonly analyse results as if each learner was an independent unit of analysis (e.g., experimental condition, n=c.30; control condition, n=c.30) which is necessary to obtain statistical results, but unfortunately means that inferences drawn from those statistics are invalid (Taber, 2019). Such studies offer examples of where there seems little point doing an experiment badly as the very design makes it intrinsically impossible to obtain a (i.e., a valid) statistically significant outcome.


Experimental designs may be categorised as true experiments, quasi-experiments and natural experiments (Taber, 2019).

Tarhan and colleagues, however, randomly assign the learners to the two conditions so can genuinely claim that in their study they have a true experiment: for their study, experimental condition, n=30; control condition, n=31.

Initial equivalence between groups

Assigning students in this way also helped ensure the two groups started from a similar base. Often such experimental studies use a pre-test to compare the groups before teaching. However, often the researchers look for a statistical difference between the groups which does not reach statistical significance (Taber, 2019). That is, if a statistical test shows p≥0.05 (in effect, the initial difference between the groups is not very unlikely to occur by chance) this is taken as evidence of equivalence. That is like saying we will consider two teachers to be of 'equivalent' height as long as there is no more than 30 cm difference in their height!

In effect

'not very different'

is being seen as a synonym for

'near enough the same'


Some analogies for how equivalence is determined in some studies: read about testing for initial equivalence

However, the pretest in Tarhan and colleagues' study found that the difference between two groups in performances on the pretest was at a level likely to occur by chance (not simply something more than 5%, but) 87% of the time. This is a much more convincing basis for seeing the two groups as initially similar.

So, there are two ways in which the Tarhan et al. study seemed better thought-through than many small scale experiments in teaching I have read.

Comparing two conditions

The research was carried out with "sixth grade students in a public elementary school in Izmir, Turkey" (p.184). The focus was learning about physical and chemical changes.

The experimental condition

At the outset of the study, the authors suggest it is already known that

  • "Jigsaw enhances cooperative learning" (p.185)"
  • "Jigsaw promotes positive attitudes and interests, develops communication skills between students, and increases learning achievement in chemistry" (p.186)
  • "the jigsaw technique has the potential to improve students' attitude towards science"
  • development of "students' understanding of chemical equilibrium in a first year general chemistry course [was more successful] in the jigsaw class…than …in the individual learning class"

It seems the approach being tested was already demonstrated to be effective in a range of contexts. Based on the existing research, then, we could already expect well-implemented jigsaw learning to be effective in facilitating student learning.

Similarly, the authors tell the readers that the broader category of cooperative learning has been well established as successful,

"The benefits of cooperative learning have been well documented as being

higher academic achievement,

higher level of reasoning and critical thinking skills,

deeper understanding of learned material,

better attention and less disruptive behavior in class,

more motivation to learn and achieve,

positive attitudes to subject matter,

higher self-esteem and

higher social skills."

Tarhan et al., 2013, p.185

What is there not to like here? So, what was this highly effective teaching approach compared with?

What is being compared?

Tarhan and colleagues tell readers that:

"The experimental group was taught via jigsaw cooperative learning activities developed by the researchers and the control group was taught using the traditional science and technology curriculum."

Tarhan et al., 2013, p.189
A different curriculum?

This seems an unhelpful statement as it does not seem to compare like with like:


conditioncurriculumpedagogy
experimental?jigsaw cooperative learning activities developed by the researchers
control traditional science and technology curriculum?
A genuine experiment would look to control variables, so would not simultaneously vary both curriculum and pedagogy

The study uses a common test to compare learning in the two conditions, so the study only makes sense as an experimental test of jigsaw learning if the same curriculum is being followed in both conditions. Otherwise, there is no prima facie reason to think that the post-test is equally fair in testing what has been taught in the two conditions. 1

The control condition

The paper includes an account of the control condition which seems to make it clear that both groups were taught "the same content", which is helpful as to have done otherwise would have seriously undermined the study.

The control group was instructed via a teacher-centered didactic lecture format. Throughout the lesson, the same science and technology teacher presented the same content as for the experimental group to achieve the same learning objectives, which were taught via detailed instruction in the experimental group. This instruction included lectures, discussions and problem solving. During this process, the teacher used the blackboard and asked some questions related to the subject. Students also used a regular textbook. While the instructor explained the subject, the students listened to her and took notes. The instruction was accomplished in the same amount of time as for the experimental group.

Tarhan et al., 2013, p.194

So, it seems:


conditioncurriculumpedagogy
experimental[by inference: "traditional science and technology curriculum"]jigsaw cooperative learning activities developed by the researchers
control traditional science and technology curriculum
[the same content as for the experimental group to achieve the same learning objectives]
teacher-centred didactic lecture format:
instructor explained the subject and asked questions
controlled variableindependent variable
An experiment relies on control of variables and would not simultaneously vary both curriculum and pedagogy

The statement is helpful, but might be considered ambiguous as "this instruction which included lectures, discussions and problem solving" seems to relate to what had been "taught via detailed instruction in the experimental group".

But this seems incongruent with the wider textual context. The experimental group were taught by a jigsaw learning technique – not lectures, discussions and problem solving. Yet, for that matter, the experimental group were not taught via 'detailed instruction' if this means the teacher presenting the curriculum content. So, this phrasing seems unhelpfully confusing (to me, at least – presumably, the journal referees and editor thought this was clear enough.)

So, this probably means the "lectures, discussions and problem solving" were part of the control condition where "the teacher used the blackboard and asked some questions related to the subject. Students also used a regular textbook. While the instructor explained the subject, the students listened to her and took notes".

'Lectures' certainly fit with that description.

However, genuine 'discussion' work is a dialogic teaching method and would not seem to fit within a "teacher-centered didactic lecture format". But perhaps 'discussion' simply refers to how the "teacher used the blackboard and asked some questions" that members of the class were invited to answer?

Read about dialogic teaching

Writing-up research is a bit like teaching in that in presenting to a particular audience, one works with a mental model of what that audience already knowns and understands, and how they use specific terms, and this model is never likely to be perfectly accurate:

  • when teaching, the learners tend to let you know this, whereas,
  • when writing, this kind of immediate feedback is lacking.

Similarly, problem-solving would not seem to fit within a "teacher-centered didactic lecture format". 'Problem-solving' engages high level cognitive and metacognitive skills because a 'problem' is a task that students are not able to respond to simply by recalling what they have been told and applying learnt algorithms. Problem-solving requires planning and applying strategies to test out ideas and synthesise knowledge. Yet teachers and textbooks commonly refer to simple questions that simply test recall and comprehension, or direct application of learnt techniques, as 'problems' when they are better understood as 'exercises' as they do not pose authentic problems.

The imprecise use of terms that may be understood differently across diverse contexts is characteristic of educational discourse, so Tarhan and colleagues may have simply used the labels that are normally applied in the context where they are working. It should also be noted that as the researchers are based in Turkey they are presumably finding the best English translations they can for the terms used locally.

Read about the challenges of translation in research writing

So, it seems we have:


Experimental conditionin one of the conditions?Control condition
Jigsaw learning (set out in some detail in the paper) – an example of
cooperative learning – an active learning approach in which students work together in small groups
detailed instruction?
discussions (=teacher questioning?)
problem solving? (=practice exercises?)
teacher-centred didactic lecture format…the teacher used the blackboard and asked some questions…a regular textbook….the instructor explained the subject, the students listened and took notes
The independent variable – teaching methodology

The teacher variable

One of the major problems with some educational experiments comparing different teaching approaches is the confound of the teacher. If

  • class A is taught through approach 'a' by teacher 1, and
  • class B is taught through approach 'b' by teacher 2

then even if there is a good case that class A and class B start off as 'equivalent' in terms of readiness to learn about the focal topic then any differences in study outcomes could be as much down to different teachers (and we all know that different teachers are not equivalent!) as different teaching methodology.

At first sight this is easily solved by having the same teacher teach both classes (as in the study discussed here). That certainly seems to help. But, a little thought suggests it is not a foolproof approach (Taber, 2019).

Teachers inevitably have better rapport with some classes than others (even when those classes are shown to be technically 'equivalent') simply because that is the nature of how diverse personalities interact. 3 Even the most professional teachers find they prefer to teach some classes than others, enjoy the teaching more, and seem to get better results (even when the classes are supposed to be equivalent).

In an experiment, there is no reason why the teacher would work better with a class assigned the experimental condition; it might just as well be the control condition. However, this is still a confound and there is no obvious solution to this, except having multiple classes and teachers in each condition such that the statistics can offer guide on whether outcomes are sufficiently unlikely to be able to reasonable discount these types of effect.

Different teachers also have different styles and approaches and skills sets – so the same teacher will not be equally suited to every teaching approach and pedagogy. Again, this does not necessarily advantage the experimental condition, but, again, is something that can only be addressed by having a diverse range of teachers in each condition (Taber, 2019).

So, although we might expect having the same teacher teach both classes is the preferred approach, the same teacher is not exactly the same teacher in different classes or teaching in different ways.

And what do participants expect will happen?

Moreover, expectancy effects can be very influential in education. Expecting something to work, or not work, has been shown to have real effects on outcomes. It may not be true, as some motivational gurus like to pretend, that we can all of us achieve anything if only we believe: but we are more likely to be successful when we believe we can succeed. When confident, we tend to be more motivated, less easily deterred, and (given the human capacity for perceiving with confirmation bias) more likely to judge we are making good progress. So, any research design which communicates to teachers and students (directly, or through the teacher's or researcher's enthusiasm) an expectation of success in some innovation is more likely to lead to success. This is a potential confound that is not even readily addressed by having large numbers of classes and teachers (Taber, 2019)!

Read about expectancy effects

The authors report that

Before implementation of the study, all students and their families were informed about the aims of the study and the privacy of their personal information. Permission for their children attend the study was obtained from all families.

Tarhan et al., 2013, p.194

This is as it should be. School children are not data-fodder for researchers, and they should always be asked for, and give, voluntary informed consent when recruited to join a research project. However, researchers need to open and honest about their work, whilst also being careful about how they present their research aims. We can imagine a possible form of invitation,

We would like you to invite you to be part of a study where some of you will be subject to traditional learning through a teacher-centred didactic lecture format where the teacher will give you notes and ask you questions, and some of you will learn by a different approach that has been shown to enhance learning, promote positive attitudes and interests, develop communication skills, increase achievement, support higher level of reasoning and critical thinking skills, lead to deeper understanding of learned material…

An honest, but unhelpful, briefing for students and parents

If this was how the researchers understood the background to their study, then this would be a fair and honest briefing. Yet, this would clearly set up strong expectations in the student groups!

A suitable teacher

Tarhan and colleagues report that

"A teacher experienced in active learning was trained in how to implement the instruction based on jigsaw cooperative learning. The teacher and researchers discussed the instructional plans before implementing the activities."

Tarhan et al., 2013, p.189

So, the teacher who taught both classes, using an jigsaw cooperative learning in one class and a teacher-centred didactic lecture approach in the other was "experienced in active learning". So, it seems that

  • the researchers were already convinced that active learning approaches were far superior to teaching via a lecture approach
  • the teacher had experience in teaching though more engaging, effective student-centred active learning approaches

despite this, a control condition was set-up that required the teacher to, in effect, de-skill, and teach in a way the researchers were well aware research suggested was inferior, for the sake of carrying out an experiment to demonstrate in a specific context what had already been well demonstrated elsewhere.

In other words, it seems that one group of students were deliberately disadvantaged by asking an experienced and skilled teacher to teach in a way all concerned knew was sub-optimal, so as to provide a low base line that would be outperformed by the intervention, simply to replicate a much demonstrated finding. When seen in that way, this is surely unethical research.

The researchers may not have been consciously conceptualising their design in those terms, but it is hard to see this as a fair test of the jigsaw learning approach – it can show it is better than suboptimal teaching, but does not offer a comparison with an example of the kind of teaching that is recommended in the national context where the research took place.

Unethical, but not unusual

I am not seeking to pick out Tarhan and colleagues in particular for designing an unethical study, because they are not unique in adopting this approach (Taber, 2019): indeed, they are following a common formula (an experimental 'paradigm' in the sense the term is used in psychology).

Tarhan and colleagues have produced a study that is interesting and informative, and which seems well planned, and strongly-motivated when considered as part of tradition of such studies. Clearly, the referees and journal editor were not minded to question the procedure. The problem is that as a science education community we have allowed this tradition to continue such that a form of study that was originally genuinely open-ended (in that it examined under-researched teaching approaches of untested efficacy) has not been modified as published study after published study has slowly turned those untested teaching approaches into well-researched and repeatedly demonstrated approaches.

So much so, that such studies are now in danger of simply being rhetorical research – where (as in this case) the authors tell readers at the outset that it is already known that what they are going to test is widely shown to be effective good practice. Rhetorical research is set up to produce an expected result, and so is not authentic research. A real experiment tests a genuine hypothesis rather than demonstrates a commonplace. A question researchers might ask themselves could be

'how surprised would I be if this leads to a negative outcome'?

If the answer is

'that would be very surprising'

then they should consider modifying their research so it is likely to be more than minimally informative.

Finding out that jigsaw learning achieved learning objectives better/as well as/not so well as, say, P-O-E (predict-observe-explain) activities might be worth knowing: that it is better than deliberately constrained teaching does not tell us very much that is not obvious.

I do think this type of research design is highly questionable and takes unfair advantage of students. It fails to meet my suggested guideline that

  • researchers should never ask learners to participate in a study condition they have good reason to expect will damage their opportunities to learn

The problem of generalisation

Of course, one fair response is that despite all the claims of the superiority of constructivist, active, cooperatative (etc.) learning approaches, the diversity of educational contexts means we can not simply generalise from an experiment in one context and assume the results apply elsewhere.

Read about generalising from research

That is, the research literature shows us that jigsaw learning is an effective teaching approach, but we cannot be certain it will be effective in the particular context of teaching about chemical and physical changes to sixth grade students in a public elementary school in Izmir, Turkey.

Strictly that is true! But we should ask:

do we not know this because

  1. research shows a great variation in whether jigsaw learning is effective or not as it differs according to contexts and conditions
  2. although jigsaw learning has consistently been shown to be effective in many different contexts, no one has yet tested it in the specific case of teaching about chemical and physical changes to sixth grade students in a public elementary school in Izmir, Turkey

It seems clear from the paper that the researchers are presenting the second case (in which case the study would actually be of more interest and importance if had been found that in this context jigsaw learning was not effective).

Given there are very good reasons to expect a positive outcome, there seems no need to 'stack the odds' by using deliberately detrimental control conditions.

Even had situation 1 applied, it seems of limited value to know that jigsaw learning is more effective (in teaching about chemical and physical changes to sixth grade students in a public elementary school in Izmir, Turkey) than an approach we already recognise is suboptimal.

An ethical alternative

This does not mean that there is no value in research that explores well-established teaching approaches in new contexts. However, unless the context is very different from where the approach has already been widely demonstrated, there is little value in comparing it with approaches that are known to be sub-optimal (which in Turkey, a country where constructivist 'reform' teaching approaches are supposed to be the expected standard, seem to often be labelled as 'traditional').

Detailed case studies of the implementation of a reform pedagogy in new contexts that collect rich 'process' data to explore challenges to implementation and to identify especially effective specific practices would surely be more informative? 4

If researchers do feel the need to do experiments, then rather than comparing known-to-be-effective approaches with suboptimal approaches hoping to demonstrate what everyone already knows, why not use comparison conditions that really test the innovation. Of course jigsaw learning out performed lecturing in an elementary school – but how might it have compared with another constructivist approach?

I have described the constructivist science teacher as a kind of learning doctor. Like medical doctors, our first tenet should be to do no harm. So, if researchers want to set up experimental comparisons, they have a duty to try to set up two different approaches that they believe are likely to benefit the learners (whichever condition they are assigned to):

  • not one condition that advantages one group of students
  • and another which deliberately disadvantages another group of students for the benefit of a 'positive' research outcome.

If you already know the outcome then it is not genuine research – and you need a better research question.


Work cited:

Note:

1 Imagine teaching one class about acids by jigsaw learning, and teaching another about the nervous system by some other pedagogy – and then comparing the pedagogies by administering a test – about acids! The class in the jigsaw condition might well do better, without it being reasonable to assume this reflects more effective pedagogy.

So, I am tempted to read this as simply a drafting/typographical error that has been missed, and suspect the authors intended to refer to something like the traditional approach to teaching the science and technology curriculum. Otherwise the experiment is fatally flawed.

Yet, one purpose of the study was to find out

"Does jigsaw cooperative learning instruction contribute to a better conceptual understanding of 'physical and chemical changes' in sixth grade students compared to the traditional science and technology curriculum?"

Tarhan et al., 2013, p.187

This reads as if the researchers felt the curriculum was not sufficiently matched to what they felt were the most important learning objectives in the topic of physical and chemical changes, so they have undertaken some curriculum development, as well as designed a teaching unit accordingly, to be taught by jigsaw learning pedagogy. If so the experiment is testing

traditional curriculum x traditional pedagogy

vs.

reformed curriculum x innovative pedagogy

making it impossible to disentangle the two components.

This suggests the researchers are testing the combination of curriculum and pedagogy, and doing so with a test biased towards the experimental condition. This seems illogical, but I have actually worked in a project where we faced a similar dilemma. In the epiSTEMe project we designed innovative teaching units for lower secondary science and maths. In both physics units we incorporated innovative aspects to the curriculum.

  • In the forces unit material on proportionality was introduced, with examples (car stopping distance) normally not taught at that grade level (Y7);
  • In the electricity unit the normal physics content was embedded in an approach designed to teach aspects of the nature of science.

In the forces unit, the end-of-topic test included material that was included in the project-designed units, but unlikely to be taught in the control classes. There was evidence that on average students in the project classes did better on the test.

In the electricity unit, the nature of science objectives were not tested as these would not necessarily have been included in teaching control classes. On average, there was very little difference in learning about electrical circuits in the two conditions. There was however a very wide range of class performances – oddly just as wide in the experimental condition (where all classes had a common scheme of work, common activities, and common learning materials) as in the control condition where teachers taught the topic in their customary ways.


2 It could be read either as


1

ControlExperimental
The control group was instructed via a teacher-centered didactic lecture format. Throughout the lesson, the same science and technology teacher presented the same content as for the experimental group to achieve the same learning objectives, which were taught via detailed instruction in the experimental group.
…detailed instruction in the experimental group. This instruction included lectures, discussions and problem solving.
During this process, the teacher used the blackboard and asked some questions related to the subject. Students also used a regular textbook. While the instructor explained the subject, the students listened to her and took notes. The instruction was accomplished in the same amount of time as for the experimental group.
What was 'this instruction' which included lectures, discussions and problem solving?

or


2

ControlExperimental
The control group was instructed via a teacher-centered didactic lecture format. Throughout the lesson, the same science and technology teacher presented the same content as for the experimental group to achieve the same learning objectives, which were taught via detailed instruction in the experimental group.
…detailed instruction in the experimental group.
This [sic] instruction included lectures, discussions and problem solving. During this process, the teacher used the blackboard and asked some questions related to the subject. Students also used a regular textbook. While the instructor explained the subject, the students listened to her and took notes. The instruction was accomplished in the same amount of time as for the experimental group.
What was 'this instruction' which included lectures, discussions and problem solving?

3 A class, of course, is not a person, but a collection of people, so perhaps does not have a 'personality' as such. However, for teachers, classes do take on something akin to a personality.

This is not just an impression. It was pointed out above that if a researcher wants to treat each learner as a unit of analysis (necessary to use inferential statistics when only working with a small number of classes) then learners, not intact classes, should be assigned to conditions. However, even a newly formed class will soon develop something akin to a personality. This will certainly be influenced by individual learners present but develop through the history of their evolving mutual interactions and is not just a function of the sum of their individual characteristics.

So, even when a class is formed by random assignment of learners at the start of a study, it is still strictly questionable whether these students should be seen as independent units for analysis (Taber, 2019).


4 I suspect that science educators have a justified high regard for experimental method in the natural sciences, which sometimes blinkers us to its limitations in social contexts where there are myriad interacting variables and limited controls.

Read: Why do natural scientists tend to make poor social scientists?


Fingerprinting an exoplanet

Life, death, and multiple Gaias


Keith S. Taber


NASA might be said to be engaged in looking for other Gaias beyond our Gaia, as Dr Milam explained to another Gaia.

This post is somewhat poignant as something I heard on a radio podcast reminded me how science has recently lost one of its great characters, as well as an example of that most rare thing in today's science – the independent scientist.


Inside Science episode "Deep Space and the Deep Sea – 40 years of the International Whaling Moratorium", presented, perhaps especially aptly, by Gaia Vince

I was listening to the BBC's Inside Science pod-cast episode 'Deep Space and the Deep Sea – 40 years of the International Whaling Moratorium' where the presenter – somewhat ironically, in view of the connection I was making, Gaia Vince – was talking to Dr Stefanie Milam of Nasa's Goddard Space Flight Centre about how the recently launched James Webb Space Telescope could help scientists look for signs of life on other planets.


From: https://jwst.nasa.gov/content/meetTheTeam/people/milam.html

Dr Milam explained that

"spectra…give us all the information that we really need to understand a given environment. And that's one of the amazing parts about the James Webb space telescope. So, what we have access to with the wavelengths that the James Webb space telescope actually operates at, is that we have the fingerprint pattern of given molecules, things like water, carbon monoxide, carbon dioxide, all these things that we find in our own atmosphere, and so by using the infrared wavelengths we can look for these key ingredients in atmospheres around other planets or even, actually, objects in our own solar system, and that tells us a little bit about what is going on as far as the dynamics of that planet, whether or not its has got geological activity, or maybe even something as crazy as biology."

Dr Stefanie Milam, interviewed for 'Inside Science'
"Webb has captured the first clear evidence of carbon dioxide (CO2) in the atmosphere of a planet outside of our solar system!" (Hot Gas Giant Exoplanet WASP-39 b Transit Light Curve, NIRSpec Bright Object Time-Series Spectroscopy.)
Image: NASA, ESA, CSA, and L. Hustak (STScI). Released under 2.0 Generic (CC BY 2.0) License – Some rights reserved by James Webb Space Telescope
Do molecules have fingerprints

Fingerprints have long been used in forensic work to identify criminals (and sometimes their victims) because our fingerprints are pretty unique. Even 'identical' twins do not have identical fingerprints (thought I suspect that fact rather undermines some crime fiction plots). But, to have fingerprints one surely has to have fingers. A palm print requires a palm, and a footprint, a foot. So, can molecules, not known for their manual dexterity, have fingerprints?

Well, it is not exactly by coincidence (as the James Webb space telescope has had a lot of media attention) that I very recently posted here, in the context of new observations of the early Universe, that

"Spectroscopic analysis allows us to compare the pattern of redshifted spectral lines due to the presence of elements absorbing or emitting radiation, with the position of those lines as they are found without any shift. Each element has its own pattern of lines – providing a metaphorical fingerprint.

from: A hundred percent conclusive science. Estimation and certainty in Maisie's galaxy

In chemistry, elements and compounds have unique patterns of energy transitions which can be identified through spectroscopy. So, we have 'metaphorical fingerprints'. To describe a spectrum as a chemical substance's (or entity's, such as an ion's) fingerprint is to use a metaphor. It is not actually a fingerprint – there are no fingers to leave prints – but this figure of speech gets across an idea though an implicit comparison with something already familiar. *1 That is, it is a way of making the unfamiliar familiar (which might be seen as a description of teaching!)

Dead metaphors

But perhaps this has become a 'dead metaphor' so that now chemicals do have fingerprints? One of the main ways that language develops is by words changing their meanings over time as metaphors become so commonly used they case to be metaphorical.

For example, I understand the term electrical charge is a dead metaphor. When electrical charge was first being explored and was still unfamiliar, the term 'charge' was adopted by comparison with the charging of a canon or the charge of shot used in a shotgun. The shot charge refers to the weight of shot included in a cartridge. Today, most people would not know that, whilst being very familiar with the idea of electrical charge. But when the term electrical charge was first used most people knew about charging guns.

So, initially, electrical 'charge' was a metaphor to refer to the amount of 'electricity' – which made use of a familiar comparison. Now it is a dead metaphor, and 'electrical charge' is now considered a technical tern in its own right.

Another example might be electron spin: electrons do not spin in the familiar sense, but really do (now) have spin as the term has been extended to apply to quanticles with inherent angular momentum by analogy with more familiar macroscopic objects that have angular momentum when they are physically rotating. So, we might say that when the term was first used, it was a metaphor, but no longer. (That is, physicists have expanded the range of convenience of the term spin.)

Perhaps, similarly, fingerprint is now so commonly used to mean a unique identifier in a wide range of contexts, that it should no longer be considered a metaphor. I am not sure if that is so, yet, but perhaps it will be in, say, a century's time – and the term will be broadly used without people even noticing that many things have acquired fingerprints without having fingers. (A spectrum will then actually be a chemical substance's or entity's fingerprint.) After all, many words we now commonly use contain fossils of their origins without us noticing. That is, metaphorical fossils, of course. *2

James Lovelock, R.I.P.

The reason I found this news item somewhat poignant was that I was listening to it just a matter of weeks after the death (at age 103) of the scientist Jim Lovelock. *3 Lovelock invented the device which was able to demonstrate the ubiquity of chlorofluorocarbons (CFCs) in the atmosphere. These substances were very commonly used as refrigerants and aerosol propellants as they were very stable, and being un-reactive (so non-toxic) were considered safe.

But this very stability allowed them to remain in and spread through the atmosphere for a very long time until they were broken down in the stratosphere by ultraviolet radiation to give radicals that reacted with the ozone that is so protective of living organisms. Free radical reactions can occur as chain reactions as when a radical interacts with a molecule it leads to a new molecule, plus a new radical which can often take part in a further interaction with another molecule: so, each CFC molecule could lead to the destruction of many ozone molecules. CFCs have now been banned for most purposes to protect the ozone 'layer', and so us.

Life is chemistry out of balance

But another of Lovelock's achievements came when working for NASA to develop means to search for life elsewhere in the universe. As part of the Mariner missions, NASA wanted Lovelock to design apparatus that could be sent to other worlds and search for life (and I think he did help do that), but Lovelock pointed out that one could tell if a planet had life by a spectroscopic analysis.

Any alien species analysing light passing through earth's atmosphere would see its composition was far from chemical equilibrium due to the ongoing activity of its biota. (If life were to cease on earth today, the oxygen content of the atmosphere would very quickly fall from 21% to virtually none at all as oxygen reacts with rocks and other materials.) If the composition of an atmosphere seemed to be in chemical equilibrium, then it was unlikely there was life. However, if there were high concentrations of gases that should react together or with the surface, then something, likely life, must be actively maintaining that combination of gases in the atmosphere.

"Living systems maintain themselves in a state of relatively low entropy at the expense of their nonliving environments. We may assume that this general property is common to all life in the solar system. On this assumption, evidence of a large chemical free energy gradient between surface matter and the atmosphere in contact with it is evidence of life. Furthermore, any planetary biota which interacts with its atmosphere will drive that atmosphere to a state of disequilibrium which, if recognized, would also constitute direct evidence of life, provided the extent of the disequilibrium is significantly greater than abiological processes would permit. It is shown that the existence of life on Earth can be inferred from knowledge of the major and trace components of the atmosphere, even in the absence of any knowledge of the nature or extent of the dominant life forms. Knowledge of the composition of the Martian atmosphere may similarly reveal the presence of life there."

Dian R. Hitchcock and James E. Lovelock – from Lovelock's website (originally published in Icarus: International Journal of the Solar System in 1967)

The story was that NASA did not really want to be told they did not need to send missions with spacecraft to other words such as Mars to look for life, rather that they only had to point a telescope and analyse the spectrum of radiation. Ironically, perhaps, then, that is exactly what they are now doing with planets around other star systems where it is not feasible (not now, perhaps not ever) to send missions.

Gaia and Gaia

But Lovelock became best known for his development and championing of the Gaia theory. According to Gaia (the theory, not the journalist), the development of life on earth has shaped the environment (and not just exploited pre-existing niches) and developed as a huge integrated and interacting system (the biota, but also the seas, the atmosphere, freshwater, the soil,…) such that large scale changes in one part of the system have knock-on effect elsewhere. *4

So, Gaia can be understood not as the whole earth as a planet, or just the biota as the collective life in terms of organisms, but rather as the dynamic system of life of earth and the environment it interacts with. In a sense (and it is important to see this is meant as an analogy, a thinking tool) Gaia is like some supra-organism. Just as snail has a shell that it has produced for its self, Gaia has shaped the biosphere where the biota lives. *4

The system has built in feedback cycles to protect it from perturbations (not by chance, or due to some mysterious power, but due to natural selection) but if it is subject to a large enough input it would shift to a new (and perhaps very different) equilibrium state. *5 This certainly happened when oxygen releasing organisms evolved: the earth today is inhospitable to the organisms that lived here before that event (some survived to leave descendants, but only in places away from the high oxygen concentrations, such as in lower lays of mud beneath the sea), and most organisms alive today would die very quickly in the previous conditions.

It would be nice to think that Gaia, the science journalist that is, was named after the Gaia theory – but Lovelock only started publishing about his Gaia hypothesis about the time that Gaia was born.*6 So, probably not. Gaia is a traditional girl's name, and was the name of the Greek goddess who personified the earth (which is why the name was adopted by Lovelock).

Still, it was poignant to hear a NASA scientist referring to the current value of a method first pointed out by Lovelock when advising NASA in the 1970s and informed by his early thinking about the Gaia hypothesis. NASA might be said to now be engaged in looking for other Gaias on worlds outside our own solar system, as Dr Milam explained to – another – Gaia here on earth.


Notes:

*1 It is an implicit comparison, because the listener/reader is left to appreciate that it is meant as a figure of speech: unlike in a simile ('a spectrum is like a fingerprint') where the comparison is made explicit .


*2 For some years I had a pager (common before mobile phones) – a small electronic device which could receive a text message, so that my wife could contact me in an emergency if I was out visiting schools by phoning a message to be conveyed by a radio signal. If I had been asked why it was called a pager, I would have assumed that each message of text was considered to comprise a 'page'.

However, a few weeks ago I watched an old 'screwball comedy' being shown on television: 'My favourite wife' (or 'My favorite [sic] wife' in US release).

(On the very day that Cary Grant remarries after having his first wife, long missing after being lost at sea, declared legally dead, wife number one reappears having been rescued from a desert island. That this is a very unlikely scenario was played upon when the film was remade in colour, as 'Move Over Darling', with Doris Day and James Garner. The returned first wife, pretending to be a nurse, asks the new wife if she is not afraid the original wife would reappear, as happened in that movie; eliciting the response: 'Movies. When do movies ever reflect real life?')

Some of the action takes place in the honeymoon hotel where groom has disappeared from the suite (these are wealthy people!) having been tracked down by his first wife. The new wife asks the hotel to page him – and this is how that worked with pre-electronic technology:

Paging Mr Arden: Still from 'My Favorite Wife'

*3 So, although I knew Lovelock had died (July 26th), he was still alive at the time of the original broadcast (July 14th). In part, my tardiness comes from the publicly funded BBC's decisions to no longer make available downloads of some of its programmes for iPods and similar devices immediately after broadcast. (This downgrading of the BBC's service to the public seems to be to persuade people to use its own streaming service.)


*4 The Gaia theory developed by Lovelock and Lyn Margulis includes ideas that were discussed by Vladimir Vernadsky almost a century ago. Although Vernadsky's work was well known in scientific circles in the Soviet Union, it did not become known to scientists in Western Europe till much later. Vernadsky used the term 'biosphere' to refer to those 'layers' of the earth (lower atmosphere to outer crust) where life existed.


*5 A perturbation such as as extensive deforestation perhaps, or certainly increasing the atmospheric concentrations of 'greenhouse' gases beyond a certain point.


*6 Described as a hypothesis originally, it has been extensibility developed and would seem to now qualify as a theory (a "consistent, comprehensive, coherent and extensively evidenced explanation of aspects of the natural world") today.