Falsifying research conclusions

You do not need to falsify your results if you are happy to draw conclusions contrary to the outcome of your data analysis.


Keith S. Taber


Li and colleagues claim that their innovation is successful in improving teaching quality and student learning: but their own data analaysis does not support this.

I recently read a research study to evaluate a teaching innovation where the authors

  • presented their results,
  • reported the statistical test they had used to analyse their results,
  • acknowledged that the outcome of their experiment was negative (not statistically significant), then
  • stated their findings as having obtained a positive outcome, and
  • concluded their paper by arguing they had demonstrated their teaching innovation was effective.

Li, Ouyang, Xu and Zhang's (2022) paper in the Journal of Chemical Education contravenes the scientific norm that your conclusions should be consistent with the outcome of your data analysis.
(Magnified portions of this scheme are presented below)

And this was not in a paper in one of those predatory journals that I have criticised so often here – this was a study in a well regarded journal published by a learned scientific society!

The legal analogy

I have suggested (Taber, 2013) that writing up research can be understood in terms of a number of metaphoric roles: researchers need to

  • tell the story of their research;
  • teach readers about the unfamiliar aspects of their work;
  • make a case for the knowledge claims they make.

Three metaphors for writing-up research

All three aspects are important in making a paper accessible and useful to readers, but arguably the most important aspect is the 'legal' analogy: a research paper is an argument to make a claim for new public knowledge. A paper that does not make its case does not add anything of substance to the literature.

Imagine a criminal case where the prosecution seeks to make its argument at a pre-trial hearing:

"The police found fingerprints and D.N.A. evidence at the scene, which they believe were from the accused."

"Were these traces sent for forensic analysis?"

"Of course. The laboratory undertook the standard tests to identify who left these traces."

"And what did these analyses reveal?"

"Well according to the current standards that are widely accepted in the field, the laboratory was unable to find a definite match between the material collected at the scene, and fingerprints and a D.N.A. sample provided by the defendant."

"And what did the police conclude from these findings?"

"The police concluded that the fingerprints and D.N.A. evidence show that the accused was at the scene of the crime."

It seems unlikely that such a scenario has ever played out, at least in any democratic country where there is an independent judiciary, as the prosecution would be open to ridicule and it is quite likely the judge would have some comments about wasting court time. What would seem even more remarkable, however, would be if the judge decided on the basis of this presentation that there was a prima facie case to answer that should proceed to a full jury trial.

Yet in educational research, it seems parallel logic can be persuasive enough to get a paper published in a good peer-reviewed journal.

Testing an educational innovation

The paper was entitled 'Implementation of the Student-Centered Team-Based Learning Teaching Method in a Medicinal Chemistry Curriculum' (Li, Ouyang, Xu & Zhang, 2022), and it was published in the Journal of Chemical Education. 'J.Chem.Ed.' is a well-established, highly respected periodical that takes peer review seriously. It is published by a learned scientific society – the American Chemical Society.

That a study published in such a prestige outlet should have such a serious and obvious flaw is worrying. Of course, no matter how good editorial and peer review standards are, it is inevitable that sometimes work with serious flaws will get published, and it is easy to pick out the odd problematic paper and ignore the vast majority of quality work being published. But, I did think this was a blatant problem that should have been spotted.

Indeed, because I have a lot of respect for the Journal of Chemical Education I decided not to blog about it ("but that is what you are doing…?"; yes, but stick with me) and to take time to write a detailed letter to the journal setting out the problem in the hope this would be acknowledged and the published paper would not stand unchallenged in the literature. The journal declined to publish my letter although the referees seemed to generally accept the critique. This suggests to me that this was not just an isolated case of something slipping through – but a failure to appreciate the need for robust scientific standards in publishing educational research.

Read the letter submitted to the Journal of Chemical Education

A flawed paper does not imply worthless research

I am certainly not suggesting that there is no merit in Li, Ouyang, Xu and Zhang's work. Nor am I arguing that their work was not worth publishing in the journal. My argument is that Li and colleague's paper draws an invalid conclusion, and makes misleading statements inconsistent with the research data presented, and that it should not have been published in this form. These problems are pretty obvious, and should (I felt) have been spotted in peer review. The authors should have been asked to address these issues, and follow normal scientific standards and norms such that their conclusions follow from, rather than contradict, their results.

That is my take. Please read my reasoning below (and the original study if you have access to J.Chem.Ed.) and make up your own mind.

Li, Ouyang, Xu and Zhang report an innovation in a university course. They consider this to have been a successful innovation, and it may well have great merits. The core problem is that Li and colleagues claim that their innovation is successful in improving teaching quality and student learning: when their own data analysis does not support this.

The evidence for a successful innovation

There is much material in the paper on the nature of the innovation, and there is evidence about student responses to it. Here, I am only concerned with the failure of the paper to offer a logical chain of argument to support their knowledge claim that the teaching innovation improved student achievement.

There are (to my reading – please judge for yourself if you can access the paper) some slight ambiguities in some parts of the description of the collection and analysis of achievement data (see note 5 below), but the key indicator relied on by Li, Ouyang, Xu and Zhang is the average score achieved by students in four teaching groups, three of which experienced the teaching innovation (these are denoted collectively as the 'the experimental group') and one group which did not (denoted as 'the control group', although there is no control of variables in the study 1). Each class comprised of 40 students.

The study is not published open access, so I cannot reproduce the copyright figures from the paper here, but below I have drawn a graph of these key data:


Key results from Li et al, 2022: this data was the basis for claiming an effective teaching innovation.

Loading poll ...

It is on the basis of this set of results that Li and colleagues claim that "the average score showed a constant upward trend, and a steady increase was found". Surely, anyone interrogating these data might have pause to wonder if that is the most authentic description of the pattern of scores year on year.

Does anyone teaching in a university really think that assessment methods are good enough to produce average class scores that are meaningful to 3 or 4 significant figures. To a more reasonable level of precision, nearest %age point (which is presumably what these numbers are – that is not made explicit), the results were:


CohortAverage class score
201780
201880
201980
202080
Average class scores (2 s.f.) year on year

When presented to a realistic level of precision, the obvious pattern is…no substantive change year on year!

A truncated graph

In their paper, Li and colleagues do present a graph to compare the average results in 2017 with (not 2018, but) 2019 and 2020, somewhat similar to the one I have reproduced here which should have made it very clear how little the scores varied between cohorts. However, Li and colleagues did not include on their axis the full range of possible scores, but rather only included a small portion of the full range – from 79.4 to 80.4.

This is a perfectly valid procedure often used in science, and it is quite explicitly done (the x-axis is clearly marked), but it does give a visual impression of a large spread of scores which could be quite misleading. In effect, their Figure 4b includes just a slither of my graph above, as shown below. If one takes the portion of the image below that is not greyed out, and stretches it to cover the full extent of the x axis of a graph, that is what is presented in the published account.


In the paper in J.Chem.Ed., Li and colleagues (2022) truncate the scale on their average score axis to expand 1% of the full range (approximated above in the area not shaded over) into a whole graph as their Figure 4b. This gives a visual impression of widely varying scores (to anyone who does not read the axis labels).

Compare images: you can use the 'slider' to change how much of each of the two images is shown.

What might have caused those small variations?

If anyone does think that differences of a few tenths of a percent in average class scores are notable, and that this demonstrates increasing student achievement, then we might ask what causes this?

Li and colleagues seem to be convinced that the change in teaching approach caused the (very modest) increase in scores year on year. That would be possible. (Indeed, Li et al seem to be arguing that the very, very modest shift from 2017 to subsequent years was due to the change of teaching approach; but the not-quite-so-modest shifts from 2018 to 2019 to 2020 are due to developing teacher competence!) However, drawing that conclusion requires making a ceteris paribus assumption: that all other things are equal. That is, that any other relevant variables have been controlled.

Read about confounding variables

Another possibility however is simply that each year the teaching team are more familiar with the science, and have had more experience teaching it to groups at this level. That is quite reasonable and could explain why there might be a modest increase in student outcomes on a course year on year.

Non-equivalent groups of students?

However, a big assumption here is that each of the year groups can be considered to be intrinsically the same at the start of the course (and to have equivalent relevant experiences outside the focal course during the programme). Often in quasi-experimental studies (where randomisation to conditions is not possible 1) a pre-test is used to check for equivalence prior to the innovation: after all, if students are starting from different levels of background knowledge and understanding then they are likely to score differently at the end of a course – and no further explanation of any measured differences in course achievement need be sought.

Read about testing for initial equivalence

In experiments, you randomly assign the units of analysis (e.g., students) to the conditions, which gives some basis for at least comparing any differences in outcomes with the variations likely by chance. But this was not a true experiment as there was no randomisation – the comparisons are between successive year groups.

In Li and colleagues' study, the 40 students taking the class in 2017 are implicitly assumed equivalent to the 40 students taking the class in each of the years 20818-2020: but no evidence is presented to support this assumption. 3

Yet anyone who has taught the same course over a period of time knows that even when a course is unchanged and the entrance requirements stable, there are naturally variations from one year to the next. That is one of the challenges of educational research (Taber, 2019): you never can "take two identical students…two identical classes…two identical teachers…two identical institutions".

Novelty or expectation effects?

We would also have to ignore any difference introduced by the general effect of there being an innovation beyond the nature of the specific innovation (Taber, 2019). That is, students might be more attentive and motivated simply because this course does things differently to their other current courses and past courses. (Perhaps not, but it cannot be ruled out.)

The researchers are likely enthusiastic for, and had high expectations for, the innovation (so high that it seems to have biased their interpretation of the data and blinded them to the obvious problems with their argument) and much research shows that high expectation, in its own right, often influences outcomes.

Read about expectancy effects in studies

Equivalent examination questions and marking?

We also have to assume the assessment was entirely equivalent across the four years. 4 The scores were based on aggregating a number of components:

"The course score was calculated on a percentage basis: attendance (5%), preclass preview (10%), in-class group presentation (10%), postclass mind map (5%), unit tests (10%), midterm examination (20%), and final examination (40%)."

Li, et al, 2022, p.1858

This raises questions about the marking and the examinations:

  • Are the same test and examination questions used each year (that is not usually the case as students can acquire copies of past papers)?
  • If not, how were these instruments standardised to ensure they were not more difficult in some years than others?
  • How reliable is the marking? (Reliable meaning the same scores/mark would be assigned to the same work on a different occasion.)

These various issues do not appear to have been considered.

Change of assessment methodology?

The description above of how the students' course scores were calculated raises another problem. The 2017 cohort were taught by "direct instruction". This is not explained as the authors presumably think we all know exactly what that is : I imagine lectures. By comparison, in the innovation (2018-2020 cohorts):

"The preclass stage of the SCTBL strategy is the distribution of the group preview task; each student in the group is responsible for a task point. The completion of the preview task stimulates students' learning motivation. The in-class stage is a team presentation (typically PowerPoint (PPT)), which promotes students' understanding of knowledge points. The postclass stage is the assignment of team homework and consolidation of knowledge points using a mind map. Mind maps allow an orderly sorting and summarization of the knowledge gathered in the class; they are conducive to connecting knowledge systems and play an important role in consolidating class knowledge."

Li, et al, 2022, p.1856, emphasis added.

Now the assessment of the preview tasks, the in-class group presentations, and the mind maps all contributed to the overall student scores (10%, 10%, 5% respectively). But these are parts of the innovative teaching strategy – they are (presumably) not part of 'direct instruction'. So, the description of how the student class scores were derived only applies to 2018-2020, and the methodology used in 2017 must have been different. (This is not discussed in the paper.) 5

A quarter of the score for the 'experimental' groups came from assessment components that could not have been part of the assessment regime applied to the 2017 cohort. At the very least, the tests and examinations must have been more heavily weighed into the 'control' group students' overall scores. This makes it very unlikely the scores can be meaningfully directly compared from 2017 to subsequent years: if the authors think otherwise they should have presented persuasive evidence of equivalence.


Li and colleagues want to convince us that variations in average course scores can be assumed to be due to a change in teaching approach – even though there are other conflating variables.

So, groups that we cannot assume are equivalent are assessed in ways that we cannot assume to be equivalent and obtain nearly identical average levels of achievement. Despite that, Li and colleagues want to persuade us that the very modest differences in average scores between the 'control' and 'experimental' groups (which is actually larger between different 'experimental group' cohorts than between the 'control' group and the successive 'experimental' cohort) are large enough to be significant and demonstrate their teaching innovation improves student achievement.

Statistical inference

So, even if we thought shifts of less than a 1% average in class achievement were telling, there are no good reasons to assume they are down to the innovation rather than some other factor. But Li and colleagues use statistical tests to tell them whether differences between the 'control' and 'experimental' conditions are significant. They find – just what anyone looking at the graph above would expect – "there is no significant difference in average score" (p.1860).

The scientific convention in using such tests is that the choice of test, and confidence level (e.g., a probability of p<0.05 to be taken as significant) is determined in advance, and the researchers accept the outcomes of the analysis. There is a kind of contract involved – a decision to use a statistical test (chosen in advance as being a valid way of deciding the outcome of an experiment) is seen as a commitment to accept its outcomes. 2 This is a form of honesty in scientific work. Just as it is not acceptable to fabricate data, nor is is acceptable to ignore experimental outcomes when drawing conclusions from research.

Special pleading is allowed in mitigation (e.g., "although our results were non-significant, we think this was due to the small samples sizes, and suggest that further research should be undertaken with large groups {and we are happy to do this if someone gives us a grant}"), but the scientist is not allowed to simply set aside the results of the analysis.


Li and colleagues found no significant difference between the two conditions, yet that did not stop them claiming, and the Journal of Chemical Education publishing, a conclusion that the new teaching approach improved student achievement!

Yet setting aside the results of their analysis is what Li and colleagues do. They carry out an analysis, then simply ignore the findings, and conclude the opposite:

"To conclude, our results suggest that the SCTBL method is an effective way to improve teaching quality and student achievement."

Li, et al, 2022, p.1861

It was this complete disregard of scientific values, rather than the more common failure to appreciate that they were not comparing like with like, that I found really shocking – and led to me writing a formal letter to the journal. Not so much surprise that researchers might do this (I know how intoxicating research can be, and how easy it is to become convinced in one's ideas) but that the peer reviewers for the Journal of Chemical Education did not make the firmest recommendation to the editor that this manuscript could NOT be published until it was corrected so that the conclusion was consistent with the findings.

This seems a very stark failure of peer review, and allows a paper to appear in the literature that presents a conclusion totally unsupported by the evidence available and the analysis undertaken. This also means that Li, Ouyang, Xu and Zhang now have a publication on their academic records that any careful reader can see is critically flawed – something that could have been avoided had peer reviewers:

  • used their common sense to appreciate that variations in class average scores from year to year between 79.8 and 80.3 could not possibly be seen as sufficient to indicate a difference in the effectiveness of teaching approaches;
  • recommended that the authors follow the usual scientific norms and adopt the reasonable scholarly value position that the conclusion of your research should follow from, and not contradict, the results of your data analysis.


Work cited:

Notes

1 Strictly the 2017 cohort has the role of a comparison group, but NOT a control group as there was no randomisation or control of variables, so this was not a true experiment (but a 'quasi-experiment'). However, for clarity, I am here using the original authors' term 'control group'.

Read about experimental research design


2 Some journals are now asking researchers to submit their research designs and protocols to peer review BEFORE starting the research. This prevents wasted effort on work that is flawed in design. Journals will publish a report of the research carried out according to an accepted design – as long as the researchers have kept to their research plans (or only made changes deemed necessary and acceptable by the journal). This prevents researchers seeking to change features of the research because it is not giving the expected findings and means that negative results as well as positive results do get published.


3 'Implicitly' assumed as nowhere do the authors state that they think the classes all start as equivalent – but if they do not assume this then their argument has no logic.

Without this assumption, their argument is like claiming that growing conditions for tree development are better at the front of a house than at the back because on average the trees at the front are taller – even though fast-growing mature trees were planted at the front and slow-growing saplings at the back.


4 From my days working with new teachers, a common rookie mistake was assuming that one could tell a teaching innovation was successful because students achieved an average score of 63% on the (say, acids) module taught by the new method when the same class only averaged 46% on the previous (say, electromagnetism) module. Graduate scientists would look at me with genuine surprise when I asked how they knew the two tests were of comparable difficulty!

Read about why natural scientists tend to make poor social scientists


5 In my (rejected) letter to the Journal of Chemical Education I acknowledged some ambiguity in the paper's discussion of the results. Li and colleagues write:

"The average scores of undergraduates majoring in pharmaceutical engineering in the control group and the experimental group were calculated, and the results are shown in Figure 4b. Statistical significance testing was conducted on the exam scores year to year. The average score for the pharmaceutical engineering class was 79.8 points in 2017 (control group). When SCTBL was implemented for the first time in 2018, there was a slight improvement in the average score (i.e., an increase of 0.11 points, not shown in Figure 4b). However, by 2019 and 2020, the average score increased by 0.32 points and 0.54 points, respectively, with an obvious improvement trend. We used a t test to test whether the SCTBL method can create any significant difference in grades among control groups and the experimental group. The calculation results are shown as follows: t1 = 0.0663, t2 = 0.1930, t3 =0.3279 (t1 <t2 <t3 <t𝛼, t𝛼 =2.024, p>0.05), indicating that there is no significant difference in average score. After three years of continuous implementation of SCTBL, the average score showed a constant upward trend, and a steady increase was found. The SCTBL method brought about improvement in the class average, which provides evidence for its effectiveness in medicinal chemistry."

Li, et al, 2022, p.1858-1860, emphasis added

This appears to refer to three distinct measures:

  • average scores (produced by weighed summations of various assessment components as discussed above)
  • exam scores (perhaps just the "midterm examination…and final examination", or perhaps just the final examination?)
  • grades

Formal grades are not discussed in the paper (the word is only used in this one place), although the authors do refer to categorising students into descriptive classes ('levels') according to scores on 'assessments', and may see these as grades:

"Assessments have been divided into five levels: disqualified (below 60), qualified (60-69), medium (70-79), good (80-89), and excellent (90 and above)."

Li, et al, 2022, p.1856, emphasis added

In the longer extract above, the reference to testing difference in "grades" is followed by reporting the outcome of the test for "average score":

"We used a t test to test …grades …The calculation results … there is no significant difference in average score"

As Student's t-test was used, it seems unlikely that the assignment of students to grades could have been tested. That would surely have needed something like the Chi-squared statistic to test categorical data – looking for an association between (i) the distributions of the number of students in the different cells 'disqualified', 'qualified', 'medium', 'good' and 'excellent'; and (ii) treatment group.

Presumably, then, the statistical testing was applied to the average course scores shown in the graph above. This also makes sense because the classification into descriptive classes loses some of the detail in the data and there is no obvious reason why the researchers would deliberately chose to test 'reduced' data rather than the full data set with the greatest resolution.


Reflecting the population

Sampling an "exceedingly large number of students"


Keith S. Taber


the key to sampling a population is identifying a representative sample

Obtaining a representative sample of a population can be challenging
(Image by Gerd Altmann from Pixabay)


Many studies in education are 'about' an identified population (students taking A level Physics examinations; chemistry teachers in German secondary schools; children transferring from primary to secondary school in Scotland; undergraduates majoring in STEM subjects in Australia…).

Read about populations of interest in research

But, in practice, most studies only collect data from a sample of the population of interest.

Sampling the population

One of the key challenges in social research is sampling. Obtaining a sample is usually not that difficult. However, often the logic of research is something along the lines:

  • 1. Aim – to find out about a population.
  • 2. As it is impractical to collect data from the whole population, collect data from a sample.
  • 3. Analyse data collected from the sample.
  • 4. Draw inferences about the population from the analysis of data collected form the sample.

For example, if one wished to do research into the views of school teachers in England and there are, say, 600 000 of them, it is, unlikely anyone could undertake research that collected and analysed data from all of them and produce results in a short enough period for the findings to still be valid (unless they were prepared to employ a research team of thousands!) But perhaps one could collect data from a sample that would be informative about the population.

This can be a reasonable approach (and, indeed, is a very common approach in research in areas like education) but relies on the assumption that what is true of the sample, can be generalised to the population.

That clearly depends on the sample being representatives of the larger population (at least in those ways which are pertinent to the the research).


When a study (as here in the figure an experiment) collects data from a sample drawn at random from a wider population, then the findings of the experiment can be assumed to apply (on average) to the population. (Figure from Taber, 2019.) In practice, unless a population of interest is quite modest in size (e.g., teachers in one school; post-graduate students in one university department; registered members of a society) it is usually simply not feasible to obtain a random sample.

For example, if we were interested in secondary school students in England, and we had a sample of secondary students from England that (a) reflected the age profile of the population; (b) reflected the gender profile of the population; but (c) were all drawn from one secondary school, this is unlikely to be a representative sample.

  • If we do have a representative sample, then the likely error in generalising from sample to population can be calculated (and can be reduced by having a larger sample);
  • If we do not have a representative sample, then there is no way of knowing how well the findings from the sample reflect the wider population and increasing sample size does not really help; and, for that matter,
  • If we do not know whether we have a representative sample, then, again, there is no way of knowing how well the findings from the sample reflect the wider population and increasing sample size does not really help.

So, the key to sampling a population is identifying a representative sample.

Read about sampling a population

If we know that only a small number of factors are relevant to the research then we may (if we are able to characterise members of the population on these criteria) be able to design a sample which is representative based on those features which are important.

If the relevant factors for a study were teaching subject; years of teaching experience; teacher gender, then we would want to build a sample that fitted the population profile accordingly, so, maybe, 3% female maths teachers with 10+ years of teaching experience, et cetera. We would need suitable demographic information about the population to inform the building of the sample.

We can then randomly select from those members of the the population with the right characteristics within the different 'cells'.

However, if we do not know exactly what specific features might be relevant to characterise a population in a particular research project, the best we might be able to do is to to employ a randomly chosen sample which at least allows the measurement error to be estimated.

Labs for exceedingly large numbers of students

Leopold and Smith (2020) were interested in the use of collaborative group work in a "general chemistry, problem-based lab course" at a United States university, where students worked in fixed groups of three or four throughout the course. As well as using group work for more principled reasons, "group work is also utilized as a way to manage exceedingly large numbers of students and efficiently allocate limited time, space, and equipment" (p.1). They tell readers that

"the case we examine here is a general chemistry, problem-based lab course that enrols approximately 3500 students each academic year"

Leopold & Smith, 2020, p.5

Although they recognised a wide range of potential benefits of collaborative work, these depend upon students being able to work effectively in groups, which requires skills that cannot be take for granted. Leopold and Smith report how structured support was put in place that help students diagnose impediments to the effective work of their groups – and they investigated this in their study.

The data collected was of two types. There was a course evaluation at the end of the year taken by all the students in the cohort, "795 students enrolled [in] the general chemistry I lab course during the spring 2019 semester" (p.7). However, they also collected data from a sample of student groups during the course, in terms of responses to group tasks designed to help them think about and develop their group work.

Population and sample

As the focus of their research was a specific course, the population of interest was the cohort of undergraduates taking the course. Given the large number of students involved, they collected qualitative data from a sample of the groups.

Units of analysis

The course evaluation questions sought individual learners' views so for that data the unit of analysis was the individual student. However, the groups were tasked with working as a group to improve their effectiveness in collaborative learning. So, in Leopold and Smith's sample of groups, the unit of analysis was the group. Some data was received from individual groups members, and other data were submitted as group responses: but the analysis was on the basis of responses from within the specific groups in the sample.

A stratified sample

Leopold and Smith explained that

"We applied a stratified random sampling scheme in order to account for variations across lab sections such as implementation fidelity and instructor approach so as to gain as representative a sample as possible. We stratified by individual instructors teaching the course which included undergraduate teaching assistants (TAs), graduate TAs, and teaching specialists. One student group from each instructor's lab sections was randomly selected. During spring 2019, we had 19 unique instructors teaching the course therefore we selected 19 groups, for a total of 76 students."

Leopold & Smith, 2020, p.7

The paper does not report how the random assignment was made – how it was decided which group would be selected for each instructor. As any competent scientist ought to be able to make a random selection quite easily in this situation, this is perhaps not a serious omission. I mention this because sadly not all authors who report having used randomisation can support this when asked how (Taber, 2013).

Was the sample representative?

Leopold and Smith found that, based on their sample, student groups could diagnose impediments to effective group working, and could often put in place effective strategies to increase their effectiveness.

We might wonder if the sample was representative of the wider population. If the groups were randomly selected in the way claimed then one would expect this would probably be the case – only 'probably', as that is the best randomisation and statistics can do – we can never know for certain that a random sample is representative, only that it is unlikely to be especially unrepresentative!

The only way to know for sure that a sample is genuinely representative of the population of interest in relation to the specific focus of a study, would be to collect data from the whole population and check the sample data matches the population data.* But, of course, if it was feasible to collect data from everyone in the population, there would be no need to sample in the first place.

However, because the end of course evaluation was taken by all students in the cohort (the study population) Leopold and Smith were able to see if those students in the sample responded in ways that were generally in line with the population as a whole. The two figures reproduced here seem to suggest they did!


Figure 1 from Leopold & Smith, 2020, p.10, which is published with a Creative Commons Attribution (CC BY) license allowing reproduction.

Figure 2 from Leopold & Smith, 2020, p.10, which is published with a Creative Commons Attribution (CC BY) license allowing reproduction.

There is clearly a pretty good match here. However, it is important to not over-interpret this data. The questions in the evaluation related to the overall experience of group working, whereas the qualitative data analysed from the sample related to the more specific issues of diagnosing and addressing issues in the working of groups. These are related matters but not identical, and we cannot assume that the very strong similarity between sample and population outcomes in the survey demonstrates (or proves!) that the analysis of data from the sample is also so closely representative of what would have been obtained if all the groups had been included in the data collection.


Experiences of learning through group-workLearning to work more effectively in groups
Samplepatterns in data closely reflected population responsesdata only collected from a sample of groups
Populationall invited to provide feedback[it seems reasonable to assume results from sample are likely to apply to the cohort as a whole]
The similarly of the feedback viewing by students in the sample of groups to the overall cohort responses suggests that the sample was broadly representative of the overall population in terms of developing group-work skills and practices

It might well have been, but we cannot know for sure. (* The only way to know for sure that a sample is genuinely representative of the population of interest in relation to the specific focus of a study, would be …)

However, the way the sample so strongly reflected the population in relation to the evaluation data, shows that in that (related if not identical) respect at least the sample is strongly representative, and that is very likely to give readers confidence in the sampling procedure used. If this had been my study I would have been pretty pleased with this, at least strongly suggestive, circumstantial evidence of the representativeness of the sampling of the student groups.


Work cited:

Didactic control conditions

Another ethically questionable science education experiment?


Keith S. Taber


This seems to be a rhetorical experiment where an educational treatment that is already known to be effective is 'tested' to demonstrate that it is more effective than suboptimal teaching – by asking a teacher to constrain her teaching to students assigned to be an unethical comparison condition

one group of students were deliberately disadvantaged by asking an experienced and skilled teacher to teach in a way all concerned knew was sub-optimal so as to provide a low base line that would be outperformed by the intervention, simply to replicate a much demonstrated finding

In a scientific experiment, an intervention is made into the natural state of affairs to see if it produces a hypothesised change. A key idea in experimental research is control of variables: in the ideal experiment only one thing is changed. In the control condition all relevant variables are fixed so that there is a fair test between the experimental treatment and the control.

Although there are many published experimental studies in education, such research can rarely claim to have fully controlled all potentially relevant variables: there are (nearly always, always?) confounding factors that simply can not be controlled.

Read about confounding variables

Experimental research in education, then, (nearly always, always?) requires some compromising of the pure experimental method.

Where those compromises are substantial, we might ask if experiment was the wrong choice of methodology: even if a good experiment is often the best way to test an idea, a bad experiment may be less informative than, for example, a good case study.

That is primarily a methodological matter, but testing educational innovations and using control conditions in educational studies also raises ethical issues. After all, an experiment means experimenting with real learners' educational experiences. This can certainly be sometimes justified – but there is (or should be) an ethical imperative:

  • researchers should never ask learners to participate in a study condition they have good reason to expect will damage their opportunities to learn.

If researchers want to test a genuinely innovative teaching approach or learning resource, then they have to be confident it has a reasonable chance of being effective before asking learners to participate in a study where they will be subjected to an untested teaching input.

It is equally the case that students assigned to a control condition should never be deliberately subjected to inferior teaching simply in order to help make a strong contrast with an experimental approach being tested. Yet, reading some studies leads to a strong impression that some researchers do seek to constrain teaching to a control group to help bias studies towards the innovation being tested (Taber, 2019). That is, such studies are not genuinely objective, open-minded investigations to test a hypothesis, but 'rhetorical' studies set up to confirm and demonstrate the researchers' prior assumptions. We might say these studies do not reflect true scientific values.


A general scheme for a 'rhetorical experiment'

Read about rhetorical experiments


I have raised this issue in the research literature (Taber, 2019), so when I read experimental studies in education I am minded to check see that any control condition has been set up with a concern to ensure that the interests of all study participants (in both experimental and control conditions) have been properly considered.

Jigsaw cooperative learning in elementary science: physical and chemical changes

I was reading a study called "A jigsaw cooperative learning application in elementary science and technology lessons: physical and chemical changes" (Tarhan, Ayyıldız, Ogunc & Sesen, 2013) published in a respectable research journal (Research in Science & Technological Education).

Tarhan and colleagues adopted a common type of research design, and the journal referees and editor presumably were happy with the design of their study. However, I think the science education community should collectively be more critical about the setting up of control conditions which require students to be deliberately taught in ways that are considered to be less effective (Taber, 2019).


Jigsaw learning involves students working in co-operative groups, and in undertaking peer-teaching

Jigsaw learning is a pedagogic technique which can be seen as a constructivist, student-centred, dialogic, form of 'active learning'. It is based on collaborative groupwork and includes an element of peer-tutoring. In this paper the technique is described as "jigsaw cooperative learning", and the article authors explain that "cooperative learning is an active learning approach in which students work together in small groups to complete an assigned task" (p.185).

Read about jigsaw learning

Random assignment

The study used an experimental design, to compare between learning outcomes in two classes taught the same topic in two different ways. Many studies that compare between two classes are problematic because whole extant classes are assigned to conditions which means that the unit of analysis should be the class (experimental condition, n=1; control condition, n=1). Yet, despite this, such studies commonly analyse results as if each learner was an independent unit of analysis (e.g., experimental condition, n=c.30; control condition, n=c.30) which is necessary to obtain statistical results, but unfortunately means that inferences drawn from those statistics are invalid (Taber, 2019). Such studies offer examples of where there seems little point doing an experiment badly as the very design makes it intrinsically impossible to obtain a (i.e., a valid) statistically significant outcome.


Experimental designs may be categorised as true experiments, quasi-experiments and natural experiments (Taber, 2019).

Tarhan and colleagues, however, randomly assign the learners to the two conditions so can genuinely claim that in their study they have a true experiment: for their study, experimental condition, n=30; control condition, n=31.

Initial equivalence between groups

Assigning students in this way also helped ensure the two groups started from a similar base. Often such experimental studies use a pre-test to compare the groups before teaching. However, often the researchers look for a statistical difference between the groups which does not reach statistical significance (Taber, 2019). That is, if a statistical test shows p≥0.05 (in effect, the initial difference between the groups is not very unlikely to occur by chance) this is taken as evidence of equivalence. That is like saying we will consider two teachers to be of 'equivalent' height as long as there is no more than 30 cm difference in their height!

In effect

'not very different'

is being seen as a synonym for

'near enough the same'


Some analogies for how equivalence is determined in some studies: read about testing for initial equivalence

However, the pretest in Tarhan and colleagues' study found that the difference between two groups in performances on the pretest was at a level likely to occur by chance (not simply something more than 5%, but) 87% of the time. This is a much more convincing basis for seeing the two groups as initially similar.

So, there are two ways in which the Tarhan et al. study seemed better thought-through than many small scale experiments in teaching I have read.

Comparing two conditions

The research was carried out with "sixth grade students in a public elementary school in Izmir, Turkey" (p.184). The focus was learning about physical and chemical changes.

The experimental condition

At the outset of the study, the authors suggest it is already known that

  • "Jigsaw enhances cooperative learning" (p.185)"
  • "Jigsaw promotes positive attitudes and interests, develops communication skills between students, and increases learning achievement in chemistry" (p.186)
  • "the jigsaw technique has the potential to improve students' attitude towards science"
  • development of "students' understanding of chemical equilibrium in a first year general chemistry course [was more successful] in the jigsaw class…than …in the individual learning class"

It seems the approach being tested was already demonstrated to be effective in a range of contexts. Based on the existing research, then, we could already expect well-implemented jigsaw learning to be effective in facilitating student learning.

Similarly, the authors tell the readers that the broader category of cooperative learning has been well established as successful,

"The benefits of cooperative learning have been well documented as being

higher academic achievement,

higher level of reasoning and critical thinking skills,

deeper understanding of learned material,

better attention and less disruptive behavior in class,

more motivation to learn and achieve,

positive attitudes to subject matter,

higher self-esteem and

higher social skills."

Tarhan et al., 2013, p.185

What is there not to like here? So, what was this highly effective teaching approach compared with?

What is being compared?

Tarhan and colleagues tell readers that:

"The experimental group was taught via jigsaw cooperative learning activities developed by the researchers and the control group was taught using the traditional science and technology curriculum."

Tarhan et al., 2013, p.189
A different curriculum?

This seems an unhelpful statement as it does not seem to compare like with like:


conditioncurriculumpedagogy
experimental?jigsaw cooperative learning activities developed by the researchers
control traditional science and technology curriculum?
A genuine experiment would look to control variables, so would not simultaneously vary both curriculum and pedagogy

The study uses a common test to compare learning in the two conditions, so the study only makes sense as an experimental test of jigsaw learning if the same curriculum is being followed in both conditions. Otherwise, there is no prima facie reason to think that the post-test is equally fair in testing what has been taught in the two conditions. 1

The control condition

The paper includes an account of the control condition which seems to make it clear that both groups were taught "the same content", which is helpful as to have done otherwise would have seriously undermined the study.

The control group was instructed via a teacher-centered didactic lecture format. Throughout the lesson, the same science and technology teacher presented the same content as for the experimental group to achieve the same learning objectives, which were taught via detailed instruction in the experimental group. This instruction included lectures, discussions and problem solving. During this process, the teacher used the blackboard and asked some questions related to the subject. Students also used a regular textbook. While the instructor explained the subject, the students listened to her and took notes. The instruction was accomplished in the same amount of time as for the experimental group.

Tarhan et al., 2013, p.194

So, it seems:


conditioncurriculumpedagogy
experimental[by inference: "traditional science and technology curriculum"]jigsaw cooperative learning activities developed by the researchers
control traditional science and technology curriculum
[the same content as for the experimental group to achieve the same learning objectives]
teacher-centred didactic lecture format:
instructor explained the subject and asked questions
controlled variableindependent variable
An experiment relies on control of variables and would not simultaneously vary both curriculum and pedagogy

The statement is helpful, but might be considered ambiguous as "this instruction which included lectures, discussions and problem solving" seems to relate to what had been "taught via detailed instruction in the experimental group".

But this seems incongruent with the wider textual context. The experimental group were taught by a jigsaw learning technique – not lectures, discussions and problem solving. Yet, for that matter, the experimental group were not taught via 'detailed instruction' if this means the teacher presenting the curriculum content. So, this phrasing seems unhelpfully confusing (to me, at least – presumably, the journal referees and editor thought this was clear enough.)

So, this probably means the "lectures, discussions and problem solving" were part of the control condition where "the teacher used the blackboard and asked some questions related to the subject. Students also used a regular textbook. While the instructor explained the subject, the students listened to her and took notes".

'Lectures' certainly fit with that description.

However, genuine 'discussion' work is a dialogic teaching method and would not seem to fit within a "teacher-centered didactic lecture format". But perhaps 'discussion' simply refers to how the "teacher used the blackboard and asked some questions" that members of the class were invited to answer?

Read about dialogic teaching

Writing-up research is a bit like teaching in that in presenting to a particular audience, one works with a mental model of what that audience already knowns and understands, and how they use specific terms, and this model is never likely to be perfectly accurate:

  • when teaching, the learners tend to let you know this, whereas,
  • when writing, this kind of immediate feedback is lacking.

Similarly, problem-solving would not seem to fit within a "teacher-centered didactic lecture format". 'Problem-solving' engages high level cognitive and metacognitive skills because a 'problem' is a task that students are not able to respond to simply by recalling what they have been told and applying learnt algorithms. Problem-solving requires planning and applying strategies to test out ideas and synthesise knowledge. Yet teachers and textbooks commonly refer to simple questions that simply test recall and comprehension, or direct application of learnt techniques, as 'problems' when they are better understood as 'exercises' as they do not pose authentic problems.

The imprecise use of terms that may be understood differently across diverse contexts is characteristic of educational discourse, so Tarhan and colleagues may have simply used the labels that are normally applied in the context where they are working. It should also be noted that as the researchers are based in Turkey they are presumably finding the best English translations they can for the terms used locally.

Read about the challenges of translation in research writing

So, it seems we have:


Experimental conditionin one of the conditions?Control condition
Jigsaw learning (set out in some detail in the paper) – an example of
cooperative learning – an active learning approach in which students work together in small groups
detailed instruction?
discussions (=teacher questioning?)
problem solving? (=practice exercises?)
teacher-centred didactic lecture format…the teacher used the blackboard and asked some questions…a regular textbook….the instructor explained the subject, the students listened and took notes
The independent variable – teaching methodology

The teacher variable

One of the major problems with some educational experiments comparing different teaching approaches is the confound of the teacher. If

  • class A is taught through approach 'a' by teacher 1, and
  • class B is taught through approach 'b' by teacher 2

then even if there is a good case that class A and class B start off as 'equivalent' in terms of readiness to learn about the focal topic then any differences in study outcomes could be as much down to different teachers (and we all know that different teachers are not equivalent!) as different teaching methodology.

At first sight this is easily solved by having the same teacher teach both classes (as in the study discussed here). That certainly seems to help. But, a little thought suggests it is not a foolproof approach (Taber, 2019).

Teachers inevitably have better rapport with some classes than others (even when those classes are shown to be technically 'equivalent') simply because that is the nature of how diverse personalities interact. 3 Even the most professional teachers find they prefer to teach some classes than others, enjoy the teaching more, and seem to get better results (even when the classes are supposed to be equivalent).

In an experiment, there is no reason why the teacher would work better with a class assigned the experimental condition; it might just as well be the control condition. However, this is still a confound and there is no obvious solution to this, except having multiple classes and teachers in each condition such that the statistics can offer guide on whether outcomes are sufficiently unlikely to be able to reasonable discount these types of effect.

Different teachers also have different styles and approaches and skills sets – so the same teacher will not be equally suited to every teaching approach and pedagogy. Again, this does not necessarily advantage the experimental condition, but, again, is something that can only be addressed by having a diverse range of teachers in each condition (Taber, 2019).

So, although we might expect having the same teacher teach both classes is the preferred approach, the same teacher is not exactly the same teacher in different classes or teaching in different ways.

And what do participants expect will happen?

Moreover, expectancy effects can be very influential in education. Expecting something to work, or not work, has been shown to have real effects on outcomes. It may not be true, as some motivational gurus like to pretend, that we can all of us achieve anything if only we believe: but we are more likely to be successful when we believe we can succeed. When confident, we tend to be more motivated, less easily deterred, and (given the human capacity for perceiving with confirmation bias) more likely to judge we are making good progress. So, any research design which communicates to teachers and students (directly, or through the teacher's or researcher's enthusiasm) an expectation of success in some innovation is more likely to lead to success. This is a potential confound that is not even readily addressed by having large numbers of classes and teachers (Taber, 2019)!

Read about expectancy effects

The authors report that

Before implementation of the study, all students and their families were informed about the aims of the study and the privacy of their personal information. Permission for their children attend the study was obtained from all families.

Tarhan et al., 2013, p.194

This is as it should be. School children are not data-fodder for researchers, and they should always be asked for, and give, voluntary informed consent when recruited to join a research project. However, researchers need to open and honest about their work, whilst also being careful about how they present their research aims. We can imagine a possible form of invitation,

We would like you to invite you to be part of a study where some of you will be subject to traditional learning through a teacher-centred didactic lecture format where the teacher will give you notes and ask you questions, and some of you will learn by a different approach that has been shown to enhance learning, promote positive attitudes and interests, develop communication skills, increase achievement, support higher level of reasoning and critical thinking skills, lead to deeper understanding of learned material…

An honest, but unhelpful, briefing for students and parents

If this was how the researchers understood the background to their study, then this would be a fair and honest briefing. Yet, this would clearly set up strong expectations in the student groups!

A suitable teacher

Tarhan and colleagues report that

"A teacher experienced in active learning was trained in how to implement the instruction based on jigsaw cooperative learning. The teacher and researchers discussed the instructional plans before implementing the activities."

Tarhan et al., 2013, p.189

So, the teacher who taught both classes, using an jigsaw cooperative learning in one class and a teacher-centred didactic lecture approach in the other was "experienced in active learning". So, it seems that

  • the researchers were already convinced that active learning approaches were far superior to teaching via a lecture approach
  • the teacher had experience in teaching though more engaging, effective student-centred active learning approaches

despite this, a control condition was set-up that required the teacher to, in effect, de-skill, and teach in a way the researchers were well aware research suggested was inferior, for the sake of carrying out an experiment to demonstrate in a specific context what had already been well demonstrated elsewhere.

In other words, it seems that one group of students were deliberately disadvantaged by asking an experienced and skilled teacher to teach in a way all concerned knew was sub-optimal, so as to provide a low base line that would be outperformed by the intervention, simply to replicate a much demonstrated finding. When seen in that way, this is surely unethical research.

The researchers may not have been consciously conceptualising their design in those terms, but it is hard to see this as a fair test of the jigsaw learning approach – it can show it is better than suboptimal teaching, but does not offer a comparison with an example of the kind of teaching that is recommended in the national context where the research took place.

Unethical, but not unusual

I am not seeking to pick out Tarhan and colleagues in particular for designing an unethical study, because they are not unique in adopting this approach (Taber, 2019): indeed, they are following a common formula (an experimental 'paradigm' in the sense the term is used in psychology).

Tarhan and colleagues have produced a study that is interesting and informative, and which seems well planned, and strongly-motivated when considered as part of tradition of such studies. Clearly, the referees and journal editor were not minded to question the procedure. The problem is that as a science education community we have allowed this tradition to continue such that a form of study that was originally genuinely open-ended (in that it examined under-researched teaching approaches of untested efficacy) has not been modified as published study after published study has slowly turned those untested teaching approaches into well-researched and repeatedly demonstrated approaches.

So much so, that such studies are now in danger of simply being rhetorical research – where (as in this case) the authors tell readers at the outset that it is already known that what they are going to test is widely shown to be effective good practice. Rhetorical research is set up to produce an expected result, and so is not authentic research. A real experiment tests a genuine hypothesis rather than demonstrates a commonplace. A question researchers might ask themselves could be

'how surprised would I be if this leads to a negative outcome'?

If the answer is

'that would be very surprising'

then they should consider modifying their research so it is likely to be more than minimally informative.

Finding out that jigsaw learning achieved learning objectives better/as well as/not so well as, say, P-O-E (predict-observe-explain) activities might be worth knowing: that it is better than deliberately constrained teaching does not tell us very much that is not obvious.

I do think this type of research design is highly questionable and takes unfair advantage of students. It fails to meet my suggested guideline that

  • researchers should never ask learners to participate in a study condition they have good reason to expect will damage their opportunities to learn

The problem of generalisation

Of course, one fair response is that despite all the claims of the superiority of constructivist, active, cooperatative (etc.) learning approaches, the diversity of educational contexts means we can not simply generalise from an experiment in one context and assume the results apply elsewhere.

Read about generalising from research

That is, the research literature shows us that jigsaw learning is an effective teaching approach, but we cannot be certain it will be effective in the particular context of teaching about chemical and physical changes to sixth grade students in a public elementary school in Izmir, Turkey.

Strictly that is true! But we should ask:

do we not know this because

  1. research shows a great variation in whether jigsaw learning is effective or not as it differs according to contexts and conditions
  2. although jigsaw learning has consistently been shown to be effective in many different contexts, no one has yet tested it in the specific case of teaching about chemical and physical changes to sixth grade students in a public elementary school in Izmir, Turkey

It seems clear from the paper that the researchers are presenting the second case (in which case the study would actually be of more interest and importance if had been found that in this context jigsaw learning was not effective).

Given there are very good reasons to expect a positive outcome, there seems no need to 'stack the odds' by using deliberately detrimental control conditions.

Even had situation 1 applied, it seems of limited value to know that jigsaw learning is more effective (in teaching about chemical and physical changes to sixth grade students in a public elementary school in Izmir, Turkey) than an approach we already recognise is suboptimal.

An ethical alternative

This does not mean that there is no value in research that explores well-established teaching approaches in new contexts. However, unless the context is very different from where the approach has already been widely demonstrated, there is little value in comparing it with approaches that are known to be sub-optimal (which in Turkey, a country where constructivist 'reform' teaching approaches are supposed to be the expected standard, seem to often be labelled as 'traditional').

Detailed case studies of the implementation of a reform pedagogy in new contexts that collect rich 'process' data to explore challenges to implementation and to identify especially effective specific practices would surely be more informative? 4

If researchers do feel the need to do experiments, then rather than comparing known-to-be-effective approaches with suboptimal approaches hoping to demonstrate what everyone already knows, why not use comparison conditions that really test the innovation. Of course jigsaw learning out performed lecturing in an elementary school – but how might it have compared with another constructivist approach?

I have described the constructivist science teacher as a kind of learning doctor. Like medical doctors, our first tenet should be to do no harm. So, if researchers want to set up experimental comparisons, they have a duty to try to set up two different approaches that they believe are likely to benefit the learners (whichever condition they are assigned to):

  • not one condition that advantages one group of students
  • and another which deliberately disadvantages another group of students for the benefit of a 'positive' research outcome.

If you already know the outcome then it is not genuine research – and you need a better research question.


Work cited:

Note:

1 Imagine teaching one class about acids by jigsaw learning, and teaching another about the nervous system by some other pedagogy – and then comparing the pedagogies by administering a test – about acids! The class in the jigsaw condition might well do better, without it being reasonable to assume this reflects more effective pedagogy.

So, I am tempted to read this as simply a drafting/typographical error that has been missed, and suspect the authors intended to refer to something like the traditional approach to teaching the science and technology curriculum. Otherwise the experiment is fatally flawed.

Yet, one purpose of the study was to find out

"Does jigsaw cooperative learning instruction contribute to a better conceptual understanding of 'physical and chemical changes' in sixth grade students compared to the traditional science and technology curriculum?"

Tarhan et al., 2013, p.187

This reads as if the researchers felt the curriculum was not sufficiently matched to what they felt were the most important learning objectives in the topic of physical and chemical changes, so they have undertaken some curriculum development, as well as designed a teaching unit accordingly, to be taught by jigsaw learning pedagogy. If so the experiment is testing

traditional curriculum x traditional pedagogy

vs.

reformed curriculum x innovative pedagogy

making it impossible to disentangle the two components.

This suggests the researchers are testing the combination of curriculum and pedagogy, and doing so with a test biased towards the experimental condition. This seems illogical, but I have actually worked in a project where we faced a similar dilemma. In the epiSTEMe project we designed innovative teaching units for lower secondary science and maths. In both physics units we incorporated innovative aspects to the curriculum.

  • In the forces unit material on proportionality was introduced, with examples (car stopping distance) normally not taught at that grade level (Y7);
  • In the electricity unit the normal physics content was embedded in an approach designed to teach aspects of the nature of science.

In the forces unit, the end-of-topic test included material that was included in the project-designed units, but unlikely to be taught in the control classes. There was evidence that on average students in the project classes did better on the test.

In the electricity unit, the nature of science objectives were not tested as these would not necessarily have been included in teaching control classes. On average, there was very little difference in learning about electrical circuits in the two conditions. There was however a very wide range of class performances – oddly just as wide in the experimental condition (where all classes had a common scheme of work, common activities, and common learning materials) as in the control condition where teachers taught the topic in their customary ways.


2 It could be read either as


1

ControlExperimental
The control group was instructed via a teacher-centered didactic lecture format. Throughout the lesson, the same science and technology teacher presented the same content as for the experimental group to achieve the same learning objectives, which were taught via detailed instruction in the experimental group.
…detailed instruction in the experimental group. This instruction included lectures, discussions and problem solving.
During this process, the teacher used the blackboard and asked some questions related to the subject. Students also used a regular textbook. While the instructor explained the subject, the students listened to her and took notes. The instruction was accomplished in the same amount of time as for the experimental group.
What was 'this instruction' which included lectures, discussions and problem solving?

or


2

ControlExperimental
The control group was instructed via a teacher-centered didactic lecture format. Throughout the lesson, the same science and technology teacher presented the same content as for the experimental group to achieve the same learning objectives, which were taught via detailed instruction in the experimental group.
…detailed instruction in the experimental group.
This [sic] instruction included lectures, discussions and problem solving. During this process, the teacher used the blackboard and asked some questions related to the subject. Students also used a regular textbook. While the instructor explained the subject, the students listened to her and took notes. The instruction was accomplished in the same amount of time as for the experimental group.
What was 'this instruction' which included lectures, discussions and problem solving?

3 A class, of course, is not a person, but a collection of people, so perhaps does not have a 'personality' as such. However, for teachers, classes do take on something akin to a personality.

This is not just an impression. It was pointed out above that if a researcher wants to treat each learner as a unit of analysis (necessary to use inferential statistics when only working with a small number of classes) then learners, not intact classes, should be assigned to conditions. However, even a newly formed class will soon develop something akin to a personality. This will certainly be influenced by individual learners present but develop through the history of their evolving mutual interactions and is not just a function of the sum of their individual characteristics.

So, even when a class is formed by random assignment of learners at the start of a study, it is still strictly questionable whether these students should be seen as independent units for analysis (Taber, 2019).


4 I suspect that science educators have a justified high regard for experimental method in the natural sciences, which sometimes blinkers us to its limitations in social contexts where there are myriad interacting variables and limited controls.

Read: Why do natural scientists tend to make poor social scientists?


Fingerprinting an exoplanet

Life, death, and multiple Gaias


Keith S. Taber


NASA might be said to be engaged in looking for other Gaias beyond our Gaia, as Dr Milam explained to another Gaia.

This post is somewhat poignant as something I heard on a radio podcast reminded me how science has recently lost one of its great characters, as well as an example of that most rare thing in today's science – the independent scientist.


Inside Science episode "Deep Space and the Deep Sea – 40 years of the International Whaling Moratorium", presented, perhaps especially aptly, by Gaia Vince

I was listening to the BBC's Inside Science pod-cast episode 'Deep Space and the Deep Sea – 40 years of the International Whaling Moratorium' where the presenter – somewhat ironically, in view of the connection I was making, Gaia Vince – was talking to Dr Stefanie Milam of Nasa's Goddard Space Flight Centre about how the recently launched James Webb Space Telescope could help scientists look for signs of life on other planets.


From: https://jwst.nasa.gov/content/meetTheTeam/people/milam.html

Dr Milam explained that

"spectra…give us all the information that we really need to understand a given environment. And that's one of the amazing parts about the James Webb space telescope. So, what we have access to with the wavelengths that the James Webb space telescope actually operates at, is that we have the fingerprint pattern of given molecules, things like water, carbon monoxide, carbon dioxide, all these things that we find in our own atmosphere, and so by using the infrared wavelengths we can look for these key ingredients in atmospheres around other planets or even, actually, objects in our own solar system, and that tells us a little bit about what is going on as far as the dynamics of that planet, whether or not its has got geological activity, or maybe even something as crazy as biology."

Dr Stefanie Milam, interviewed for 'Inside Science'
"Webb has captured the first clear evidence of carbon dioxide (CO2) in the atmosphere of a planet outside of our solar system!" (Hot Gas Giant Exoplanet WASP-39 b Transit Light Curve, NIRSpec Bright Object Time-Series Spectroscopy.)
Image: NASA, ESA, CSA, and L. Hustak (STScI). Released under 2.0 Generic (CC BY 2.0) License – Some rights reserved by James Webb Space Telescope
Do molecules have fingerprints

Fingerprints have long been used in forensic work to identify criminals (and sometimes their victims) because our fingerprints are pretty unique. Even 'identical' twins do not have identical fingerprints (thought I suspect that fact rather undermines some crime fiction plots). But, to have fingerprints one surely has to have fingers. A palm print requires a palm, and a footprint, a foot. So, can molecules, not known for their manual dexterity, have fingerprints?

Well, it is not exactly by coincidence (as the James Webb space telescope has had a lot of media attention) that I very recently posted here, in the context of new observations of the early Universe, that

"Spectroscopic analysis allows us to compare the pattern of redshifted spectral lines due to the presence of elements absorbing or emitting radiation, with the position of those lines as they are found without any shift. Each element has its own pattern of lines – providing a metaphorical fingerprint.

from: A hundred percent conclusive science. Estimation and certainty in Maisie's galaxy

In chemistry, elements and compounds have unique patterns of energy transitions which can be identified through spectroscopy. So, we have 'metaphorical fingerprints'. To describe a spectrum as a chemical substance's (or entity's, such as an ion's) fingerprint is to use a metaphor. It is not actually a fingerprint – there are no fingers to leave prints – but this figure of speech gets across an idea though an implicit comparison with something already familiar. *1 That is, it is a way of making the unfamiliar familiar (which might be seen as a description of teaching!)

Dead metaphors

But perhaps this has become a 'dead metaphor' so that now chemicals do have fingerprints? One of the main ways that language develops is by words changing their meanings over time as metaphors become so commonly used they case to be metaphorical.

For example, I understand the term electrical charge is a dead metaphor. When electrical charge was first being explored and was still unfamiliar, the term 'charge' was adopted by comparison with the charging of a canon or the charge of shot used in a shotgun. The shot charge refers to the weight of shot included in a cartridge. Today, most people would not know that, whilst being very familiar with the idea of electrical charge. But when the term electrical charge was first used most people knew about charging guns.

So, initially, electrical 'charge' was a metaphor to refer to the amount of 'electricity' – which made use of a familiar comparison. Now it is a dead metaphor, and 'electrical charge' is now considered a technical tern in its own right.

Another example might be electron spin: electrons do not spin in the familiar sense, but really do (now) have spin as the term has been extended to apply to quanticles with inherent angular momentum by analogy with more familiar macroscopic objects that have angular momentum when they are physically rotating. So, we might say that when the term was first used, it was a metaphor, but no longer. (That is, physicists have expanded the range of convenience of the term spin.)

Perhaps, similarly, fingerprint is now so commonly used to mean a unique identifier in a wide range of contexts, that it should no longer be considered a metaphor. I am not sure if that is so, yet, but perhaps it will be in, say, a century's time – and the term will be broadly used without people even noticing that many things have acquired fingerprints without having fingers. (A spectrum will then actually be a chemical substance's or entity's fingerprint.) After all, many words we now commonly use contain fossils of their origins without us noticing. That is, metaphorical fossils, of course. *2

James Lovelock, R.I.P.

The reason I found this news item somewhat poignant was that I was listening to it just a matter of weeks after the death (at age 103) of the scientist Jim Lovelock. *3 Lovelock invented the device which was able to demonstrate the ubiquity of chlorofluorocarbons (CFCs) in the atmosphere. These substances were very commonly used as refrigerants and aerosol propellants as they were very stable, and being un-reactive (so non-toxic) were considered safe.

But this very stability allowed them to remain in and spread through the atmosphere for a very long time until they were broken down in the stratosphere by ultraviolet radiation to give radicals that reacted with the ozone that is so protective of living organisms. Free radical reactions can occur as chain reactions as when a radical interacts with a molecule it leads to a new molecule, plus a new radical which can often take part in a further interaction with another molecule: so, each CFC molecule could lead to the destruction of many ozone molecules. CFCs have now been banned for most purposes to protect the ozone 'layer', and so us.

Life is chemistry out of balance

But another of Lovelock's achievements came when working for NASA to develop means to search for life elsewhere in the universe. As part of the Mariner missions, NASA wanted Lovelock to design apparatus that could be sent to other worlds and search for life (and I think he did help do that), but Lovelock pointed out that one could tell if a planet had life by a spectroscopic analysis.

Any alien species analysing light passing through earth's atmosphere would see its composition was far from chemical equilibrium due to the ongoing activity of its biota. (If life were to cease on earth today, the oxygen content of the atmosphere would very quickly fall from 21% to virtually none at all as oxygen reacts with rocks and other materials.) If the composition of an atmosphere seemed to be in chemical equilibrium, then it was unlikely there was life. However, if there were high concentrations of gases that should react together or with the surface, then something, likely life, must be actively maintaining that combination of gases in the atmosphere.

"Living systems maintain themselves in a state of relatively low entropy at the expense of their nonliving environments. We may assume that this general property is common to all life in the solar system. On this assumption, evidence of a large chemical free energy gradient between surface matter and the atmosphere in contact with it is evidence of life. Furthermore, any planetary biota which interacts with its atmosphere will drive that atmosphere to a state of disequilibrium which, if recognized, would also constitute direct evidence of life, provided the extent of the disequilibrium is significantly greater than abiological processes would permit. It is shown that the existence of life on Earth can be inferred from knowledge of the major and trace components of the atmosphere, even in the absence of any knowledge of the nature or extent of the dominant life forms. Knowledge of the composition of the Martian atmosphere may similarly reveal the presence of life there."

Dian R. Hitchcock and James E. Lovelock – from Lovelock's website (originally published in Icarus: International Journal of the Solar System in 1967)

The story was that NASA did not really want to be told they did not need to send missions with spacecraft to other words such as Mars to look for life, rather that they only had to point a telescope and analyse the spectrum of radiation. Ironically, perhaps, then, that is exactly what they are now doing with planets around other star systems where it is not feasible (not now, perhaps not ever) to send missions.

Gaia and Gaia

But Lovelock became best known for his development and championing of the Gaia theory. According to Gaia (the theory, not the journalist), the development of life on earth has shaped the environment (and not just exploited pre-existing niches) and developed as a huge integrated and interacting system (the biota, but also the seas, the atmosphere, freshwater, the soil,…) such that large scale changes in one part of the system have knock-on effect elsewhere. *4

So, Gaia can be understood not as the whole earth as a planet, or just the biota as the collective life in terms of organisms, but rather as the dynamic system of life of earth and the environment it interacts with. In a sense (and it is important to see this is meant as an analogy, a thinking tool) Gaia is like some supra-organism. Just as snail has a shell that it has produced for its self, Gaia has shaped the biosphere where the biota lives. *4

The system has built in feedback cycles to protect it from perturbations (not by chance, or due to some mysterious power, but due to natural selection) but if it is subject to a large enough input it would shift to a new (and perhaps very different) equilibrium state. *5 This certainly happened when oxygen releasing organisms evolved: the earth today is inhospitable to the organisms that lived here before that event (some survived to leave descendants, but only in places away from the high oxygen concentrations, such as in lower lays of mud beneath the sea), and most organisms alive today would die very quickly in the previous conditions.

It would be nice to think that Gaia, the science journalist that is, was named after the Gaia theory – but Lovelock only started publishing about his Gaia hypothesis about the time that Gaia was born.*6 So, probably not. Gaia is a traditional girl's name, and was the name of the Greek goddess who personified the earth (which is why the name was adopted by Lovelock).

Still, it was poignant to hear a NASA scientist referring to the current value of a method first pointed out by Lovelock when advising NASA in the 1970s and informed by his early thinking about the Gaia hypothesis. NASA might be said to now be engaged in looking for other Gaias on worlds outside our own solar system, as Dr Milam explained to – another – Gaia here on earth.


Notes:

*1 It is an implicit comparison, because the listener/reader is left to appreciate that it is meant as a figure of speech: unlike in a simile ('a spectrum is like a fingerprint') where the comparison is made explicit .


*2 For some years I had a pager (common before mobile phones) – a small electronic device which could receive a text message, so that my wife could contact me in an emergency if I was out visiting schools by phoning a message to be conveyed by a radio signal. If I had been asked why it was called a pager, I would have assumed that each message of text was considered to comprise a 'page'.

However, a few weeks ago I watched an old 'screwball comedy' being shown on television: 'My favourite wife' (or 'My favorite [sic] wife' in US release).

(On the very day that Cary Grant remarries after having his first wife, long missing after being lost at sea, declared legally dead, wife number one reappears having been rescued from a desert island. That this is a very unlikely scenario was played upon when the film was remade in colour, as 'Move Over Darling', with Doris Day and James Garner. The returned first wife, pretending to be a nurse, asks the new wife if she is not afraid the original wife would reappear, as happened in that movie; eliciting the response: 'Movies. When do movies ever reflect real life?')

Some of the action takes place in the honeymoon hotel where groom has disappeared from the suite (these are wealthy people!) having been tracked down by his first wife. The new wife asks the hotel to page him – and this is how that worked with pre-electronic technology:

Paging Mr Arden: Still from 'My Favorite Wife'

*3 So, although I knew Lovelock had died (July 26th), he was still alive at the time of the original broadcast (July 14th). In part, my tardiness comes from the publicly funded BBC's decisions to no longer make available downloads of some of its programmes for iPods and similar devices immediately after broadcast. (This downgrading of the BBC's service to the public seems to be to persuade people to use its own streaming service.)


*4 The Gaia theory developed by Lovelock and Lyn Margulis includes ideas that were discussed by Vladimir Vernadsky almost a century ago. Although Vernadsky's work was well known in scientific circles in the Soviet Union, it did not become known to scientists in Western Europe till much later. Vernadsky used the term 'biosphere' to refer to those 'layers' of the earth (lower atmosphere to outer crust) where life existed.


*5 A perturbation such as as extensive deforestation perhaps, or certainly increasing the atmospheric concentrations of 'greenhouse' gases beyond a certain point.


*6 Described as a hypothesis originally, it has been extensibility developed and would seem to now qualify as a theory (a "consistent, comprehensive, coherent and extensively evidenced explanation of aspects of the natural world") today.

A case study of educational innovation?

Design and Assessment of an Online Prelab Model in General Chemistry


Keith S. Taber


Case study is meant to be naturalistic – whereas innovation sounds like an intervention. But interventions can be the focus of naturalistic enquiry.

One of the downsides of having spent years teaching research methods is that one cannot help but notice how so much published research departs from the ideal models one offers to students. (Which might be seen as a polite way of saying authors often seem to get key things wrong.) I used to teach that how one labelled one's research was less important than how well one explained it. That is, different people would have somewhat different takes on what is, or is not, grounded theory, case study or action research, but as long as an author explained what they had done, and could adequately justify why, the choice of label for the methodology was of secondary importance.

A science teacher can appreciate this: a student who tells the teacher they are doing a distillation when they are actually carrying out reflux – but clearly explains what they are doing and why, will still be understood (even if the error should be pointed out). On the other hand if a student has the right label but an alternative conception this is likely to be a more problematic 'bug' in the teaching-learning system. 1

That said, each type of research strategy has its own particular weaknesses and strengths so describing something as an experiment, or a case study, if it did not actually share the essential characteristics of that strategy, can mislead the reader – and sometimes even mislead the authors such that invalid conclusions are drawn.

A 'case study', that really is a case study

I made reference above to action research, grounded theory, and case study – three methodologies which are commonly name-checked in education research. There are a vast number of papers in the literature with one of these terms in the title, and a good many of them do not report work that clearly fits the claimed approach! 2


The case study was published in the Journal for the Research Center for Educational Technology

So, I was pleased to read an interesting example of a 'case study' that I felt really was a case study (Llorens-Molina, 2009). 'Design and assessment of an online prelab model in general chemistry: A case study' offered a good example of a case study. Although, I suspect some other authors might have been tempted to describe this research differently.

Is it a bird, is it a plane; no it's…

Llorens-Molina's study included an experimental aspect. A cohort of learners was divided into two groups to allow the researcher to compare two different educational treatments; then, measurements were made to compare outcomes quantitatively. That might sound like an experiment. Moreover, this study reported an attempt to innovate in a teaching situation, which gives the work a flavour of action research. Despite this, I agree with Llorens-Molinathat that the work is best characterised as a case study.

Read about experiments

Read about action research


A case study focuses on 'one instance' from among many


What is a case study?

A case study is an in-depth examination of one instance: one example – of something for which there are many examples. The focus of a case study might be one learner, one teacher, one group of students working together on a task, one class, one school, one course, one examination paper, one text book, one laboratory session, one lesson, one enrichment programme… So, there is great variety in what kind of entity a case study is a study of, but what case studies have in common is they each focus in detail on that one instance.

Read about case study methodology


Characteristics of case study

Characteristics of case study

Case studies are naturalistic studies, which means they are studies of things as they are, not attempts to change things. The case has to be bounded (a reader of a case study learns what is in the case and what is not) but tends to be embedded in a wider context that impacts upon it. That is, the case is entangled in a context from which it could not easily be extracted and still be the same case. (Imagine moving a teacher with her class from their school to have their lesson in a university where it could be observed by researchers – it would not be 'the same lesson' as would have occurred in situ).

The case study is reported in detail, often in a narrative form (not just statistical summaries) – what is sometimes called 'thick description'. Usually several 'slices' of data are collected – often different kinds of data – and often there is a process of 'triangulation' to check the consistency of the account presented in relation to the different slices of data available. Although case studies can include analysis of quantitative data, they are usually seen as interpretive as the richness of data available usually reflects complexity and invites nuance.



Design and Assessment of an Online Prelab Model in General Chemistry

Llorens-Molina's study explored the use of prelabs that are "used to introduce and contextualize laboratory work in learning chemistry" (p.15), and in particular "an alternative prelab model, which consists of an audiovisual tutorial associated with an online test" (p.15).

An innovation

The research investigated an innovation in teaching practice,

"In our habitual practice, a previous lecture at the beginning of each laboratory session, focused almost exclusively on the operational issues, was used. From our teaching experience, we can state that this sort of introductory activity contributes to a "cookbook" way to carry out the laboratory tasks. Furthermore, the lecture takes up valuable time (about half an hour) of each ordinary two-hour session. Given this set-up, the main goal of this research was to design and assess an alternative prelab model, which was designed to enhance the abilities and skills related to an inquiry-type learning environment. Likewise, it would have to allow us to save a significant amount of time in laboratory sessions due to its online nature….

a prelab activity developed …consists of two parts…a digital video recording about a brief tutorial lecture, supported by a slide presentation…[followed by ] an online multiple choice test"

Llorens-Molina, 2009, p.16-17
Not action research?

The reference to shifting "our habitual practice" indicates this study reports practitioner research. Practitioner studies, such as this, that test a new innovation are often labelled by authors as 'action research'. (Indeed, sometimes, the fact that research is carried out by practitioners looking to improve their own practice is seen as sufficient for action research: when actually this is a necessary, but not a sufficient condition.)

Genuine action research aims at improving practice, not simply seeing if a specific innovation is working. This means action research has an open-ended design, and is cyclical – with iterations of an innovation tested and the outcomes used as feedback to inform changes in the innovation. (Despite this, a surprising number of published studies labelled as action research lack any cyclic element, simply reporting one iteration of a innovation.) Llorens-Molina's study does not have a cyclic design, so would not be well-characterised as action research.

An experimental design?

Llorens-Molina reports that the study was motivated by three hypotheses (p.16):

  • "Substituting an initial lecture by an online prelab to save time during laboratory sessions will not have negative repercussions in final examination marks.
  • The suggested online prelab model will improve student autonomy and prerequisite knowledge levels during laboratory work. This can be checked by analyzing the types and quantity of SGQ [student generated questions].
  • Student self-perceptions about prelab activities will be more favourable than those of usual lecture methods."

To test these hypotheses the student cohort was divided into two groups, to be split between the customary and innovative approach. This seems very much like an experiment.

It may be useful here to make a discrimination between two levels of research design – methodology (akin to strategy) and techniques (akin to tactics). In research design, a methodology is chosen to meet the overall aims of the study, and then one or more research techniques are selected consistent with that methodology (Taber, 2013). Experimental techniques may be included in a range of methodologies, but experiment as an overall methodology has some specific features.

Read about Research design

In a true experiment there is random assignment to conditions, and often there is an intention to generalise results to a wider population considered to be sampled in the study. Llorens-Molina reports that although inferential statistics were used to test the hypotheses, there was no intention to offer statistical generalisation beyond the case. The cohort of students was not assumed to be a sample representing some wider population (such as, say, undergraduates on chemistry courses in Spain) – and, indeed, clearly such an assumption would not have been justified.

Case study is naturalistic – but an innovation is an intervention in practice…

Case study is said to be naturalistic research – it is a method used to understand and explore things as they are, not to bring about change. Yet, here the focus is an innovation. That seems a contradiction. It would be a contradiction if the study was being carried out by external researchers who had asked the teaching team to change practice for the benefits of their study. However, here it is useful to separate out the two roles of teacher and researcher.

This is a situation that I commonly faced when advising graduates preparing for school teaching who were required to carry out a classroom based study into an aspect of their school placement practice context as part of their university qualification (the Post-Graduate Certificate in Education, P.G.C.E.). Many of these graduates were unfamiliar with research into social phenomena. Science graduates often brought a model of what worked in the laboratory to their thinking about their projects – and had a tendency to think that transferring the experimental approach to classrooms (where there are usually a large number of potentially relevant variables, many of which can not be controlled) would be straightforward.

Read 'Why do natural scientists tend to make poor social scientists?'

The Cambridge P.G.C.E. teaching team put into place a range of supports to introduce graduate preparing for teaching to the kinds of education research useful for teachers who want to evaluate and improve their own teaching. This included a book written to introduce classroom-based research that drew heavily on analysis of published studies (Taber, 2007; 2013). Part of our advice was that those new to this kind of enquiry might want to consider action research and case study as suitable options for their small-scale projects.


Useful strategies for the novice practitioner-researcher (Figure: diagram used in working with graduates preparing for teaching, from Taber, 2010)

Simplistically, action research might be considered best suited to a project to test an innovation or address a problem (e.g., evaluating a new teaching resource; responding to behavioural issues), and case study best suited to an exploratory study (e.g., what do Y9 students understand about photosynthesis?; what is the nature of peer dialogue during laboratory working in this class?) However, it was often difficult for the graduates to carry out authentic action research as the constraints of the school-based placements seldom allowed them to test successive iterations of the same intervention until they found something like an optimal specification.

Yet, they often were in a good position to undertake a detailed study of one iteration, collecting a range of different data, and so producing a detailed evaluation. That sounds like a case study.

Case study is supposed to be naturalistic – whereas innovation sounds like an intervention. But some interventions in practice can be considered the focus of naturalistic enquiry. My argument was that when a teacher changes the way they do something to try and solve a problem, or simply to find a better way to work, that is a 'natural' part of professional practice. The teacher-researcher, as researcher, is exploring something the fully professional teacher does as matter of course – seek to develop practice. After all, our graduates were being asked to undertake research to give them the skills expected to meet professional teaching standards, which

"clearly requires the teacher to have both the procedural knowledge to undertake small-scale classroom enquiry, and 'conceptual frameworks' for thinking about teaching and learning that can provide the basis for evaluating their teaching. In other words, the professional teacher needs both the ability to do her own research and knowledge of what existing research suggests"

Taber, 2013, p.8

So, the research is on something that is naturally occurring in the classroom context, rather than an intervention imported into the context in order to answer an external researcher's questions. A case study of an intervention introduced by practitioners themselves can be naturalistic – even if the person implementing the change is the researcher as well as the teacher.


If a teacher-researcher (qua researcher) wishes to enquire into an innovation introduced by the teacher-researcher (qua teacher) then this can be considered as naturalistic enquiry


The case and the context

In Llorens-Molina's study, the case was a sequence of laboratory activities carried out by a cohort of undergraduates undertaking a course of General and Organic Chemistry as part of an Agricultural Engineering programme. So, the case was bounded (the laboratory part of one taught course) and embedded in a wider context – a degree programme in a specific institution in Spain: the Polytechnic University of Valencia.

The primary purpose of the study was to find out about the specific innovation in the particular course that provided the case. This was then what is known as an intrinsic case study. (When a case is studied primarily as an example of a class of cases, rather than primarily for its own interest, it is called an instrumental case study).

Llorens-Molina recognised that what was found in this specific case, in its particular context, could not be assumed to apply more widely. There can be no statistical generalisation to other courses elsewhere. In case study, the intention is to offer sufficient detail of the case for readers to make judgements of the likely relevance to other context of interest (so-called 'reader generalisation').

The published report gives a good deal of information about the course as well as much information about how data was collected, and equally important, analysed.

Different slices of data

Case study often uses a range of data sources to develop a rounded picture of the case. In this study the identification of three specific hypotheses (less usual in case studies, which often have more open-ended research questions) led to the collection of three different types of data.

  • Students were assessed on each of six laboratory activities. A comparison was made between the prelab condition and the existing approach.
  • Questions asked by students in the laboratories were recorded and analysed to see if the quality/nature of such questions was different in the two conditions. A sophisticated approach was developed to analyse the questions.
  • Students were asked to rate the prelabs through responding to items on a questionnaire.

This approach allowed the author to go beyond simply reporting whether hypotheses were supported by the analysis, to offer a more nuanced discussion around each feature. Such nuance is not only more informative to the reader of a case study, but reflects how the researcher, as practitioner, has an ongoing commitment to further develop practice and not see the study as an end in itself.

Avoiding the 'equivalence' and the 'misuse of control groups' problems

I particularly appreciate a feature of the research design that many educational studies that claim to be experiments could benefit from. To test his hypotheses Llorens-Molina employed two conditions or treatments, the innovation and a comparison condition, and divided the cohort: "A group with 21 students was split into two subgroups, with 10 and 11 in each one, respectively". Llorens-Molina does not suggest this was based on random assignment, which is necessary for a 'true' experiment.

In many such quasi-experiments (where randomisation to condition is not carried out, and is indeed often not possible) the researchers seek to offer evidence of equivalence before the treatments occur. After all, if the two subgroups are different in terms of past subject attainment or motivation or some other relevant factor (or, indeed, if there is no information to allow a judgement regarding whether this is the case or not), no inferences about an intervention can be drawn from any measured differences. (Although that does not always stop researchers from making such claims regardless: e.g., see Lack of control in educational research.)

Another problem is that if learners are participating in research but are assigned to a control or comparison condition then it could be asked if they are just being used as 'data fodder', and would that be fair to them? This is especially so in those cases (so, not this one) where researchers require that the comparison condition is educationally deficient – many published studies report a control condition where schools students have effectively been lectured to, and no discussion work, group work, practical work, digital resources, et cetera, have been allowed, in order to ensure a stark contrast with whatever supposedly innovative pedagogy or resource is being evaluated (Taber, 2019).

These issues are addressed in research designs which have a compensatory structure – in effect the groups switch between being the experimental and comparison condition – as here:

"Both groups carried out the alternative prelab and the previous lecture (traditional practice), alternately. In this way, each subgroup carried out the same number of laboratory activities with either a prelab and previous lecture"

Llorens-Molina, 2009, p.19

This is good practice both from methodological and ethical considerations.


The study used a compensatory design which avoids the need to ensure both groups are equivalent at the start, and does not disadvantage one group. (Figure from Llorens-Molina, 2009, p.22 – published under a creative commons Attribution-NonCommercial-NoDerivs 3.0 United States license allowing redistribution with attribution)

A case of case study

Do I think this is a model case study that perfectly exemplifies all the claimed characteristics of the methodology? No, and very few studies do. Real research projects, often undertaken in complex contexts with limited resources and intractable constraints, seldom fit such ideal models.

However, unlike some studies labelled as case studies, this study has an explicit bounded case and has been carried out in the spirit of case study that highlights and values the intrinsic worth of individual cases. There is a good deal of detail about aspects of the case. It is in essence a case study, and (unlike what sometimes seems to be the case [sic]) not just called a case study for want of a methodological label. Most educational research studies examine one particular case of something – but (and I do not think this is always appreciated) that does not automatically make them case studies. Because it has been both conceptualised and operationalised as a case study, Llorens-Molina's study is a coherent piece of research.

Given how, in these pages, I have often been motivated to call out studies I have read that I consider have major problems – major enough to be sufficient to undermine the argument for the claimed conclusions of the research – I wanted to recognise a piece of research that I felt offered much to admire.


Work cited:

Notes:

1 I am using language here reflecting a perspective on teaching as being based on a model (whether explicit or not) in the teacher's mind of the learners' current knowledge and understanding and how this will respond to teaching. That expects a great deal of the teacher, so there are often bugs in the system (e.g., the teacher over-estimates prior knowledge) that need to be addressed. This is why being a teacher involves being something of a 'learning doctor'.

Read about the learning doctor perspective on teaching


2 I used to teach sessions introducing each of these methodologies when I taught on an Educational Research course. One of the class activities was to examine published papers claiming the focal methodology, asking students to see if studies matched the supposed characteristics of the strategy. This was a course with students undertaking a very diverse range of research projects, and I encouraged them to apply the analysis to papers selected because they were of particular interest and relevance to to their own work. Many examples selected by students proved to offer poor match between claimed methodology and the actual research design of ther study!

Lack of control in educational research

Getting that sinking feeling on reading published studies


Keith S. Taber


this is like finding that, after a period of watering plant A, it is taller than plant B – when you did not think to check how tall the two plants were before you started watering plant A

Research on prelabs

I was looking for studies which explored the effectiveness of 'prelabs', activities which students are given before entering the laboratory to make sure they are prepared for practical work, and can therefore use their time effectively in the lab. There is much research suggesting that students often learn little from science practical work, in part because of cognitive overload – that is, learners can be so occupied with dealing with the apparatus and materials they have little capacity left to think about the purpose and significance of the work. 1


Okay, so is THIS the pipette?
(Image by PublicDomainPictures from Pixabay)

Approaching a practical work session having already spent time engaging with its purpose and associated theories/models, and already having become familiar with the processes to be followed, should mean students enter the laboratory much better prepared to use their time efficiently, and much better informed to reflect on the wider theoretical context of the work.

I found a Swedish paper (Winberg & Berg, 2007) reporting a pair of studies that tested this idea by using a simulation as a prelab activity for undergraduates about to engage with an acid-base titration. The researchers tested this innovation by comparisons between students who completed the prelab before the titration, and those who did not.

The work used two basic measures:

  • types (sophistication) of questions asked by students during the lab. session
  • elicitation of knowledge in interviews after the laboratory activity

The authors found some differences (between those who had completed the prelab and those that had not) in the sophistication of the questions students asked, and in the quality of the knowledge elicited. They used inferential statistics to suggest at least some of the differences found were statistically significant. From my reading of the paper, these claims were not justified.

A peer reviewed journal (no, really, this time)

This is a paper in a well respected journal (not one of the predatory journals I have often discussed on this site). The Journal of Research in Science Teaching is published by Wiley (a major respected publisher of academic material) and is the official journal of NARST (which used to stand for the National Association for Research in Science Teaching – where 'national' referred to the USA 2). This is a journal that does take peer review very seriously.

The paper is well-written and well-structured. Winberg and Berg set out a conceptual framework for the research that includes a discussion of previous relevant studies. They adopt a theoretical framework based on the Perry's model of intellectual development (Taber, 2020). There is considerable detail of how data was collected and analysed. This account is well-argued. (But, you, dear reader, can surely sense a 'but' coming.)

Experimental research into experimental work?

The authors do not seem to explicitly describe their research as an experiment as such (as opposed to adopting some other kind of research strategy such as survey or case study), but the word 'experiment' and variations of it appear in the paper.

For one thing, the authors refer to students' practical work as being experiments,

"Laboratory exercises, especially in higher education contexts, often involve training in several different manipulative skills as well as a high information flow, such as from manuals, instructors, output from the experimental equipment, and so forth. If students do not have prior experiences that help them to sort out significant information or reduce the cognitive effort required to understand what is happening in the experiment, they tend to rely on working strategies that help them simply to cope with the situation; for example, focusing only on issues that are of immediate importance to obtain data for later analysis and reflective thought…"

Winberg & Berg, 2007

Now, some student practical work is experimental, where a student is actively looking to see what happens when they manipulate some variable to test a hypothesis. This type of practical work is sometimes labelled enquiry (or inquiry in US spelling). But a lot of school and university laboratory work, however, is undertaken to learn techniques, or (probably more often) to support the learning of taught theory – where it is usually important the learners know what is meant to happen before they begin the laboratory activity.

Winberg and Berg refer to the 'laboratory exercise' as 'the experiment' as though any laboratory work counts as an experiment. In Winberg and Berg's research, students were asked about their "own [titration] experiment", despite the prelab material involving a simulation of the titration process, in advance of which "the theoretical concepts, ideas, and procedures addressed in the simulation exercise had been treated mainly quantitatively during the preceding 1-week instructional sequence". So, the laboratory titration exercise does not seem to be an experiment in the scientific sense of the term.

School children commonly describe all practical work in the lab as 'doing experiments'. It cannot help students learn what an experiment really is when the word 'experiment' has two quite distinct meanings in the science classroom:

We might describe this second meaning as an alternative conception of 'experiment', a way of understanding that is inconsistent with the scientific meaning. (Just as there are common alternative conceptions of other 'nature of science' concepts such as 'theory').

I would imagine Winberg and Berg were well aware of what an experiment is, although their casual use of language might suggest a lack of rigour in thinking with the term. They refer to having "both control and experiment groups" in their studies, and refer to "the experimental chronology" of their research design. So, they certainly seem to think of their work as a kind of experiment.

Experimental design

In a true experiment, a sample is randomly drawn from a population of interest (say, first year undergraduate chemistry students; or, perhaps, first year undergraduate chemistry students attending Swedish Universities, or… 3) and assigned randomly to the conditions being compared. Providing a genuine form of random assignment is used, then inferential statistical tests can guide on whether any differences found between groups at the end of an experiment should be considered statistically significant. 4

"Statistics can only indicate how likely a measured result would occur by chance (as randomisation of units of analysis to different treatments can only make uneven group composition unlikely, not impossible)…Randomisation cannot ensure equivalence between groups (even if it makes any imbalance just as likely to advantage either condition)"

Taber, 2019, p.73

Inferential statistics can be used to test for statistical significance in experiments – as long as the 'units of analysis' (e.g., students) are randomly assigned to the experimental and control conditions.
(Figure from Taber, 2019)

That is, if the are difference that the stats. tests suggests are very unlikely to happen by chance, then they are very unlikely to be due to an initial difference between the groups in the two conditions as long as the groups were the result of random assignment. But that is a very important proviso.

There are two aspects to this need for randomisation:

  • to be able to suggest any differences found reflect the effects of the intervention, then there should be random assignment to the two (or more) conditions
  • to be able to suggest the results reflect what would probably would be found in a wider population, the sample should be randomly selected from the population of interest 3

Studies in education seldom meet the requirements for being true experiments
(Figure from Taber, 2019)

In education, it is not always possible to use random assignment, so true experiments are then not possible. However, so-called 'quasi-experiments' may be possible where differences between the outcomes in different conditions may be understood as informative, as long as there is good reason to believe that even without random assignment, the groups assigned to the different conditions are equivalent.

In this specific research, that would mean having good reason to believe that without the intervention (the prelab):

  • students in both groups would have asked overall equivalent (in terms of the analysis undertaken in this study) questions in the lab.;
  • students in both groups would have been judged as displaying overall equivalent subject knowledge.

Often in research where a true experiment is not possible some kind of pre-testing is used to make a case for equivalence between groups.

Two control groups that were out of control

In Winberg and Berg's research there were two studies where comparisons were made between 'experimental' and 'control' conditions

StudyExperimentalControl
Study 1n=78: first-year students, following completion of their first chemistry course in 2001n=97: students who had been interviewed by the researchers during the same course in the previous year
Study 2n=21 (of 58 in cohort)n=37 (of 58 in same cohort)

In the first study, a comparison was made between the cohort where the innovation was introduced and a cohort from the previous year. All other things being equal, it seems likely these two cohorts were fairly similar. But in education all thing are seldom equal, so there is no assurance they were similar enough to be considered equivalent.

In the second study

"Students were divided into treatment (n = 21) and control (n = 37) groups. Distribution of students between the treatment and control groups was not controlled by the researchers".

Winberg & Berg, 2007

So, some factor(s) external to the researchers divided the cohort into two groups – and the reader is told nothing about the basis for this, nor even if the two groups were assigned to the treatments randomly.5 The authors report that the cohort "comprised prospective molecular biologists (31%), biologists (51%), geologists (7%), and students who did not follow any specific program (11%)", and so it is possible the division into two uneven sized groups was based on timetabling constraints with students attending chemistry labs sessions according to their availability based on specialism. But that is just a guess. (It is usually better when the reader of a research report is not left to speculate about procedures and constraints.)

What is important for a reader to note is that in these studies:

  • the researchers were not able to assign learners to conditions randomly;
  • nor were the researchers able to offer any evidence of equivalence between groups (such as near identical pre-test scores);
  • so, the requirements for inferring significance from statistical tests were not met;
  • so, claims in the paper about finding statistically significant differences between conditions cannot therefore be justified given the research design;
  • and therefore the conclusions presented in the paper are strictly not valid.

If students are not randomly assigned to conditions, then any statistically unlikely difference found at the end of an experiment cannot be assumed to be likely to be due to intervention, rather than some systematic initial difference between the groups.
(Figure adapted from Taber, 2019)


This is a shame, because this is in many ways an interesting paper, and much thought and care seems to have been taken about the collection and analysis of meaningful data. Yet, drawing conclusions from statistical tests comparing groups that might never have been similar in the first case is like finding that careful use of a vernier scale shows that after a period of watering plant A, plant A is taller than plant B – having been very careful to make sure plant A was watered regularly with carefully controlled volumes, while plant B was not watered at all – when you did not think to check how tall the two plants were before you started watering plant A.

In such a scenario we might be tempted to assume plant A has actually become taller because it had been watered; but that is just applying what we had conjectured should be the case, and we would be mistaking our expectations for experimental evidence.

Work cited:

Notes:

1 The part of the brain where we can consciously mentipulate ideas is called the working memory (WM). Research suggests that WM has a very limited capacity in the sense that people can only hold in mind a very small number of different things at once. (These 'things' however are somewhat subjective – a complex idea that is treated as a single 'thing' in the WM of an expert can overload a novice.) This limit to ~WM is considered to be one of the most substantial constraints on effective classroom learning. This is also, then, one of the key research findings informing the design of effective teaching.

Read about working memory

Read about key ideas for teaching in accordance with learning theory

How fat is your memory? – read about a chemical analogy for working memory


2 The organisation has seemingly spotted that the USA is only one part of the world, and now describes itself as a global organisation for improving science education through research.


3 There is no reason why an experiment cannot be carried out on a very specific population, such as first year undergraduate chemistry students attending a specific Swedish University such a, say, Umea ̊ University. However, if researchers intend their study to have results generalisable beyond their specific research contexts (say, to first year undergraduate chemistry students attending any Swedish University) then it is important to have a representative sample of that population.

Read about populations of interest in research

Read about generalisation from research studies


4 It might be assumed that scientists, and researchers know what is meant by random, and how to undertake random assignment. Sadly, the literature suggests that in practice the term 'randomly' is sometimes used in research reports to mean something like 'arbitrarily' (Taber, 2013), which fills short of being random.

Read about randomisation in research


5 Arguably, even if the two groups were assigned randomly, there is only one 'unit of analysis' in each condition, as they were assigned as groups. That is, for statistical purposes, the two groups have size n=1 and n=1, which would not allow statistical significance to be found: e.g, see 'Quasi-experiment or crazy experiment?'

Quasi-experiment or crazy experiment?

Trustworthy research findings are conditional on getting a lot of things right


Keith S. Taber


A good many experimental educational research studies that compare treatments across two classes or two schools are subject to potentially conflating variables that invalidate study findings and make any consequent conclusions and recommendations untrustworthy.

I was looking for research into the effectiveness of P-O-E (predict-observe-explain) pedagogy, a teaching technique that is believed to help challenge learners' alternative conceptions and support conceptual change.

Read about the predict-observe-explain approach



One of the papers I came across reported identifying, and then using P-O-E to respond to, students' alternative conceptions. The authors reported that

The pre-test revealed a number of misconceptions held by learners in both groups: learners believed that salts 'disappear' when dissolved in water (37% of the responses in the 80% from the pre-test) and that salt 'melts' when dissolved in water (27% of the responses in the 80% from the pre-test).

Kibirige, Osodo & Tlala, 2014, p.302

The references to "in the 80%" did not seem to be explained anywhere. Perhaps only 80% of students responded to the open-ended questions included as part of the assessment instrument (discussed below), so the authors gave the incidence as a proportion of those responding? Ideally, research reports are explicit about such matters avoiding the need for readers to speculate.

The authors concluded from their research that

"This study revealed that the use of POE strategy has a positive effect on learners' misconceptions about dissolved salts. As a result of this strategy, learners were able to overcome their initial misconceptions and improved on their performance….The implication of these results is that science educators, curriculum developers, and textbook writers should work together to include elements of POE in the curriculum as a model for conceptual change in teaching science in schools."

Kibirige, Osodo & Tlala, 2014, p.305

This seemed pretty positive. As P-O-E is an approach which is consistent with 'constructivist' thinking that recognises the importance of engaging with learners' existing thinking I am probably biased towards accepting such conclusions. I would expect techniques such as P-O-E, when applied carefully in suitable curriculum contexts, to be effective.

Read about constructivist pedagogy

Yet I also have a background in teaching research methods and in acting as a journal editor and reviewer – so I am not going to trust the conclusion of a research study without having a look at the research design.


All research findings are subject to caveats and provisos: good practice in research writing is for the authors to discuss them – but often they are left unmentioned for readers to spot. (Read about drawing conclusions from studies)


Kibirige and colleagues describe their study as a quasi-experiment.

Experimental research into teaching approaches

If one wants to see if a teaching approach is effective, then it seems obvious that one needs to do an experiment. If we can experimentally compare different teaching approaches we can find out which are more effective.

An experiment allows us to make a fair comparison by 'control of variables'.

Read about experimental research

Put very simply, the approach might be:

  • Identify a representative sample of an identified population
  • Randomly assign learners in the sample to either an experimental condition or a control condition
  • Set up two conditions that are alike in all relevant ways, apart from the independent variable of interest
  • After the treatments, apply a valid instrument to measure learning outcomes
  • Use inferential statistics to see if any difference in outcomes across the two conditions reaches statistical significance
  • If it does, conclude that
    • the effect is likely to due to the difference in treatments
    • and will apply, on average, to the population that has been sampled

Now, I expect anyone reading this who has worked in schools, and certainly anyone with experience in social research (such as research into teaching and learning), will immediately recognise that in practice it is very difficult to actually set up an experiment into teaching which fits this description.

Nearly always (if indeed not always!) experiments to test teaching approaches fall short of this ideal model to some extent. This does not mean such studies can not be useful – especially where there are many of them with compensatory strengths and weaknesses offering similar findings (Taber, 2019a)- but one needs to ask how closely published studies fit the ideal of a good experiment. Work in high quality journals is often expected to offer readers guidance on this, but readers should check for themselves to see if they find a study convincing.

So, how convincing do I find this study by Kibirige and colleagues?

The sample and the population

If one wishes a study to be informative about a population (say, chemistry teachers in the UK; or 11-12 year-olds in state schools in Western Australia; or pharmacy undergraduates in the EU; or whatever) then it is important to either include the full population in the study (which is usually only feasible when the population is a very limited one, such as graduate students in a single university department) or to ensure the sample is representative.

Read about populations of interest in research

Read about sampling a population

Kibirige and colleagues refer to their participants as a sample

"The sample consisted of 93 Grade 10 Physical Sciences learners from two neighbouring schools (coded as A and B) in a rural setting in Moutse West circuit in Limpopo Province, South Africa. The ages of the learners ranged from 16 to 20 years…The learners were purposively sampled."

Kibirige, Osodo & Tlala, 2014, p.302

Purposive sampling means selecting participants according to some specific criteria, rather than sampling a population randomly. It is not entirely clear precisely what the authors mean by this here – which characteristics they selected for. Also, there is no statement of the population being sampled – so the reader is left to guess what population the sample is a sample of. Perhaps "Grade 10 Physical Sciences" students – but, if so, universally, or in South Africa, or just within Limpopo Province, or indeed just the Moutse West circuit? Strictly the notion of a sample is meaningless without reference to the population being sampled.

A quasi-experiment

A key notion in experimental research is the unit of analysis

"An experiment may, for example, be comparing outcomes between different learners, different classes, different year groups, or different schools…It is important at the outset of an experimental study to clarify what the unit of analysis is, and this should be explicit in research reports so that readers are aware what is being compared."

Taber, 2019a, p.72

In a true experiment the 'units of analysis' (which in different studies may be learners, teachers, classes, schools, exam. papers, lessons, textbook chapters, etc.) are randomly assigned to conditions. Random assignment allows inferential statistics to be used to directly compare measures made in the different conditions to determine whether outcomes are statistically significant. Random assignment is a way of making systematic differences between groups unlikely (and so allows the use of inferential statistics to draw meaningful conclusions).

Random assignment is sometimes possible in educational research, but often researchers are only able to work with existing groupings.

Kibirige, Osodo & Tlala describe their approach as using a quasi-experimental design as they could not assign learners to groups, but only compare between learners in two schools. This is important, as means that the 'units of analysis' are not the individual learners, but the groups: in this study one group of students in one school (n=1) is being compared with another group of students in a different school (n=1).

The authors do not make it clear whether they assigned the schools to the two teaching conditions randomly – or whether some other criterion was used. For example, if they chose school A to be the experimental school because they knew the chemistry teacher in the school was highly skilled, always looking to improve her teaching, and open to new approaches; whereas the chemistry teacher in school B had a reputation for wishing to avoid doing more than was needed to be judged competent – that would immediately invalidate the study.

Compensating for not using random assignment

When it is not possible to randomly assign learners to treatments, researchers can (a) use statistics that take into account measurements on each group made before, as well as after, the treatments (that is, a pre-test – post-test design); (b) offer evidence to persuade readers that the groups are equivalent before the experiment. Kibirige, Osodo and Tlala seek to use both of these steps.

Do the groups start as equivalent?

Kibirige, Osodo and Tlala present evidence from the pre-test to suggest that the learners in the two groups are starting at about the same level. In practice, pre-tests seldom lead to identical outcomes for different groups. It is therefore common to use inferential statistics to test for whether there is a statistically significant difference between pre-test scores in the groups. That could be reasonable, if there was an agreed criterion for deciding just how close scores should be to be seen as equivalent. In practice, many researchers only check that the differences do not reach statistical significance at the level of probability <0.05: that it they look to see if there are strong differences, and, if not, declare this is (or implicitly treat this as) equivalence!

This is clearly an inadequate measure of equivalence as it will only filter out cases where there is a difference so large it is found to be very unlikely to be a chance effect.


If we want to make sure groups start as 'equivalent', we cannot simply look to exclude the most blatant differences. (Original image by mcmurryjulie from Pixabay)

See 'Testing for initial equivalence'


We can see this in the Kibirige and colleagues' study where the researchers list mean scores and standard deviations for each question on the pre-test. They report that:

"The results (Table 1) reveal that there was no significant difference between the pre-test achievement scores of the CG [control group] and EG [experimental group] for questions (Appendix 2). The p value for these questions was greater than 0.05."

Kibirige, Osodo & Tlala, 2014, p.302

Now this paper is published "licensed under Creative Commons Attribution 3.0 License" which means I am free to copy from it here.



According to the results table, several of the items (1.2, 1.4, 2.6) did lead to statistically significantly different response patterns in the two groups.

Most of these questions (1.1-1.4; 2.1-2.8; discussed below) are objective questions, so although no marking scheme was included in the paper, it seems they were marked as correct or incorrect.

So, let's take as an example question 2.5 where readers are told that there was no statistically significant difference in the responses of the two groups. The mean score in the control group was 0.41, and in the experimental group was 0.27. Now, the paper reports that:

"Forty nine (49) learners (31 males and 18 females) were from school A and acted as the experimental group (EG) whereas the control group (CG) consisted of 44 learners (18 males and 26 females) from school B."

Kibirige, Osodo & Tlala, 2014, p.302

So, according to my maths,


Correct responsesIncorrect responses
School A (49 students)(0.27 ➾) 1336
School B (44 students)(0.41 ➾) 1826
pre-test results for an item with no statistically significant difference between groups

"The achievement of the EG and CG from pre-test results were not significantly different which suggest that the two groups had similar understanding of concepts" (p.305).
Pre-test results for an item with no statistically significant difference between groups (offered as evidence of 'similar' levels of initial understanding in the two groups)

While, technically, there may have been no statistically significant difference here, I think inspection is sufficient to suggest this does not mean the two groups were initially equivalent in terms of performance on this item.


Data that is normally distributed falls on a 'bell-shaped' curve

(Image by mcmurryjulie from Pixabay)


Inspection of this graphic also highlights something else. Student's t-test (used by the authors to produce the results in their table 1), is a parametric test. That means it can only be used when the data fit certain criteria. The data sample should be randomly selected (not true here) and normally distributed. A normal distribution means data is distributed in a bell-shaped Gaussian curve (as in the image in the blue circle above).If Kibirige, Osodo & Tlala were applying the t-test to data distributed as in my graphic above (a binary distribution where answers were either right or wrong) then the test was invalid.

So, to summarise, the authors suggest there "was no significant difference between the pre-test achievement scores of the CG and EG for questions", although sometimes there was (according to their table); and they used the wrong test to check for this; and in any case lack of statistical significance is not a sufficient test for equivalence.

I should note that the journal does claim to use peer review to evaluate submissions to see if they are ready for publication!

Comparing learning gains between the two groups

At one level equivalence might not be so important, as the authors used an ANCOVA (Analysis of Covariance) test which tests for difference at post-test taking into account the pre-test. Yet this test also has assumptions that need to be tested for and met, but here seem to have just been assumed.

However, to return to an even more substantive point I made earlier, as the learners were not randomly assigned to the two different conditions /treatments, what should be compared are the two school-based groups (i.e., the unit of analysis should be the school group) but that (i.e., a sample of 1 class, rather than 40+ learners, in each condition) would not facilitate using inferential statistics to make a comparison. So, although the authors conclude

"that the achievement of the EG [taking n=49] after treatment (mean 34. 07 ± 15. 12 SD) was higher than the CG [taking n =44] (mean 20. 87 ± 12. 31 SD). These means were significantly different"

Kibirige, Osodo & Tlala, 2014, p.303

the statistics are testing the outcomes as if 49 units independently experienced one teaching approach and 44 independently experienced another. Now, I do not claim to be a statistics expert, and I am aware that most researchers only have a limited appreciation of how and why stats. tests work. For most readers, then, a more convincing argument may be made by focussing on the control of variables.

Controlling variables in educational experiments

The ability to control variables is a key feature of laboratory science, and is critical to experimental tests. Control of variables, even identification of relevant variables, is much more challenging outside of a laboratory in social contexts – such as schools.

In the case of Kibirige, Osodo & Tlala's study, we can set out the overall experimental design as follows


Independent
variable
Teaching approach:
predict-observe-explain (experimental)
– lectures (comparison condition)
Dependent
variable
Learning gains
Controlled
variable(s)
Anything other than teaching approach which might make a difference to student learning
Variables in Kibirige, Osodo & Tlala's study

The researchers set up the two teaching conditions, measure learning gains, and need to make sure any other factors which might have an effect on learning outcomes, so called confounding variables, are controlled so the same in both conditions.

Read about confounding variables in research

Of course, we cannot be sure what might act as a confounding variable, so in practice we may miss something which we do not recognise is having an effect. Here are some possibilities based on my own (now dimly recalled) experience of teaching in school.

The room may make a difference. Some rooms are

  • spacious,
  • airy,
  • well illuminated,
  • well equipped,
  • away from noisy distractions
  • arranged so everyone can see the front, and the teacher can easily move around the room

Some rooms have

  • comfortable seating,
  • a well positioned board,
  • good acoustics

Others, not so.

The timetable might make a difference. Anyone who has ever taught the same class of students at different times in the week might (will?) have noticed that a Tuesday morning lesson and a Friday afternoon lesson are not always equally productive.

Class size may make a difference (here 49 versus 44).

Could gender composition make a difference? Perhaps it was just me, but I seem to recall that classes of mainly female adolescents had a different nature than classes of mainly male adolescents. (And perhaps the way I experienced those classes would have been different if I had been a female teacher?) Kibirige, Osodo and Tlala report the sex of the students, but assuming that can be taken as a proxy for gender, the gender ratios were somewhat different in the two classes.


The gender make up of the classes was quite different: might that influence learning?

School differences

A potentially major conflating variable is school. In this study the researchers report that the schools were "neighbouring" and that

Having been drawn from the same geographical set up, the learners were of the same socio-cultural practices.

Kibirige, Osodo & Tlala, 2014, p.302

That clearly makes more sense than choosing two schools from different places with different demographics. But anyone who has worked in schools will know that two neighbouring schools serving much the same community can still be very different. Different ethos, different norms, and often different levels of outcome. Schools A and B may be very similar (but the reader has no way to know), but when comparing between groups in different schools it is clear that school could be a key factor in group outcome.

The teacher effect

Similar points can be made about teachers – they are all different! Does ANY teacher really believe that one can swap one teacher for another without making a difference? Kibirige, Osodo and Tlala do not tell readers anything about the teachers, but as students were taught in their own schools the default assumption must be that they were taught by their assigned class teachers.

Teachers vary in terms of

  • skill,
  • experience,
  • confidence,
  • enthusiasm,
  • subject knowledge,
  • empathy levels,
  • insight into their students,
  • rapport with classes,
  • beliefs about teaching and learning,
  • teaching style,
  • disciplinary approach
  • expectations of students

The same teacher may perform at different levels with different classes (preferring to work with different grade levels, or simply getting on/not getting on with particular classes). Teachers may have uneven performance across topics. Teachers differentially engage with and excel in different teaching approaches. (Even if the same teacher had taught both groups we could not assume they were equally skilful in both teaching conditions.)

Teacher variable is likely to be a major difference between groups.

Meta-effects

Another conflating factor is the very fact of the research itself. Students may welcome a different approach because it is novel and a change from the usual diet (or alternatively they may be nervous about things being done differently) – but such 'novelty' effects would disappear once the new way of doing things became established as normal. In which case, it would be an effect of the research itself and not of what is being researched.

Perhaps even more powerful are expectancy effects. If researchers expect an innovation to improve matters, then these expectations get communicated to those involved in the research and can themselves have an affect. Expectancy effects are so well demonstrated that in medical research double-blind protocols are used so that neither patients nor health professionals they directly engage with in the study know who is getting which treatment.

Read about expectancy effects in research

So, we might revise the table above:


Independent
variable
Teaching approach:
predict-observe-explain (experimental)
– lectures (comparison condition)
Dependent
variable
Learning gains
Potentially conflating
variables
School effect
Teacher effect
Class size
Gender composition of teaching groups
Relative novelty of the two teaching approaches
Variables in Kibirige, Osodo & Tlala's study

Now, of course, these problems are not unique to this particular study. The only way to respond to teacher and school effects of this kind is to do large scale studies, and randomly assign a large enough number of schools and teachers to the different conditions so that it becomes very unlikely there will be systematic differences between treatment groups.

A good many experimental educational research studies that compare treatments across two classes or two schools are subject to potentially conflating variables that invalidate study findings and make any consequent conclusions and recommendations untrustworthy (Taber, 2019a). Strangely, often this does not seem to preclude publication in research journals. 1

Advice on controls in scientific investigations:

I can probably do no better than to share some advice given to both researchers, and readers of research papers, in an immunology textbook from 1910:

"I cannot impress upon you strongly enough never to operate without the necessary controls. You will thus protect yourself against grave errors and faulty diagnoses, to which even the most competent investigator may be liable if he [or she] fails to carry out adequate controls. This applies above all when you perform independent scientific investigations or seek to assess them. Work done without the controls necessary to eliminate all possible errors, even unlikely ones, permits no scientific conclusions.

I have made it a rule, and would advise you to do the same, to look at the controls listed before you read any new scientific papers… If the controls are inadequate, the value of the work will be very poor, irrespective of its substance, because none of the data, although they may be correct, are necessarily so."

Julius Citron

The comparison condition

It seems clear that in this study there is no strict 'control' of variables, and the 'control' group is better considered just a comparison group. The authors tell us that:

"the control group (CG) taught using traditional methods…

the CG used the traditional lecture method"

Kibirige, Osodo & Tlala, 2014, pp.300, 302

This is not further explained, but if this really was teaching by 'lecturing' then that is not a suitable approach for teaching school age learners.

This raises two issues.

There is a lot of evidence that a range of active learning approaches (discussion work, laboratory work, various kinds of group work) engages and motivates students more than whole lessons spent listening to a teacher. Therefore any approach which basically involves a mixture of students doing things, discussing things, engaging with manipulatives and resources as well as listening to a teacher, tends to be superior to just being lectured. Good science teaching normally involves lessons sequenced into a series of connected episodes involving different types of student activity (Taber, 2019b). Teacher presentations of the target scientific account are very important, but tend to be effective when embedded in a dialogic approach that allows students to explore their own thinking and takes into account their starting points.

So, comparing P-O-E with lectures (if they really were lectures) may not tell researchers much about P-O-E specifically, as a teaching approach. A better test would compare P-O-E with some other approach known to be engaging.

"Many published studies argue that the innovation being tested has the potential to be more effective than current standard teaching practice, and seek to demonstrate this by comparing an innovative treatment with existing practice that is not seen as especially effective. This seems logical where the likely effectiveness of the innovation being tested is genuinely uncertain, and the 'standard' provision is the only available comparison. However, often these studies are carried out in contexts where the advantages of a range of innovative approaches have already been well demonstrated, in which case it would be more informative to test the innovation that is the focus of the study against some other approach already shown to be effective."

Taber, 2019a, p.93

The second issue is more ethical than methodological. Sometimes in published studies (and I am not claiming I know this happened here, as the paper says so little about the comparison condition) researchers seem to deliberately set up a comparison condition they have good reason to expect is not effective: such as asking a teacher to lecture and not include practical work or discussion work or use of digital learning technologies and so forth. Potentially the researchers are asking the teacher of the 'control' group to teach less effectively than normally to bias the experiment towards their preferred outcome (Taber, 2019a).

This is not only a failure to do good science, but also an abuse of those learners being deliberately subjected to poor teaching. Perhaps in this study the class in School B was habitually taught by being lectured at, so the comparison condition was just what would have occurred in the absence of the research, but this is always a worry when studies report comparison conditions that seem to deliberately disadvantage students. (This paper does not seem to report anything about obtaining voluntary informed consent from participants, nor indeed about how access to the schools was negotiated. )

"In most educational research experiments of the type discussed in this article, potential harm is likely to be limited to subjecting students (and teachers) to conditions where teaching may be less effective, and perhaps demotivating…It can also potentially occur in control conditions if students are subjected to teaching inputs of low effectiveness when better alternatives were available. This may be judged only a modest level of harm, but – given that the whole purpose of experiments to test teaching innovations is to facilitate improvements in teaching effectiveness – this possibility should be taken seriously."

Taber, 2019a, p.94

Validity of measurements

Even leaving aside all the concerns expressed above, the results of a study of this kind depends upon valid measurements. Assessment items must test what they claim to test, and their analysis should be subject to quality control (and preferably blind to which condition a script being analysed derives form). Kibirige, Osodo and Tlala append the test they used in the study (Appendix 2, pp.309-310), which is very helpful in allowing readers to judge at least its face validity. Unfortunately, they do not include a mark/analysis scheme to show what they considered responses worthy of credit.

"The [Achievement Test] consisted of three questions. Question one consisted of five statements which learners had to classify as either true or false. Question two consisted of nine [sic, actually eight] multiple questions which were used as a diagnostic tool in the design of the teaching and learning materials in addressing misconceptions based on prior knowledge. Question three had two open-ended questions to reveal learners' views on how salts dissolve in water (Appendix 1 [sic, 2])."

Kibirige, Osodo & Tlala, 2014, p.302

"Question one consisted of five statements which learners had to classify as either true or false."

Question 1 is fairly straightforward.

1.2: Strictly all salts do dissolve in water to some extent. I expect that students were taught that some salts are insoluble. Often in teaching we start with simple dichotomous models (metal-non metal; ionic-covalent; soluble-insoluble; reversible – irreversible) and then develop these to more continuous accounts that recognise difference of degree. It is possible here then that a student who had learnt that all salts are soluble to some extent might have been disadvantaged by giving the 'wrong' ('True') response…

…although[sic] , actually, there is perhaps no excuse for answering 'True' ('All salts can dissolve in water') here as a later question begins "3.2. Some salts does [sic] not dissolve in water. In your own view what happens when a salt do [sic] not dissolve in water".

Despite the test actually telling students the answer to this item, it seems only 55% of the experimental group, and 23% of the control group obtained the correct answer on the post test – precisely the same proportions as on the pre-test!



1.4: Seems to be 'False' as the ions exist in the salt and are not formed when it goes into solution. However, I am not sure if that nuance of wording is intended in the question.

Question 2 gets more interesting.


"Question two consisted of nine multiple questions" (seven shown here)

I immediately got stuck on question 2.2 which asked which formula (singular, not 'formula/formulae', note) represented a salt. Surely, they are all salts?

I had the same problem on 2.4 which seemed to offer three salts that could be formed by reacting acid with base. Were students allowed to give multiple responses? Did they have to give all the correct options to score?

Again, 2.5 offered three salts which could all be made by direct reaction of 'some substances'. (As a student I might have answered A assuming the teacher meant to ask about direct combination of the elements?)

At least in 2.6 there only seemed to be two correct responses to choose between.

Any student unsure of the correct answer in 2.7 might have taken guidance from the charges as shown in the equation given in question 2.8 (although indicated as 2.9).

How I wished they had provided the mark scheme.



The final question in this section asked students to select one of three diagrams to show what happens when a 'mixture' of H2O and NaCl in a closed container 'react'. (In chemistry, we do not usually consider salt dissolving as a reaction.)

Diagram B seemed to show ion pairs in solution (but why the different form of representation?) Option C did not look convincing as the chloride ions had altogether vanished from the scene and sodium seemed to have formed multiple bonds with oxygen and hydrogens.

So, by a process of elimination, the answer is surely A.

  • But components seem to be labelled Na and Cl (not as ions).
  • And the image does not seem to represent a solution as there is much too much space between the species present.
  • And in salt solution there are many water molecules between solvated ions – missing here.
  • And the figure seems to show two water molecules have broken up, not to give hydrogen and hydroxide ions, but lone oxygen (atoms, ions?)
  • And why is the chlorine shown to be so much larger in solution than it was in the salt? (If this is meant to be an atom, it should be smaller than the ion, not larger. The real mystery is why the chloride ions are shown so much smaller than smaller sodium ions before salvation occurs when chloride ions have about double the radii of sodium ions.)

So diagram A is incredible, but still not quite as crazy an option as B and C.

This is all despite

"For face validity, three Physical Sciences experts (two Physical Sciences educators and one researcher) examined the instruments with specific reference to Mpofu's (2006) criteria: suitability of the language used to the targeted group; structure and clarity of the questions; and checked if the content was relevant to what would be measured. For reliability, the instruments were piloted over a period of two weeks. Grade 10 learners of a school which was not part of the sample was used. Any questions that were not clear were changed to reduce ambiguity."

Kibirige, Osodo & Tlala, 2014, p.302

One wonders what the less clear, more ambiguous, versions of the test items were.

Reducing 'misconceptions'

The final question was (or, perhaps better, questions were) open-ended.



I assume (again, it would be good for authors of research reports to make such things explicit) these were the questions that led to claims about the identified alternative conceptions at pre-test.

"The pre-test revealed a number of misconceptions held by learners in both groups: learners believed that salts 'disappear' when dissolved in water (37% of the responses in the 80% from the pre-test) and that salt 'melts' when dissolved in water (27% of the responses in the 80% from the pre-test)."

Kibirige, Osodo & Tlala, 2014, p.302

As the first two (sets of) questions only admit objective scoring, it seems that this data can only have come from responses to Q3. This means that the authors cannot be sure how students are using terms. 'Melt' is often used in an everyday, metaphorical, sense of 'melting away'. This use of language should be addressed, but it may not be a conceptual error

As the first two (sets of) questions only admit objective scoring, it seems that this data can only have come from responses to Q3. This means that the authors cannot be sure how students are using terms. 'Melt' is often used in an everyday, metaphorical, sense of 'melting away'. This use of language should be addressed, but it may not (for at least some of these learners) be a conceptual error as much as poor use of terminology. .

To say that salts disappear when they dissolve does not seem to me a misconception: they do. To disappear means to no longer be visible, and that's a fair description of the phenomenon of salt dissolving. The authors may assume that if learners use the term 'disappear' they mean the salt is no longer present, but literally they are only claiming it is not directly visible.

Unfortunately, the authors tell us nothing about how they analysed the data collected form their test, so the reader has no basis for knowing how they interpreted student responded to arrive at their findings. The authors do tell us, however, that:

"the intervention had a positive effect on the understanding of concepts dealing with dissolving of salts. This improved achievement was due to the impact of POE strategy which reduced learners' misconceptions regarding dissolving of salts"

Kibirige, Osodo & Tlala, 2014, p.305

Yet, oddly, they offer no specific basis for this claim – no figures to show the level at which "learners believed that salts 'disappear' when dissolved in water …and that salt 'melts' when dissolved in water" in either group at the post-test.


'disappear' misconception'melt' misconception
pre-test:
experimental group
not reportednot reported
pre-test:
comparison group
not reportednot reported
pre-test:
total
(0.37 x 0.8 x 93 =)
24.5 (!?)
(0.27 x 0.8 x 93 =)
20
post-test:
experimental group
not reportednot reported
post-test:
comparison group
not reportednot reported
post-test:
total
not reportednot reported
Data presented about the numbers of learners considered to hold specific misconceptions said to have been 'reduced' in the experimental condition

It seems journal referees and the editor did not feel some important information was missing here that should be added before publication.

In conclusion

Experiments require control of variables. Experiments require random assignment to conditions. Quasi-experiments, where random assignment is not possible, are inherently weaker studies than true experiments.

Control of variables in educational contexts is often almost impossible.

Studies that compare different teaching approaches using two different classes each taught by a different teacher (and perhaps not even in the same school) can never be considered fair comparisons able to offer generalisable conclusions about the relative merits of the approaches. Such 'experiments' have no value as research studies. 1

Such 'experiments' are like comparing the solubility of two salts by (a) dropping a solid lump of 10g of one salt into some cold water, and (b) stirring a finely powdered 35g sample of the other salt into hot propanol; and watching to see which seems to dissolve better.

Only large scale studies that encompass a wide range of different teachers/schools/classrooms in each condition are likely to produce results that are generalisable.

The use of inferential statistical tests is only worthwhile when the conditions for those statistical tests are met. Sometimes tests are said to be robust to modest deviations from such acquirements as normality. But applying tests to data that do not come close to fitting the conditions of the test is pointless.

Any research is only as trustworthy as the validity of its measurements. If one does not trust the measuring instrument or the analysis of measurement data then one cannot trust the findings and conclusions.


The results of a research study depend on an extended chain of argumentation, where any broken link invalidates the whole chain. (From 'Critical reading of research')

So, although the website for the Mediterranean Journal of Social Science claims "All articles submitted …undergo to a rigorous double blinded peer review process", I think the peer reviewers for this article were either very generous, very ignorant, or simply very lazy. That may seem harsh, but peer review is meant to help authors improve submissions till they are worthy of appearing in the literature, and here peer review has failed, and the authors (and readers of the journal) have been let down by the reviewers and the editor who ultimately decided this study was publishable in this form.

If I asked a graduate student (or indeed an undergraduate student) to evaluate this paper, I would expect to see a response something along these sorts of lines:


Applying the 'Critical Reading of Empirical Studies Tool' to 'The effect of predict-observe-explain strategy on learners' misconceptions about dissolved salts'

I still think P-O-E is a very valuable part of the science teacher's repertoire – but this paper can not contribute anything to support to that view.

Work cited:

Note

1 A lot of these invalid experiments get submitted to research journals, scrutinised by editors and journal referees, and then get published without any acknowledgement of how they fall short of meeting the conditions for a valid experiment. (See, for example, examples discussed in Taber 2019a.) It is as if the mystique of experiment is so great that even studies with invalid conclusions are considered worth publishing as long as the authors did an experiment.

A corny teaching analogy

Pop goes the comparison


Keith S. Taber


The order of corn popping is no more random than the roll of a dice.


I was pleased to read about a 'new' teaching analogy in the latest 'Education in Chemistry' (the Royal Society of Chemistry's education magazine) – well, at least it was new to me. It was an analogy that could be demonstrated easily in the school science lab, and, according to Richard Gill (@RGILL_Teach on Twitter), went down really well with his class.

Teaching analogies

Analogies are used in teaching and in science communication to help 'make the unfamiliar familiar', to show someone that something they do not (yet) know about is actually, in some sense at least, a bit like something they are already familiar with. In an analogy, there is a mapping between some aspect(s) of the structure of the target ideas and the structure of the familiar phenomenon or idea being offered as an analogue. Such teaching analogies can be useful to the extent that someone is indeed highly familiar with the 'analogue' (and more so than with the target knowledge being communicated); that there is a helpful mapping across between the analogue and the target; and that comparison is clearly explained (making clear which features of the analogue are relevant, and how).

Read about analogies in science


The analogy is discussed in the July 2022 Edition of Education in Chemistry, and on line.

Richard Gill suggests that 'Nuclear decay is a tough concept' to teach and learn, but after making some popcorn he realised that popping corn offered an analogy for radioactive decay that he could demonstrate in the classroom.

Richard Gill describes how

"I tell the students I'm going to heat up the oil; I'm going to give the kernels some energy, making them unstable and they're going to want to pop. I show them under the visualiser, then I ask, 'which kernel will pop first?' We have a little competition. Why do I do this? It links to nuclear decay being random. We know an unstable atom will decay, but we don't know which atom will decay or when it will decay, just like we don't know which kernel will pop when."

Gill, 2022

In the analogy, the corn (maize) kernels represents atoms or nuclei of an unstable isotope, and the popped corn the decay product, daughter atoms or nuclei. 1



Richard Gill homes in on a key feature of radioactive decay which may seem counter-intuitive to learners, but which is actually a pattern found in many different phenomena – exponential decay. The rate of radioactive decay falls (decays, confusingly) over time. Theoretically the [radioactive] decay rate follows a very smooth [exponential] decay curve. Theoretically, because of another key feature of radioactive decay that Gill highlights – its random nature!

It may seem that something which occurs by random will not lead to a regular pattern, but although in radioactivity the behaviour of an individual nucleus (in terms of when it might decay) cannot be predicted, when one deals with vast numbers of them in a macroscopic sample, a clear pattern emerges. Each different type of unstable atom has an associated half-life which tells us when half of a sample will have decayed. These half-lives can vary from fractions of a second to vast numbers of years, but are fixed for a particular nuclide.

Richard Gill notes that he can use the popping corn demonstration as background for teaching about half-life,

I usually follow this lesson with the idea of half-lives. The concept of half-lives now makes sense. Why are there fewer unpopped kernels over time? Because they're popping. Why do radioactive materials become less radioactive over time? Because they're decaying.

Gill, 2022

Perhaps he could even develop his demonstration to model the half-life of decay?

Modelling the popcorn decay curve

The Australian Earth Science Education blog suggests

"Popcorn can be used to model radioactive decay. It is a lot safer than using radioactive isotopes, as well as much tastier"

and offers instructions for a practical activity with a bag of corn and a microwave to collect data to plot a decay curve (see https://ausearthed.blogspot.com/2020/04/radioactive-popcorn.html). Although this seems a good idea, I suspect this specific activity (which involves popping the popping corn in and out of the oven) might be too convoluted for learners just being introduced to the topic, but could be suitable for more advanced learners.

However, The Association of American State Geologists suggests an alternative approach that could be used in a class context where different groups of students put bags of popcorn into the microwave for different lengths of time to allow the plotting of a decay curve by collating class results (https://www.earthsciweek.org/classroom-activities/dating-popcorn).

Another variants is offered by The University of South Florida's' Spreadsheets Across the Curriculum' (SSAC) project. SSAC developed an activity ("Radioactive Decay and Popping Popcorn – Understanding the Rate Law") to simulate the popping of corn using (yes, you guessed) a spreadsheet to model the decay of corn popping, as a way of teaching about radioactive decay!

This is more likely to give a good decay curve, but one cannot help feeling it loses some of the attraction of Richard Gill's approach with the smell, sound and 'jumping' of actual corn being heated! One might also wonder if there is any inherent pedagogic advantage to simulating popping corn as a model for simulating radioactive decay – rather than just using the spreadsheet to directly model radioactive decay?

Feedback cycles

The reason the popping corn seems to show the same kind of decay as radioactivity, is because it can be represented with the same kind of feedback cycle.

This pattern is characteristic of simple systems where

  • a change is brought about by a driver
  • that change diminishes the driver

In radioactive decay, the level of activity is directly proportional to the number of unstable nuclei present (i.e., the number of nuclei that can potentially decay), but the very process of decay reduces this number (and so reduces the rate of decay).

So,

  • when there are many unstable nuclei
  • there will be much decay
  • quickly reducing the number of unstable nuclei
    • so reducing the rate of decay
    • so reducing the rate at which unstable nuclei decay
      • so reducing the rate at which decay is reducing

and so forth.


Exponential decay is a characteristic of systems with a simple negative feedback cycle
(source: ASCEND project)

Recognising this general pattern was the focus of an 'enrichment' activity designed for upper secondary learners in the Gatsby SEP supported ASCEND project which presented learners with information about the feedback cycle in radioactive decay; and then had them set up and observe some quite different phenomena (Taber, 2011):

  • capacitor discharge
  • levelling of connected uneven water columns
  • hot water cooling

In each case the change driven by some 'driver' reduced the driver itself (so a temperature difference leads to heat transfer which reduces the temperature difference…).

Read about the classroom activity

In Richard Gill's activity the driver is the availability of intact corn kernels being heated such that water vapour is building up inside the kernel – something which is reduced by the consequent popping of those kernels.


A negative feedback cycle

Mapping the analogy

A key feature of an analogy is that it can be understood as a kind of mapping between two conceptual structures. The making popcorn demonstration seems a very simple analogue, but mapping out the analogy might be useful (at least for the teacher) to clarify it. Below I present a representation of a mapping between popping corn and radioactive decay, suggesting which aspects of the analogue (the popping corn) map onto the target scientific concept.


Mapping an analogy between making pop-corn and radioactive decay

In this mapping I have used colour to highlight differences between the two (conceptual) structures. Perhaps the most significant difference is represented by the blue (target concept) versus red (analogue) features.


Most analogies only map to a limited extent

There will be aspects of an analogue that do not map onto anything on the target, and sometimes there will be an important feature of the target which has no analogous feature in the analogue. There is always the possibility that irrelevant features of an analogue will be mapped across by learners.

As one example, the comparison of the atom with a tiny solar system was once an image often used as a teaching analogy, yet it seems learners often have limited understandings of both analogue and target, and may be transferring across inappropriately – such as assuming the electrons are bound to the atom by gravity (Taber, 2013a). Where students have an alternative conception of the analogue (the earth attracts the sun, but not vice versa) they will often assume the same pattern in the target (the nucleus is not attracted to the electrons).

Does this matter? Well, yes and no. A teaching analogy is used to introduce a technical scientific concept by making it seem familiar. This is a starting point to be built upon (so, Richard Gill tells us that he will build upon the diminishing activity of cooking corn in his his popcorn demonstration to introduce the idea of half-life), so it does not matter if students do not fully understand everything immediately. (Indeed, it is naive to assume most learners could acquire a new complex set of ideas all at once: learning is incremental – see Key ideas for constructivist teaching).

Analogies can act as 'scaffolds' to help learners venture out from their existing continents of knowledge towards new territory. Once this 'anchor' in learners' experience is established one can, so to speak, disembark from the scaffolding raft the onto the more solid ground of the shore.

Read about scaffolding learning

However, it is important to be careful to make sure

  • (a) learners appreciate the limitations of models (such an analogies) – that they are thinking and learning tools, and not absolute accounts of the natural word; and that
  • (b) the teacher helps dismantle the 'scaffolding' once it is not needed, so that it is not retained as part of the learners 'scientific' account.
Weak anthropomorphism

An example of that might be Gill's use of anthropomorphism.

…unstable atoms/nuclei need to become stable…

…I'm going to give the kernels some energy, making them unstable and they're going to want to pop…

Anthropomorphism

This type of language is often used to offer narratives that are more readily appreciated by learners (making the unfamiliar familiar, again) but students can come to use such language habitually, and it may come to stand in place of a more scientific account (Taber & Watts, 1996). So, 'weak' anthropomorphism used to help introduce something abstract and counter-intuitive is useful, but 'strong' anthropomorphism that comes to be adopted as a scientific explanation (e.g., nuclei decay because they want to be stable) is best avoided by seeking to move beyond the figurative language as soon as students are ready.

Read about anthropomorphism

The 'negative' analogy

The mapping diagram above may highlight several potential teaching points that may be considered (perhaps not to be introduced immediately, but when the new concepts are later reinforced and developed).

Where does the energy come from?

One key difference between the two systems is that radioactive decay is (we think) completely spontaneous, whereas the corn only pops because we cook it (Gill used a Bunsen burner) and left to its own devices remains as unpopped kernels.

Related to this, the source of energy for popping corn is the applied heat, whereas unstable nuclei are already in a state of high energy and so have an 'internal' source for their activity. This a key difference that will likely be obvious to some, but certainly not all learners in most classes.

When is random, random?

A more subtle point relates to the 'random' nature of the two events. I suggest subtle, because there are many published reports written by researchers in science education which suggests even supposed experts can have a pretty shaky ideas of what counts as random (Taber, 2013b).

Read 'Nothing random about a proper scientific evaluation?'

Read about the randomisation criterion

As far as scientists understand, the decay of one unstable nucleus in a sample of radioactive material (rather than another) is a random process. It is not just that we are not yet able to predict when a particular nucleus will decay – according to current scientific accounts it is not possible to predict in principle. This is an idea that even Einstein found difficult to accept.

That is not true with the corn. Presumably there are subtle differences between kernels – some have slightly more water content, or slightly weaker casings. Perhaps more significantly, some are heated more than others due to their position in the pan and the position of the heat source, or due differential exposure to the cooking oil… In principle it would be possible to measure relevant variables and model the set up to make good predictions. (In principle, even if in practice a very complex task.) The order of corn popping is no more random than…say…the roll of a dice. That is, physics tells us it follows natural laws, even if we are not in a position to fully model the phenomenon.

(We might suggest that a student who considered the corn popping as a random event because she saw apparently identical kernels all being heated in the same pan at the same time is simply missing certain 'hidden variables'. Einstein wondered if there were also 'hidden variables' that science had not yet uncovered which could explain random events such as why one nucleus rather than another decays at a particular moment.)

On the recoil

Perhaps a more significant difference is what is observed. The corn are observed 'jumping' (more anthropomorphic language?) Physics tells us that momentum must always be conserved, and the kernels act like tiny jet propelled rockets. That is, as steam is released when the kernel bursts, the rest of the kernel 'jumps' in the opposite direction. (That is, by Newton's third law, there is a reaction force to the force pushing the steam out of the kernel. Momentum is a vector, so it is possible for a stationary object to break up into several moving parts with conservation of momentum.)

Something similar happens in radioactive decay. The emitted radiation carries away momentum, and the remaining 'daughter' nucleus recoils – although if the material is in the solid state this effect is dissipated by being spread across the lattice. So, the radioactivity which is detected is not analogous to the jumping corn, but to the steam it has released.

Is this important? That likely depends upon the level being taught. If the topics is being introduced to 14-16 years-olds, perhaps not. If the analogy is being explored with post-compulsory students doing an elective course, then maybe. (If not in chemistry; then certainly in physics, where learners are expected to to apply the principle of conservation of momentum across various scenarios.)

Will this be on the exam?

When I drafted this, I suspected most readers might find my caveats above about the limitations of the analogy, a bit pernickety (the kind of things an academic who's been out of the school classroom too long and forgotten the realities of working with pupils might dream up), but then I found what claims to be an Edexcel GCE Physics paper from 2012 (paper reference 6PH05/01) on line. In this paper, one question begins:

"In a demonstration to her class, a teacher pours popcorn kernels onto a hot surface and waits for them to pop…".

Much to my delight, I found the first part of this question asked learners:

"How realistic is this demonstration as an analogy to radioactive decay?

Consider aspects of the demonstration that are similar to radioactive decay and aspects that are different"

Examination paper asking physics students to identify positive and negative aspects of the analogy.

Classes of radioactivity

One further difference did occur to me that may be important. At some level this analogy works for radioactivity regardless of what is being emitted from an excited nucleus. However, the analogy seems clearer for the emission of an alpha particle, or a beta particle, or a neutron, than in the case of gamma radiation.

Although in gamma decay an excited nucleus relaxes to a lower energy state emitting a photon, it may not be as obvious to learners that the nucleus has changed (arguably, it has not 'substantially' changed as there is no change of substance) – as it has the same mass number and charge as before. This may be a point to be raised if moving on later to discuss different classes of radioactivity.

Or, perhaps, with gamma decay one can use a different version of the analogy?

Another corny analogy

Although I do not think I had never come across this analogy before reading the Education in Chemistry piece (perhaps because I do not make myself popcorn), Richard Gill does not seem to be the only person to have noticed this comparison. (They say 'great minds think alike' – and not just physicist Henri Poincaré thinking like Kryten from'Red Dwarf'). When I looked around the world-wide web I found there were two different approaches to using corn kernels to model radioactivity.

Some people use a similar demonstration to Mr Gill.2 However, there was also a different approach to using the corn. There were variations on this 3, but the gist was that

  • one starts with a large number of kernels
  • they are agitated (e.g., shaken in a box with different cells, poured onto the bench…)
  • then inspected to see which are pointing in some arbitrary direction designated as representing decay
  • the 'decayed' kernels are removed and counted
  • the rest of the sample is agitated again
  • etc.
Choose a direction to represent decay, and remove the aligned kernels as the 'activity' in that interval.
(Original image by Susie from Pixabay)

This lacks the excitement of popping corn, but could be a better model for gamma decay where the daughter nucleus is at a different energy after decay, but is otherwise unchanged.

Perhaps this version of the analogy could be improved by using a tray with an array of small dips (like tiny spot tiles) just the right size to stand corn kernels in the depressions with their points upwards. Then, after a very gentle tap on the bench next to the tile, those which have 'relaxed' from the higher energy state (i.e., fallen onto their sides) would be considered decayed. This would more directly model the change in potential energy and also avoid the need to keep removing kernels from the context (just as daughter atoms usually remain in a sample of radioactive material), as further gentle tapes are unlikely to excite them back to the higher energy state. 4

Or, dear reader, perhaps I've just been thinking about this analogy for just a little too long now.


Sources:

Notes

1 Referring to the nuclei before and after radioactive decay as 'parents' and 'daughters' seems metaphorical, but this use has become so well established (in effect, these are now technical terms) that these descriptors are now what are known (metaphorically!) as 'dead metaphors'.

Read about metaphors in science


2 Here are some examples I found:

Jennifer Wenner, University of Wisconsin-Oshkosh uses the demonstration in undergraduate geosciences:

"I usually perform it after I have introduced radioactive decay and talked about how it works. It only takes a few minutes and I usually talk while I am waiting for the "decay" to happen 'Using Popcorn to Simulate Radioactive Decay'"

https://serc.carleton.edu/quantskills/activities/popcorn.html

The Institute of Physics (IoP) include this activity as part of their 'Modelling decay in the laboratory Classroom Activity for 14-16' but suggest the pan lid is kept on as a safety measure. (Any teacher planing on carrying out any activity in the lab., should undertake a risk assessment first.)

I note the IoP also suggests care in using the term 'random':

Teacher: While we were listening to that there didn't seem to be any fixed pattern to the popping. Is there a word that we could use to describe that?

Lydia: Random?

Teacher: Excellent. But the word random has a very special meaning in physics. It isn't like how we think of things in everyday life. When do you use the word random in everyday life?

Lydia: Like if it's unpredictable? Or has no pattern?

https://spark.iop.org/modelling-decay-laboratory

Kieran Maher and 'Kikibooks contributors' suggests readers of their 'Basic Physics of Nuclear Medicine' could "think about putting some in in a pot, adding the corn, heating the pot…" and indeed their readers "might also like to try this out while considering the situation", but warn readers not to "push this popcorn analogy too far" (pp.20-21).


3 Here are some examples I found:

Florida High School teacher Marc Mayntz offers teachers' notes and student instructions for his 'Nuclear Popcorn' activity, where students are told to "Carefully 'spill' the kernels onto the table".

Chelsea Davis (a student?) reports her results in 'Half Life Popcorn Lab' from an approach where kernels are shaken in a Petri dish.

Redwood High School's worksheet for 'Radioactive Decay and Half Life Simulation' has students work with 100 kernels in a box with its sides labelled 1-4 (kernels that have the small end pointed toward side 1 after "a sharp, single shake (up and down, not side to side)" are considered decayed). Students are told at the start to to "Count the popcorn kernels to be certain there are exactly 100 kernels in your box".

This activity is repeated but with (i) kernels pointing to either side 1 or 2; and in a further run (ii) any of sides 1, 2, or 3; being considered decayed. This allows a graph to be drawn comparing all three sets of results.

The same approach is used in the Utah Education network's 'Radioactive Decay' activity, which specifies the use of a shoe box.

A site called 'Chegg' specified "a square box is filled with 100 popcorn kernels". and asked "What alteration in the experimental design would dramatically change the results? Why?" But, sadly, I needed to subscribe to see the answer.

The 'Lesson Planet' site offers 'Nuclear Popcorn' where "Using popcorn kernels spread over a tabletop, participants pick up all of those that point toward the back of the room, that is, those that represent decayed atoms".

'Anonymous' was set a version of this activity, but could not "seem to figure it out". 'Jiskha Homework Help' (tag line: "Ask questions and get helpful responses") helpfully responded,

"You ought to have a better number than 'two units of shake time…'

Read off the graph, not the data table."

(For some reason this brought to mind my sixth form mathematics teacher imploring us in desperation to "look at the ruddy diagram!")


4 Consider the challenge of developing this model to simulate nuclear magnetic resonance or laser excitation!


POEsing assessment questions…

…but not fattening the cow


Keith S. Taber


A well-known Palestinian proverb reminds us that we do not fatten the cow simply by repeatedly weighing it. But, sadly, teachers and others working in education commonly get so fixated on assessment that it seems to become an end in itself.


Images by Clker-Free-Vector-Images from PixabayOpenClipart-Vectors and Deedster from Pixabay

A research study using P-O-E

I was reading a report of a study that adopted the predict-observe-explain, P-O-E, technique as a means to elicit "high school students' conceptions about acids and bases" (Kala, Yaman & Ayas, 2013, p.555). As the name suggests, P-O-E asks learners to make a prediction before observing some phenomenon, and then to explain their observations (something that can be specially valuable when the predictions are based on strongly held intuitions which are contrary to what actually happens).

Read about Predict-Observe-Explain


The article on the publisher website

Kala and colleagues begin the introduction to their paper by stating that

"In any teaching or learning approach enlightened by constructivism, it is important to infer the students' ideas of what is already known"

Kala, Yaman & Ayas, 2013, p.555
Constructivism?

Constructivism is a perspective on learning that is informed by research into how people learn and a great many studies into student thinking and learning in science. A key point is how a learner's current knowledge and understanding influences how they make sense of teaching and what they go on to learn. Research shows it is very common for students to have 'alternative conceptions' of science topics, and often these conceptions either survive teaching or distort how it is understood.

The key point is that teachers who teach the science without regard to student thinking will often find that students retain their alternative ways of thinking, so constructivist teaching is teaching that takes into account and responds to the ideas about science topics that students bring to class.

Read about constructivism

Read about constructivist pedagogy

Assessment: summative, formative and diagnostic

If teachers are to take into account, engage with, and try to reshape, learners ideas about science topics, then they need to know what those ideas are. Now there is a vast literature reporting alternative conceptions in a wide range of science topics, spread across thousands or research reports – but no teacher could possibly find time to study them all. There are books which discuss many examples and highlight some of the most common alternative conceptions (including one of my own, Taber, 2014)



However, in any class studying some particular topic there will nearly always be a spread of different alternative conceptions across the students – including some so idiosyncratic that they have never been reported in any literature. So, although reading about common misconceptions is certainly useful to prime teachers for what to look out for, teachers need to undertake diagnostic assessment to find out about the thinking of their own particular students.

There are many resources available to support teachers in diagnostic assessment, and some activities (such as using concept cartoons) that are especially useful at revealing student thinking.

Read about diagnostic assessment

Diagnostic assessment, assessment to inform teaching, is carried out at the start of a topic, before the teaching, to allow teachers to judge the learners' starting points and any alternative conceptions ('misconceptions') they may have. It can therefore be considered aligned to formative assessment ('assessment for learning') which is carried out as part of the learning process, rather than summative assessment (assessment of leaning) which is used after studying to check, score, grade and certify learning.

P-O-E as a learning activity…

P-O-E can best support learning in topics where it is known learners tend to have strongly held, but unhelpful, intuitions. The predict stage elicits students' expectations – which, when contrary to the scientific account, can be confounded by the observe step. The 'cognitive conflict' generated by seeing something unexpected (made more salient by having been asked to make a formal prediction) is thought to help students concentrate on that actual phenomena, and to provide 'epistemic relevance' (Taber, 2015).

Epistemic relevance refers to the idea that students are learning about things they are actually curious about, whereas for many students following a conventional science course must be experienced as being presented with the answers to a seemingly never-ending series questions that had never occurred to them in the first place.

Read about the Predict-Observe-Explain technique

Students are asked to provide an explanation for what they have observed which requires deeper engagement than just recording an observation. Developing explanations is a core scientific practice (and one which is needed before another core scientific practice – testing explanations – is possible).

Read about teaching about scientific explanations

To be most effective, P-O-E is carried out in small groups, as this encourages the sharing, challenging and justifying of ideas: the kind of dialogic activity thought to be powerful in supporting learners in developing their thinking, as well as practicing their skills in scientific argumentation. As part of dialogic teaching such an open-forum for learners' ideas is not an end in itself, but a preparatory stage for the teacher to marshal the different contributions and develop a convincing argument for how the best account of the phenomenon is the scientific account reflected in the curriculum.

Constructivist teaching is informed by learners' ideas, and therefore relies on their elicitation, but that elicitation is never the end in itself but is a precursor to a customised presentation of the canonical account.

Read about dialogic teaching and learning

…and as a diagnostic activity

Group work also has another function – if the activity is intended to support diagnostic assessment, then the teacher can move around the room listening in to the various discussions and so collecting valuable information on what students think and understand. When assessment is intended to inform teaching it does not need to be about students completing tests and teachers marking them – a key principle of formative assessment is that it occurs as a natural part of the teaching process. It can be based on productive learning activities, and does not need marks or grades – indeed as the point is to help students move on in their thinking, any kind of formal grading whilst learning is in progress would be inappropriate as well as a misuse of teacher time.

Probing students' understandings about acid-base chemistry

The constructivist model of learning applies to us all: students, teachers, professors, researchers. Given what I have written above about P-O-E, about diagnostic assessment, and dialogic approaches to learning, I approached Kala and colleagues' paper with expectations about how they would have carried out their project.

These authors do report that they were able to diagnose aspects of student thinking about acids and bases, and found some learning difficulties and alternative conceptions,

"it was observed that eight of the 27 students had the idea that the "pH of strong acids is the lowest every time," while two of the 27 students had the idea that "strong acids have a high pH." Furthermore, four of the 27 students wrote the idea that the "substance is strong to the extent to which it is burning," while one of the 27 students mentioned the idea that "different acids which have equal concentration have equal pH."

Kala, Yaman & Ayas, 2013, pp.562-3

The key feature seems to be that, as reported in previous research, students conflate acid concentration and acid strength (when it is possible to have a high concentration solution of a weak acid or a very dilute solution of a strong acid).

Yet some aspects of this study seemed out of alignment with the use of P-O-E.

The best research style?

One feature was the adoption of a positivistic approach to the analysis,

Although there has been no reported analyzing procedure for the POE, in this study, a different [sic] analyzing approach was offered taking into account students' level of understanding… Data gathered from the written responses to the POE tasks were analyzed and divided into six groups. In this context, while students' prediction were divided into two categories as being correct or wrong, reasons for predictions were divided into three categories as being correct, partially correct, or wrong.

Kala, Yaman & Ayas, 2013, pp.560


GroupPredictionReasons
correctcorrect
correctpartially correct
correctwrong
wrongcorrect
wrongpartially correct
wrongwrong
"the written responses to the POE tasks were analyzed and divided into six groups"

There is nothing inherently wrong with doing this, but it aligns the research with an approach that seems at odds with the thinking behind constructivist studies that are intended to interpret a learner's thinking in its own terms, rather than simply compare it with some standard. (I have explored this issue in some detail in a comparison of two research studies into students' conceptions of forces – see Taber, 2013, pp.58-66.)

In terms of research methodology we might say it seem to be conceptualised within the 'wrong' paradigm for this kind of work. It seems positivist (assuming data can be unambiguously fitted into clear categories), nomothetic (tied to 'norms' and canonical answers) and confirmatory (testing thinking as matching model responses or not), rather than interpretivist (seeking to understand student thinking in its own terms rather than just classifying it as right or wrong), idiographic (acknowledging that every learner's thinking is to some extent unique to them) and discovery (exploring nuances and sophistication, rather than simply deciding if something is acceptable or not).

Read about paradigms in educational research

The approach used seemed more suitable for investigating something in the science laboratory, than the complex, interactive, contextualised, and ongoing life of classroom teaching. Kala and colleagues describe their methodology as case study,

"The present study used a case study because it enables the giving of permission to make a searching investigation of an event, a fact, a situation, and an individual or a group…"

Kala, Yaman & Ayas, 2013, pp.558
A case study?

Case study is a naturalistc methodology (rather than involving an intervention, such as an experiment), and is idiographic, reflecting the value of studying the individual case. The case is one from among many instances of its kind (one lesson, one school, one examination paper, etc.), and is considered as a somewhat self contained entity yet one that is embedded in a context in which it is to some extent entangled (for example, what happens in a particular lesson is inevitably somewhat influenced by

  • the earlier sequence of lessons that teacher taught that class {the history of that teacher with that class},
  • the lessons the teacher and student came from immediately before this focal lesson,
  • the school in which it takes place,
  • the curriculum set out to be followed…)

Although a lesson can be understood as a bounded case (taking place in a particular room over a particular period of time involving a specified group of people) it cannot be isolated from the embedding context.

Read about case study methodology


Case study – study of one instance from among many


As case study is idiographic, and does not attempt to offer direct generalisation to other situations beyond that case, a case study should be reported with 'thick description' so a reader has a good mental image of the case (and can think about what makes it special – and so what makes it similar to, or different from, other instances the reader may be interested in). But that is lacking in Kala and colleagues' study, as they only tell readers,

"The sample in the present study consisted of 27 high school students who were enrolled in the science and mathematics track in an Anatolian high school in Trabzon, Turkey. The selected sample first studied the acid and base subject in the middle school (grades 6 – 8) in the eighth year. Later, the acid and base topic was studied in high school. The present study was implemented, based on the sample that completed the normal instruction on the acid and base topic."

Kala, Yaman & Ayas, 2013, pp.558-559

The reference to a sample can be understood as something of a 'reveal' of their natural sympathies – 'sample' is the language of positivist studies that assume a suitably chosen sample reflects a wider population of interest. In case study, a single case is selected and described rather than a population sampled. A reader is left to rather guess what population being sampled here, and indeed precisely what the 'case' is.

Clearly, Kala and colleagues elicited some useful information that could inform teaching, but I sensed that their approach would not have made optimal use of a learning activity (P-O-E) that can give insight into the richness, and, sometimes, subtlety of different students' ideas.

Individual work

Even more surprising was the researchers' choice to ask students to work individually without group discussion.

"The treatment was carried out individually with the sample by using worksheets."

Kala, Yaman & Ayas, 2013, p.559

This is a choice which would surely have compromised the potential of the teaching approach to allow learners to explore, and reveal, their thinking?

I wondered why the researchers had made this choice. As they were undertaking research, perhaps they thought it was a better way to collect data that they could readily analyse – but that seems to be choosing limited data that can be easily characterised over the richer data that engagement in dialogue would surely reveal?

Assessment habits

All became clear near the end of the study when, in the final paragraph, the reader is told,

"In the present study, the data collection instruments were used as an assessment method because the study was done at the end of the instruction/ [sic] on the acid and base topics."

Kala, Yaman & Ayas, 2013, p.571

So, it appears that the P-O-E activity, which is an effective way of generating the kind of rich but complex data that helps a teacher hone their teaching for a particular group, was being adopted, instead, as means of a summative assessment. This is presumably why the analysis focused on the degree of match to the canonical science, rather than engaging in interpreting the different ways of thinking in the class. Again presumably, this is why the highly valuable group aspect of the approach was dropped in favour of individual working – summative assessment needs to not only grade against norms, but do this on the basis of each individual's unaided work.

An activity which offers great potential for formative assessment (as it is a learning activity as well as a way of exploring student thinking); and that offers an authentic reflection of scientific practice (where ideas are presented, challenged, justified, and developed in response to criticism); and that is generally enjoyed by students because it is interactive and the predictions are 'low stakes' making for a fun learning session, was here re-purposed to be a means of assessing individual students once their study of a topic was completed.

Kala and colleagues certainly did identify some learning difficulties and alternative conceptions this way, and this allowed them to evaluate student learning. But I cannot help thinking an opportunity was lost here to explore how P-O-E can be used in a formative assessment mode to inform teaching:

Yes, I agree that "in any teaching or learning approach enlightened by constructivism, it is important to infer the students' ideas of what is already known", but the point of that is to inform the teaching and so support student learning. What were Kala and colleagues going to do with their inferences about students ideas when they used the technique as "an assessment method … at the end of the instruction".

As the Palestinian adage goes, you do not fatten up the cow by weighing it, just as you do not facilitate learning simply by testing students. To mix my farmyard allusions, this seems to be a study of closing the barn door after the horse has already bolted.


Work cited

Neuroadaptation gremlins on the see-saw in your brain

The brain's reward pathway is like a teeter-totter because…


Keith S. Taber


in your brain there is a teeter-totter like in a kids' playground…these neuroadaptation gremlins hopping on the pain side of the balance…the gremlins hop off …But if we …accumulate so many gremlins on the pain side of our balance …we've crossed over into the disease of addiction…we are craving because …it's the gremlins jumping up and down

Dr Anna Lembke talking on 'All in the mind'

Is there a see-saw in your brain?
(Original images by Image by mohamed Hassan and OpenClipart-Vectors from Pixabay) 

Dr Anna Lembke, Professor of Psychiatry at Stanford University explained how addiction relates to dopmaine and the brain's reward pathway with an analogy of a see-saw. She was talking to Sana Qadar for an episode of the the ABC programme 'All in the Mind' called 'How dopamine drives our addictions'.

Analogies are used in teaching and in science communication to help 'make the unfamiliar familiar', to show someone that something they do not (yet) know about is actually, in some sense at least, a bit like something they are already familiar with. In an analogy, there is a mapping between some aspect(s) of the structure of the target ideas and the structure of the familiar phenomenon or idea being offered as an analogue. Such teaching analogies can be useful to the extent that someone is indeed highly familiar with the 'analogue' (and more so than with the target knowledge being communicated); that there is a helpful mapping across between the analogue and the target; and that comparison is clearly explained (making clear which features of the analogue are relevant, and how).

Read about science analogies

A fried brain

During the programme the interviewer (Qadar) uses a metaphor for how addiction influences brain chemistry:

Sana Qadar: "The problem is when we are becoming addicted to something, our brain's ability to naturally produce dopamine gets fried."

Anna Lembke: "So essentially what happens in the brain as we tip toward the compulsive cycle of overconsumption or addiction is that we start to down-regulate our own dopamine production and dopamine transmission in order to compensate for the ways that we are bombarding our brain's reward pathway with too much dopamine through ingestion of these incredibly potent and rewarding substances and behaviours."

What does Sana Qadar mean by 'fried'? Presumably not destroyed, as the subsequent interview suggests that recovery is possible (see below) – although the brain's ability to naturally produce dopamine would surely not recover from actual frying. So, perhaps, fried means disturbed, or damaged? Given the following dialogue it might mean thrown out of balance.

Perhaps Qadar was thinking of the brain as circuitry as the term is commonly applied to damaged circuits (I think the term derives from damage caused by overheating, as can happen when there is a 'short' for example, which does stop the 'fried' components functioning permanently). So, perhaps for Qadar this is a dead metaphor – a term which started as a metaphor but which, with habitual use, has come to be treated as having literal meaning – at least in relation to electrical circuits and, by analogy, brain circuitry?


A fried brain?
(Images by OpenClipart-Vectors and y Roger YI from Pixabay)

Balancing those gremlins

What I found especially interesting is the way Dr Lembke made extensive use of an analogy in her explanation, much in the way teacher might keep referring back to the same metaphor or analogy or model when introducing an abstract topic.

Sana Qadar tells listeners that "to explain how this process unfolds in the brain's reward pathway, Dr Lembke uses the analogy of a teeter-totter or seesaw":

"Because pleasure and pain are processed in the same part of the brain and work like opposite sides of the balance, it means for every pleasure there is a cost and that cost is pain.

So, if you imagine that in your brain there is a teeter-totter like in a kids' playground, that teeter-totter will tip to one side when we experience pleasure, and the opposite side when we experience pain.

But no sooner has that balance tipped to the side of pleasure, for example when I eat a piece of chocolate, then my brain will work very hard to restore a level balance or what neuroscientists call homeostasis. And it does that not just by bringing the balance level again but first by tipping it an equal and opposite amount to the to the side of pain, that is the after-effect, the come-down. I imagine that as these neuroadaptation gremlins hopping on the pain side of the balance to bring it level again.

Now, if we wait long enough, the gremlins hop off and homeostasis is restored as we go back to our baseline tonic level of dopamine firing. But if we continue to ingest addictive substances or behaviours over very long periods of time, we essentially accumulate so many gremlins on the pain side of our balance that we are in a chronic dopamine deficit state, and that is essentially where we get when we've crossed over into the disease of addiction.

Dr Anna Lembke talking on 'All in the mind'

As part of the programme of treatment Dr Lembke has developed for those suffering from additions she often recommends a period of complete abstinence – asking her clients to abstain for at least 30 days

Because 30 days is about the minimum amount of time it takes for the brain to restore baseline dopamine firing. Another way of saying this is 30 days is about the minimum amount of time it takes for the gremlins to hop off the pain side of the balance so that homeostasis or balance can be restored.

… I think in many people it is possible with abstinence to reset the reward pathway, the brain has an enormous amount of plasticity.

Dr Anna Lembke talking on 'All in the mind'

Abstinence is obviously not easy when the person is constantly faced with relevant triggers, as

"…what happens when we are triggered is that we release a little bit of dopamine in the reward pathway. … But if we wait long enough, those gremlins will hop off the pain side of the balance, and balance is restored….we are craving because we are in a dopamine deficit state, it's the gremlins jumping up and down on the pain side of the balance. But if we can just wait a few more moments, they will get off, homeostasis will be restored and that feeling will pass."

"…It's a fine line between pleasure and pain
You've done it once you can do it again
Whatever you've done don't try to explain
It's a fine, fine line between pleasure and pain.."

From the lyrics of the song 'Pleasure and Pain' (covered by Manfred Mann's Earth Band), by Holly Knight & Michael Donald Chapman

It's a fine line between pleasure and pain

Sana Qadar suggests that certain kinds of pain can actually be good for us. From a biological perspective this is clearly so, as pain provides signals to motivate us to change our behaviour (move away from the fire, put down the very heavy object), but that is not what she is referring to. Rather, that "Dr Lembke says it has to do with the fact that pleasure and pain are processed in the same part of the brain":

Well, just like when we press on the pleasure side of the balance the gremlins hop on the pain side and ultimately shift our hedonic setpoint or our joy setpoint to the side of pain, it's also true that when we press on the pain side of the balance, so we intentionally invite psychologically or physically painful experiences into our lives, that those neuroadaptation gremlins will then hop on the pleasure side of the balance and we will start to up-regulate our own endogenous dopamine, not as the primary response to the stimulus but as the after-response

…I'm absolutely not talking about extreme forms of pain like cutting [which is] not a healthy way to get dopamine

…[For example] Michael was somebody who was addicted to cocaine and alcohol, and got into recovery and immediately experienced the dopamine deficit state, those gremlins on the pain side of the balance, he was anxious, he was irritable, and he also felt very numb, kind of an absence of emotions, which was really scary for him.

And he serendipitously discovered in early recovery that if he took a very cold shower, that created for him the same kinds of feelings, in a muted way, that he used to get from drugs, so he got into this practice of every morning taking a cold shower, and it worked great for him."

Dr Anna Lembke talking on 'All in the mind'

Gremlins, indeed?

Of course there is no see-saw in the brain, but a see-saw is a familiar everyday object that people understand can be balanced – or not. And that if more children (of similar size) load up one side than the other it will be out of balance – and it will only level up once the loads are balanced.

Strictly, there are some complications here with the analogue. If the children are at different distances from the fulcrum that will change their turning effect (so two children could balance one of similar mass according to where they are positioned). Similarly, when the moments are balanced the see-saw will not necessarily be level: as 'balance' means no overall turning effect. So, if the see-saw was already at an angle to the horizontal, loading it up in a balanced way should not shift it back to being level.

Perhaps there is something comparable in the reward system to whereabouts children sit on the see-saw – (perhaps some synapses are more sensitive to the effects of dopamine than others?), but this would be over-complicating an analogy that is intended to offer a link to a simple everyday phenomenon.

Are gremlins like children – do they come in different sizes? Perhaps it seems a little childish even to talk of such things in the brain. But there was once a strong (if discouraged these days 1) tradition of considering a homunculus, a little observer, inside the brain as if in a control room. Moreover, if the lauded physicist James Clerk Maxwell could invoke his famous demon to explain aspects of thermodynamics, we should not censure Lembke's metaphorical gremlins.

If this comparison was being used as a teaching analogy in a formal course, then we might want a more careful setting out of the positive and negative aspects of the analogy (those things that do, and do not, map across from the see-saw to the reward system). But Dr Lembke is not trying to teach her clients to pass tests about brain science, but rather give them a way of thinking about their problems that can help them plan and change behaviour – that is, a useful and straightforward model they can apply in overcoming their addictions.


An episode of the radio progrmme/podcast 'All in the mind'

To find out more

Prof. Lembke was talking about a very important topic and here I have only abstracted particular comments to illustrate her use of the analogy. For a fuller account of the topic, and in particular Prof. Lembke's clinical work to help people struggling with addiction, please refer to the full interview.


Work cited:

Note

1 The term is still use, but in a somewhat different sense:

"in neuroanatomy, the cortical homunculus represents either the motor or the sensory distribution along the cerebral cortex of the brain. The motor homunculus is a topographic representation of the body parts and its correspondents along the precentral gyrus of the frontal lobe. While the sensory homunculus is a topographic representation of the body parts along the postcentral gyrus of the parietal lobe."

Nguyen and Duong, 2021

So, nowadays we each have two 'little men' in our brains.

COVID is like photosynthesis because…

An analogy based on a science concept


Keith S. Taber


Photosynthesis illuminating a plant?
(Image by OpenClipart-Vectors from Pixabay)

Analogies, metaphors and similes are used in communication to help make the unfamiliar familiar by suggesting that some novel idea or phenomena being introduced is in some ways like something the reader/listener is already familiar with. Analogies, metaphors and similes are commonly used in science teaching, and also in science writing and journalism.

An analogy maps out similarities in structure between two phenomena or concepts. This example, from a radio programme, compared the COVID pandemic with photosynthesis.

Read about science analogies

Photosynthesis and the pandemic

Professor Will Davies of Goldsmiths, University of London suggested that:

"So, what we were particularly aiming to do, was to understand the collision between a range of different political economic factors of a pre-2020 world, and how they were sort of reassembled and deployed to cope with something which was without question unprecedented.

We used this metaphor of photosynthesis because if you think about photosynthesis in relation to plants, the sun both lights things up but at the same time it feeds them and helps them to grow, and I think one of the things the pandemic has done for social scientists is to serve both as a kind of illumination of things that previously maybe critical political economists and heterodox scholars were pointing to but now became very visible to the mainstream media and to mainstream politics. But at the same time it also accentuated and deepened some of those tendencies such as our reliance on various digital platforms, certain gender dynamics of work in the household, these sort of things that became acute and undeniable and potentially politicised over the course of 2020, 2021."

Prof. Will Davies, talking on 'Thinking Allowed' 1

This image has an empty alt attribute; its file name is Screenshot-2022-06-12-at-21.47.47.png
Will Davies, Professor in Political Economy at Goldsmiths, University of London was talking to sociologist Prof. Laurie Taylor who presents the BBC programme 'Thinking Aloud' as part of an episode called 'Covid and change'

A scientific idea used as analogue

Prof. Davies refers to using "this metaphor of photosynthesis". However he goes on to suggest how the two things he is comparing are structurally similar – the pandemic has shone a light on social issues at the same time as providing the conditions for them to become more extreme, akin to how light both illuminates plants and changes them. A metaphor is an implicit comparison where the reader/listener is left to interpret the comparison, but a metaphor or simile that is explicitly developed to explain the comparison can become an analogy.

Read about science metaphors

Often science concepts are introduced by analogy to more familiar everyday ideas, objects or events. Here, however, a scientific concept, photosynthesis is used as the analogue – the source used to explain something novel. Prof. Davies assumes listeners will be familiar enough with this science concept for it to helpful in introducing his research.

Mischaracterising photosynthesis?

A science teacher might not like the notion that the sun feeds plants – indeed if a student suggested this in a science class it would likely be judged as an alternative conception. In photosynthesis, carbon dioxide (from the atmosphere) and water (usually absorbed from the soil) provide the starting materials, and the glucose that is produced (along with oxygen) enables other processes – such as growth which relies on other substances also being absorbed from the soil. (So-called 'plant foods', which would be better characterised as plant nutritional supplements, contain sources of elements such as nitrogen, phosphorus and potassium). Light is necessary for photosynthesis, but the sunlight is not best considered 'food'.

One might also argue that Prof. Davies has misidentified the source for his analogy, and perhaps he should rather have suggested sunlight as the source metaphor for his comparison as sunlight both illuminates plants and enables them to grow. Photosynthesis takes place inside chloroplasts within a plant's tissues, and does not illuminate the plant. However, Prof. Davies' expertise is in political economy, not natural science, and it was good to see a social scientist looking to use a scientific idea to explain his research.


Baking fresh electrons for the science doughnut

Faster-than-light electrons race from a sitting start and are baked to give off light brighter than millions of suns that can be used to image tiny massage balls: A case of science communication


Keith S. Taber

(The pedantic science teacher)


Ockham's razor

Ockham's razor (also known as Occam's razor) is a principle that is sometimes applied as a heuristic in science, suggesting that explanations should not be unnecessarily complicated. Faced with a straightforward explanation, and an alternative convoluted explanation, then all other things being equal we should prefer the former – not simply accept it, but to treat is as the preferred hypothesis to test out first.

Ockham's Razor is also an ABC radio show offering "a soap box for all things scientific, with short talks about research, industry and policy from people with something thoughtful to say about science". The show used to offer recorded essays (akin to the format of BBC's A Point of View), but now tends to record short live talks.

I've just listened to an episode called The 'science donut' – in fact I listened several time as I thought it was fascinating – as in a few minutes there was much to attend to.


The 'Science Donut': a recent episode of Ockham's Razor

I approached the episode as someone with an interest in science, of course, but also as an educator with an ear to the ways in which we communicate science in teaching. Teachers do not simply present sequences of information about science, but engage pedagogy (i.e., strategies and techniques to support learning). Other science communicators (whether journalists, or scientists themselves directly addressing the public) use many of the same techniques. Teaching conceptual material (such as science principles, theories, models…) can be seen as making the unfamiliar familiar, and the constructivist perspective on how learning occurs suggests this is supported by showing the learner how that which is currently still unfamiliar, is in some way like something familiar, something they already have some knowledge/experience of.

Science communicators may not be trained as teachers, so may sometimes be using these techniques in a less considered or even less deliberate manner. That is, people use analogy, metaphor, simile, and so forth, as a normal part of everyday talk to such an extent that these tropes may be generated automatically, in effect, implicitly. When we are regularly talking about an area of expertise we almost do not have to think through what we are going to say. 1

Science communicators also often have much less information about their audience than teachers: a radio programme/podcast, for example, can be accessed by people of a wide range of background knowledge and levels of formal qualifications.

One thing teachers often learn to do very early in their careers is to slow down the rate of introducing new information, and focus instead on a limited number of key points they most want to get across. Sometimes science in the media is very dense in the frequency of information presented or the background knowledge being drawn upon. (See, for example, 'Genes on steroids? The high density of science communication'.)

A beamline scientist

Dr Emily Finch, who gave this particular radio talk, is a beamline scientist at the Australian Synchrotron. Her talk began by recalling how her family visited the Synchrotron facility on an open day, and how she later went on to work there.

She then gave an outline of the functioning of the synchrotron and some examples of its applications. Along the way there were analogies, metaphors, anthropomorphism, and dubiously fast electrons.

The creation of the god particle

To introduce the work of the particle accelerator, Dr Finch reminded her audience of the research to detect the Higgs boson.

"Do you remember about 10 years ago scientists were trying to make the Higgs boson particle? I see some nods. They sometimes call it the God particle and they had a theory it existed, but they had not been able to prove it yet. So, they decided to smash together two beams of protons to try to make it using the CERN large hadron collider in Switzerland…You might remember that they did make a Higgs boson particle".

This is a very brief summary of a major research project that involved hundreds of scientists and engineers from a great many countries working over years. But this abbreviation is understandable as this was not Dr Finch's focus, but rather an attempt to link her actual focus, the Australian Synchrotron, to something most people will already know something about.

However, aspects of this summary account may have potential to encourage the development of, or reinforce an existing, common alternative conception shared by many learners. This is regarding the status of theories.

In science, theories are 'consistent, comprehensive, coherent and extensively evidenced explanations of aspects of the natural world', yet students often understand theories to be nothing more than just ideas, hunches, guesses – conjectures at best (Taber, Billingsley, Riga & Newdick, 2015). In a very naive take on the nature of science, a scientist comes up with an idea ('theory') which is tested, and is either 'proved' or rejected.

This simplistic take is wrong in two regards – something does not become an established scientific theory until it is supported by a good deal of evidence; and scientific ideas are not simply proved or disproved by testing, but rather become better supported or less credible in the light of the interpretation of data. Strictly scientific ideas are never finally proved to become certain knowledge, but rather remain as theories. 2

In everyday discourse, people will say 'I have a theory' to mean no more that 'I have a suggestion'.
A pedantic scientist or science teacher might be temped to respond:
"no you don't, not yet,"

This is sometimes not the impression given by media accounts – presumably because headlines such as 'research leads to scientist becoming slightly more confident in theory' do not have the same impact as 'cure found', 'discovery made, or 'theory proved'.

Read about scientific certainty in the media

The message that could be taken away here is that scientists had the idea that Higgs boson existed, but they had not been able to prove it till they were able to make one. But the CERN scientists did not have a Higgs boson to show the press, only the data from highly engineered detectors, analysed through highly complex modelling. Yet that analysis suggested they had recorded signals that closely matched what they expected to see when a short lived Higgs decayed allowing them to conclude that it was very likely one had been formed in the experiment. The theory motivating their experiment was strongly supported – but not 'proved' in an absolute sense.

The doughnut

Dr Finch explained that

"we do have one of these particle accelerators here in Australia, and it's called the Australian Synchrotron, or as it is affectionately known the science donut

…our synchrotron is a little different from the large hadron collider in a couple of main ways. So, first, we just have the one beam instead of two. And second, our beam is made of electrons instead of protons. You remember electrons, right, they are those tiny little negatively charged particles and they sit in the shells around the atom, the centre of the atom."

Dr Emily Finch talking on Ockham's Razor

One expects that members of the audience would be able to respond to this description and (due to previous exposure to such representations) picture images of atoms with electrons in shells. 'Shells' is of course a kind of metaphor here, even if one which with continual use has become a so-called 'dead metaphor'. Metaphor is a common technique used by teachers and other communicators to help make the unfamiliar familiar. In some simplistic models of atomic structure, electrons are considered to be arranged in shells (the K shell, the L shell, etc.), and a simple notation for electronic configuration based on these shells is still often used (e.g., Na as 2.8.1).

Read about science metaphors

However, this common way of talking about shells has the potential to mislead learners. Students can, and sometimes do, develop the alternative conception that atoms have actual physical shells of some kind, into which the electrons are located. The shells scientists refer to are abstractions, but may be misinterpreted as material entities, as actual shells. The use of anthropomorphic language, that is that the electrons "sit in the shells", whilst helping to make the abstract ideas familiar and so perhaps comfortable, can reinforce this. After all, it is difficult to sit in empty space without support.

The subatomic grand prix?

Dr Finch offers her audience an analogy for the synchrotron: the electrons "are zipping around. I like to think of it kind of like a racetrack." Analogy is another common technique used by teachers and other communicators to help make the unfamiliar familiar.

Read about science analogies

Dr Finch refers to the popularity of the Australian Formula 1 (F1) Grand Prix that takes place in Melbourne, and points out

"Now what these race enthusiasts don't know is that just a bit further out of the city we have a race track that is operating six days a week that is arguably far more impressive.

That's right, it is the science donut. The difference is that instead of having F1s doing about 300 km an hour, we have electrons zipping around at the speed of light. That's about 300 thousand km per second.

Dr Emily Finch talking on Ockham's Razor

There is an interesting slippage – perhaps a deliberate rhetoric flourish – from the synchrotron being "kind of like a racetrack" (a simile) to being "a race track" (a metaphor). Although racing electrons lacks a key attraction of an F1 race (different drivers of various nationalities driving different cars built by competing teams presented in different livery – whereas who cares which of myriad indistinguishable electrons would win a race?) that does not undermine the impact of the mental imagery encouraged by this analogy.

This can be understood as an analogy rather than just a simile or metaphor as Dr Finch maps out the comparison:


target conceptanalogue
a synchotrona racetrack
operates six days a week[Many in the audience would have known that the Melbourne Grand Prix takes place on a 'street circuit' that is only set up for racing one weekend each year.]
racing electronsracing 'F1s' (i.e., Grand Prix cars)
at the speed of light at about 300 km an hour
An analogy between the Australian Synchrotron and the Melbourne Grand Prix circuit

So, here is an attempt to show how science has something just like the popular race track, but perhaps even more impressive – generating speeds orders of magnitude greater than even Lewis Hamilton could drive.

They seem to like their F1 comparisons at the Australian Synchrotron. I found another ABC programme ('The Science Show') where Nobel Laureate "Brian Schmidt explains, the synchrotron is not being used to its best capability",

"the analogy here is that we invested in a $200 million Ferrari and decided that we wouldn't take it out of first gear and do anything other than drive it around the block. So it seems a little bit of a waste"

Brian Schmidt (Professor of Astronomy, and Vice Chancellor, at Australian National University)

A Ferrari being taken for a spin around the block in Melbourne (Image by Lee Chandler from Pixabay )

How fast?

But did Dr Finch suggest there that the electrons were travelling at the speed of light? Surely not? Was that a slip of the tongue?

"So, we bake our electrons fresh in-house using an electron gun. So, this works like an old cathode ray tube that we used to have in old TVs. So, we have this bit of tungsten metal and we heat it up and when it gets red hot it shoots out electrons into a vacuum. We then speed up the electrons, and once they leave the electron gun they are already travelling at about half the speed of light. We then speed them up even more, and after twelve metres, they are already going at the speed of light….

And it is at this speed that we shoot them off into a big ring called the booster ring, where we boost their energy. Once their energy is high enough we shoot them out again into another outer ring called the storage ring."

Dr Emily Finch talking on Ockham's Razor

So, no, the claim is that the electrons are accelerated to the speed of light within twelve metres, and then have their energy boosted even more.

But this is contrary to current physics. According to the currently accepted theories, and specifically the special theory of relativity, only entities which have zero rest mass, such as photons, can move at the speed of light.

Electrons have a tiny mass by everyday standards (about 0.000 000 000 000 000 000 000 000 001 g), but they are still 'massive' particles (i.e., particles with mass) and it would take infinite energy to accelerate a single tiny electron to the speed of light. So, given our current best understanding, this claim cannot be right.

I looked to see what was reported on the website of the synchrotron itself.

The electron beam travels just under the speed of light – about 299,792 kilometres a second.

https://www.ansto.gov.au/research/facilities/australian-synchrotron/overview

Strictly the electrons do not travel at the speed of light but very nearly the speed of light.

The speed of light in a vacuum is believed to be 299 792 458 ms-1 (to the nearest metre per second), but often in science we are working to limited precision, so this may be rounded to 2.998 ms-1 for many purposes. Indeed, sometimes 3 x 108 ms-1 is good enough for so-called 'back of the envelope' calculations. So, in a sense, Dr Finch was making a similar approximation.

But this is one approximation that a science teacher might want to avoid, as electrons travelling at the speed of light may be approximately correct, but is also thought to be physically impossible. That is, although the difference in magnitude between

  • (i) the maximum electron speeds achieved in the synchrotron, and
  • (ii) the speed of light,

might be a tiny proportional difference – conceptually the distinction is massive in terms of modern physics. (I imagine Dr Finch is aware of all this, but perhaps her background in geology does not make this seem as important as it might appear to a physics teacher.)

Dr Finch does not explicitly say that the electrons ever go faster than the speed of light (unlike the defence lawyer in a murder trial who claimed nervous impulses travel faster than the speed of light) but I wonder how typical school age learners would interpret "they are already going at the speed of light….And it is at this speed that we shoot them off into a big ring called the booster ring, where we boost their energy". I assume that refers to maintaining their high speeds to compensate for energy transfers from the beam: but only because I think Dr Finch cannot mean accelerating them beyond the speed of light. 3

The big doughnut

After the reference to how "we bake our electrons fresh in-house", Dr Finch explains

And so it is these two rings, these inner and outer rings, that give the synchrotron its nick name, the science donut. Just like two rings of delicious baked electron goodness…

So, just to give you an idea of scale here, this outer ring, the storage ring, is about forty one metres across, so it's a big donut."

Dr Emily Finch talking on Ockham's Razor
A big doughnut? The Australian Synchrotron (Source Australia's Nuclear Science and Technology Organisation)

So, there is something of an extended metaphor here. The doughnut is so-called because of its shape, but this doughnut (a bakery product) is used to 'bake' electrons.

If audience members were to actively reflect on and seek to analyse this metaphor then they might notice an incongruity, perhaps a mixed metaphor, as the synchrotron seems to shift from being that which is baked (a doughnut) to that doing the baking (baking the electrons). Perhaps the electrons are the dough, but, if so, they need to go into the oven.

But, of course, humans implicitly process language in real time, and poetic language tends to be understood intuitively without needing reflection. So, a trope such as this may 'work' to get across the flavour (sorry) of an idea, even if under close analysis (by our pedantic science teacher again) the metaphor appears only half-baked.

Perverting the electrons

Dr Finch continued

"Now the electrons like to travel in straight lines, so to get them to go round the rings we have to bend them using magnets. So, we defect the electrons around the corners [sic] using electromagnetic fields from the magnets, and once we do this the electrons give off a light, called synchrotron light…

Dr Emily Finch talking on Ockham's Razor

Now electrons are not sentient and do not have preferences in the way that someone might prefer to go on a family trip to the local synchrotron rather than a Formula 1 race. Electrons do not like to go in straight lines. They fit with Newton's first law – the law of inertia. An electron that is moving ('travelling') will move ('travel') in a straight line unless there is net force to pervert it. 4

If we describe this as electrons 'liking' to travel in straight lines it would be just as true to say electrons 'like' to travel at a constant speed. Language that assigns human feelings and motives and thoughts to inanimate objects is described as anthropomorphic. Anthropomorphism is a common way of making the unfamiliar familiar, and it is often used in relation to molecules, electrons, atoms and so forth. Sadly, when learners pick up this kind of language, they do not always appreciate that it is just meant metaphorically!

Read about anthropomorphism

The brilliant light

Dr Finch tells her audience that

"This synchrotron light is brighter than a million suns, and we capture it using special equipment that comes off that storage ring.

And this equipment will focus and tune and shape that beam of synchrotron light so we can shoot it at samples like a LASER."

Dr Emily Finch talking on Ockham's Razor

Whether the radiation is 'captured' is a moot point, as it no longer exists once it has been detected. But what caught my attention here was the claim that the synchrotron radiation was brighter than a million suns. Not because I necessarily thought this bold claim was 'wrong', but rather I did not understand what it meant.

The statement seems sensible at first hearing, and clearly it means qualitatively that the radiation is very intense. But what did the quantitative comparison actually mean? I turned again to the synchrotron webpage. I did not find an answer there, but on the site of a UK accelerator I found

"These fast-moving electrons produce very bright light, called synchrotron light. This very intense light, predominantly in the X-ray region, is millions of times brighter than light produced from conventional sources and 10 billion times brighter than the sun."

https://www.diamond.ac.uk/Home/About/FAQs/About-Synchrotrons.html#

Sunlight spreads out and its intensity drops according to an inverse square law. Move twice as far away from a sun, and the radiation intensity drops to a quarter of what it was when you were closer. Move to ten times as far away from the sun than before, and the intensity is 1% of what it was up close.

The synchrotron 'light' is being shaped into a beam "like a LASER". A LASER produces a highly collimated beam – that is, the light does not (significantly) spread out. This is why football hooligans choose LASER pointers rather than conventional torches to intimidate players from a safe distance in the crowd.

Comparing light with like

This is why I do not understand how the comparison works, as the brightness of a sun depends how close you are too it – a point previously discussed here in relation to NASA's Parker solar probe (NASA puts its hand in the oven). If I look out at the night sky on a clear moonlight night then surely I am exposed to light from more "than a million suns" but most of them are so far away I cannot even make them out. Indeed there are faint 'nebulae' I can hardly perceive that are actually galaxies shining with the brightness of billions of suns. 5 If that is the comparison, then I am not especially impressed by something being "brighter than a million suns".


How bright is the sun? it depends which planet you are observing from. (Images by AD_Images and Gerd Altmann from Pixabay)


We are told not to look directly at the sun as it can damage our eyes. But a hypothetical resident of Neptune or Uranus could presumably safely stare at the sun (just as we can safely stare at much brighter stars than our sun because they are so far away). So we need to ask :"brighter than a million suns", as observed from how far away?


How bright is the sun? That depends on viewing conditions
(Image by UteHeineSch from Pixabay)

Even if referring to our Sun as seen from the earth, the brightness varies according to its apparent altitude in the sky. So, "brighter than a million suns" needs to be specified further – as perhaps "more than a million times brighter than the sun as seen at midday from the equator on a cloudless day"? Of course, again, only the pedantic science teacher is thinking about this: everyone knows well enough what being brighter than a million suns implies. It is pretty intense radiation.

Applying the technology

Dr Finch went on to discuss a couple of applications of the synchrotron. One related to identifying pigments in art masterpieces. The other was quite timely in that it related to investigating the infectious agent in COVID.

"Now by now you have probably seen an image of the COVID virus – it looks like a ball with some spikes on it. Actually it kind of looks like those massage balls that your physio makes you buy when you turn thirty and need to to ease all your physical ailments that you suddenly have."

Dr Emily Finch talking on Ockham's Razor

Coronavirus particles and massage balls…or is it…
(Images by Ulrike Leone and Daniel Roberts from Pixabay)

Again there is an attempt to make the unfamiliar familiar. These microscopic virus particles are a bit like something familiar from everyday life. Such comparisons are useful where the everyday object is already familiar.

By now I've seen plenty of images of the coronavirus responsible for COVID, although I do not have a physiotherapist (perhaps this is a cultural difference – Australians being so sporty?) So, I found myself using this comparison in reverse – imagining that the "massage balls that your physio makes you buy" must be like larger versions of coronavirus particles. Having looked up what these massage balls (a.k.a. hedgehog balls it seems) look like, I can appreciate the similarity. Whether the manufacturers of massage balls will appreciate their products being compared to enormous coronavirus particles is, perhaps, another matter.


Work cited:
  • Taber, K. S., Billingsley, B., Riga, F., & Newdick, H. (2015). English secondary students' thinking about the status of scientific theories: consistent, comprehensive, coherent and extensively evidenced explanations of aspects of the natural world – or just 'an idea someone has'. The Curriculum Journal, 1-34. doi: 10.1080/09585176.2015.1043926

Notes:

1 At least, depending how we understand 'thinking'. Clearly there are cognitive processes at work even when we continue a conversation 'on auto pilot' (to employ a metaphor) whilst consciously focusing on something else. Only a tiny amount of our cognitive processing (thinking?) occurs within conscousness where we reflect and deliberate (i.e., explicit thinking?) We might label the rest as 'implicit thinking', but this processing varies greatly in its closeness to deliberation – and some aspects (for example, word recognition when listening to speech; identifying the face of someone we see) might seem to not deserve the label 'thinking'?


2 Of course the evidence for some ideas becomes so overwhelming that in principle we treat some theories as certain knowledge, but in principle they remain provisional knowledge. And the history of science tells us that sometimes even the most well-established ideas (e.g., Newtonian physics as an absolutely precise description of dynamics; mass and energy as distinct and discrete) may need revision in time.


3 Since I began drafting this article, the webpage for the podcast has been updated with a correction: "in this talk Dr Finch says electrons in the synchrotron are accelerated to the speed of light. They actually go just under that speed – 99.99998% of it to be exact."


4 Perversion in the sense of the distortion of an original course


5 The term nebulae is today reserved for clouds of dust and gas seen in the night sky in different parts of our galaxy. Nebulae are less distinct than stars. Many of what were originally identified as nebulae are now considered to be other galaxies immense distances away from our own.