Blog

Reflecting the population

Sampling an "exceedingly large number of students"


Keith S. Taber


the key to sampling a population is identifying a representative sample

Obtaining a representative sample of a population can be challenging
(Image by Gerd Altmann from Pixabay)


Many studies in education are 'about' an identified population (students taking A level Physics examinations; chemistry teachers in German secondary schools; children transferring from primary to secondary school in Scotland; undergraduates majoring in STEM subjects in Australia…).

Read about populations of interest in research

But, in practice, most studies only collect data from a sample of the population of interest.

Sampling the population

One of the key challenges in social research is sampling. Obtaining a sample is usually not that difficult. However, often the logic of research is something along the lines:

  • 1. Aim – to find out about a population.
  • 2. As it is impractical to collect data from the whole population, collect data from a sample.
  • 3. Analyse data collected from the sample.
  • 4. Draw inferences about the population from the analysis of data collected form the sample.

For example, if one wished to do research into the views of school teachers in England and there are, say, 600 000 of them, it is, unlikely anyone could undertake research that collected and analysed data from all of them and produce results in a short enough period for the findings to still be valid (unless they were prepared to employ a research team of thousands!) But perhaps one could collect data from a sample that would be informative about the population.

This can be a reasonable approach (and, indeed, is a very common approach in research in areas like education) but relies on the assumption that what is true of the sample, can be generalised to the population.

That clearly depends on the sample being representatives of the larger population (at least in those ways which are pertinent to the the research).


When a study (as here in the figure an experiment) collects data from a sample drawn at random from a wider population, then the findings of the experiment can be assumed to apply (on average) to the population. (Figure from Taber, 2019.) In practice, unless a population of interest is quite modest in size (e.g., teachers in one school; post-graduate students in one university department; registered members of a society) it is usually simply not feasible to obtain a random sample.

For example, if we were interested in secondary school students in England, and we had a sample of secondary students from England that (a) reflected the age profile of the population; (b) reflected the gender profile of the population; but (c) were all drawn from one secondary school, this is unlikely to be a representative sample.

  • If we do have a representative sample, then the likely error in generalising from sample to population can be calculated (and can be reduced by having a larger sample);
  • If we do not have a representative sample, then there is no way of knowing how well the findings from the sample reflect the wider population and increasing sample size does not really help; and, for that matter,
  • If we do not know whether we have a representative sample, then, again, there is no way of knowing how well the findings from the sample reflect the wider population and increasing sample size does not really help.

So, the key to sampling a population is identifying a representative sample.

Read about sampling a population

If we know that only a small number of factors are relevant to the research then we may (if we are able to characterise members of the population on these criteria) be able to design a sample which is representative based on those features which are important.

If the relevant factors for a study were teaching subject; years of teaching experience; teacher gender, then we would want to build a sample that fitted the population profile accordingly, so, maybe, 3% female maths teachers with 10+ years of teaching experience, et cetera. We would need suitable demographic information about the population to inform the building of the sample.

We can then randomly select from those members of the the population with the right characteristics within the different 'cells'.

However, if we do not know exactly what specific features might be relevant to characterise a population in a particular research project, the best we might be able to do is to to employ a randomly chosen sample which at least allows the measurement error to be estimated.

Labs for exceedingly large numbers of students

Leopold and Smith (2020) were interested in the use of collaborative group work in a "general chemistry, problem-based lab course" at a United States university, where students worked in fixed groups of three or four throughout the course. As well as using group work for more principled reasons, "group work is also utilized as a way to manage exceedingly large numbers of students and efficiently allocate limited time, space, and equipment" (p.1). They tell readers that

"the case we examine here is a general chemistry, problem-based lab course that enrols approximately 3500 students each academic year"

Leopold & Smith, 2020, p.5

Although they recognised a wide range of potential benefits of collaborative work, these depend upon students being able to work effectively in groups, which requires skills that cannot be take for granted. Leopold and Smith report how structured support was put in place that help students diagnose impediments to the effective work of their groups – and they investigated this in their study.

The data collected was of two types. There was a course evaluation at the end of the year taken by all the students in the cohort, "795 students enrolled [in] the general chemistry I lab course during the spring 2019 semester" (p.7). However, they also collected data from a sample of student groups during the course, in terms of responses to group tasks designed to help them think about and develop their group work.

Population and sample

As the focus of their research was a specific course, the population of interest was the cohort of undergraduates taking the course. Given the large number of students involved, they collected qualitative data from a sample of the groups.

Units of analysis

The course evaluation questions sought individual learners' views so for that data the unit of analysis was the individual student. However, the groups were tasked with working as a group to improve their effectiveness in collaborative learning. So, in Leopold and Smith's sample of groups, the unit of analysis was the group. Some data was received from individual groups members, and other data were submitted as group responses: but the analysis was on the basis of responses from within the specific groups in the sample.

A stratified sample

Leopold and Smith explained that

"We applied a stratified random sampling scheme in order to account for variations across lab sections such as implementation fidelity and instructor approach so as to gain as representative a sample as possible. We stratified by individual instructors teaching the course which included undergraduate teaching assistants (TAs), graduate TAs, and teaching specialists. One student group from each instructor's lab sections was randomly selected. During spring 2019, we had 19 unique instructors teaching the course therefore we selected 19 groups, for a total of 76 students."

Leopold & Smith, 2020, p.7

The paper does not report how the random assignment was made – how it was decided which group would be selected for each instructor. As any competent scientist ought to be able to make a random selection quite easily in this situation, this is perhaps not a serious omission. I mention this because sadly not all authors who report having used randomisation can support this when asked how (Taber, 2013).

Was the sample representative?

Leopold and Smith found that, based on their sample, student groups could diagnose impediments to effective group working, and could often put in place effective strategies to increase their effectiveness.

We might wonder if the sample was representative of the wider population. If the groups were randomly selected in the way claimed then one would expect this would probably be the case – only 'probably', as that is the best randomisation and statistics can do – we can never know for certain that a random sample is representative, only that it is unlikely to be especially unrepresentative!

The only way to know for sure that a sample is genuinely representative of the population of interest in relation to the specific focus of a study, would be to collect data from the whole population and check the sample data matches the population data.* But, of course, if it was feasible to collect data from everyone in the population, there would be no need to sample in the first place.

However, because the end of course evaluation was taken by all students in the cohort (the study population) Leopold and Smith were able to see if those students in the sample responded in ways that were generally in line with the population as a whole. The two figures reproduced here seem to suggest they did!


Figure 1 from Leopold & Smith, 2020, p.10, which is published with a Creative Commons Attribution (CC BY) license allowing reproduction.

Figure 2 from Leopold & Smith, 2020, p.10, which is published with a Creative Commons Attribution (CC BY) license allowing reproduction.

There is clearly a pretty good match here. However, it is important to not over-interpret this data. The questions in the evaluation related to the overall experience of group working, whereas the qualitative data analysed from the sample related to the more specific issues of diagnosing and addressing issues in the working of groups. These are related matters but not identical, and we cannot assume that the very strong similarity between sample and population outcomes in the survey demonstrates (or proves!) that the analysis of data from the sample is also so closely representative of what would have been obtained if all the groups had been included in the data collection.


Experiences of learning through group-workLearning to work more effectively in groups
Samplepatterns in data closely reflected population responsesdata only collected from a sample of groups
Populationall invited to provide feedback[it seems reasonable to assume results from sample are likely to apply to the cohort as a whole]
The similarly of the feedback viewing by students in the sample of groups to the overall cohort responses suggests that the sample was broadly representative of the overall population in terms of developing group-work skills and practices

It might well have been, but we cannot know for sure. (* The only way to know for sure that a sample is genuinely representative of the population of interest in relation to the specific focus of a study, would be …)

However, the way the sample so strongly reflected the population in relation to the evaluation data, shows that in that (related if not identical) respect at least the sample is strongly representative, and that is very likely to give readers confidence in the sampling procedure used. If this had been my study I would have been pretty pleased with this, at least strongly suggestive, circumstantial evidence of the representativeness of the sampling of the student groups.


Work cited:

Didactic control conditions

Another ethically questionable science education experiment?


Keith S. Taber


This seems to be a rhetorical experiment where an educational treatment that is already known to be effective is 'tested' to demonstrate that it is more effective than suboptimal teaching – by asking a teacher to constrain her teaching to students assigned to be an unethical comparison condition

one group of students were deliberately disadvantaged by asking an experienced and skilled teacher to teach in a way all concerned knew was sub-optimal so as to provide a low base line that would be outperformed by the intervention, simply to replicate a much demonstrated finding

In a scientific experiment, an intervention is made into the natural state of affairs to see if it produces a hypothesised change. A key idea in experimental research is control of variables: in the ideal experiment only one thing is changed. In the control condition all relevant variables are fixed so that there is a fair test between the experimental treatment and the control.

Although there are many published experimental studies in education, such research can rarely claim to have fully controlled all potentially relevant variables: there are (nearly always, always?) confounding factors that simply can not be controlled.

Read about confounding variables

Experimental research in education, then, (nearly always, always?) requires some compromising of the pure experimental method.

Where those compromises are substantial, we might ask if experiment was the wrong choice of methodology: even if a good experiment is often the best way to test an idea, a bad experiment may be less informative than, for example, a good case study.

That is primarily a methodological matter, but testing educational innovations and using control conditions in educational studies also raises ethical issues. After all, an experiment means experimenting with real learners' educational experiences. This can certainly be sometimes justified – but there is (or should be) an ethical imperative:

  • researchers should never ask learners to participate in a study condition they have good reason to expect will damage their opportunities to learn.

If researchers want to test a genuinely innovative teaching approach or learning resource, then they have to be confident it has a reasonable chance of being effective before asking learners to participate in a study where they will be subjected to an untested teaching input.

It is equally the case that students assigned to a control condition should never be deliberately subjected to inferior teaching simply in order to help make a strong contrast with an experimental approach being tested. Yet, reading some studies leads to a strong impression that some researchers do seek to constrain teaching to a control group to help bias studies towards the innovation being tested (Taber, 2019). That is, such studies are not genuinely objective, open-minded investigations to test a hypothesis, but 'rhetorical' studies set up to confirm and demonstrate the researchers' prior assumptions. We might say these studies do not reflect true scientific values.


A general scheme for a 'rhetorical experiment'

Read about rhetorical experiments


I have raised this issue in the research literature (Taber, 2019), so when I read experimental studies in education I am minded to check see that any control condition has been set up with a concern to ensure that the interests of all study participants (in both experimental and control conditions) have been properly considered.

Jigsaw cooperative learning in elementary science: physical and chemical changes

I was reading a study called "A jigsaw cooperative learning application in elementary science and technology lessons: physical and chemical changes" (Tarhan, Ayyıldız, Ogunc & Sesen, 2013) published in a respectable research journal (Research in Science & Technological Education).

Tarhan and colleagues adopted a common type of research design, and the journal referees and editor presumably were happy with the design of their study. However, I think the science education community should collectively be more critical about the setting up of control conditions which require students to be deliberately taught in ways that are considered to be less effective (Taber, 2019).


Jigsaw learning involves students working in co-operative groups, and in undertaking peer-teaching

Jigsaw learning is a pedagogic technique which can be seen as a constructivist, student-centred, dialogic, form of 'active learning'. It is based on collaborative groupwork and includes an element of peer-tutoring. In this paper the technique is described as "jigsaw cooperative learning", and the article authors explain that "cooperative learning is an active learning approach in which students work together in small groups to complete an assigned task" (p.185).

Read about jigsaw learning

Random assignment

The study used an experimental design, to compare between learning outcomes in two classes taught the same topic in two different ways. Many studies that compare between two classes are problematic because whole extant classes are assigned to conditions which means that the unit of analysis should be the class (experimental condition, n=1; control condition, n=1). Yet, despite this, such studies commonly analyse results as if each learner was an independent unit of analysis (e.g., experimental condition, n=c.30; control condition, n=c.30) which is necessary to obtain statistical results, but unfortunately means that inferences drawn from those statistics are invalid (Taber, 2019). Such studies offer examples of where there seems little point doing an experiment badly as the very design makes it intrinsically impossible to obtain a (i.e., a valid) statistically significant outcome.


Experimental designs may be categorised as true experiments, quasi-experiments and natural experiments (Taber, 2019).

Tarhan and colleagues, however, randomly assign the learners to the two conditions so can genuinely claim that in their study they have a true experiment: for their study, experimental condition, n=30; control condition, n=31.

Initial equivalence between groups

Assigning students in this way also helped ensure the two groups started from a similar base. Often such experimental studies use a pre-test to compare the groups before teaching. However, often the researchers look for a statistical difference between the groups which does not reach statistical significance (Taber, 2019). That is, if a statistical test shows p≥0.05 (in effect, the initial difference between the groups is not very unlikely to occur by chance) this is taken as evidence of equivalence. That is like saying we will consider two teachers to be of 'equivalent' height as long as there is no more than 30 cm difference in their height!

In effect

'not very different'

is being seen as a synonym for

'near enough the same'


Some analogies for how equivalence is determined in some studies: read about testing for initial equivalence

However, the pretest in Tarhan and colleagues' study found that the difference between two groups in performances on the pretest was at a level likely to occur by chance (not simply something more than 5%, but) 87% of the time. This is a much more convincing basis for seeing the two groups as initially similar.

So, there are two ways in which the Tarhan et al. study seemed better thought-through than many small scale experiments in teaching I have read.

Comparing two conditions

The research was carried out with "sixth grade students in a public elementary school in Izmir, Turkey" (p.184). The focus was learning about physical and chemical changes.

The experimental condition

At the outset of the study, the authors suggest it is already known that

  • "Jigsaw enhances cooperative learning" (p.185)"
  • "Jigsaw promotes positive attitudes and interests, develops communication skills between students, and increases learning achievement in chemistry" (p.186)
  • "the jigsaw technique has the potential to improve students' attitude towards science"
  • development of "students' understanding of chemical equilibrium in a first year general chemistry course [was more successful] in the jigsaw class…than …in the individual learning class"

It seems the approach being tested was already demonstrated to be effective in a range of contexts. Based on the existing research, then, we could already expect well-implemented jigsaw learning to be effective in facilitating student learning.

Similarly, the authors tell the readers that the broader category of cooperative learning has been well established as successful,

"The benefits of cooperative learning have been well documented as being

higher academic achievement,

higher level of reasoning and critical thinking skills,

deeper understanding of learned material,

better attention and less disruptive behavior in class,

more motivation to learn and achieve,

positive attitudes to subject matter,

higher self-esteem and

higher social skills."

Tarhan et al., 2013, p.185

What is there not to like here? So, what was this highly effective teaching approach compared with?

What is being compared?

Tarhan and colleagues tell readers that:

"The experimental group was taught via jigsaw cooperative learning activities developed by the researchers and the control group was taught using the traditional science and technology curriculum."

Tarhan et al., 2013, p.189
A different curriculum?

This seems an unhelpful statement as it does not seem to compare like with like:


conditioncurriculumpedagogy
experimental?jigsaw cooperative learning activities developed by the researchers
control traditional science and technology curriculum?
A genuine experiment would look to control variables, so would not simultaneously vary both curriculum and pedagogy

The study uses a common test to compare learning in the two conditions, so the study only makes sense as an experimental test of jigsaw learning if the same curriculum is being followed in both conditions. Otherwise, there is no prima facie reason to think that the post-test is equally fair in testing what has been taught in the two conditions. 1

The control condition

The paper includes an account of the control condition which seems to make it clear that both groups were taught "the same content", which is helpful as to have done otherwise would have seriously undermined the study.

The control group was instructed via a teacher-centered didactic lecture format. Throughout the lesson, the same science and technology teacher presented the same content as for the experimental group to achieve the same learning objectives, which were taught via detailed instruction in the experimental group. This instruction included lectures, discussions and problem solving. During this process, the teacher used the blackboard and asked some questions related to the subject. Students also used a regular textbook. While the instructor explained the subject, the students listened to her and took notes. The instruction was accomplished in the same amount of time as for the experimental group.

Tarhan et al., 2013, p.194

So, it seems:


conditioncurriculumpedagogy
experimental[by inference: "traditional science and technology curriculum"]jigsaw cooperative learning activities developed by the researchers
control traditional science and technology curriculum
[the same content as for the experimental group to achieve the same learning objectives]
teacher-centred didactic lecture format:
instructor explained the subject and asked questions
controlled variableindependent variable
An experiment relies on control of variables and would not simultaneously vary both curriculum and pedagogy

The statement is helpful, but might be considered ambiguous as "this instruction which included lectures, discussions and problem solving" seems to relate to what had been "taught via detailed instruction in the experimental group".

But this seems incongruent with the wider textual context. The experimental group were taught by a jigsaw learning technique – not lectures, discussions and problem solving. Yet, for that matter, the experimental group were not taught via 'detailed instruction' if this means the teacher presenting the curriculum content. So, this phrasing seems unhelpfully confusing (to me, at least – presumably, the journal referees and editor thought this was clear enough.)

So, this probably means the "lectures, discussions and problem solving" were part of the control condition where "the teacher used the blackboard and asked some questions related to the subject. Students also used a regular textbook. While the instructor explained the subject, the students listened to her and took notes".

'Lectures' certainly fit with that description.

However, genuine 'discussion' work is a dialogic teaching method and would not seem to fit within a "teacher-centered didactic lecture format". But perhaps 'discussion' simply refers to how the "teacher used the blackboard and asked some questions" that members of the class were invited to answer?

Read about dialogic teaching

Writing-up research is a bit like teaching in that in presenting to a particular audience, one works with a mental model of what that audience already knowns and understands, and how they use specific terms, and this model is never likely to be perfectly accurate:

  • when teaching, the learners tend to let you know this, whereas,
  • when writing, this kind of immediate feedback is lacking.

Similarly, problem-solving would not seem to fit within a "teacher-centered didactic lecture format". 'Problem-solving' engages high level cognitive and metacognitive skills because a 'problem' is a task that students are not able to respond to simply by recalling what they have been told and applying learnt algorithms. Problem-solving requires planning and applying strategies to test out ideas and synthesise knowledge. Yet teachers and textbooks commonly refer to simple questions that simply test recall and comprehension, or direct application of learnt techniques, as 'problems' when they are better understood as 'exercises' as they do not pose authentic problems.

The imprecise use of terms that may be understood differently across diverse contexts is characteristic of educational discourse, so Tarhan and colleagues may have simply used the labels that are normally applied in the context where they are working. It should also be noted that as the researchers are based in Turkey they are presumably finding the best English translations they can for the terms used locally.

Read about the challenges of translation in research writing

So, it seems we have:


Experimental conditionin one of the conditions?Control condition
Jigsaw learning (set out in some detail in the paper) – an example of
cooperative learning – an active learning approach in which students work together in small groups
detailed instruction?
discussions (=teacher questioning?)
problem solving? (=practice exercises?)
teacher-centred didactic lecture format…the teacher used the blackboard and asked some questions…a regular textbook….the instructor explained the subject, the students listened and took notes
The independent variable – teaching methodology

The teacher variable

One of the major problems with some educational experiments comparing different teaching approaches is the confound of the teacher. If

  • class A is taught through approach 'a' by teacher 1, and
  • class B is taught through approach 'b' by teacher 2

then even if there is a good case that class A and class B start off as 'equivalent' in terms of readiness to learn about the focal topic then any differences in study outcomes could be as much down to different teachers (and we all know that different teachers are not equivalent!) as different teaching methodology.

At first sight this is easily solved by having the same teacher teach both classes (as in the study discussed here). That certainly seems to help. But, a little thought suggests it is not a foolproof approach (Taber, 2019).

Teachers inevitably have better rapport with some classes than others (even when those classes are shown to be technically 'equivalent') simply because that is the nature of how diverse personalities interact. 3 Even the most professional teachers find they prefer to teach some classes than others, enjoy the teaching more, and seem to get better results (even when the classes are supposed to be equivalent).

In an experiment, there is no reason why the teacher would work better with a class assigned the experimental condition; it might just as well be the control condition. However, this is still a confound and there is no obvious solution to this, except having multiple classes and teachers in each condition such that the statistics can offer guide on whether outcomes are sufficiently unlikely to be able to reasonable discount these types of effect.

Different teachers also have different styles and approaches and skills sets – so the same teacher will not be equally suited to every teaching approach and pedagogy. Again, this does not necessarily advantage the experimental condition, but, again, is something that can only be addressed by having a diverse range of teachers in each condition (Taber, 2019).

So, although we might expect having the same teacher teach both classes is the preferred approach, the same teacher is not exactly the same teacher in different classes or teaching in different ways.

And what do participants expect will happen?

Moreover, expectancy effects can be very influential in education. Expecting something to work, or not work, has been shown to have real effects on outcomes. It may not be true, as some motivational gurus like to pretend, that we can all of us achieve anything if only we believe: but we are more likely to be successful when we believe we can succeed. When confident, we tend to be more motivated, less easily deterred, and (given the human capacity for perceiving with confirmation bias) more likely to judge we are making good progress. So, any research design which communicates to teachers and students (directly, or through the teacher's or researcher's enthusiasm) an expectation of success in some innovation is more likely to lead to success. This is a potential confound that is not even readily addressed by having large numbers of classes and teachers (Taber, 2019)!

Read about expectancy effects

The authors report that

Before implementation of the study, all students and their families were informed about the aims of the study and the privacy of their personal information. Permission for their children attend the study was obtained from all families.

Tarhan et al., 2013, p.194

This is as it should be. School children are not data-fodder for researchers, and they should always be asked for, and give, voluntary informed consent when recruited to join a research project. However, researchers need to open and honest about their work, whilst also being careful about how they present their research aims. We can imagine a possible form of invitation,

We would like you to invite you to be part of a study where some of you will be subject to traditional learning through a teacher-centred didactic lecture format where the teacher will give you notes and ask you questions, and some of you will learn by a different approach that has been shown to enhance learning, promote positive attitudes and interests, develop communication skills, increase achievement, support higher level of reasoning and critical thinking skills, lead to deeper understanding of learned material…

An honest, but unhelpful, briefing for students and parents

If this was how the researchers understood the background to their study, then this would be a fair and honest briefing. Yet, this would clearly set up strong expectations in the student groups!

A suitable teacher

Tarhan and colleagues report that

"A teacher experienced in active learning was trained in how to implement the instruction based on jigsaw cooperative learning. The teacher and researchers discussed the instructional plans before implementing the activities."

Tarhan et al., 2013, p.189

So, the teacher who taught both classes, using an jigsaw cooperative learning in one class and a teacher-centred didactic lecture approach in the other was "experienced in active learning". So, it seems that

  • the researchers were already convinced that active learning approaches were far superior to teaching via a lecture approach
  • the teacher had experience in teaching though more engaging, effective student-centred active learning approaches

despite this, a control condition was set-up that required the teacher to, in effect, de-skill, and teach in a way the researchers were well aware research suggested was inferior, for the sake of carrying out an experiment to demonstrate in a specific context what had already been well demonstrated elsewhere.

In other words, it seems that one group of students were deliberately disadvantaged by asking an experienced and skilled teacher to teach in a way all concerned knew was sub-optimal, so as to provide a low base line that would be outperformed by the intervention, simply to replicate a much demonstrated finding. When seen in that way, this is surely unethical research.

The researchers may not have been consciously conceptualising their design in those terms, but it is hard to see this as a fair test of the jigsaw learning approach – it can show it is better than suboptimal teaching, but does not offer a comparison with an example of the kind of teaching that is recommended in the national context where the research took place.

Unethical, but not unusual

I am not seeking to pick out Tarhan and colleagues in particular for designing an unethical study, because they are not unique in adopting this approach (Taber, 2019): indeed, they are following a common formula (an experimental 'paradigm' in the sense the term is used in psychology).

Tarhan and colleagues have produced a study that is interesting and informative, and which seems well planned, and strongly-motivated when considered as part of tradition of such studies. Clearly, the referees and journal editor were not minded to question the procedure. The problem is that as a science education community we have allowed this tradition to continue such that a form of study that was originally genuinely open-ended (in that it examined under-researched teaching approaches of untested efficacy) has not been modified as published study after published study has slowly turned those untested teaching approaches into well-researched and repeatedly demonstrated approaches.

So much so, that such studies are now in danger of simply being rhetorical research – where (as in this case) the authors tell readers at the outset that it is already known that what they are going to test is widely shown to be effective good practice. Rhetorical research is set up to produce an expected result, and so is not authentic research. A real experiment tests a genuine hypothesis rather than demonstrates a commonplace. A question researchers might ask themselves could be

'how surprised would I be if this leads to a negative outcome'?

If the answer is

'that would be very surprising'

then they should consider modifying their research so it is likely to be more than minimally informative.

Finding out that jigsaw learning achieved learning objectives better/as well as/not so well as, say, P-O-E (predict-observe-explain) activities might be worth knowing: that it is better than deliberately constrained teaching does not tell us very much that is not obvious.

I do think this type of research design is highly questionable and takes unfair advantage of students. It fails to meet my suggested guideline that

  • researchers should never ask learners to participate in a study condition they have good reason to expect will damage their opportunities to learn

The problem of generalisation

Of course, one fair response is that despite all the claims of the superiority of constructivist, active, cooperatative (etc.) learning approaches, the diversity of educational contexts means we can not simply generalise from an experiment in one context and assume the results apply elsewhere.

Read about generalising from research

That is, the research literature shows us that jigsaw learning is an effective teaching approach, but we cannot be certain it will be effective in the particular context of teaching about chemical and physical changes to sixth grade students in a public elementary school in Izmir, Turkey.

Strictly that is true! But we should ask:

do we not know this because

  1. research shows a great variation in whether jigsaw learning is effective or not as it differs according to contexts and conditions
  2. although jigsaw learning has consistently been shown to be effective in many different contexts, no one has yet tested it in the specific case of teaching about chemical and physical changes to sixth grade students in a public elementary school in Izmir, Turkey

It seems clear from the paper that the researchers are presenting the second case (in which case the study would actually be of more interest and importance if had been found that in this context jigsaw learning was not effective).

Given there are very good reasons to expect a positive outcome, there seems no need to 'stack the odds' by using deliberately detrimental control conditions.

Even had situation 1 applied, it seems of limited value to know that jigsaw learning is more effective (in teaching about chemical and physical changes to sixth grade students in a public elementary school in Izmir, Turkey) than an approach we already recognise is suboptimal.

An ethical alternative

This does not mean that there is no value in research that explores well-established teaching approaches in new contexts. However, unless the context is very different from where the approach has already been widely demonstrated, there is little value in comparing it with approaches that are known to be sub-optimal (which in Turkey, a country where constructivist 'reform' teaching approaches are supposed to be the expected standard, seem to often be labelled as 'traditional').

Detailed case studies of the implementation of a reform pedagogy in new contexts that collect rich 'process' data to explore challenges to implementation and to identify especially effective specific practices would surely be more informative? 4

If researchers do feel the need to do experiments, then rather than comparing known-to-be-effective approaches with suboptimal approaches hoping to demonstrate what everyone already knows, why not use comparison conditions that really test the innovation. Of course jigsaw learning out performed lecturing in an elementary school – but how might it have compared with another constructivist approach?

I have described the constructivist science teacher as a kind of learning doctor. Like medical doctors, our first tenet should be to do no harm. So, if researchers want to set up experimental comparisons, they have a duty to try to set up two different approaches that they believe are likely to benefit the learners (whichever condition they are assigned to):

  • not one condition that advantages one group of students
  • and another which deliberately disadvantages another group of students for the benefit of a 'positive' research outcome.

If you already know the outcome then it is not genuine research – and you need a better research question.


Work cited:

Note:

1 Imagine teaching one class about acids by jigsaw learning, and teaching another about the nervous system by some other pedagogy – and then comparing the pedagogies by administering a test – about acids! The class in the jigsaw condition might well do better, without it being reasonable to assume this reflects more effective pedagogy.

So, I am tempted to read this as simply a drafting/typographical error that has been missed, and suspect the authors intended to refer to something like the traditional approach to teaching the science and technology curriculum. Otherwise the experiment is fatally flawed.

Yet, one purpose of the study was to find out

"Does jigsaw cooperative learning instruction contribute to a better conceptual understanding of 'physical and chemical changes' in sixth grade students compared to the traditional science and technology curriculum?"

Tarhan et al., 2013, p.187

This reads as if the researchers felt the curriculum was not sufficiently matched to what they felt were the most important learning objectives in the topic of physical and chemical changes, so they have undertaken some curriculum development, as well as designed a teaching unit accordingly, to be taught by jigsaw learning pedagogy. If so the experiment is testing

traditional curriculum x traditional pedagogy

vs.

reformed curriculum x innovative pedagogy

making it impossible to disentangle the two components.

This suggests the researchers are testing the combination of curriculum and pedagogy, and doing so with a test biased towards the experimental condition. This seems illogical, but I have actually worked in a project where we faced a similar dilemma. In the epiSTEMe project we designed innovative teaching units for lower secondary science and maths. In both physics units we incorporated innovative aspects to the curriculum.

  • In the forces unit material on proportionality was introduced, with examples (car stopping distance) normally not taught at that grade level (Y7);
  • In the electricity unit the normal physics content was embedded in an approach designed to teach aspects of the nature of science.

In the forces unit, the end-of-topic test included material that was included in the project-designed units, but unlikely to be taught in the control classes. There was evidence that on average students in the project classes did better on the test.

In the electricity unit, the nature of science objectives were not tested as these would not necessarily have been included in teaching control classes. On average, there was very little difference in learning about electrical circuits in the two conditions. There was however a very wide range of class performances – oddly just as wide in the experimental condition (where all classes had a common scheme of work, common activities, and common learning materials) as in the control condition where teachers taught the topic in their customary ways.


2 It could be read either as


1

ControlExperimental
The control group was instructed via a teacher-centered didactic lecture format. Throughout the lesson, the same science and technology teacher presented the same content as for the experimental group to achieve the same learning objectives, which were taught via detailed instruction in the experimental group.
…detailed instruction in the experimental group. This instruction included lectures, discussions and problem solving.
During this process, the teacher used the blackboard and asked some questions related to the subject. Students also used a regular textbook. While the instructor explained the subject, the students listened to her and took notes. The instruction was accomplished in the same amount of time as for the experimental group.
What was 'this instruction' which included lectures, discussions and problem solving?

or


2

ControlExperimental
The control group was instructed via a teacher-centered didactic lecture format. Throughout the lesson, the same science and technology teacher presented the same content as for the experimental group to achieve the same learning objectives, which were taught via detailed instruction in the experimental group.
…detailed instruction in the experimental group.
This [sic] instruction included lectures, discussions and problem solving. During this process, the teacher used the blackboard and asked some questions related to the subject. Students also used a regular textbook. While the instructor explained the subject, the students listened to her and took notes. The instruction was accomplished in the same amount of time as for the experimental group.
What was 'this instruction' which included lectures, discussions and problem solving?

3 A class, of course, is not a person, but a collection of people, so perhaps does not have a 'personality' as such. However, for teachers, classes do take on something akin to a personality.

This is not just an impression. It was pointed out above that if a researcher wants to treat each learner as a unit of analysis (necessary to use inferential statistics when only working with a small number of classes) then learners, not intact classes, should be assigned to conditions. However, even a newly formed class will soon develop something akin to a personality. This will certainly be influenced by individual learners present but develop through the history of their evolving mutual interactions and is not just a function of the sum of their individual characteristics.

So, even when a class is formed by random assignment of learners at the start of a study, it is still strictly questionable whether these students should be seen as independent units for analysis (Taber, 2019).


4 I suspect that science educators have a justified high regard for experimental method in the natural sciences, which sometimes blinkers us to its limitations in social contexts where there are myriad interacting variables and limited controls.

Read: Why do natural scientists tend to make poor social scientists?


Delusions of educational impact

A 'peer-reviewed' study claims to improve academic performance by purifying the souls of students suffering from hallucinations


Keith S. Taber


The research design is completely inadequate…the whole paper is confused…the methodology seems incongruous…there is an inconsistency…nowhere is the population of interest actually identified…No explanation of the discrepancy is provided…results of this analysis are not reported…the 'interview' technique used in the study is highly inadequate…There is a conceptual problem here…neither the validity nor reliability can be judged…the statistic could not apply…the result is not reported…approach is completely inappropriate…these tables are not consistent…the evidence is inconclusive…no evidence to demonstrate the assumed mechanism…totally unsupported claims…confusion of recommendations with findings…unwarranted generalisation…the analysis that is provided is useless…the research design is simply inadequate…no control condition…such a conclusion is irresponsible

Some issues missed in peer review for a paper in the European Journal of Education and Pedagogy

An invitation to publish without regard to quality?

I received an email from an open-access journal called the European Journal of Education and Pedagogy, with the subject heading 'Publish Fast and Pay Less' which immediately triggered the thought "another predatory journal?" Predatory journals publish submissions for a fee, but do not offer the editorial and production standards expected of serious research journals. In particular, they publish material which clearly falls short of rigorous research despite usually claiming to engage in peer review.

A peer reviewed journal?

Checking out the website I found the usual assurances that the journal used rigorous peer review as:

"The process of reviewing is considered critical to establishing a reliable body of research and knowledge. The review process aims to make authors meet the standards of their discipline, and of science in general.

We use a double-blind system for peer-reviewing; both reviewers and authors' identities remain anonymous to each other. The paper will be peer-reviewed by two or three experts; one is an editorial staff and the other two are external reviewers."

https://www.ej-edu.org/index.php/ejedu/about

Peer review is critical to the scientific process. Work is only published in (serious) research journals when it has been scrutinised by experts in the relevant field, and any issues raised responded to in terms of revisions sufficient to satisfy the editor.

I could not find who the editor(-in-chief) was, but the 'editorial team' of European Journal of Education and Pedagogy were listed as

  • Bea Tomsic Amon, University of Ljubljana, Slovenia
  • Chunfang Zhou, University of Southern Denmark, Denmark
  • Gabriel Julien, University of Sheffield, UK
  • Intakhab Khan, King Abdulaziz University, Saudi Arabia
  • Mustafa Kayıhan Erbaş, Aksaray University, Turkey
  • Panagiotis J. Stamatis, University of the Aegean, Greece

I decided to look up the editor based in England where I am also based but could not find a web presence for him at the University of Sheffield. Using the ORCID (Open Researcher and Contributor ID) provided on the journal website I found his ORCID biography places him at the University of the West Indies and makes no mention of Sheffield.

If the European Journal of Education and Pedagogy is organised like a serious research journal, then each submission is handled by one of this editorial team. However the reference to "editorial staff" might well imply that, like some other predatory journals I have been approached by (e.g., Are you still with us, Doctor Wu?), the editorial work is actually carried out by office staff, not qualified experts in the field.

That would certainly help explain the publication, in this 'peer-reviewed research journal', of the first paper that piqued my interest enough to motivate me to access and read the text.


The Effects of Using the Tazkiyatun Nafs Module on the Academic Achievement of Students with Hallucinations

The abstract of the paper published in what claims to be a peer-reviewed research journal

The paper initially attracted my attention because it seemed to about treatment of a medical condition, so I wondered was doing in an education journal. Yet, the paper seemed to also be about an intervention to improve academic performance. As I read the paper, I found a number of flaws and issues (some very obvious, some quite serious) that should have been spotted by any qualified reviewer or editor, and which should have indicated that possible publication should have been be deferred until these matters were satisfactorily addressed.

This is especially worrying as this paper makes claims relating to the effective treatment of a symptom of potentially serious, even critical, medical conditions through religious education ("a  spiritual  approach", p.50): claims that might encourage sufferers to defer seeking medical diagnosis and treatment. Moreover, these are claims that are not supported by any evidence presented in this paper that the editor of the European Journal of Education and Pedagogy decided was suitable for publication.


An overview of what is demonstrated, and what is claimed, in the study.

Limitations of peer review

Peer review is not a perfect process: it relies on busy human beings spending time on additional (unpaid) work, and it is only effective if suitable experts can be found that fit with, and are prepared to review, a submission. It is also generally more challenging in the social sciences than in the natural sciences. 1

That said, one sometimes finds papers published in predatory journals where one would expect any intelligent person with a basic education to notice problems without needing any specialist knowledge at all. The study I discuss here is a case in point.

Purpose of the study

Under the heading 'research objectives', the reader is told,

"In general, this journal [article?] attempts to review the construction and testing of Tazkiyatun Nafs [a Soul Purification intervention] to overcome the problem of hallucinatory disorders in student learning in secondary schools. The general objective of this study is to identify the symptoms of hallucinations caused by subtle beings such as jinn and devils among students who are the cause of disruption in learning as well as find solutions to these problems.

Meanwhile, the specific objective of this study is to determine the effect of the use of Tazkiyatun Nafs module on the academic achievement of students with hallucinations.

To achieve the aims and objectives of the study, the researcher will get answers to the following research questions [sic]:

Is it possible to determine the effect of the use of the Tazkiyatun Nafs module on the academic achievement of students with hallucinations?"

Awang, 2022, p.42

I think I can save readers a lot of time regarding the research question by suggesting that, in this study, at least, the answer is no – if only because the research design is completely inadequate to answer the research question. (I should point that the author comes to the opposite conclusion: e.g., "the approach taken in this study using the Tazkiyatun Nafs module is very suitable for overcoming the problem of this hallucinatory disorder", p.49.)

Indeed, the whole paper is confused in terms of what it is setting out to do, what it actually reports, and what might be concluded. As one example, the general objective of identifying "the symptoms of hallucinations caused by subtle beings such as jinn and devils" (but surely, the hallucinations are the symptoms here?) seems to have been forgotten, or, at least, does not seem to be addressed in the paper. 2


The study assumes that hallucinations are caused by subtle beings such as jinn and devils possessing the students.
(Image by Tünde from Pixabay)

Methodology

So, this seems to be an intervention study.

  • Some students suffer from hallucinations.
  • This is detrimental to their education.
  • It is hypothesised that the hallucinations are caused by supernatural spirits ("subtle beings that lead to hallucinations"), so, a soul purification module might counter this detriment;
  • if so, sufferers engaging with the soul purification module should improve their academic performance;
  • and so the effect of the module is being tested in the study.

Thus we have a kind of experimental study?

No, not according to the author. Indeed, the study only reports data from a small number of unrepresentative individuals with no controls,

"The study design is a case study design that is a qualitative study in nature. This study uses a case study design that is a study that will apply treatment to the study subject to determine the effectiveness of the use of the planned modules and study variables measured many times to obtain accurate and original study results. This study was conducted on hallucination disorders [students suffering from hallucination disorders?] to determine the effectiveness of the Tazkiyatun Nafs module in terms of aspects of student academic achievement."

Awang, 2022, p.42

Case study?

So, the author sees this as a case study. Research methodologies are better understood as clusters of similar approaches rather than unitary categories – but case study is generally seen as naturalistic, rather than involving an intervention by an external researcher. So, case study seems incongruous here. Case study involves the detailed exploration of an instance (of something of interest – a lesson, a school, a course of tudy, a textbook, …) reported with 'thick description'.

Read about the characteristics of case study research

The case is usually a complex phenomena which is embedded within a context from which is cannot readily be untangled (for example, a lesson always takes place within a wider context of a teacher working over time with a class on a course of study, within a curricular, and institutional, and wider cultural, context, all of which influence the nature of the specific lesson). So, due to the complex and embedded nature of cases, they are all unique.

"a case study is a study that is full of thoroughness and complex to know and understand an issue or case studied…this case study is used to gain a deep understanding of an issue or situation in depth and to understand the situation of the people who experience it"

Awang, 2022, p.42

A case is usually selected either because that case is of special importance to the researcher (an intrinsic case study – e.g., I studied this school because it is the one I was working in) or because we hope this (unique) case can tell us something about similar (but certainly not identical) other (also unique) cases. In the latter case [sic], an instrumental case study, we are always limited by the extent we might expect to be able to generalise beyond the case.

This limited generalisation might suggest we should not work with a single case, but rather look for a suitably representative sample of all cases: but we sometimes choose case study because the complexity of the phenomena suggests we need to use extensive, detailed data collection and analyses to understand the complexity and subtlety of any case. That is (i.e., the compromise we choose is), we decide we will look at one case in depth because that will at least give us insight into the case, whereas a survey of many cases will inevitably be too superficial to offer any useful insights.

So how does Awang select the case for this case study?

"This study is a case study of hallucinatory disorders. Therefore, the technique of purposive sampling (purposive sampling [sic]) is chosen so that the selection of the sample can really give a true picture of the information to be explored ….

Among the important steps in a research study is the identification of populations and samples. The large group in which the sample is selected is termed the population. A sample is a small number of the population identified and made the respondents of the study. A case or sample of n = 1 was once used to define a patient with a disease, an object or concept, a jury decision, a community, or a country, a case study involves the collection of data from only one research participant…

Awang, 2022, p.42

Of course, a case study of "a community, or a country" – or of a school, or a lesson, or a professional development programme, or a school leadership team, or a homework policy, or an enrichnment activity, or … – would almost certainly be inadequate if it was limited to "the collection of data from only one research participant"!

I do not think this study actually is "a case study of hallucinatory disorders [sic]". Leading aside the shift from singular ("a case study") to plural ("disorders"), the research does not investigate a/some hallucinatory disorders, but the effect of a soul purification module on academic performance. (Actually, spoiler alert  😉, it does not actually investigate the effect of a soul purification module on academic performance either, but the author seems to think it does.)

If this is a case study, there should be the selection of a case, not a sample. Sometimes we do sample within a case in case study, but only from those identified as part of the case. (For example, if the case was a year group in a school, we may not have resources to interact in depth with several hundred different students). Perhaps this is pedantry as the reader likely knows what Awang meant by 'sample' in the paper – but semantics is important in research writing: a sample is chosen to represent a population, whereas the choice of case study is an acknowledgement that generalisation back to a population is not being claimed).

However, if "among the important steps in a research study is the identification of populations" then it is odd that nowhere in the paper is the population of interest actually specified!

Things slip our minds. Perhaps Awang intended to define the population, forgot, and then missed this when checking the text – buy, hey, that is just the kind of thing the reviewers and editor are meant to notice! Otherwise this looks very like including material from standard research texts to play lip-service to the idea that research-design needs to be principled, but without really appreciating what the phrases used actually mean. This impression is also given by the descriptions of how data (for example, from interviews) were analysed – but which are not reflected at all in the results section of the paper. (I am not accusing Awang of this, but because of the poor standard of peer review not raising the question, the author is left vulnerable to such an evaluation.)

The only one research participant?

So, what do we know about the "case or sample of n = 1 ", the "only one research participant" in this study?

The actual respondents in this case study related to hallucinatory disorders were five high school students. The supportive respondents in the case study related to hallucination disorders were five counseling teachers and five parents or guardians of students who were the actual respondents."

Awang, 2022, p.42

It is certainly not impossible that a case could comprise a group of five people – as long as those five make up a naturally bounded group – that is a group that a reasonable person would recognise as existing as a coherent entiy as they clearly had something in common (they were in the same school class, for example; they were attending the same group therapy session, perhaps; they were a friendship group; they were members of the same extended family diagnosed with hallucinatory disorders…something!) There is no indication here of how these five make up a case.

The identification of the participants as a case might have made sense had the participants collectively undertaken the module as a group, but the reader is told: "This study is in the form of a case study. Each practice and activity in the module are done individually" (p.50). Another justification could have been if the module had been offered in one school, and these five participants were the students enrolled in the programme at that time but as "analysis of  the  respondents'  academic  performance  was conducted  after  the  academic  data  of  all  respondents  were obtained  from  the  respective  respondent's  school" (p.45) it seems they did not attend a single school.

The results tables and reports in the text refer to "respondent 1" to "respondent 4". In case study, an approach which recognises the individuality and inherent value of the particular case, we would usually assign assumed names to research participants, not numbers. But if we are going to use numbers, should there not be a respondent 5?

The other one research participant?

It seems that these is something odd here.

Both the passage above, and the abstract refer to five respondents. The results report on four. So what is going on? No explanation of the discrepancy is provided. Perhaps:

  • There only ever were four participants, and the author made a mistake in counting.
  • There only ever were four participants, and the author made a typographical mistake (well, strictly, six typographical mistakes) in drafting the paper, and then missed this in checking the manuscript.
  • There were five respondents and the author forgot to include data on respondent 5 purely by accident.
  • There were five respondents, but the author decided not to report on the fifth deliberately for a reason that is not revealed (perhaps the results did not fit with the desired outcome?)

The significant point is not that there is an inconsistency but that this error was missed by peer reviewers and the editor – if there ever was any genuine peer review. This is the kind of mistake that a school child could spot – so, how is it possible that 'expert reviewers' and 'editorial staff' either did not notice it, or did not think it important enough to query?

Research instruments

Another section of the paper reports the instrumentation used in the paper.

"The research instruments for this study were Takziyatun Nafs modules, interview questions, and academic document analysis. All these instruments were prepared by the researcher and tested for validity and reliability before being administered to the selected study sample [sic, case?]."

Awang, 2022, p.42

Of course, it is important to test instruments for validity and reliability (or perhaps authenticity and trustworthiness when collecting qualitative data). But it is also important

  • to tell the reader how you did this
  • to report the outcomes

which seems to be missing (apart from in regard to part of the implemented module – see below). That is, the reader of a research study wants evidence not simply promises. Simply telling readers you did this is a bit like meeting a stranger who tells you that you can trust them because they (i.e., say that they) are honest.

Later the reader is told that

"Semi- structured interview questions will be [sic, not 'were'?] developed and validated for the purpose of identifying the causes and effects of hallucinations among these secondary school students…

…this interview process will be [sic, not 'was'] conducted continuously [sic!] with respondents to get a clear and specific picture of the problem of hallucinations and to find the best solution to overcome this disorder using Islamic medical approaches that have been planned in this study

Awang, 2022, pp.43-44

At the very least, this seems to confuse the plan for the research with a report of what was done. (But again, apparently, the reviewers and editorial staff did not think this needed addressing.) This is also confusing as it is not clear how this aspect of the study relates to the intervention. Were the interviews carried out before the intervention to help inform the design of the modules (presumably not as they had already been "tested for validity and reliability before being administered to the selected study sample"). Perhaps there are clear and simple answers to such questions – but the reader will not know because the reviewers and editor did not seem to feel they needed to be posed.

If "Interviews are the main research instrument in this study" (p.43), then one would expect to see examples of the interview schedules – but these are not presented. The paper reports a complex process for analysing interview data, but this is not reflected in the findings reported. The readers is told that the six stage process leads to the identifications and refinement of main and sub-categories. Yet, these categories are not reported in the paper. (But, again, peer reviewers and the editor did not apparently raise this as something to be corrected.) More generally "data  analysis  used  thematic  analysis  methods" (p.44), so why is there no analysis presented in terms of themes? The results of this analysis are simply not reported.

The reader is told that

"This  interview  method…aims to determine the respondents' perspectives, as well as look  at  the  respondents'  thoughts  on  their  views  on  the issues studied in this study."

Awang, 2022, p.44

But there is no discussion of participants perspectives and views in the findings of the study. 2 Did the peer reviewers and editor not think this needed addressing before publication?

Even more significantly, in a qualitative study where interviews are supposedly the main research instrument, one would expect to see extracts from the interviews presented as part of the findings to support and exemplify claims being made: yet, there are none. (Did this not strike the peer reviewers and editor as odd: presumably they are familiar with the norms of qualitative research?)

The only quotation from the qualitative data (in this 'qualitative' study) I can find appears in the implications section of the paper:

"Are you aware of the importance of education to you? Realize. Is that lesson really important? Important. The success of the student depends on the lessons in school right or not? That's right"

Respondent 3: Awang, 2022, p.49

This seems a little bizarre, if we accept this is, as reported, an utterance from one of the students, Respondent 3. It becomes more sensible if this is actually condensed dialogue:

"Are you aware of the importance of education to you?"

"Realize."

"Is that lesson really important?"

"Important."

"The success of the student depends on the lessons in school right or not?"

"That's right"

It seems the peer review process did not lead to suggesting that the material should be formatted according to the norms for presenting dialogue in scholarly texts by indicating turns. In any case, if that is typical of the 'interview' technique used in the study then it is highly inadequate, as clearly the interviewer is leading the respondent, and this is more an example of indoctrination than open-ended enquiry.

Random sampling of data

Completely incongruous with the description of the purposeful selection of the participants for a case study is the account of how the assessment data was selected for analysis:

"The  process  of  analysis  of  student  achievement documents is carried out randomly by taking the results of current  examinations  that  have  passed  such  as the  initial examination of the current year or the year before which is closest  to  the  time  of  the  study."

Awang, 2022, p.44

Did the peer reviewers or editor not question the use of the term random here? It is unclear what is meant to by 'random' here, but clearly if the analysis was based on randomly selected data that would undermine the results.

Validating the soul purification module

There is also a conceptual problem here. The Takziyatun Nafs modules are the intervention materials (part of what is being studied) – so they cannot also be research instruments (used to study them). Surely, if the Takziyatun Nafs modules had been shown to be valid and reliable before carrying out the reported study, as suggested here, then the study would not be needed to evaluate their effectiveness. But, presumably, expert peer reviewers (if there really were any) did not see an issue here.

The reliability of the intervention module

The Takziyatun Nafs modules had three components, and the author reports the second of the three was subjected to tests of validity and reliability. It seems that Awang thinks that this demonstrates the validity and reliability of the complete intervention,

"The second part of this module will go through [sic] the process of obtaining the validity and reliability of the module. Proses [sic] to obtain this validity, a questionnaire was constructed to test the validity of this module. The appointed specialists are psychologists, modern physicians (psychiatrists), religious specialists, and alternative medicine specialists. The validity of the module is identified from the aspects of content, sessions, and activities of the Tazkiyatun Nafs module. While to obtain the value of the reliability coefficient, Cronbach's alpha coefficient method was used. To obtain this Cronbach's alpha coefficient, a pilot test was conducted on 50 students who were randomly selected to test the reliability of this module to be conducted."

Awang, 2022, pp.43-44

Now to unpack this, it may be helpful to briefly outline what the intervention involved (as as the paper is open access anyone can access and read the full details in the report).


From the MGM film 'A Night at the Opera' (1935): "The introduction of the module will elaborate on the introduction, rationale, and objectives of this module introduced"

The description does not start off very helpfully ("The introduction of the module will elaborate on the introduction, rationale, and objectives of this module introduced" (p.43) put me in mind of the Marx brothers: "The party of the first part shall be known in this contract as the party of the first part"), but some key points are,

"the Tazkiyatun Nafs module was constructed to purify the heart of each respondent leading to the healing of hallucinatory disorders. This liver purification process is done in stages…

"the process of cleansing the patient's soul will be done …all the subtle beings in the patient will be expelled and cleaned and the remnants of the subtle beings in the patient will be removed and washed…

The second process is the process of strengthening and the process of purification of the soul or heart of the patient …All the mazmumah (evil qualities) that are in the heart must be discarded…

The third process is the process of enrichment and the process of distillation of the heart and the practices performed. In this process, there will be an evaluation of the practices performed by the patient as well as the process to ensure that the patient is always clean from all the disturbances and disturbances [sic] of subtle beings to ensure that students will always be healthy and clean from such disturbances…

Awang, 2022, p.45, p.43

Quite how this process of exorcising and distilling and cleansing will occur is not entirely clear (and if the soul is equated with the heart, how is the liver involved?), but it seems to involve reflection and prayer and contemplation of scripture – certainly a very personal and therapeutic process.

And yet its validity and reliability was tested by giving a questionnaire to 50 students randomly selected (from the unspecified population, presumably)? No information is given on how a random section was made (Taber, 2013) – which allows a reader to be very sceptical that this actually was a random sample from the (un?)identified population, and not just an arbitrary sample of 50 students. (So, that is twice the word 'random' is used in the paper when it seems inappropriate.)

It hardly matters here, as clearly neither the validity nor the reliability of a spiritual therapy can be judged from a questionnaire (especially when administered to people who have never undertaken the therapy). In any case, the "reliability coefficient" obtained from an administration of a questionnaire ONLY applies to that sample on that occasion. So, the statistic could not apply to the four participants in the study. And, in any case, the result is not reported, so the reader has no idea what the value of Cronbach's alpha was (but then, this was described as a qualitative study!)

Moreover, Cronbach's alpha only indicates the internal coherence of the items on a scale (Taber, 2019): so, it only indicates whether the set of questions included in the questionnaire seem to be accessing the same underlying construct in motivating the responses of those surveyed across the set of items. It gives no information about the reliability of the instrument (i.e., whether it would give the same results on another occasion).

This approach to testing validity and reliability is then completely inappropriate and unhelpful. So, even if the outcomes of the testing had been reported (and they are not) they would not offer any relevant evidence. Yet it seems that peer reviewers and editor did not think to question why this section was included in the paper.

Ethical issues

A study of this kind raises ethical issues. It may well be that the research was carried out in an entirely proper and ethical manner, but it is usual in studies with human participants ('human subjects') to make this clear in the published report (Taber, 2014b). A standard issue is whether the participants gave voluntary, informed, consent. This would mean that they were given sufficient information about the study at the outset to be able to decide if they wished to participate, and were under no undue pressure to do so. The 'respondents' were school students: if they were considered minors in the research context (and oddly for a 'case study' such basic details as age and gender are not reported) then parental permission would also be needed, again subject to sufficient briefing and no duress.

However, in this specific research there are also further issues due to the nature of the study. The participants were subject to medical disorders, so how did the researcher obtain information about, and access to, the students without medical confidentiality being broken? Who were the 'gatekeepers' who provided access to the children and their personal data? The researcher also obtained assessment data "from  the  class  teacher  or  from  the  Student Affairs section of the student's school" (p.44), so it is important to know that students (and parents/guardians) consented to this. Again, peer review does not seem to have identified this as an issue to address before publication.

There is also the major underlying question about the ethics of a study when recognising that these students were (or could be, as details are not provided) suffering from serious medical conditions, but employing religious education as a treatment ("This method of treatment is to help respondents who suffer from hallucinations caused by demons or subtle beings", p.44). Part of the theoretical framework underpinning the study is the assumption that what is being addressed is"the problem of hallucinations caused by the presence of ethereal beings…" (p.43) yet it is also acknowledged that,

"Hallucinatory disorders in learning that will be emphasized in this study are due to several problems that have been identified in several schools in Malaysia. Such disorders are psychological, environmental, cultural, and sociological disorders. Psychological disorders such as hallucinatory disorders can lead to a more critical effect of bringing a person prone to Schizophrenia. Psychological disorders such as emotional disorders and psychiatric disorders. …Among the causes of emotional disorders among students are the school environment, events in the family, family influence, peer influence, teacher actions, and others."

Awang, 2022, p.41

There seem to be three ways of understanding this apparent discrepancy, which I might gloss:

  1. there are many causes of conditions that involve hallucinations, including, but not only, possession by evil or mischievousness spirits;
  2. the conditions that lead to young people having hallucinations may be understood at two complementary levels, at a spiritual level in terms of a need for inner cleansing and exorcising of subtle beings, and in terms of organic disease or conditions triggered by, for example, social and psychological factors;
  3. in the introduction the author has relied on various academic sources to discuss the nature of the phenomenon of students having hallucinations, but he actually has a working assumption that is completely different: hallucinations are due to the presence of jinn or other spirits.

I do not think it is clear which of these positions is being taken by the study's author.

  1. In the first case it would be necessary to identify which causes are present in potential respondents and only recruit those suffering possession for this study (which does not seem to have been done);
  2. In the second case, spiritual treatment would need to complement medical intervention (which would completely undermine the validity of the study as medical treatments for the underlying causes of hallucinations are likely to be the cause of hallucinations ceasing, not the tested intervention);
  3. The third position is clearly problematic in terms of academic scholarship as it is either completely incompetent or deliberately disregards academic norms that require the design of a study to reflect the conceptual framework set out to motivate it.

So, was this tested intervention implemented instead of or alongside formal medical intervention?

  • If it was alongside medical treatment, then that raises a major confound for the study.
  • Yet it would clearly be unacceptable to deny sufferers indicated medical treatment in order to test an educational intervention that is in effect a form of exorcism.

Again, it may be there are simple and adequate responses to these questions (although here I really cannot see what they might be), but unfortunately it seems the journal referees and editor did not think to ask for them.  

Findings


Results tables presented in Awang, 2022 (p.45) [Published with a creative commons licence allowing reproduction]: "Based on the findings stated in Table I show that serial respondents experienced a decline in academic achievement while they face the problem of hallucinations. In contrast to Table II which shows an improvement in students' academic achievement  after  hallucinatory  disorders  can  be  resolved." If we assume that columns in the second table have been mislabelled, then it seems the school performance of these four students suffered while they were suffering hallucinations, but improved once they recovered. From this, we can infer…?

The key findings presented concern academic performance at school. Core results are presented in tables I and II. Unfortunately these tables are not consistent as they report contradictory results for the academic performance of students before and during periods when they had hallucinations.

They can be made consistent if the reader assumes that two of the columns in table II are mislabelled. If the reader assumes that the column labelled 'before disruption' actually reports the performance 'during disruption' and that the column actually labelled 'during disruption' is something else, then they become consistent. For the results to tell a coherent story and agree with the author's interpretation this 'something else' presumably should be 'after disruption'.

This is a very unfortunate error – and moreover one that is obvious to any careful reader. (So, why was it not obvious to the referees and editor?)

As well as looking at these overall scores, other assessment data is presented separately for each of respondent 1 – respondent 4. Theses sections comprise presentations of information about grades and class positions, mixed with claims about the effects of the intervention. These claims are not based on any evidence and in many cases are conclusions about 'respondents' in general although they are placed in sections considering the academic assessment data of individual respondents. So,there are a number of problems with these claims:

  • they are of the nature of conclusions, but appear in the section presenting the findings;
  • they are about the specific effects of the intervention that the author assumes has influenced academic performance, not the data analysed in these sections;
  • they are completely unsubstantiated as no data or analysis is offered to support them;
  • often they make claims about 'respondents' in general, although as part of the consideration of data from individual learners.

Despite this, the paper passed peer-review and editorial scrutiny.

Rhetorical research?

This paper seems to be an example of a kind of 'rhetorical research' where a researcher is so convinced about their pre-existant theoretical commitments that they simply assume they have demonstrated them. Here the assumption seem to be:

  1. Recovering from suffering hallucinations will increase student performance
  2. Hallucinations are caused by jinn and devils
  3. A spiritual intervention will expel jinn and devils
  4. So, a spiritual intervention will cure hallucinations
  5. So, a spiritual intervention will increase student performance

The researcher provided a spiritual intervention, and the student performance increased, so it is assumed that the scheme is demonstrated. The data presented is certainly consistent with the assumption, but does not in itself support this scheme without evidence. Awang provides evidence that student performance improved in four individuals after they had received the intervention – but there is no evidence offered to demonstrate the assumed mechanism.

A gardener might think that complimenting seedlings will cause them to grow. Perhaps she praises her seedlings every day, and they do indeed grow. Are we persuaded about the efficacy of her method, or might we suspect another cause at work? Would the peer-reveiewers and editor of the European Journal of Education and Pedagogy be persuaded this demonstrated that compliments cause plant growth? On the evidence of this paper, perhaps they would.

This is what Awang tells readers about the analysis undertaken:

Each student  respondent  involved  in  this  study  [sic, presumably not, rather the researcher] will  use  the analysis  of  the  respondent's  performance  to  determine the effect of hallucination disorders on student achievement in secondary school is accurate.

The elements compared in this analysis are as follows: a) difference in mean percentage of achievement by subject, b) difference in grade achievement by subject and c) difference in the grade of overall student achievement. All academic results of the respondents will be analyzed as well as get the mean of the difference between the  performance  before, during, and after the  respondents experience  hallucinations. 

These  results  will  be  used  as research material to determine the accuracy of the use of the Tazkiyatun  Nafs  Module  in  solving  the  problem  of hallucinations   in   school   and   can   improve   student achievement in academic school."

Awang, 2022, p.45

There is clearly a large jump between the analysis outlined in the second paragraph here, and testing the study hypotheses as set out in the final paragraph. But the author does not seem to notice this (and more worryingly, nor do the journal's reviewers and editor).

So interleaved into the account of findings discussing "mean percentage of achievement by subject…difference in grade achievement by subject…difference in the grade of overall student achievement" are totally unsupported claims. Here is an example for Respondent 1:

"Based on the findings of the respondent's achievement in the  grade  for  Respondent  1  while  facing  the  problem  of hallucinations  shows  that  there  is  not  much  decrease  or deterioration  of  the  respondent's  grade.  There  were  only  4 subjects who experienced a decline in grade between before and  during  hallucination  disorder.  The  subjects  that experienced  decline  were  English,  Geography,  CBC, and Civics.  Yet  there  is  one  subject  that  shows  a  very  critical grade change the Civics subject. The decline occurred from grade A to grade E. This shows that Civics education needs to be given serious attention in overcoming this problem of decline. Subjects experiencing this grade drop were subjects involving  emotion,  language,  as  well  as  psychomotor fitness.  In  the  context  of  psychology,  unstable  emotional development  leads  to  a  decline  in the psychomotor  and emotional development of respondents.

After  the  use  of  the  Tazkiyatun  Nafs  module  in overcoming  this  problem,  hallucinatory  disorders  can  be overcome.  This  situation  indicates  the  development  of  the respondents  during  and  after  experiencing  hallucinations after  practicing  the  Tazkiyatun  Nafs  module.  The  process that takes place in the Tzkiyatun Nafs module can help the respondent  to  stabilize  his  emotions  and  psyche  for  the better. From the above findings there were 5 subjects who experienced excellent improvement in grades. The increase occurred in English, Malay, Geography, and Civics subjects. The best improvement is in the subject of Civic education from grade E to grade B. The improvement in this language subject  shows  that  the  respondents'  emotions  have stabilized.  This  situation  is  very  positive  and  needs  to  be continued for other subjects so that respondents continue to excel in academic achievement in school.""

Awang, 2022, p.45 (emphasis added)

The material which I show here as underlined is interjected completely gratuitously. It does not logically fit in the sequence. It is not part of the analysis of school performance. It is not based on any evidence presented in this section. Indeed, nor is it based on any evidence presented anywhere else in the paper!

This pattern is repeated in discussing other aspects of respondents' school performance. Although there is mention of other factors which seem especially pertinent to the dip in school grades ("this was due to the absence of the  respondents  to  school  during  the  day  the  test  was conducted", p.46; "it was an increase from before with no marks due to non-attendance at school", p.46) the discussion of grades is interspersed with (repetitive) claims about the effects of the intervention for which no evidence is offered.


Respondent 1Respondent 2Respondent 3Respondent 4
§: Differences in Respondents' Grade Achievement by Subject"After the use of the Tazkiyatun Nafs module in overcoming this problem, hallucinatory disorders can be overcome. This situation indicates the development of the respondents during and after experiencing hallucinations after practicing the Tazkiyatun Nafs module. The process that takes place in the Tzkiyatun Nafs module can help the respondent to stabilize his emotions and psyche for the better." (p.45)"After the use of the Tazkiyatun Nafs module as a soul purification module, showing the development of the respondents during and after experiencing hallucination disorders is very good. The process that takes place in the Tzkiyatun Nafs module can help the respondent to stabilize his emotions and psyche for the better." (p.46)"The process that takes place in the Tazkiyatun Nafs module can help the respondent to stabilize his emotions and psyche for the better" (p.46)"The process that takes place in the Tazkiyatun Nafs module can help the respondent to stabilize his emotions and psyche for the better." (p.46)
§:Differences in Respondent Grades according to Overall Academic Achievement"Based on the findings of the study after the hallucination
disorder was overcome showed that the development of the respondents was very positive after going through the treatment process using the Tazkiyatun Nafs module…In general, the use of Tazkiyatun Nafs module successfully changed the learning lifestyle and achievement of the respondents from poor condition to good and excellent achievement.
" (pp.46-7)
"Based on the findings of the study after the hallucination disorder was overcome showed that the development of the respondents was very positive after going through the treatment process using the Tazkiyatun Nafs module. … This excellence also shows that the respondents have recovered from hallucinations after practicing the methods found in the Tazkiayatun Nafs module that has been introduced.
In general, the use of the Tazkiyatun Nafs module successfully changed the learning lifestyle and achievement of the respondents from poor condition to good and excellent achievement
." (p.47)
"Based on the findings of the study after the hallucination disorder was overcome showed that the development of the respondents was very positive after going through the treatment process using the Tazkiyatun Nafs module…In general, the use of the Tazkiyatun Nafs module successfully changed the learning lifestyle and achievement of the respondents from poor condition to good and excellent achievement." (p.47)"Based on the findings of the study after the hallucination disorder was overcome showed that the development of the respondents was very positive after going through the treatment process using the Tazkiyatun Nafs module…In general, the use of the Tazkiyatun Nafs module has successfully changed the learning lifestyle and achievement of the respondents from poor condition to good and excellent achievement." (p.47)
Unsupported claims made within findings sections reporting analyses of individual student academic grades: note (a) how these statements included in the analysis of individual school performance data from four separate participants (in a case study – a methodology that recognises and values diversity and individuality) are very similar across the participants; (b) claims about 'respondents' (plural) are included in the reports of findings from individual students.

Awang summarises what he claims the analysis of 'differences in respondents' grade achievement by subject' shows:

"The use of the Tazkiyatun Nafs module in this study helped the students improve their respective achievement grades. Therefore, this soul purification module should be practiced by every student to help them in stabilizing their soul and emotions and stay away from all the disturbances of the subtle beings that lead to hallucinations"

Awang, 2022, p.46

And, on the next page, Awang summarises what he claims the analysis of 'differences in respondent grades according to overall academic achievement' shows:

"The use of the Tazkiyatun Nafs module in this study helped the students improve their respective overall academic achievement. Therefore, this soul purification module should be practiced by every student to help them in stabilizing the soul and emotions as well as to stay away from all the disturbances of the subtle beings that lead to hallucination disorder."

Awang, 2022, p.47

So, the analysis of grades is said to demonstrate the value of the intervention, and indeed Awang considers this is reason to extend the intervention beyond the four participants, not just to others suffering hallucinations, but to "every student". The peer review process seems not to have raised queries about

  • the unsupported claims,
  • the confusion of recommendations with findings (it is normal to keep to results in a findings section), nor
  • the unwarranted generalisation from four hallucination suffers to all students whether healthy or not.

Interpreting the results

There seem to be two stories that can be told about the results:

When the four students suffered hallucinations, this led to a deterioration in their school performance. Later, once they had recovered from the episodes of hallucinations, their school performance improved.  

Narrative 1

Now narrative 1 relies on a very substantial implied assumption – which is that the numbers presented as school performance are comparable over time. So, a control would be useful: such as what happened to the performance scores of other students in the same classes over the same time period. It seems likely they would not have shown the same dip – unless the dip was related to something other than hallucinations – such as the well-recognised dip after long school holidays, or some cultural distraction (a major sports tournament; fasting during Ramadan; political unrest; a pandemic…). Without such a control the evidence is suggestive (after all, being ill, and missing school as a result, is likely to lead to a dip in school performance, so the findings are not surprising), but inconclusive.

Intriguingly, the author tells readers that "student  achievement  statistics  from  the  beginning  of  the year to the middle of the current [sic, published in 2022] year in secondary schools in Northern Peninsular Malaysia that have been surveyed by researchers show a decline (Sabri, 2015 [sic])" (p.42), but this is not considered in relation to the findings of the study.

When the four students suffered hallucinations, this led to a deterioration in their school performance. Later, as a result of undergoing the soul purification module, their school performance improved.  

Narrative 2

Clearly narrative 2 suffers from the same limitation as narrative 1. However, it also demands an extra step in making an inference. I could re-write this narrative:

When the four students suffered hallucinations, this led to a deterioration in their school performance. Later, once they had recovered from the episodes of hallucinations, their school performance improved. 
AND
the recovery was due to engagement with the soul purification module.

Narrative 2'.

That is, even if we accept narrative 1 as likely, to accept narrative 2 we would also need to be convinced that:

  • a) sufferers from medical conditions leading to hallucinations do not suffer periodic attacks with periods of remission in between; or
  • b) episodes of hallucinations cannot be due to one-off events (emotional trauma, T.I.A. {transient ischaemic attack or mini-strokes},…) that resolve naturally in time; or
  • c) sufferers from medical conditions leading to hallucinations do not find they resolve due to maturation; or
  • d) the four participants in this study did not undertaken any change in life-style (getting more sleep, ceasing eating strange fungi found in the woods) unrelated to the intervention that might have influenced the onset of hallucinations; or
  • e) the four participants in this study did not receive any medical treatment independent of the intervention (e.g., prescribed medication to treat migraine episodes) that might have influenced the onset of hallucinations

Despite this study being supposedly a case study (where the expectation is there should be 'thick description' of the case and its context), there is no information to help us exclude such options. We do not know the medical diagnoses of the conditions causing the participants' hallucinations, or anything about their lives or any medical treatment that may have been administered. Without such information, the analysis that is provided is useless for answering the research question.

In effect, regardless of all the other issues raised, the key problem is that the research design is simply inadequate to test the research question. But it seems the referees and editor did not notice this shortcoming.

Alleged implications of the research

After presenting his results Awang draws various implications, and makes a number of claims about what had been found in the study:

  • "After the students went through the treatment session by using the Tazkiayatun Nafsmodule to treat hallucinations, it showed a positive effect on the student respondents. All this was certified by the expert, the student's parents as well as the  counselor's  teacher." (p.48)
  • "Based on these findings, shows that hallucinations are very disturbing to humans and the appropriate method for now to solve this problem is to use the Tazkiyatun Nafs Module." (p.48)
  • "…the use of the Tazkiyatun Nafs module while the  respondent  is  suffering  from  hallucination  disorder  is very  appropriate…is very helpful to the respondents in restoring their minds and psyche to be calmer and healthier. These changes allow  students  to  focus  on  their  studies  as  well  as  allow them to improve their academic performance better." (p.48)
  • "The use of the Tazkiyatun Nafs Module in this study has led to very positive changes there are attitudes and traits of students  who  face  hallucinations  before.  All  the  negative traits  like  irritability, loneliness,  depression,etc.  can  be overcome  completely." (p.49)
  • "The personality development of students is getting better and perfect with the implementation of the Tazkiaytun Nafs module in their lives." (p.49)
  • "Results  indicate that  students  who  suffer  from  this hallucination  disorder are in  a  state  of  high  depression, inactivity, fatigue, weakness and pain,and insufficient sleep." (p.49)
  • "According  to  the  findings  of  this study,  the  history  of  this  hallucination  disorder  started in primary  school  and  when  a  person  is  in  adolescence,  then this  disorder  becomes  stronger  and  can  cause  various diseases  and  have  various  effects  on  a  person who  is disturbed." (p.50)

Given the range of interview data that Awang claims to have collected and analysed, at least some of the claims here are possibly supported by the data. However, none of this data and analysis is available to the reader. 2 These claims are not supported by any evidence presented in the paper. Yet peer reviewers and the editor who read the manuscript seem to feel it is entirely acceptable to publish such claims in a research paper, and not present any evidence whatsoever.

Summing up

In summary: as far as these four students were concerned (but not perhaps the fifth participant?), there did seem to be a relationship between periods of experiencing hallucinations and lower school performance (perhaps explained by such factors as "absenteeism to school during the day the test was conducted" p.46) ,

"the performance shown by students who face chronic hallucinations is also declining and  declining.  This  is  all  due  to  the  actions  of  students leaving the teacher's learning and teaching sessions as well as  not  attending  school  when  this  hallucinatory  disorder strikes.  This  illness or  disorder  comes  to  the  student suddenly  and  periodically.  Each  time  this  hallucination  disease strikes the student causes the student to have to take school  holidays  for  a  few  days  due  to  pain  or  depression"

Awang, 2022, p.42

However,

  • these four students do not represent any wider population;
  • there is no information about the specific nature, frequency, intensity, etcetera, of the hallucinations or diagnoses in these individuals;
  • there was no statistical test of significance of changes; and
  • there was no control condition to see if performance dips were experienced by others not experiencing hallucinations at the same time.

Once they had recovered from the hallucinations (and it is not clear on what basis that judgement was made) their scores improved.

The author would like us to believe that the relief from the hallucinations was due to the intervention, but this seems to be (quite literally) an act of faith 3 as no actual research evidence is offered to show that the soul purification module actually had any effect. It is of course possible the module did have an effect (whether for the conjectured or other reasons – such as simply offering troubled children some extra study time in a calm and safe environment and special attention – or because of an expectancy effect if the students were told by trusted authority figures that the intervention would lead to the purification of their hearts and the healing of their hallucinatory disorder) but the study, as reported, offers no strong grounds to assume it did have such an effect.

An irresponsible journal

As hallucinations are often symptoms of organic disease affecting blood supply to the brain, there is a major question of whether treating the condition by religious instruction is ethically sound. For example, hallucinations may indicate a tumour growing in the brain. Yet, if the module was only a complement to proper medical attention, a reader may prefer to suspect that any improvement in the condition (and consequent increased engagement in academic work) may have been entirely unrelated to the module being evaluated.

Indeed, a published research study that claims that soul purification is a suitable treatment for medical conditions presenting with hallucinations is potentially dangerous as it could lead to serious organic disease going untreated. If Awang's recommendations were widely taken up in Malaysia such that students with serious organic conditions were only treated for their hallucinations by soul purification rather than with medication or by surgery it would likely lead to preventable deaths. For a research journal to publish a paper with such a conclusion, where any qualified reviewer or editor could easily see the conclusion is not warranted, is irresponsible.

As the journal website points out,

"The process of reviewing is considered critical to establishing a reliable body of research and knowledge. The review process aims to make authors meet the standards of their discipline, and of science in general."

https://www.ej-edu.org/index.php/ejedu/about

So, why did the European Journal of Education and Pedagogy not subject this submission to meaningful review to help the author of this study meet the standards of the discipline, and of science in general?


Work cited:

Notes:

1 In mature fields in the natural sciences there are recognised traditions ('paradigms', 'disciplinary matrices') in any active field at any time. In general (and of course, there will be exceptions):

  • at any historical time, there is a common theoretical perspective underpinning work in a research programme, aligned with specific ontological and epistemological commitments;
  • at any historical time, there is a strong alignment between the active theories in a research programme and the acceptable instrumentation, methodology and analytical conventions.

Put more succinctly, in a mature research field, there is generally broad agreement on how a phenomenon is to be understood; and how to go about investigating it, and how to interpret data as research evidence.

This is generally not the case in educational research – which is in part at least due to the complexity and, so, multi-layered nature, of the phenomena studied (Taber, 2014a): phenomena such as classroom teaching. So, in reviewing educational papers, it is sometimes necessary to find different experts to look at the theoretical and the methodological aspects of the same submission.


2 The paper is very strange in that the introductory sections and the conclusions and implications sections have a very broad scope, but the actual research results are restricted to a very limited focus: analysis of school test scores and grades.

It is as if as (and could well be that) a dissertation with a number of evidential strands has been reduced to a paper drawing upon only one aspect of the research evidence, but with material from other sections of the dissertation being unchanged from the original broader study.


3 Readers are told that

"All  these  acts depend on the sincerity of the medical researcher or fortune-teller seeking the help of Allah S.W.T to ensure that these methods and means are successful. All success is obtained by the permission of Allah alone"

Awang, 2022, p.43


Fingerprinting an exoplanet

Life, death, and multiple Gaias


Keith S. Taber


NASA might be said to be engaged in looking for other Gaias beyond our Gaia, as Dr Milam explained to another Gaia.

This post is somewhat poignant as something I heard on a radio podcast reminded me how science has recently lost one of its great characters, as well as an example of that most rare thing in today's science – the independent scientist.


Inside Science episode "Deep Space and the Deep Sea – 40 years of the International Whaling Moratorium", presented, perhaps especially aptly, by Gaia Vince

I was listening to the BBC's Inside Science pod-cast episode 'Deep Space and the Deep Sea – 40 years of the International Whaling Moratorium' where the presenter – somewhat ironically, in view of the connection I was making, Gaia Vince – was talking to Dr Stefanie Milam of Nasa's Goddard Space Flight Centre about how the recently launched James Webb Space Telescope could help scientists look for signs of life on other planets.


From: https://jwst.nasa.gov/content/meetTheTeam/people/milam.html

Dr Milam explained that

"spectra…give us all the information that we really need to understand a given environment. And that's one of the amazing parts about the James Webb space telescope. So, what we have access to with the wavelengths that the James Webb space telescope actually operates at, is that we have the fingerprint pattern of given molecules, things like water, carbon monoxide, carbon dioxide, all these things that we find in our own atmosphere, and so by using the infrared wavelengths we can look for these key ingredients in atmospheres around other planets or even, actually, objects in our own solar system, and that tells us a little bit about what is going on as far as the dynamics of that planet, whether or not its has got geological activity, or maybe even something as crazy as biology."

Dr Stefanie Milam, interviewed for 'Inside Science'
"Webb has captured the first clear evidence of carbon dioxide (CO2) in the atmosphere of a planet outside of our solar system!" (Hot Gas Giant Exoplanet WASP-39 b Transit Light Curve, NIRSpec Bright Object Time-Series Spectroscopy.)
Image: NASA, ESA, CSA, and L. Hustak (STScI). Released under 2.0 Generic (CC BY 2.0) License – Some rights reserved by James Webb Space Telescope
Do molecules have fingerprints

Fingerprints have long been used in forensic work to identify criminals (and sometimes their victims) because our fingerprints are pretty unique. Even 'identical' twins do not have identical fingerprints (thought I suspect that fact rather undermines some crime fiction plots). But, to have fingerprints one surely has to have fingers. A palm print requires a palm, and a footprint, a foot. So, can molecules, not known for their manual dexterity, have fingerprints?

Well, it is not exactly by coincidence (as the James Webb space telescope has had a lot of media attention) that I very recently posted here, in the context of new observations of the early Universe, that

"Spectroscopic analysis allows us to compare the pattern of redshifted spectral lines due to the presence of elements absorbing or emitting radiation, with the position of those lines as they are found without any shift. Each element has its own pattern of lines – providing a metaphorical fingerprint.

from: A hundred percent conclusive science. Estimation and certainty in Maisie's galaxy

In chemistry, elements and compounds have unique patterns of energy transitions which can be identified through spectroscopy. So, we have 'metaphorical fingerprints'. To describe a spectrum as a chemical substance's (or entity's, such as an ion's) fingerprint is to use a metaphor. It is not actually a fingerprint – there are no fingers to leave prints – but this figure of speech gets across an idea though an implicit comparison with something already familiar. *1 That is, it is a way of making the unfamiliar familiar (which might be seen as a description of teaching!)

Dead metaphors

But perhaps this has become a 'dead metaphor' so that now chemicals do have fingerprints? One of the main ways that language develops is by words changing their meanings over time as metaphors become so commonly used they case to be metaphorical.

For example, I understand the term electrical charge is a dead metaphor. When electrical charge was first being explored and was still unfamiliar, the term 'charge' was adopted by comparison with the charging of a canon or the charge of shot used in a shotgun. The shot charge refers to the weight of shot included in a cartridge. Today, most people would not know that, whilst being very familiar with the idea of electrical charge. But when the term electrical charge was first used most people knew about charging guns.

So, initially, electrical 'charge' was a metaphor to refer to the amount of 'electricity' – which made use of a familiar comparison. Now it is a dead metaphor, and 'electrical charge' is now considered a technical tern in its own right.

Another example might be electron spin: electrons do not spin in the familiar sense, but really do (now) have spin as the term has been extended to apply to quanticles with inherent angular momentum by analogy with more familiar macroscopic objects that have angular momentum when they are physically rotating. So, we might say that when the term was first used, it was a metaphor, but no longer. (That is, physicists have expanded the range of convenience of the term spin.)

Perhaps, similarly, fingerprint is now so commonly used to mean a unique identifier in a wide range of contexts, that it should no longer be considered a metaphor. I am not sure if that is so, yet, but perhaps it will be in, say, a century's time – and the term will be broadly used without people even noticing that many things have acquired fingerprints without having fingers. (A spectrum will then actually be a chemical substance's or entity's fingerprint.) After all, many words we now commonly use contain fossils of their origins without us noticing. That is, metaphorical fossils, of course. *2

James Lovelock, R.I.P.

The reason I found this news item somewhat poignant was that I was listening to it just a matter of weeks after the death (at age 103) of the scientist Jim Lovelock. *3 Lovelock invented the device which was able to demonstrate the ubiquity of chlorofluorocarbons (CFCs) in the atmosphere. These substances were very commonly used as refrigerants and aerosol propellants as they were very stable, and being un-reactive (so non-toxic) were considered safe.

But this very stability allowed them to remain in and spread through the atmosphere for a very long time until they were broken down in the stratosphere by ultraviolet radiation to give radicals that reacted with the ozone that is so protective of living organisms. Free radical reactions can occur as chain reactions as when a radical interacts with a molecule it leads to a new molecule, plus a new radical which can often take part in a further interaction with another molecule: so, each CFC molecule could lead to the destruction of many ozone molecules. CFCs have now been banned for most purposes to protect the ozone 'layer', and so us.

Life is chemistry out of balance

But another of Lovelock's achievements came when working for NASA to develop means to search for life elsewhere in the universe. As part of the Mariner missions, NASA wanted Lovelock to design apparatus that could be sent to other worlds and search for life (and I think he did help do that), but Lovelock pointed out that one could tell if a planet had life by a spectroscopic analysis.

Any alien species analysing light passing through earth's atmosphere would see its composition was far from chemical equilibrium due to the ongoing activity of its biota. (If life were to cease on earth today, the oxygen content of the atmosphere would very quickly fall from 21% to virtually none at all as oxygen reacts with rocks and other materials.) If the composition of an atmosphere seemed to be in chemical equilibrium, then it was unlikely there was life. However, if there were high concentrations of gases that should react together or with the surface, then something, likely life, must be actively maintaining that combination of gases in the atmosphere.

"Living systems maintain themselves in a state of relatively low entropy at the expense of their nonliving environments. We may assume that this general property is common to all life in the solar system. On this assumption, evidence of a large chemical free energy gradient between surface matter and the atmosphere in contact with it is evidence of life. Furthermore, any planetary biota which interacts with its atmosphere will drive that atmosphere to a state of disequilibrium which, if recognized, would also constitute direct evidence of life, provided the extent of the disequilibrium is significantly greater than abiological processes would permit. It is shown that the existence of life on Earth can be inferred from knowledge of the major and trace components of the atmosphere, even in the absence of any knowledge of the nature or extent of the dominant life forms. Knowledge of the composition of the Martian atmosphere may similarly reveal the presence of life there."

Dian R. Hitchcock and James E. Lovelock – from Lovelock's website (originally published in Icarus: International Journal of the Solar System in 1967)

The story was that NASA did not really want to be told they did not need to send missions with spacecraft to other words such as Mars to look for life, rather that they only had to point a telescope and analyse the spectrum of radiation. Ironically, perhaps, then, that is exactly what they are now doing with planets around other star systems where it is not feasible (not now, perhaps not ever) to send missions.

Gaia and Gaia

But Lovelock became best known for his development and championing of the Gaia theory. According to Gaia (the theory, not the journalist), the development of life on earth has shaped the environment (and not just exploited pre-existing niches) and developed as a huge integrated and interacting system (the biota, but also the seas, the atmosphere, freshwater, the soil,…) such that large scale changes in one part of the system have knock-on effect elsewhere. *4

So, Gaia can be understood not as the whole earth as a planet, or just the biota as the collective life in terms of organisms, but rather as the dynamic system of life of earth and the environment it interacts with. In a sense (and it is important to see this is meant as an analogy, a thinking tool) Gaia is like some supra-organism. Just as snail has a shell that it has produced for its self, Gaia has shaped the biosphere where the biota lives. *4

The system has built in feedback cycles to protect it from perturbations (not by chance, or due to some mysterious power, but due to natural selection) but if it is subject to a large enough input it would shift to a new (and perhaps very different) equilibrium state. *5 This certainly happened when oxygen releasing organisms evolved: the earth today is inhospitable to the organisms that lived here before that event (some survived to leave descendants, but only in places away from the high oxygen concentrations, such as in lower lays of mud beneath the sea), and most organisms alive today would die very quickly in the previous conditions.

It would be nice to think that Gaia, the science journalist that is, was named after the Gaia theory – but Lovelock only started publishing about his Gaia hypothesis about the time that Gaia was born.*6 So, probably not. Gaia is a traditional girl's name, and was the name of the Greek goddess who personified the earth (which is why the name was adopted by Lovelock).

Still, it was poignant to hear a NASA scientist referring to the current value of a method first pointed out by Lovelock when advising NASA in the 1970s and informed by his early thinking about the Gaia hypothesis. NASA might be said to now be engaged in looking for other Gaias on worlds outside our own solar system, as Dr Milam explained to – another – Gaia here on earth.


Notes:

*1 It is an implicit comparison, because the listener/reader is left to appreciate that it is meant as a figure of speech: unlike in a simile ('a spectrum is like a fingerprint') where the comparison is made explicit .


*2 For some years I had a pager (common before mobile phones) – a small electronic device which could receive a text message, so that my wife could contact me in an emergency if I was out visiting schools by phoning a message to be conveyed by a radio signal. If I had been asked why it was called a pager, I would have assumed that each message of text was considered to comprise a 'page'.

However, a few weeks ago I watched an old 'screwball comedy' being shown on television: 'My favourite wife' (or 'My favorite [sic] wife' in US release).

(On the very day that Cary Grant remarries after having his first wife, long missing after being lost at sea, declared legally dead, wife number one reappears having been rescued from a desert island. That this is a very unlikely scenario was played upon when the film was remade in colour, as 'Move Over Darling', with Doris Day and James Garner. The returned first wife, pretending to be a nurse, asks the new wife if she is not afraid the original wife would reappear, as happened in that movie; eliciting the response: 'Movies. When do movies ever reflect real life?')

Some of the action takes place in the honeymoon hotel where groom has disappeared from the suite (these are wealthy people!) having been tracked down by his first wife. The new wife asks the hotel to page him – and this is how that worked with pre-electronic technology:

Paging Mr Arden: Still from 'My Favorite Wife'

*3 So, although I knew Lovelock had died (July 26th), he was still alive at the time of the original broadcast (July 14th). In part, my tardiness comes from the publicly funded BBC's decisions to no longer make available downloads of some of its programmes for iPods and similar devices immediately after broadcast. (This downgrading of the BBC's service to the public seems to be to persuade people to use its own streaming service.)


*4 The Gaia theory developed by Lovelock and Lyn Margulis includes ideas that were discussed by Vladimir Vernadsky almost a century ago. Although Vernadsky's work was well known in scientific circles in the Soviet Union, it did not become known to scientists in Western Europe till much later. Vernadsky used the term 'biosphere' to refer to those 'layers' of the earth (lower atmosphere to outer crust) where life existed.


*5 A perturbation such as as extensive deforestation perhaps, or certainly increasing the atmospheric concentrations of 'greenhouse' gases beyond a certain point.


*6 Described as a hypothesis originally, it has been extensibility developed and would seem to now qualify as a theory (a "consistent, comprehensive, coherent and extensively evidenced explanation of aspects of the natural world") today.

A hundred percent conclusive science

Estimation and certainty in Maisie's galaxy


Keith S. Taber


An image from the James Webb Space Telescope
(released images are available at https://webbtelescope.org/resource-gallery/images)

NASA's James Webb Space Telescope is now operational, and offering new images of 'deep space'. This has led to claims of finding images of objects from further away in the Universe, and so from further back in time, than previously seen. This should support a lot of new scientific work and will surely lead to some very interesting findings. Indeed, it seems to have had an almost immediate impact.

Maisie's galaxy

One of these new images is of an object known as:

CEERSJ141946.35-525632.8

or less officially (but more memorably) as

Masie's galaxy.

A red smudge on one of the new images has been provisionally identified as evidence of a galaxy as it was less than 300 000 000 years after the conjectured 'big bang' event understood as the origin of the universe. The galaxy is so far away that its light has taken since then to reach us.

Three hundred million years seems a very long time in everyday terms, but it a small fraction of the current age of the universe, believed to be around fourteen billion years. 1

300 000 000 years

≪ 14 000 000 000 years

The age estimate is based on the colour of the object, reflecting its 'redshift':

"Scientists with the CEERS Collaboration have identified an object–dubbed Maisie's galaxy in honor of project head Steven Finkelstein's daughter–that may be one of the earliest galaxies ever observed. If its estimated redshift of 14 is confirmed with future observations, that would mean we're seeing it as it was just 290 million years after the Big Bang."

University of Texas at Austin, UT News, August 04, 2022

(CEERS is the Cosmic Evolution Early Release Science Survey.)

This finding is important in understanding the evolution of the universe – for example, observing the earliest galaxies puts a limit on how long the universe existed before star formation started. (Although the episode was called 'The first galaxies at the universe's dawn' Masie's galaxy is thought to contain heavier elements that were produced in even earlier stars.)

Uncertainty in science (and certainty in reporting science)

So, the claim is provisional. It is an estimate awaiting confirmation.

Strictly, science is concerned with provisional knowledge claims. This is not simply because scientists can make mistakes. All measurements are subject to limits in precision (measurement 'errors'). More fundamentally, all measurements depend on a theory of the instrumentation used, and theoretical knowledge is always open to being revisited on the basis of new ways of understanding.

We may not expect the theory behind the metre rule to change any time soon (although even there, our understanding shifted somewhat with Einstein's theories) but many scientific observations depend on highly complex apparatus, both for data collection and analysis. Despite this, science is often represented in the media, both by commentators and sometimes scientists themselves, as if it produced absolute certainty.

Read about science in public discourse and the media

Read about scientific certainty in the media

A rough estimate?

In the case of Maisie's galaxy, the theoretical apparatus seems to be somewhat more sophisticated than the analytical method used to provisionally age the object. This was explained by Associate Professor Steve Finkelstein when he was interviewed on the BBC's Science in Action episode 'The first galaxies at the universe's dawn'.


Masie's galaxy – it's quite red.
The first galaxies at the universe's dawn. An episode of 'Science in Action'

Professor Finkelstein explained:

"We can look deep into out past by taking these deep images, and we can find the sort of faintest red smudges and that tells us that they are extremely far away, and from exactly how red they are we can estimate that distance."

Associate Professor Steve Finkelstein

So, the figure of 290 000 000 years after the big bang is an estimate. Fair enough, but what 'caught my ear', so to speak, was the contrast between the acknowledged uncertainty of the current estimate, and the claimed possibility of moving from this to absolute knowledge,

"If this distance we have measured for Masie's galaxy, of a red shift of 14, holds true, and I can't stress enough that we need spectroscopic confirmation to precisely measure that distance. [*] Where you take a telescope, could be James Webb, could be a different telescope, you observe it [the galaxy] and you split the light into its component colours, and you can actually precisely measure – measure the red shift, measure the distance – a hundred percent conclusively."

Associate Professor Steve Finke
[* To my ear, there might well be an edit at this point – the quote is based on what was broadcast which might omit or re-sequence parts of the interview.]

Spectroscopic analysis allows us to compare the pattern of redshifted spectral lines due to the presence of elements absorbing or emitting radiation, with the position of those lines as they are found without any shift. Each element has its own pattern of lines – providing a metaphorical fingerprint. A redshift (or blueshift) moves these lines to different parts of the spectrum, but does not change their collective profile as all the lines are moved to the same extent.


Spectral lines can be used like fingerprints to identify substances.
(Image by No-longer-here from Pixabay)

Some of these lines are fine, allowing precise measurements of wavenumber/frequency, and there are enough of them to be able to make very confident assignments of the 'fingerprints', and use this to estimate the shift. We might extend our analogy to a fingerprint on a rubber balloon which had been stretched since a fingerprint was made. In absolute terms, the print would no longer (or 'no wider' for that matter) fit the finger that made it, but the distortion is systematic allowing a match to be made – and the degree of stretching to be calculated.


If a pattern is distorted in a systematic way, we may still be able to match it to an undistorted version
(Original images by Clker-Free-Vector-Images (balloon), OpenClipart-Vectors (print) and Alexander (notepad) from Pixabay)

Yet, even though this is a method that is considered well-understood, reliable, and potentially very accurate and precise 2, I am not sure you can "precisely measure, measure the redshift, measure the distance. A hundred percent conclusively". Science, at least as I understand it, always has to maintain some small level of humility.

Scientists may be able to confirm and hone the estimate of 290 000 000 years after the big bang for the age of Maisie's galaxy. Over time, further observations, new measurements, refinement in technique, or even theory, may lead to successive improvements in that age measurement and both greater accuracy and greater precision.2 But any claim of a conclusive measurement to a precision of 100% has a finality that sounds like something other than science to me.


Notes

1 Oddly, most webages I've seen that cite values for the age of the universe do not make it explicit whether these are American (109) or English (1012) billions! It seems to be assumed that, as with sulfur [i.e., sulphur], and perhaps soon aluminum and fosforus, we are all using American conventions.


2 Precision and accuracy are different. Consider an ammeter.


An ammeter (Image by Gerd Altmann from Pixabay)

Due to the method of reading a needle position against a scale there is a limit to precision (perhaps assumed to the nearest calibration, so to ±0.5 calibrations). This measurement error of ±0.5 units is, in effect, a limit in detail or resolution, but not an 'error' in the everyday sense of getting something wrong. However, if the meter had been miscalibrated, or over time has shifted from calibration, so the needle is misaligned (so perhaps the meter reads +0.15 A when it is not connected into a circuit) then that is inaccuracy. There is always some level of imprecision (some limit on how precise we can be), even when we have an accurate reading.


In science, a measurement normally offers a best estimate of a true value, with an error range acknowledging how far the real value might be from that best estimate. See the example below: Measurement B claims the most precision, but is actually inaccurate. Measurement A is the most accurate (but least precise).

If we imagine that a galaxy was being seen as it was

275 000 000 years after the big bang

and three measurements of its age were given as:

A: 280 000 000 ± 30 000 000 years after the big bang

(i.e., 250 000 000 – 310 000 000)

B: 290 000 000 ± 10 000 000 years after the big bang

(i.e., 280 000 000 – 300 000 000)

C: 260 000 000 ± 20 000 000 years after the big bang

(i.e., 240 000 000 – 280 000 000)

then measurement B is more precise (it narrows down the possible range the most) but is inaccurate (as the actual age falls outside the range of this measurement). Of course, unlike in such a hypothetical example, in a real case we would not know the actual age to allow us to decide which of the measurements is more accurate.


A case study of educational innovation?

Design and Assessment of an Online Prelab Model in General Chemistry


Keith S. Taber


Case study is meant to be naturalistic – whereas innovation sounds like an intervention. But interventions can be the focus of naturalistic enquiry.

One of the downsides of having spent years teaching research methods is that one cannot help but notice how so much published research departs from the ideal models one offers to students. (Which might be seen as a polite way of saying authors often seem to get key things wrong.) I used to teach that how one labelled one's research was less important than how well one explained it. That is, different people would have somewhat different takes on what is, or is not, grounded theory, case study or action research, but as long as an author explained what they had done, and could adequately justify why, the choice of label for the methodology was of secondary importance.

A science teacher can appreciate this: a student who tells the teacher they are doing a distillation when they are actually carrying out reflux – but clearly explains what they are doing and why, will still be understood (even if the error should be pointed out). On the other hand if a student has the right label but an alternative conception this is likely to be a more problematic 'bug' in the teaching-learning system. 1

That said, each type of research strategy has its own particular weaknesses and strengths so describing something as an experiment, or a case study, if it did not actually share the essential characteristics of that strategy, can mislead the reader – and sometimes even mislead the authors such that invalid conclusions are drawn.

A 'case study', that really is a case study

I made reference above to action research, grounded theory, and case study – three methodologies which are commonly name-checked in education research. There are a vast number of papers in the literature with one of these terms in the title, and a good many of them do not report work that clearly fits the claimed approach! 2


The case study was published in the Journal for the Research Center for Educational Technology

So, I was pleased to read an interesting example of a 'case study' that I felt really was a case study (Llorens-Molina, 2009). 'Design and assessment of an online prelab model in general chemistry: A case study' offered a good example of a case study. Although, I suspect some other authors might have been tempted to describe this research differently.

Is it a bird, is it a plane; no it's…

Llorens-Molina's study included an experimental aspect. A cohort of learners was divided into two groups to allow the researcher to compare two different educational treatments; then, measurements were made to compare outcomes quantitatively. That might sound like an experiment. Moreover, this study reported an attempt to innovate in a teaching situation, which gives the work a flavour of action research. Despite this, I agree with Llorens-Molinathat that the work is best characterised as a case study.

Read about experiments

Read about action research


A case study focuses on 'one instance' from among many


What is a case study?

A case study is an in-depth examination of one instance: one example – of something for which there are many examples. The focus of a case study might be one learner, one teacher, one group of students working together on a task, one class, one school, one course, one examination paper, one text book, one laboratory session, one lesson, one enrichment programme… So, there is great variety in what kind of entity a case study is a study of, but what case studies have in common is they each focus in detail on that one instance.

Read about case study methodology


Characteristics of case study

Characteristics of case study

Case studies are naturalistic studies, which means they are studies of things as they are, not attempts to change things. The case has to be bounded (a reader of a case study learns what is in the case and what is not) but tends to be embedded in a wider context that impacts upon it. That is, the case is entangled in a context from which it could not easily be extracted and still be the same case. (Imagine moving a teacher with her class from their school to have their lesson in a university where it could be observed by researchers – it would not be 'the same lesson' as would have occurred in situ).

The case study is reported in detail, often in a narrative form (not just statistical summaries) – what is sometimes called 'thick description'. Usually several 'slices' of data are collected – often different kinds of data – and often there is a process of 'triangulation' to check the consistency of the account presented in relation to the different slices of data available. Although case studies can include analysis of quantitative data, they are usually seen as interpretive as the richness of data available usually reflects complexity and invites nuance.



Design and Assessment of an Online Prelab Model in General Chemistry

Llorens-Molina's study explored the use of prelabs that are "used to introduce and contextualize laboratory work in learning chemistry" (p.15), and in particular "an alternative prelab model, which consists of an audiovisual tutorial associated with an online test" (p.15).

An innovation

The research investigated an innovation in teaching practice,

"In our habitual practice, a previous lecture at the beginning of each laboratory session, focused almost exclusively on the operational issues, was used. From our teaching experience, we can state that this sort of introductory activity contributes to a "cookbook" way to carry out the laboratory tasks. Furthermore, the lecture takes up valuable time (about half an hour) of each ordinary two-hour session. Given this set-up, the main goal of this research was to design and assess an alternative prelab model, which was designed to enhance the abilities and skills related to an inquiry-type learning environment. Likewise, it would have to allow us to save a significant amount of time in laboratory sessions due to its online nature….

a prelab activity developed …consists of two parts…a digital video recording about a brief tutorial lecture, supported by a slide presentation…[followed by ] an online multiple choice test"

Llorens-Molina, 2009, p.16-17
Not action research?

The reference to shifting "our habitual practice" indicates this study reports practitioner research. Practitioner studies, such as this, that test a new innovation are often labelled by authors as 'action research'. (Indeed, sometimes, the fact that research is carried out by practitioners looking to improve their own practice is seen as sufficient for action research: when actually this is a necessary, but not a sufficient condition.)

Genuine action research aims at improving practice, not simply seeing if a specific innovation is working. This means action research has an open-ended design, and is cyclical – with iterations of an innovation tested and the outcomes used as feedback to inform changes in the innovation. (Despite this, a surprising number of published studies labelled as action research lack any cyclic element, simply reporting one iteration of a innovation.) Llorens-Molina's study does not have a cyclic design, so would not be well-characterised as action research.

An experimental design?

Llorens-Molina reports that the study was motivated by three hypotheses (p.16):

  • "Substituting an initial lecture by an online prelab to save time during laboratory sessions will not have negative repercussions in final examination marks.
  • The suggested online prelab model will improve student autonomy and prerequisite knowledge levels during laboratory work. This can be checked by analyzing the types and quantity of SGQ [student generated questions].
  • Student self-perceptions about prelab activities will be more favourable than those of usual lecture methods."

To test these hypotheses the student cohort was divided into two groups, to be split between the customary and innovative approach. This seems very much like an experiment.

It may be useful here to make a discrimination between two levels of research design – methodology (akin to strategy) and techniques (akin to tactics). In research design, a methodology is chosen to meet the overall aims of the study, and then one or more research techniques are selected consistent with that methodology (Taber, 2013). Experimental techniques may be included in a range of methodologies, but experiment as an overall methodology has some specific features.

Read about Research design

In a true experiment there is random assignment to conditions, and often there is an intention to generalise results to a wider population considered to be sampled in the study. Llorens-Molina reports that although inferential statistics were used to test the hypotheses, there was no intention to offer statistical generalisation beyond the case. The cohort of students was not assumed to be a sample representing some wider population (such as, say, undergraduates on chemistry courses in Spain) – and, indeed, clearly such an assumption would not have been justified.

Case study is naturalistic – but an innovation is an intervention in practice…

Case study is said to be naturalistic research – it is a method used to understand and explore things as they are, not to bring about change. Yet, here the focus is an innovation. That seems a contradiction. It would be a contradiction if the study was being carried out by external researchers who had asked the teaching team to change practice for the benefits of their study. However, here it is useful to separate out the two roles of teacher and researcher.

This is a situation that I commonly faced when advising graduates preparing for school teaching who were required to carry out a classroom based study into an aspect of their school placement practice context as part of their university qualification (the Post-Graduate Certificate in Education, P.G.C.E.). Many of these graduates were unfamiliar with research into social phenomena. Science graduates often brought a model of what worked in the laboratory to their thinking about their projects – and had a tendency to think that transferring the experimental approach to classrooms (where there are usually a large number of potentially relevant variables, many of which can not be controlled) would be straightforward.

Read 'Why do natural scientists tend to make poor social scientists?'

The Cambridge P.G.C.E. teaching team put into place a range of supports to introduce graduate preparing for teaching to the kinds of education research useful for teachers who want to evaluate and improve their own teaching. This included a book written to introduce classroom-based research that drew heavily on analysis of published studies (Taber, 2007; 2013). Part of our advice was that those new to this kind of enquiry might want to consider action research and case study as suitable options for their small-scale projects.


Useful strategies for the novice practitioner-researcher (Figure: diagram used in working with graduates preparing for teaching, from Taber, 2010)

Simplistically, action research might be considered best suited to a project to test an innovation or address a problem (e.g., evaluating a new teaching resource; responding to behavioural issues), and case study best suited to an exploratory study (e.g., what do Y9 students understand about photosynthesis?; what is the nature of peer dialogue during laboratory working in this class?) However, it was often difficult for the graduates to carry out authentic action research as the constraints of the school-based placements seldom allowed them to test successive iterations of the same intervention until they found something like an optimal specification.

Yet, they often were in a good position to undertake a detailed study of one iteration, collecting a range of different data, and so producing a detailed evaluation. That sounds like a case study.

Case study is supposed to be naturalistic – whereas innovation sounds like an intervention. But some interventions in practice can be considered the focus of naturalistic enquiry. My argument was that when a teacher changes the way they do something to try and solve a problem, or simply to find a better way to work, that is a 'natural' part of professional practice. The teacher-researcher, as researcher, is exploring something the fully professional teacher does as matter of course – seek to develop practice. After all, our graduates were being asked to undertake research to give them the skills expected to meet professional teaching standards, which

"clearly requires the teacher to have both the procedural knowledge to undertake small-scale classroom enquiry, and 'conceptual frameworks' for thinking about teaching and learning that can provide the basis for evaluating their teaching. In other words, the professional teacher needs both the ability to do her own research and knowledge of what existing research suggests"

Taber, 2013, p.8

So, the research is on something that is naturally occurring in the classroom context, rather than an intervention imported into the context in order to answer an external researcher's questions. A case study of an intervention introduced by practitioners themselves can be naturalistic – even if the person implementing the change is the researcher as well as the teacher.


If a teacher-researcher (qua researcher) wishes to enquire into an innovation introduced by the teacher-researcher (qua teacher) then this can be considered as naturalistic enquiry


The case and the context

In Llorens-Molina's study, the case was a sequence of laboratory activities carried out by a cohort of undergraduates undertaking a course of General and Organic Chemistry as part of an Agricultural Engineering programme. So, the case was bounded (the laboratory part of one taught course) and embedded in a wider context – a degree programme in a specific institution in Spain: the Polytechnic University of Valencia.

The primary purpose of the study was to find out about the specific innovation in the particular course that provided the case. This was then what is known as an intrinsic case study. (When a case is studied primarily as an example of a class of cases, rather than primarily for its own interest, it is called an instrumental case study).

Llorens-Molina recognised that what was found in this specific case, in its particular context, could not be assumed to apply more widely. There can be no statistical generalisation to other courses elsewhere. In case study, the intention is to offer sufficient detail of the case for readers to make judgements of the likely relevance to other context of interest (so-called 'reader generalisation').

The published report gives a good deal of information about the course as well as much information about how data was collected, and equally important, analysed.

Different slices of data

Case study often uses a range of data sources to develop a rounded picture of the case. In this study the identification of three specific hypotheses (less usual in case studies, which often have more open-ended research questions) led to the collection of three different types of data.

  • Students were assessed on each of six laboratory activities. A comparison was made between the prelab condition and the existing approach.
  • Questions asked by students in the laboratories were recorded and analysed to see if the quality/nature of such questions was different in the two conditions. A sophisticated approach was developed to analyse the questions.
  • Students were asked to rate the prelabs through responding to items on a questionnaire.

This approach allowed the author to go beyond simply reporting whether hypotheses were supported by the analysis, to offer a more nuanced discussion around each feature. Such nuance is not only more informative to the reader of a case study, but reflects how the researcher, as practitioner, has an ongoing commitment to further develop practice and not see the study as an end in itself.

Avoiding the 'equivalence' and the 'misuse of control groups' problems

I particularly appreciate a feature of the research design that many educational studies that claim to be experiments could benefit from. To test his hypotheses Llorens-Molina employed two conditions or treatments, the innovation and a comparison condition, and divided the cohort: "A group with 21 students was split into two subgroups, with 10 and 11 in each one, respectively". Llorens-Molina does not suggest this was based on random assignment, which is necessary for a 'true' experiment.

In many such quasi-experiments (where randomisation to condition is not carried out, and is indeed often not possible) the researchers seek to offer evidence of equivalence before the treatments occur. After all, if the two subgroups are different in terms of past subject attainment or motivation or some other relevant factor (or, indeed, if there is no information to allow a judgement regarding whether this is the case or not), no inferences about an intervention can be drawn from any measured differences. (Although that does not always stop researchers from making such claims regardless: e.g., see Lack of control in educational research.)

Another problem is that if learners are participating in research but are assigned to a control or comparison condition then it could be asked if they are just being used as 'data fodder', and would that be fair to them? This is especially so in those cases (so, not this one) where researchers require that the comparison condition is educationally deficient – many published studies report a control condition where schools students have effectively been lectured to, and no discussion work, group work, practical work, digital resources, et cetera, have been allowed, in order to ensure a stark contrast with whatever supposedly innovative pedagogy or resource is being evaluated (Taber, 2019).

These issues are addressed in research designs which have a compensatory structure – in effect the groups switch between being the experimental and comparison condition – as here:

"Both groups carried out the alternative prelab and the previous lecture (traditional practice), alternately. In this way, each subgroup carried out the same number of laboratory activities with either a prelab and previous lecture"

Llorens-Molina, 2009, p.19

This is good practice both from methodological and ethical considerations.


The study used a compensatory design which avoids the need to ensure both groups are equivalent at the start, and does not disadvantage one group. (Figure from Llorens-Molina, 2009, p.22 – published under a creative commons Attribution-NonCommercial-NoDerivs 3.0 United States license allowing redistribution with attribution)

A case of case study

Do I think this is a model case study that perfectly exemplifies all the claimed characteristics of the methodology? No, and very few studies do. Real research projects, often undertaken in complex contexts with limited resources and intractable constraints, seldom fit such ideal models.

However, unlike some studies labelled as case studies, this study has an explicit bounded case and has been carried out in the spirit of case study that highlights and values the intrinsic worth of individual cases. There is a good deal of detail about aspects of the case. It is in essence a case study, and (unlike what sometimes seems to be the case [sic]) not just called a case study for want of a methodological label. Most educational research studies examine one particular case of something – but (and I do not think this is always appreciated) that does not automatically make them case studies. Because it has been both conceptualised and operationalised as a case study, Llorens-Molina's study is a coherent piece of research.

Given how, in these pages, I have often been motivated to call out studies I have read that I consider have major problems – major enough to be sufficient to undermine the argument for the claimed conclusions of the research – I wanted to recognise a piece of research that I felt offered much to admire.


Work cited:

Notes:

1 I am using language here reflecting a perspective on teaching as being based on a model (whether explicit or not) in the teacher's mind of the learners' current knowledge and understanding and how this will respond to teaching. That expects a great deal of the teacher, so there are often bugs in the system (e.g., the teacher over-estimates prior knowledge) that need to be addressed. This is why being a teacher involves being something of a 'learning doctor'.

Read about the learning doctor perspective on teaching


2 I used to teach sessions introducing each of these methodologies when I taught on an Educational Research course. One of the class activities was to examine published papers claiming the focal methodology, asking students to see if studies matched the supposed characteristics of the strategy. This was a course with students undertaking a very diverse range of research projects, and I encouraged them to apply the analysis to papers selected because they were of particular interest and relevance to to their own work. Many examples selected by students proved to offer poor match between claimed methodology and the actual research design of ther study!

Lack of control in educational research

Getting that sinking feeling on reading published studies


Keith S. Taber


this is like finding that, after a period of watering plant A, it is taller than plant B – when you did not think to check how tall the two plants were before you started watering plant A

Research on prelabs

I was looking for studies which explored the effectiveness of 'prelabs', activities which students are given before entering the laboratory to make sure they are prepared for practical work, and can therefore use their time effectively in the lab. There is much research suggesting that students often learn little from science practical work, in part because of cognitive overload – that is, learners can be so occupied with dealing with the apparatus and materials they have little capacity left to think about the purpose and significance of the work. 1


Okay, so is THIS the pipette?
(Image by PublicDomainPictures from Pixabay)

Approaching a practical work session having already spent time engaging with its purpose and associated theories/models, and already having become familiar with the processes to be followed, should mean students enter the laboratory much better prepared to use their time efficiently, and much better informed to reflect on the wider theoretical context of the work.

I found a Swedish paper (Winberg & Berg, 2007) reporting a pair of studies that tested this idea by using a simulation as a prelab activity for undergraduates about to engage with an acid-base titration. The researchers tested this innovation by comparisons between students who completed the prelab before the titration, and those who did not.

The work used two basic measures:

  • types (sophistication) of questions asked by students during the lab. session
  • elicitation of knowledge in interviews after the laboratory activity

The authors found some differences (between those who had completed the prelab and those that had not) in the sophistication of the questions students asked, and in the quality of the knowledge elicited. They used inferential statistics to suggest at least some of the differences found were statistically significant. From my reading of the paper, these claims were not justified.

A peer reviewed journal (no, really, this time)

This is a paper in a well respected journal (not one of the predatory journals I have often discussed on this site). The Journal of Research in Science Teaching is published by Wiley (a major respected publisher of academic material) and is the official journal of NARST (which used to stand for the National Association for Research in Science Teaching – where 'national' referred to the USA 2). This is a journal that does take peer review very seriously.

The paper is well-written and well-structured. Winberg and Berg set out a conceptual framework for the research that includes a discussion of previous relevant studies. They adopt a theoretical framework based on the Perry's model of intellectual development (Taber, 2020). There is considerable detail of how data was collected and analysed. This account is well-argued. (But, you, dear reader, can surely sense a 'but' coming.)

Experimental research into experimental work?

The authors do not seem to explicitly describe their research as an experiment as such (as opposed to adopting some other kind of research strategy such as survey or case study), but the word 'experiment' and variations of it appear in the paper.

For one thing, the authors refer to students' practical work as being experiments,

"Laboratory exercises, especially in higher education contexts, often involve training in several different manipulative skills as well as a high information flow, such as from manuals, instructors, output from the experimental equipment, and so forth. If students do not have prior experiences that help them to sort out significant information or reduce the cognitive effort required to understand what is happening in the experiment, they tend to rely on working strategies that help them simply to cope with the situation; for example, focusing only on issues that are of immediate importance to obtain data for later analysis and reflective thought…"

Winberg & Berg, 2007

Now, some student practical work is experimental, where a student is actively looking to see what happens when they manipulate some variable to test a hypothesis. This type of practical work is sometimes labelled enquiry (or inquiry in US spelling). But a lot of school and university laboratory work, however, is undertaken to learn techniques, or (probably more often) to support the learning of taught theory – where it is usually important the learners know what is meant to happen before they begin the laboratory activity.

Winberg and Berg refer to the 'laboratory exercise' as 'the experiment' as though any laboratory work counts as an experiment. In Winberg and Berg's research, students were asked about their "own [titration] experiment", despite the prelab material involving a simulation of the titration process, in advance of which "the theoretical concepts, ideas, and procedures addressed in the simulation exercise had been treated mainly quantitatively during the preceding 1-week instructional sequence". So, the laboratory titration exercise does not seem to be an experiment in the scientific sense of the term.

School children commonly describe all practical work in the lab as 'doing experiments'. It cannot help students learn what an experiment really is when the word 'experiment' has two quite distinct meanings in the science classroom:

  • experiment(technical) = an empirical test of a hypothesis involving the careful control of variables and observation of the effect on a specified (hypothetised as) dependent variable of changing the variable specified as the independent variable
  • experiment(casual) = absolutely any practical activity carried out with laboratory equipment

We might describe this second meaning as an alternative conception of 'experiment', a way of understanding that is inconsistent with the scientific meaning. (Just as there are common alternative conceptions of other 'nature of science' concepts such as 'theory').

I would imagine Winberg and Berg were well aware of what an experiment is, although their casual use of language might suggest a lack of rigour in thinking with the term. They refer to having "both control and experiment groups" in their studies, and refer to "the experimental chronology" of their research design. So, they certainly seem to think of their work as a kind of experiment.

Experimental design

In a true experiment, a sample is randomly drawn from a population of interest (say, first year undergraduate chemistry students; or, perhaps, first year undergraduate chemistry students attending Swedish Universities, or… 3) and assigned randomly to the conditions being compared. Providing a genuine form of random assignment is used, then inferential statistical tests can guide on whether any differences found between groups at the end of an experiment should be considered statistically significant. 4

"Statistics can only indicate how likely a measured result would occur by chance (as randomisation of units of analysis to different treatments can only make uneven group composition unlikely, not impossible)…Randomisation cannot ensure equivalence between groups (even if it makes any imbalance just as likely to advantage either condition)"

Taber, 2019, p.73

Inferential statistics can be used to test for statistical significance in experiments – as long as the 'units of analysis' (e.g., students) are randomly assigned to the experimental and control conditions.
(Figure from Taber, 2019)

That is, if the are difference that the stats. tests suggests are very unlikely to happen by chance, then they are very unlikely to be due to an initial difference between the groups in the two conditions as long as the groups were the result of random assignment. But that is a very important proviso.

There are two aspects to this need for randomisation:

  • to be able to suggest any differences found reflect the effects of the intervention, then there should be random assignment to the two (or more) conditions
  • to be able to suggest the results reflect what would probably would be found in a wider population, the sample should be randomly selected from the population of interest 3

Studies in education seldom meet the requirements for being true experiments
(Figure from Taber, 2019)

In education, it is not always possible to use random assignment, so true experiments are then not possible. However, so-called 'quasi-experiments' may be possible where differences between the outcomes in different conditions may be understood as informative, as long as there is good reason to believe that even without random assignment, the groups assigned to the different conditions are equivalent.

In this specific research, that would mean having good reason to believe that without the intervention (the prelab):

  • students in both groups would have asked overall equivalent (in terms of the analysis undertaken in this study) questions in the lab.;
  • students in both groups would have been judged as displaying overall equivalent subject knowledge.

Often in research where a true experiment is not possible some kind of pre-testing is used to make a case for equivalence between groups.

Two control groups that were out of control

In Winberg and Berg's research there were two studies where comparisons were made between 'experimental' and 'control' conditions

StudyExperimentalControl
Study 1n=78: first-year students, following completion of their first chemistry course in 2001n=97: students who had been interviewed by the researchers during the same course in the previous year
Study 2n=21 (of 58 in cohort)n=37 (of 58 in same cohort)

In the first study, a comparison was made between the cohort where the innovation was introduced and a cohort from the previous year. All other things being equal, it seems likely these two cohorts were fairly similar. But in education all thing are seldom equal, so there is no assurance they were similar enough to be considered equivalent.

In the second study

"Students were divided into treatment (n = 21) and control (n = 37) groups. Distribution of students between the treatment and control groups was not controlled by the researchers".

Winberg & Berg, 2007

So, some factor(s) external to the researchers divided the cohort into two groups – and the reader is told nothing about the basis for this, nor even if the two groups were assigned to the treatments randomly.5 The authors report that the cohort "comprised prospective molecular biologists (31%), biologists (51%), geologists (7%), and students who did not follow any specific program (11%)", and so it is possible the division into two uneven sized groups was based on timetabling constraints with students attending chemistry labs sessions according to their availability based on specialism. But that is just a guess. (It is usually better when the reader of a research report is not left to speculate about procedures and constraints.)

What is important for a reader to note is that in these studies:

  • the researchers were not able to assign learners to conditions randomly;
  • nor were the researchers able to offer any evidence of equivalence between groups (such as near identical pre-test scores);
  • so, the requirements for inferring significance from statistical tests were not met;
  • so, claims in the paper about finding statistically significant differences between conditions cannot therefore be justified given the research design;
  • and therefore the conclusions presented in the paper are strictly not valid.

If students are not randomly assigned to conditions, then any statistically unlikely difference found at the end of an experiment cannot be assumed to be likely to be due to intervention, rather than some systematic initial difference between the groups.
(Figure adapted from Taber, 2019)


This is a shame, because this is in many ways an interesting paper, and much thought and care seems to have been taken about the collection and analysis of meaningful data. Yet, drawing conclusions from statistical tests comparing groups that might never have been similar in the first case is like finding that careful use of a vernier scale shows that after a period of watering plant A, plant A is taller than plant B – having been very careful to make sure plant A was watered regularly with carefully controlled volumes, while plant B was not watered at all – when you did not think to check how tall the two plants were before you started watering plant A.

In such a scenario we might be tempted to assume plant A has actually become taller because it had been watered; but that is just applying what we had conjectured should be the case, and we would be mistaking our expectations for experimental evidence.

Work cited:

Notes:

1 The part of the brain where we can consciously mentipulate ideas is called the working memory (WM). Research suggests that WM has a very limited capacity in the sense that people can only hold in mind a very small number of different things at once. (These 'things' however are somewhat subjective – a complex idea that is treated as a single 'thing' in the WM of an expert can overload a novice.) This limit to ~WM is considered to be one of the most substantial constraints on effective classroom learning. This is also, then, one of the key research findings informing the design of effective teaching.

Read about working memory

Read about key ideas for teaching in accordance with learning theory

How fat is your memory? – read about a chemical analogy for working memory


2 The organisation has seemingly spotted that the USA is only one part of the world, and now describes itself as a global organisation for improving science education through research.


3 There is no reason why an experiment cannot be carried out on a very specific population, such as first year undergraduate chemistry students attending a specific Swedish University such a, say, Umea ̊ University. However, if researchers intend their study to have results generalisable beyond their specific research contexts (say, to first year undergraduate chemistry students attending any Swedish University) then it is important to have a representative sample of that population.

Read about populations of interest in research

Read about generalisation from research studies


4 It might be assumed that scientists, and researchers know what is meant by random, and how to undertake random assignment. Sadly, the literature suggests that in practice the term 'randomly' is sometimes used in research reports to mean something like 'arbitrarily' (Taber, 2013), which fills short of being random.

Read about randomisation in research


5 Arguably, even if the two groups were assigned randomly, there is only one 'unit of analysis' in each condition, as they were assigned as groups. That is, for statistical purposes, the two groups have size n=1 and n=1, which would not allow statistical significance to be found: e.g, see 'Quasi-experiment or crazy experiment?'

Poincaré, inertia, and a common misconception

A historical, and ongoing, alternative conception


Keith S. Taber


"…and eleventhly Madame Curie…" Henri Poincaré enjoying small talk at a physics conference (image source: 'Marie Curie and Poincaré talk at the 1911 Solvay Conference', Wikipedia)


One of the most fundamental ideas in physics, surely taught in every secondary school science curriculum around the world, is also the focus of one of the most common alternative conceptions documented in science education. Inertia. Much research in the latter part of the twentieth century has detailed how most people have great trouble with this very simple idea.

But that would likely not have surprised the nineteenth century French physicist (and mathematician and philosopher) Henri Poincaré in the least. Over a century ago he had this to say about the subject of Newton's first law, inertia,

"The principle of inertia. A body acted on by no force can only move uniformly in a straight line.

Is this a truth imposed a priori upon the mind? If it were so, how could the Greeks have failed to recognise it? How could they have believed that motion stops when the cause which gave birth to it ceases? Or again that every body if nothing prevents, will move in a circle, the noblest of motions?

If it is said that the velocity of a body can not change if there is no reason for it to change, could it not be maintained just as well that the position of this body can not change, or that the curvature of its trajectory can not change, if no external cause intervenes to modify them?

Is the principle of inertia, which is not an a priori truth, therefore an experimental fact? But has any one ever experimented on bodies withdrawn from the action of every force? and, if so, how was it known that these bodies were subjected to no force?"

Poincaré, 1902/1913/2015

There is quite a lot going on in that quote, so it is worth breaking it down.

The principle of inertia

"The principle of inertia. A body acted on by no force can only move uniformly in a straight line."

Poincaré, 1902/1913/2015

We might today choose to phrase this differently – at least in teaching. Perhaps along the lines that

a body remains at rest, or moving with uniform motion, unless it is acted upon by a net (overall) force

That's a pretty simple idea.

  • If you want something that is stationary to start moving, you need to apply a force to it. Otherwise it will remain stationary. And:
  • If you want something that is moving with constant velocity to slow down (decelerate), speed up (accelerate), or change direction, you need to apply a force to it. Otherwise it will carry on moving in the same direction at the same speed.

A simple idea, but one which most people struggle with!

It is worth noting that Poincaré's formulation seems simpler than the versions more commonly presented in school today. He does not make reference to a body at rest; and we might detect a potential ambiguity in what is meant by "can only move uniformly in a straight line".

Is the emphasis:

  • can only move uniformly in a straight line:
    • i.e., 〈 can only 〉 〈 move uniformly in a straight line 〉, or
  • can only move uniformly in a straight line:
    • i.e., 〈 can only move 〉 〈 uniformly in a straight line 〉

That is, must such a body "move uniformly in a straight line" or must such a body, if moving, "move uniformly in a straight line"? A body acted on by no force may be stationary.

Perhaps this is less ambiguous in the original French? But I suspect that, as a physicist, Poincairé did not, particularly, see the body at rest as being much of a special case.

To most people the distinction between something stationary and something moving is very salient (evolution has prepared us to notice movement). But to a physicist the more important distinction is between any body at constant velocity, and one accelerating* – and a body not moving has constant velocity (of 0 ms-1!)

*and for a physicist accelerating usually includes decelerating, as that is just acceleration with a negative vale, or indeed positive acceleration in a different direction. These 'simplifications' seem very neat – to the initiated (but perhaps not to novices!)

A historical scientific conception

Poincaré then asks:

Is this a truth imposed a priori upon the mind? If it were so, how could the Greeks have failed to recognise it? How could they have believed that motion stops when the cause which gave birth to it ceases?"

Poincaré, 1902/1913/2015

Poincairé asks a rhetorical question: "Is this a truth imposed a priori upon the mind?" Rhetorical, as he immediately suggests the answer. No, it cannot be.

Science is very much an empirical endeavour. The world is investigated by observation, indeed often observation of the effects of interventions (i.e., experiments).

In this way, it diverges from a rationalist approach to understanding the world based on reflection and reasoning that occurs without seeking empirical evidence.

An aside on simulations and perpetual change

Yet, even empirical science depends on some (a priori) metaphysical commitments that cannot themselves be demonstrated by scientific observation (e.g., Taber, 2013). As one example, the famous 'brain in a vat' scenario (that informed films such as The Matrix) asks how we could know that we really experience an external world rather than a very elaborate virtual reality fed directly into our central nervous system (assuming we have such a thing!) 1

Science only makes sense if we believe that the world we experience is an objective reality originating outside our own minds
(Image by Gerd Altmann from Pixabay)

Despite this, scientists operate on the assumption this is a physical world (that we all experience), and one that has a certain degree of stability and consistency. 2 The natural scientist has to assume this is not a capricious universe if science (a search for the underlying order of the world) is to make sense!

It may seem this (that we live in is an objective physical world that has a certain degree of stability and consistency) is obviously the case, as our observations of the world find this stability. But not really: rather, we impose an assumption of an underlying stability, and interpret accordingly. The sun 'rises' every day. (We see stability.) But the amount of daylight changes each day. (We observe change, but assume, and look for, and posit, some underlying stability to explain this.)

Continental drift, new comets, evolution of new species and extinction of others, supernovae, the appearance of HIV and COVID, increasing IQ (disguised by periodically renormalising scoring), climate change, the expanding universe, plant growth, senile dementia, rotting fruit, printers running out of ink, lovers falling out of love, et cetera,…are all assumed to be (and subsequently found to be) explainable in terms of underlying stable and consistent features of the world!

But it would be possible to consider nothing stays the same, and seek to explain away any apparent examples of stability!

Parmenides thought change is impossible

Heraclitus though everything was in flux

An a priori?

So Poincaré was asking if the principle of is inertia was something that would appear to us as a given; is inertia something that seems a necessary and obvious feature of the world (which it probably does to most physicists – but that is after years of indoctrination into that perspective).

But, Poincaré was pointing out, we know that for centuries people did not think that objects not subject any force would continue to move with constant velocity.

There were (considered to be) certain natural motions, and these had a teleological aspect. So, heavy objects, that were considered mainly earth naturally fell down to their natural place on the ground. 3 Once there, mission accomplished (so to speak), they would stop moving. No further explanation was considered necessary.

Violent motions were (considered to be) different as they needed an active cause – such as a javelin moving through the air because someone had thrown it. Yet, clearly (it was believed), the athlete could only impart a finite motion to the javelin, which it would soon exhaust, so the javelin would (naturally) stop soon enough.

Today, such ideas are seen as alternative conceptions (misconceptions), but for hundreds of years these ideas were largely taken as self-evident and secure principles describing aspects of the world. The idea that the javelin might carry on moving for ever if it was 'left to its own devices' seemed absurd. (And to most people today who are not physicists or science teachers, it probably still does!)

An interesting question is if, and if so, to what extent, the people who become physicists and physics teachers start out with intuitions more aligned with the principles of physics than most of their classmates.

"Assuming that there is significant variation in the extent to which our intuitive physics matches what we are taught in school, I would expect that most physics teachers are among those to whom the subject seemed logical and made good sense when they were students. I have no evidence for this, but it just seems natural that these students would have enjoyed and continued with the subject.

If I am right about this intuition, then this may be another reason why physics is so hard for some of our students. Not only do they have to struggle with subject matter that seems counterintuitive, but the very people who are charged with helping them may be those who instinctively think most differently from the way in which they do."

Taber, 2004, p.124

Another historical scientific conception

And Poincaré went on:

"Or again that every body if nothing prevents, will move in a circle, the noblest of motions?"

Poincaré, 1902/1913/2015

It was also long thought that in the heavens bodies naturally moved spontaneously in circles – a circle being a perfect shape, and the heavens being a perfect place.

Orbital motion – once viewed to be natural (i.e., not requiring any further explanation) and circular in 'the heavens'.
(Image by WikiImages from Pixabay: Body sizes and separations not to the same scale!)

It is common for people to feel that what seems natural does not need further explanation (Watts & Taber, 1996) – even though most of what we consider natural is likely just familiarity with common phenomena. We start noticing how the floor arrests the motion of falling objects very young in life, so by the time we have language to help reflect on this, we simply explain this as motion stopping because the floor was in the way! Similarly, reaction forces are not obvious when an object rests on another – a desk, a shelf, etc – as the object cannot fall 'because it is supported'.

Again, we (sic, we the initiated) now think that without an acting centripetal force, an orbiting body would move off at a tangent – but that would have seemed pretty bizarre for much of European history.

The idea that bodies moved in circles (as the fixed stars seemed to do) was maintained despite extensive observational evidence collected over centuries that the planets appeared to do something quite different. Today Kepler's laws are taught in physics, including that the solar system's orbiting bodies move (almost) in ellipses. ('Almost', as they bodies perturb each other a little.)

But when Kepler tried to fit observations to theory by adopting Copernicus's 'heliocentric' model of the Earth and planets orbiting the Sun (Earth and other planets, we would say), he still struggled to make progress for a considerable time because of an unquestioned assumption that the planetary motions had to be circular, or some combination of multiple circles.

Learners' alternative conceptions

These historical ideas are of more than historical interest. Many people, research suggests most people, today share similar intuitions.

  • Objects will naturally come to a stop when they have used up their imparted motion without the need for any forces to act.
  • Something that falls to the floor does not need a force to act on it to stop it moving, as the ground is in its way.
  • Moons and planets continue in orbits because there is no overall force acting on them.

The vast majority of learners some to school science holding versions of such alternative conceptions.

Read about common alternative conceptions related to Newton's first law

Read about common alternative conceptions related to Newton's second law

The majority of learners also leave school holding versions of such alternative conceptions – even if some of them have mastered the ability to usually respond to physics test questions as if they accepted a different worldview.

The idea that objects soon stop moving once the applied force ceases to act may be contrary to physics, but it is not, of course, contrary to common experience – at least not contrary to common experience as most people perceive it.

Metaphysical principles

Poincaré recognised this.

"If it is said that the velocity of a body can not change if there is no reason for it to change [i.e. the principle of inertia],

could it not be maintained just as well that

the position of this body can not change, or

that the curvature of its trajectory can not change,

if no external cause intervenes to modify them?"

Poincaré, 1902/1913/2015 (emphasis added)

After all, as Poincairé pointed out, there seems no reason, a priori, that is intuitively, to assume the world must work according to the principle of inertia (though some physicists and science teachers whom have been indoctrinated over many years may have come to think otherwise – of course after indoctrination is not a priori!), rather than assuming, say, that force must act for movement to occur and/or that force must act to change an orbit.

Science as an empirical enterprise

Science teachers might reply, that our initial intuitions are not the point, because myriad empirical tests have demonstrated the principle of inertia. But Poincairé suggested this was strictly not so,

"Is the principle of inertia, which is not an a priori truth, therefore an experimental fact? But has any one ever experimented on bodies withdrawn from the action of every force? and, if so, how was it known that these bodies were subjected to no force?"

Poincaré, 1902/1913/2015

For example, if we accept the ideas of universal gravitation, than anywhere in the universe a body will be subject to gravitational attractions (that is, forces). A body could only be completely free of this by being in a universe of its own with no other gravitating bodies. Then we might think we could test, in principle at least, whether the body "acted on by no force can only move uniformly in a straight line".

Well, apart from a couple of small difficulties. There would be no observers in this universe to see, as we have excluded all other massive bodies. And if this was the only body there, it would be the only frame of reference available – a frame of reference in which it was always stationary. It would always be at the centre of, and indeed would be the extent of, its universe.

Poincaré and pedagogic awareness

Poincaré was certainly not denying the principle of inertia so fundamental to mechanics. But he was showing that he appreciated that a simple principle which seems (comes to seem?) so basic and obvious to the inducted physics expert:

  • was hard won in the history of science
  • in not 'given' in intuition
  • is not the only possible basic principle on which a mechanics (in some other universe) could be based
  • is contrary to immediate experience (that is, to those who have not been indoctrinated to 'see' resistive forces sch as friction acting everywhere)
  • could never be entirely demonstrated in a pure form, but rather must be inferred from experimental tests of more complex situations where we will only deduce the principle of inertia if we assume a range of other principles (about the action of gravitational fields, air resistance, etc.)

Poincaré may have been seen as one of the great physicists of his time, but his own expertise certainly did not him appreciating the challenges facing the learner of physics, or indeed the teacher of physics.


Work cited:

Notes

1 With current human technology we cannot achieve this – even the best virtual worlds clearly do not yet come close to the real one! But that argument falls away if 'the real' world we experience is such a virtual reality created by very advanced technology, and what we think of as virtual worlds are low definition simulations being created within that! (After all, when people saw the first jumpy black-and-white movies, they then came out from the cinema into a colourful, smooth and high definition world.) If you have ever awaken from a dream, only to later realise you are still asleep, and had been dreaming of being asleep in the dream, then you may appreciate how such nesting of worlds could work.

Probably no one actually believes they are a brain in a vat, but how would we know. There is an argument that

  • 1) the evolution of complex life is a very slow process that requires a complex ecosystem, but
  • 2) once humans (or indeed non-humans) have the technology to create convincing virtual worlds this can be done very much more quickly, and with much less resource [i.e., than the evolution of the physical world which within which the programmers of the simulations themselves live]. So,
  • 3) if we are living in a phase of the universe where such technology has been achieved, then we would expect there to be a great many more such virtual worlds than planets inhabited by life forms with the level of self-consciousness to think about whether they are in a simulation.4 So,
  • 4) [if we are living in a phase of the universe where such technology has been achieved] we would be much more likely to be living in one of these worlds (a character in a very complex simulation) than an actual organic being. 5

2 That is, not a simulation where an adolescent programmer is going to suddenly increase gravity or add a new fundamental force just to make things more interesting.


3 Everything on earth was considered to be made up of different proportions of the four elements, which in terms of increasing rarity were earth, water, air and fire. The rocks of the earth were predominately the element earth – and objects that were mainly earth fell to their natural place. (Rarity in this context means the inverse of density, not scarcity.)


4 When I was a child (perhaps in part because I think I started Sunday School before I could start 'proper' school), I used to muse about God being able to create everything, and being omniscient – although I am pretty sure I did not use that term! It seemed to me (and, sensibly, I do not think I shared this at Sunday School) that if God knew everything and was infallible, then he did not need to actually create the world as a physical universe, but rather just think what would happen. For God, that would work just as well, as a perfect mind could imagine things exactly as they would be in exquisite detail and with absolute precision. So, I thought I might just be an aspect of the mind of God – so part of a simulation in effect. This was a comforting rather than worrying thought – surely there is no safer place to be than in the mind of God?

Sadly, I grew to be much less sure of God (the creation seems just as incredible – in the literal sense – either way), but still think that, for God, thinking it would be as good as (if not the same as) making it. I suspect some theologians would not entirely dismiss this.

If I am just a character in someone's simulation, I'd rather it was that of a supreme being than some alien adolescent likely to abandon my world at the first sign of romantic interest from a passing conspecific.


5 Unless we assume a dystopian Matrix like simulation, the technology has to be able to create characters (sub-routines?) with self-awareness – which goes some way beyond just a convincing simulation, as it also requires components complex enough to be convinced about their own existence, as well as the reality of the wider simulation!

Quasi-experiment or crazy experiment?

Trustworthy research findings are conditional on getting a lot of things right


Keith S. Taber


A good many experimental educational research studies that compare treatments across two classes or two schools are subject to potentially conflating variables that invalidate study findings and make any consequent conclusions and recommendations untrustworthy.

I was looking for research into the effectiveness of P-O-E (predict-observe-explain) pedagogy, a teaching technique that is believed to help challenge learners' alternative conceptions and support conceptual change.

Read about the predict-observe-explain approach



One of the papers I came across reported identifying, and then using P-O-E to respond to, students' alternative conceptions. The authors reported that

The pre-test revealed a number of misconceptions held by learners in both groups: learners believed that salts 'disappear' when dissolved in water (37% of the responses in the 80% from the pre-test) and that salt 'melts' when dissolved in water (27% of the responses in the 80% from the pre-test).

Kibirige, Osodo & Tlala, 2014, p.302

The references to "in the 80%" did not seem to be explained anywhere. Perhaps only 80% of students responded to the open-ended questions included as part of the assessment instrument (discussed below), so the authors gave the incidence as a proportion of those responding? Ideally, research reports are explicit about such matters avoiding the need for readers to speculate.

The authors concluded from their research that

"This study revealed that the use of POE strategy has a positive effect on learners' misconceptions about dissolved salts. As a result of this strategy, learners were able to overcome their initial misconceptions and improved on their performance….The implication of these results is that science educators, curriculum developers, and textbook writers should work together to include elements of POE in the curriculum as a model for conceptual change in teaching science in schools."

Kibirige, Osodo & Tlala, 2014, p.305

This seemed pretty positive. As P-O-E is an approach which is consistent with 'constructivist' thinking that recognises the importance of engaging with learners' existing thinking I am probably biased towards accepting such conclusions. I would expect techniques such as P-O-E, when applied carefully in suitable curriculum contexts, to be effective.

Read about constructivist pedagogy

Yet I also have a background in teaching research methods and in acting as a journal editor and reviewer – so I am not going to trust the conclusion of a research study without having a look at the research design.


All research findings are subject to caveats and provisos: good practice in research writing is for the authors to discuss them – but often they are left unmentioned for readers to spot. (Read about drawing conclusions from studies)


Kibirige and colleagues describe their study as a quasi-experiment.

Experimental research into teaching approaches

If one wants to see if a teaching approach is effective, then it seems obvious that one needs to do an experiment. If we can experimentally compare different teaching approaches we can find out which are more effective.

An experiment allows us to make a fair comparison by 'control of variables'.

Read about experimental research

Put very simply, the approach might be:

  • Identify a representative sample of an identified population
  • Randomly assign learners in the sample to either an experimental condition or a control condition
  • Set up two conditions that are alike in all relevant ways, apart from the independent variable of interest
  • After the treatments, apply a valid instrument to measure learning outcomes
  • Use inferential statistics to see if any difference in outcomes across the two conditions reaches statistical significance
  • If it does, conclude that
    • the effect is likely to due to the difference in treatments
    • and will apply, on average, to the population that has been sampled

Now, I expect anyone reading this who has worked in schools, and certainly anyone with experience in social research (such as research into teaching and learning), will immediately recognise that in practice it is very difficult to actually set up an experiment into teaching which fits this description.

Nearly always (if indeed not always!) experiments to test teaching approaches fall short of this ideal model to some extent. This does not mean such studies can not be useful – especially where there are many of them with compensatory strengths and weaknesses offering similar findings (Taber, 2019a)- but one needs to ask how closely published studies fit the ideal of a good experiment. Work in high quality journals is often expected to offer readers guidance on this, but readers should check for themselves to see if they find a study convincing.

So, how convincing do I find this study by Kibirige and colleagues?

The sample and the population

If one wishes a study to be informative about a population (say, chemistry teachers in the UK; or 11-12 year-olds in state schools in Western Australia; or pharmacy undergraduates in the EU; or whatever) then it is important to either include the full population in the study (which is usually only feasible when the population is a very limited one, such as graduate students in a single university department) or to ensure the sample is representative.

Read about populations of interest in research

Read about sampling a population

Kibirige and colleagues refer to their participants as a sample

"The sample consisted of 93 Grade 10 Physical Sciences learners from two neighbouring schools (coded as A and B) in a rural setting in Moutse West circuit in Limpopo Province, South Africa. The ages of the learners ranged from 16 to 20 years…The learners were purposively sampled."

Kibirige, Osodo & Tlala, 2014, p.302

Purposive sampling means selecting participants according to some specific criteria, rather than sampling a population randomly. It is not entirely clear precisely what the authors mean by this here – which characteristics they selected for. Also, there is no statement of the population being sampled – so the reader is left to guess what population the sample is a sample of. Perhaps "Grade 10 Physical Sciences" students – but, if so, universally, or in South Africa, or just within Limpopo Province, or indeed just the Moutse West circuit? Strictly the notion of a sample is meaningless without reference to the population being sampled.

A quasi-experiment

A key notion in experimental research is the unit of analysis

"An experiment may, for example, be comparing outcomes between different learners, different classes, different year groups, or different schools…It is important at the outset of an experimental study to clarify what the unit of analysis is, and this should be explicit in research reports so that readers are aware what is being compared."

Taber, 2019a, p.72

In a true experiment the 'units of analysis' (which in different studies may be learners, teachers, classes, schools, exam. papers, lessons, textbook chapters, etc.) are randomly assigned to conditions. Random assignment allows inferential statistics to be used to directly compare measures made in the different conditions to determine whether outcomes are statistically significant. Random assignment is a way of making systematic differences between groups unlikely (and so allows the use of inferential statistics to draw meaningful conclusions).

Random assignment is sometimes possible in educational research, but often researchers are only able to work with existing groupings.

Kibirige, Osodo & Tlala describe their approach as using a quasi-experimental design as they could not assign learners to groups, but only compare between learners in two schools. This is important, as means that the 'units of analysis' are not the individual learners, but the groups: in this study one group of students in one school (n=1) is being compared with another group of students in a different school (n=1).

The authors do not make it clear whether they assigned the schools to the two teaching conditions randomly – or whether some other criterion was used. For example, if they chose school A to be the experimental school because they knew the chemistry teacher in the school was highly skilled, always looking to improve her teaching, and open to new approaches; whereas the chemistry teacher in school B had a reputation for wishing to avoid doing more than was needed to be judged competent – that would immediately invalidate the study.

Compensating for not using random assignment

When it is not possible to randomly assign learners to treatments, researchers can (a) use statistics that take into account measurements on each group made before, as well as after, the treatments (that is, a pre-test – post-test design); (b) offer evidence to persuade readers that the groups are equivalent before the experiment. Kibirige, Osodo and Tlala seek to use both of these steps.

Do the groups start as equivalent?

Kibirige, Osodo and Tlala present evidence from the pre-test to suggest that the learners in the two groups are starting at about the same level. In practice, pre-tests seldom lead to identical outcomes for different groups. It is therefore common to use inferential statistics to test for whether there is a statistically significant difference between pre-test scores in the groups. That could be reasonable, if there was an agreed criterion for deciding just how close scores should be to be seen as equivalent. In practice, many researchers only check that the differences do not reach statistical significance at the level of probability <0.05: that it they look to see if there are strong differences, and, if not, declare this is (or implicitly treat this as) equivalence!

This is clearly an inadequate measure of equivalence as it will only filter out cases where there is a difference so large it is found to be very unlikely to be a chance effect.


If we want to make sure groups start as 'equivalent', we cannot simply look to exclude the most blatant differences. (Original image by mcmurryjulie from Pixabay)

See 'Testing for initial equivalence'


We can see this in the Kibirige and colleagues' study where the researchers list mean scores and standard deviations for each question on the pre-test. They report that:

"The results (Table 1) reveal that there was no significant difference between the pre-test achievement scores of the CG [control group] and EG [experimental group] for questions (Appendix 2). The p value for these questions was greater than 0.05."

Kibirige, Osodo & Tlala, 2014, p.302

Now this paper is published "licensed under Creative Commons Attribution 3.0 License" which means I am free to copy from it here.



According to the results table, several of the items (1.2, 1.4, 2.6) did lead to statistically significantly different response patterns in the two groups.

Most of these questions (1.1-1.4; 2.1-2.8; discussed below) are objective questions, so although no marking scheme was included in the paper, it seems they were marked as correct or incorrect.

So, let's take as an example question 2.5 where readers are told that there was no statistically significant difference in the responses of the two groups. The mean score in the control group was 0.41, and in the experimental group was 0.27. Now, the paper reports that:

"Forty nine (49) learners (31 males and 18 females) were from school A and acted as the experimental group (EG) whereas the control group (CG) consisted of 44 learners (18 males and 26 females) from school B."

Kibirige, Osodo & Tlala, 2014, p.302

So, according to my maths,


Correct responsesIncorrect responses
School A (49 students)(0.27 ➾) 1336
School B (44 students)(0.41 ➾) 1826
pre-test results for an item with no statistically significant difference between groups

"The achievement of the EG and CG from pre-test results were not significantly different which suggest that the two groups had similar understanding of concepts" (p.305).
Pre-test results for an item with no statistically significant difference between groups (offered as evidence of 'similar' levels of initial understanding in the two groups)

While, technically, there may have been no statistically significant difference here, I think inspection is sufficient to suggest this does not mean the two groups were initially equivalent in terms of performance on this item.


Data that is normally distributed falls on a 'bell-shaped' curve

(Image by mcmurryjulie from Pixabay)


Inspection of this graphic also highlights something else. Student's t-test (used by the authors to produce the results in their table 1), is a parametric test. That means it can only be used when the data fit certain criteria. The data sample should be randomly selected (not true here) and normally distributed. A normal distribution means data is distributed in a bell-shaped Gaussian curve (as in the image in the blue circle above).If Kibirige, Osodo & Tlala were applying the t-test to data distributed as in my graphic above (a binary distribution where answers were either right or wrong) then the test was invalid.

So, to summarise, the authors suggest there "was no significant difference between the pre-test achievement scores of the CG and EG for questions", although sometimes there was (according to their table); and they used the wrong test to check for this; and in any case lack of statistical significance is not a sufficient test for equivalence.

I should note that the journal does claim to use peer review to evaluate submissions to see if they are ready for publication!

Comparing learning gains between the two groups

At one level equivalence might not be so important, as the authors used an ANCOVA (Analysis of Covariance) test which tests for difference at post-test taking into account the pre-test. Yet this test also has assumptions that need to be tested for and met, but here seem to have just been assumed.

However, to return to an even more substantive point I made earlier, as the learners were not randomly assigned to the two different conditions /treatments, what should be compared are the two school-based groups (i.e., the unit of analysis should be the school group) but that (i.e., a sample of 1 class, rather than 40+ learners, in each condition) would not facilitate using inferential statistics to make a comparison. So, although the authors conclude

"that the achievement of the EG [taking n=49] after treatment (mean 34. 07 ± 15. 12 SD) was higher than the CG [taking n =44] (mean 20. 87 ± 12. 31 SD). These means were significantly different"

Kibirige, Osodo & Tlala, 2014, p.303

the statistics are testing the outcomes as if 49 units independently experienced one teaching approach and 44 independently experienced another. Now, I do not claim to be a statistics expert, and I am aware that most researchers only have a limited appreciation of how and why stats. tests work. For most readers, then, a more convincing argument may be made by focussing on the control of variables.

Controlling variables in educational experiments

The ability to control variables is a key feature of laboratory science, and is critical to experimental tests. Control of variables, even identification of relevant variables, is much more challenging outside of a laboratory in social contexts – such as schools.

In the case of Kibirige, Osodo & Tlala's study, we can set out the overall experimental design as follows


Independent
variable
Teaching approach:
– predict-observe-explain (experimental)
– lectures (comparison condition)
Dependent
variable
Learning gains
Controlled
variable(s)
Anything other than teaching approach which might make a difference to student learning
Variables in Kibirige, Osodo & Tlala's study

The researchers set up the two teaching conditions, measure learning gains, and need to make sure any other factors which might have an effect on learning outcomes, so called confounding variables, are controlled so the same in both conditions.

Read about confounding variables in research

Of course, we cannot be sure what might act as a confounding variable, so in practice we may miss something which we do not recognise is having an effect. Here are some possibilities based on my own (now dimly recalled) experience of teaching in school.

The room may make a difference. Some rooms are

  • spacious,
  • airy,
  • well illuminated,
  • well equipped,
  • away from noisy distractions
  • arranged so everyone can see the front, and the teacher can easily move around the room

Some rooms have

  • comfortable seating,
  • a well positioned board,
  • good acoustics

Others, not so.

The timetable might make a difference. Anyone who has ever taught the same class of students at different times in the week might (will?) have noticed that a Tuesday morning lesson and a Friday afternoon lesson are not always equally productive.

Class size may make a difference (here 49 versus 44).

Could gender composition make a difference? Perhaps it was just me, but I seem to recall that classes of mainly female adolescents had a different nature than classes of mainly male adolescents. (And perhaps the way I experienced those classes would have been different if I had been a female teacher?) Kibirige, Osodo and Tlala report the sex of the students, but assuming that can be taken as a proxy for gender, the gender ratios were somewhat different in the two classes.


The gender make up of the classes was quite different: might that influence learning?

School differences

A potentially major conflating variable is school. In this study the researchers report that the schools were "neighbouring" and that

Having been drawn from the same geographical set up, the learners were of the same socio-cultural practices.

Kibirige, Osodo & Tlala, 2014, p.302

That clearly makes more sense than choosing two schools from different places with different demographics. But anyone who has worked in schools will know that two neighbouring schools serving much the same community can still be very different. Different ethos, different norms, and often different levels of outcome. Schools A and B may be very similar (but the reader has no way to know), but when comparing between groups in different schools it is clear that school could be a key factor in group outcome.

The teacher effect

Similar points can be made about teachers – they are all different! Does ANY teacher really believe that one can swap one teacher for another without making a difference? Kibirige, Osodo and Tlala do not tell readers anything about the teachers, but as students were taught in their own schools the default assumption must be that they were taught by their assigned class teachers.

Teachers vary in terms of

  • skill,
  • experience,
  • confidence,
  • enthusiasm,
  • subject knowledge,
  • empathy levels,
  • insight into their students,
  • rapport with classes,
  • beliefs about teaching and learning,
  • teaching style,
  • disciplinary approach
  • expectations of students

The same teacher may perform at different levels with different classes (preferring to work with different grade levels, or simply getting on/not getting on with particular classes). Teachers may have uneven performance across topics. Teachers differentially engage with and excel in different teaching approaches. (Even if the same teacher had taught both groups we could not assume they were equally skilful in both teaching conditions.)

Teacher variable is likely to be a major difference between groups.

Meta-effects

Another conflating factor is the very fact of the research itself. Students may welcome a different approach because it is novel and a change from the usual diet (or alternatively they may be nervous about things being done differently) – but such 'novelty' effects would disappear once the new way of doing things became established as normal. In which case, it would be an effect of the research itself and not of what is being researched.

Perhaps even more powerful are expectancy effects. If researchers expect an innovation to improve matters, then these expectations get communicated to those involved in the research and can themselves have an affect. Expectancy effects are so well demonstrated that in medical research double-blind protocols are used so that neither patients nor health professionals they directly engage with in the study know who is getting which treatment.

Read about expectancy effects in research

So, we might revise the table above:


Independent
variable
Teaching approach:
– predict-observe-explain (experimental)
– lectures (comparison condition)
Dependent
variable
Learning gains
Potentially conflating
variables
School effect
Teacher effect
Class size
Gender composition of teaching groups
Relative novelty of the two teaching approaches
Variables in Kibirige, Osodo & Tlala's study

Now, of course, these problems are not unique to this particular study. The only way to respond to teacher and school effects of this kind is to do large scale studies, and randomly assign a large enough number of schools and teachers to the different conditions so that it becomes very unlikely there will be systematic differences between treatment groups.

A good many experimental educational research studies that compare treatments across two classes or two schools are subject to potentially conflating variables that invalidate study findings and make any consequent conclusions and recommendations untrustworthy (Taber, 2019a). Strangely, often this does not seem to preclude publication in research journals. 1

Advice on controls in scientific investigations:

I can probably do no better than to share some advice given to both researchers, and readers of research papers, in an immunology textbook from 1910:

"I cannot impress upon you strongly enough never to operate without the necessary controls. You will thus protect yourself against grave errors and faulty diagnoses, to which even the most competent investigator may be liable if he [or she] fails to carry out adequate controls. This applies above all when you perform independent scientific investigations or seek to assess them. Work done without the controls necessary to eliminate all possible errors, even unlikely ones, permits no scientific conclusions.

I have made it a rule, and would advise you to do the same, to look at the controls listed before you read any new scientific papers… If the controls are inadequate, the value of the work will be very poor, irrespective of its substance, because none of the data, although they may be correct, are necessarily so."

Julius Citron

The comparison condition

It seems clear that in this study there is no strict 'control' of variables, and the 'control' group is better considered just a comparison group. The authors tell us that:

"the control group (CG) taught using traditional methods…

the CG used the traditional lecture method"

Kibirige, Osodo & Tlala, 2014, pp.300, 302

This is not further explained, but if this really was teaching by 'lecturing' then that is not a suitable approach for teaching school age learners.

This raises two issues.

There is a lot of evidence that a range of active learning approaches (discussion work, laboratory work, various kinds of group work) engages and motivates students more than whole lessons spent listening to a teacher. Therefore any approach which basically involves a mixture of students doing things, discussing things, engaging with manipulatives and resources as well as listening to a teacher, tends to be superior to just being lectured. Good science teaching normally involves lessons sequenced into a series of connected episodes involving different types of student activity (Taber, 2019b). Teacher presentations of the target scientific account are very important, but tend to be effective when embedded in a dialogic approach that allows students to explore their own thinking and takes into account their starting points.

So, comparing P-O-E with lectures (if they really were lectures) may not tell researchers much about P-O-E specifically, as a teaching approach. A better test would compare P-O-E with some other approach known to be engaging.

"Many published studies argue that the innovation being tested has the potential to be more effective than current standard teaching practice, and seek to demonstrate this by comparing an innovative treatment with existing practice that is not seen as especially effective. This seems logical where the likely effectiveness of the innovation being tested is genuinely uncertain, and the 'standard' provision is the only available comparison. However, often these studies are carried out in contexts where the advantages of a range of innovative approaches have already been well demonstrated, in which case it would be more informative to test the innovation that is the focus of the study against some other approach already shown to be effective."

Taber, 2019a, p.93

The second issue is more ethical than methodological. Sometimes in published studies (and I am not claiming I know this happened here, as the paper says so little about the comparison condition) researchers seem to deliberately set up a comparison condition they have good reason to expect is not effective: such as asking a teacher to lecture and not include practical work or discussion work or use of digital learning technologies and so forth. Potentially the researchers are asking the teacher of the 'control' group to teach less effectively than normally to bias the experiment towards their preferred outcome (Taber, 2019a).

This is not only a failure to do good science, but also an abuse of those learners being deliberately subjected to poor teaching. Perhaps in this study the class in School B was habitually taught by being lectured at, so the comparison condition was just what would have occurred in the absence of the research, but this is always a worry when studies report comparison conditions that seem to deliberately disadvantage students. (This paper does not seem to report anything about obtaining voluntary informed consent from participants, nor indeed about how access to the schools was negotiated. )

"In most educational research experiments of the type discussed in this article, potential harm is likely to be limited to subjecting students (and teachers) to conditions where teaching may be less effective, and perhaps demotivating…It can also potentially occur in control conditions if students are subjected to teaching inputs of low effectiveness when better alternatives were available. This may be judged only a modest level of harm, but – given that the whole purpose of experiments to test teaching innovations is to facilitate improvements in teaching effectiveness – this possibility should be taken seriously."

Taber, 2019a, p.94

Validity of measurements

Even leaving aside all the concerns expressed above, the results of a study of this kind depends upon valid measurements. Assessment items must test what they claim to test, and their analysis should be subject to quality control (and preferably blind to which condition a script being analysed derives form). Kibirige, Osodo and Tlala append the test they used in the study (Appendix 2, pp.309-310), which is very helpful in allowing readers to judge at least its face validity. Unfortunately, they do not include a mark/analysis scheme to show what they considered responses worthy of credit.

"The [Achievement Test] consisted of three questions. Question one consisted of five statements which learners had to classify as either true or false. Question two consisted of nine [sic, actually eight] multiple questions which were used as a diagnostic tool in the design of the teaching and learning materials in addressing misconceptions based on prior knowledge. Question three had two open-ended questions to reveal learners' views on how salts dissolve in water (Appendix 1 [sic, 2])."

Kibirige, Osodo & Tlala, 2014, p.302

"Question one consisted of five statements which learners had to classify as either true or false."

Question 1 is fairly straightforward.

1.2: Strictly all salts do dissolve in water to some extent. I expect that students were taught that some salts are insoluble. Often in teaching we start with simple dichotomous models (metal-non metal; ionic-covalent; soluble-insoluble; reversible – irreversible) and then develop these to more continuous accounts that recognise difference of degree. It is possible here then that a student who had learnt that all salts are soluble to some extent might have been disadvantaged by giving the 'wrong' ('True') response…

…although[sic] , actually, there is perhaps no excuse for answering 'True' ('All salts can dissolve in water') here as a later question begins "3.2. Some salts does [sic] not dissolve in water. In your own view what happens when a salt do [sic] not dissolve in water".

Despite the test actually telling students the answer to this item, it seems only 55% of the experimental group, and 23% of the control group obtained the correct answer on the post test – precisely the same proportions as on the pre-test!



1.4: Seems to be 'False' as the ions exist in the salt and are not formed when it goes into solution. However, I am not sure if that nuance of wording is intended in the question.

Question 2 gets more interesting.


"Question two consisted of nine multiple questions" (seven shown here)

I immediately got stuck on question 2.2 which asked which formula (singular, not 'formula/formulae', note) represented a salt. Surely, they are all salts?

I had the same problem on 2.4 which seemed to offer three salts that could be formed by reacting acid with base. Were students allowed to give multiple responses? Did they have to give all the correct options to score?

Again, 2.5 offered three salts which could all be made by direct reaction of 'some substances'. (As a student I might have answered A assuming the teacher meant to ask about direct combination of the elements?)

At least in 2.6 there only seemed to be two correct responses to choose between.

Any student unsure of the correct answer in 2.7 might have taken guidance from the charges as shown in the equation given in question 2.8 (although indicated as 2.9).

How I wished they had provided the mark scheme.



The final question in this section asked students to select one of three diagrams to show what happens when a 'mixture' of H2O and NaCl in a closed container 'react'. (In chemistry, we do not usually consider salt dissolving as a reaction.)

Diagram B seemed to show ion pairs in solution (but why the different form of representation?) Option C did not look convincing as the chloride ions had altogether vanished from the scene and sodium seemed to have formed multiple bonds with oxygen and hydrogens.

So, by a process of elimination, the answer is surely A.

  • But components seem to be labelled Na and Cl (not as ions).
  • And the image does not seem to represent a solution as there is much too much space between the species present.
  • And in salt solution there are many water molecules between solvated ions – missing here.
  • And the figure seems to show two water molecules have broken up, not to give hydrogen and hydroxide ions, but lone oxygen (atoms, ions?)
  • And why is the chlorine shown to be so much larger in solution than it was in the salt? (If this is meant to be an atom, it should be smaller than the ion, not larger. The real mystery is why the chloride ions are shown so much smaller than smaller sodium ions before salvation occurs when chloride ions have about double the radii of sodium ions.)

So diagram A is incredible, but still not quite as crazy an option as B and C.

This is all despite

"For face validity, three Physical Sciences experts (two Physical Sciences educators and one researcher) examined the instruments with specific reference to Mpofu's (2006) criteria: suitability of the language used to the targeted group; structure and clarity of the questions; and checked if the content was relevant to what would be measured. For reliability, the instruments were piloted over a period of two weeks. Grade 10 learners of a school which was not part of the sample was used. Any questions that were not clear were changed to reduce ambiguity."

Kibirige, Osodo & Tlala, 2014, p.302

One wonders what the less clear, more ambiguous, versions of the test items were.

Reducing 'misconceptions'

The final question was (or, perhaps better, questions were) open-ended.



I assume (again, it would be good for authors of research reports to make such things explicit) these were the questions that led to claims about the identified alternative conceptions at pre-test.

"The pre-test revealed a number of misconceptions held by learners in both groups: learners believed that salts 'disappear' when dissolved in water (37% of the responses in the 80% from the pre-test) and that salt 'melts' when dissolved in water (27% of the responses in the 80% from the pre-test)."

Kibirige, Osodo & Tlala, 2014, p.302

As the first two (sets of) questions only admit objective scoring, it seems that this data can only have come from responses to Q3. This means that the authors cannot be sure how students are using terms. 'Melt' is often used in an everyday, metaphorical, sense of 'melting away'. This use of language should be addressed, but it may not be a conceptual error

As the first two (sets of) questions only admit objective scoring, it seems that this data can only have come from responses to Q3. This means that the authors cannot be sure how students are using terms. 'Melt' is often used in an everyday, metaphorical, sense of 'melting away'. This use of language should be addressed, but it may not (for at least some of these learners) be a conceptual error as much as poor use of terminology. .

To say that salts disappear when they dissolve does not seem to me a misconception: they do. To disappear means to no longer be visible, and that's a fair description of the phenomenon of salt dissolving. The authors may assume that if learners use the term 'disappear' they mean the salt is no longer present, but literally they are only claiming it is not directly visible.

Unfortunately, the authors tell us nothing about how they analysed the data collected form their test, so the reader has no basis for knowing how they interpreted student responded to arrive at their findings. The authors do tell us, however, that:

"the intervention had a positive effect on the understanding of concepts dealing with dissolving of salts. This improved achievement was due to the impact of POE strategy which reduced learners' misconceptions regarding dissolving of salts"

Kibirige, Osodo & Tlala, 2014, p.305

Yet, oddly, they offer no specific basis for this claim – no figures to show the level at which "learners believed that salts 'disappear' when dissolved in water …and that salt 'melts' when dissolved in water" in either group at the post-test.


'disappear' misconception'melt' misconception
pre-test:
experimental group
not reportednot reported
pre-test:
comparison group
not reportednot reported
pre-test:
total
(0.37 x 0.8 x 93 =)
24.5 (!?)
(0.27 x 0.8 x 93 =)
20
post-test:
experimental group
not reportednot reported
post-test:
comparison group
not reportednot reported
post-test:
total
not reportednot reported
Data presented about the numbers of learners considered to hold specific misconceptions said to have been 'reduced' in the experimental condition

It seems journal referees and the editor did not feel some important information was missing here that should be added before publication.

In conclusion

Experiments require control of variables. Experiments require random assignment to conditions. Quasi-experiments, where random assignment is not possible, are inherently weaker studies than true experiments.

Control of variables in educational contexts is often almost impossible.

Studies that compare different teaching approaches using two different classes each taught by a different teacher (and perhaps not even in the same school) can never be considered fair comparisons able to offer generalisable conclusions about the relative merits of the approaches. Such 'experiments' have no value as research studies. 1

Such 'experiments' are like comparing the solubility of two salts by (a) dropping a solid lump of 10g of one salt into some cold water, and (b) stirring a finely powdered 35g sample of the other salt into hot propanol; and watching to see which seems to dissolve better.

Only large scale studies that encompass a wide range of different teachers/schools/classrooms in each condition are likely to produce results that are generalisable.

The use of inferential statistical tests is only worthwhile when the conditions for those statistical tests are met. Sometimes tests are said to be robust to modest deviations from such acquirements as normality. But applying tests to data that do not come close to fitting the conditions of the test is pointless.

Any research is only as trustworthy as the validity of its measurements. If one does not trust the measuring instrument or the analysis of measurement data then one cannot trust the findings and conclusions.


The results of a research study depend on an extended chain of argumentation, where any broken link invalidates the whole chain. (From 'Critical reading of research')

So, although the website for the Mediterranean Journal of Social Science claims "All articles submitted …undergo to a rigorous double blinded peer review process", I think the peer reviewers for this article were either very generous, very ignorant, or simply very lazy. That may seem harsh, but peer review is meant to help authors improve submissions till they are worthy of appearing in the literature, and here peer review has failed, and the authors (and readers of the journal) have been let down by the reviewers and the editor who ultimately decided this study was publishable in this form.

If I asked a graduate student (or indeed an undergraduate student) to evaluate this paper, I would expect to see a response something along these sorts of lines:


Applying the 'Critical Reading of Empirical Studies Tool' to 'The effect of predict-observe-explain strategy on learners' misconceptions about dissolved salts'

I still think P-O-E is a very valuable part of the science teacher's repertoire – but this paper can not contribute anything to support to that view.

Work cited:

Note

1 A lot of these invalid experiments get submitted to research journals, scrutinised by editors and journal referees, and then get published without any acknowledgement of how they fall short of meeting the conditions for a valid experiment. (See, for example, examples discussed in Taber 2019a.) It is as if the mystique of experiment is so great that even studies with invalid conclusions are considered worth publishing as long as the authors did an experiment.

A corny teaching analogy

Pop goes the comparison


Keith S. Taber


The order of corn popping is no more random than the roll of a dice.


I was pleased to read about a 'new' teaching analogy in the latest 'Education in Chemistry' (the Royal Society of Chemistry's education magazine) – well, at least it was new to me. It was an analogy that could be demonstrated easily in the school science lab, and, according to Richard Gill (@RGILL_Teach on Twitter), went down really well with his class.

Teaching analogies

Analogies are used in teaching and in science communication to help 'make the unfamiliar familiar', to show someone that something they do not (yet) know about is actually, in some sense at least, a bit like something they are already familiar with. In an analogy, there is a mapping between some aspect(s) of the structure of the target ideas and the structure of the familiar phenomenon or idea being offered as an analogue. Such teaching analogies can be useful to the extent that someone is indeed highly familiar with the 'analogue' (and more so than with the target knowledge being communicated); that there is a helpful mapping across between the analogue and the target; and that comparison is clearly explained (making clear which features of the analogue are relevant, and how).

Read about analogies in science


The analogy is discussed in the July 2022 Edition of Education in Chemistry, and on line.

Richard Gill suggests that 'Nuclear decay is a tough concept' to teach and learn, but after making some popcorn he realised that popping corn offered an analogy for radioactive decay that he could demonstrate in the classroom.

Richard Gill describes how

"I tell the students I'm going to heat up the oil; I'm going to give the kernels some energy, making them unstable and they're going to want to pop. I show them under the visualiser, then I ask, 'which kernel will pop first?' We have a little competition. Why do I do this? It links to nuclear decay being random. We know an unstable atom will decay, but we don't know which atom will decay or when it will decay, just like we don't know which kernel will pop when."

Gill, 2022

In the analogy, the corn (maize) kernels represents atoms or nuclei of an unstable isotope, and the popped corn the decay product, daughter atoms or nuclei. 1



Richard Gill homes in on a key feature of radioactive decay which may seem counter-intuitive to learners, but which is actually a pattern found in many different phenomena – exponential decay. The rate of radioactive decay falls (decays, confusingly) over time. Theoretically the [radioactive] decay rate follows a very smooth [exponential] decay curve. Theoretically, because of another key feature of radioactive decay that Gill highlights – its random nature!

It may seem that something which occurs by random will not lead to a regular pattern, but although in radioactivity the behaviour of an individual nucleus (in terms of when it might decay) cannot be predicted, when one deals with vast numbers of them in a macroscopic sample, a clear pattern emerges. Each different type of unstable atom has an associated half-life which tells us when half of a sample will have decayed. These half-lives can vary from fractions of a second to vast numbers of years, but are fixed for a particular nuclide.

Richard Gill notes that he can use the popping corn demonstration as background for teaching about half-life,

I usually follow this lesson with the idea of half-lives. The concept of half-lives now makes sense. Why are there fewer unpopped kernels over time? Because they're popping. Why do radioactive materials become less radioactive over time? Because they're decaying.

Gill, 2022

Perhaps he could even develop his demonstration to model the half-life of decay?

Modelling the popcorn decay curve

The Australian Earth Science Education blog suggests

"Popcorn can be used to model radioactive decay. It is a lot safer than using radioactive isotopes, as well as much tastier"

and offers instructions for a practical activity with a bag of corn and a microwave to collect data to plot a decay curve (see https://ausearthed.blogspot.com/2020/04/radioactive-popcorn.html). Although this seems a good idea, I suspect this specific activity (which involves popping the popping corn in and out of the oven) might be too convoluted for learners just being introduced to the topic, but could be suitable for more advanced learners.

However, The Association of American State Geologists suggests an alternative approach that could be used in a class context where different groups of students put bags of popcorn into the microwave for different lengths of time to allow the plotting of a decay curve by collating class results (https://www.earthsciweek.org/classroom-activities/dating-popcorn).

Another variants is offered by The University of South Florida's' Spreadsheets Across the Curriculum' (SSAC) project. SSAC developed an activity ("Radioactive Decay and Popping Popcorn – Understanding the Rate Law") to simulate the popping of corn using (yes, you guessed) a spreadsheet to model the decay of corn popping, as a way of teaching about radioactive decay!

This is more likely to give a good decay curve, but one cannot help feeling it loses some of the attraction of Richard Gill's approach with the smell, sound and 'jumping' of actual corn being heated! One might also wonder if there is any inherent pedagogic advantage to simulating popping corn as a model for simulating radioactive decay – rather than just using the spreadsheet to directly model radioactive decay?

Feedback cycles

The reason the popping corn seems to show the same kind of decay as radioactivity, is because it can be represented with the same kind of feedback cycle.

This pattern is characteristic of simple systems where

  • a change is brought about by a driver
  • that change diminishes the driver

In radioactive decay, the level of activity is directly proportional to the number of unstable nuclei present (i.e., the number of nuclei that can potentially decay), but the very process of decay reduces this number (and so reduces the rate of decay).

So,

  • when there are many unstable nuclei
  • there will be much decay
  • quickly reducing the number of unstable nuclei
    • so reducing the rate of decay
    • so reducing the rate at which unstable nuclei decay
      • so reducing the rate at which decay is reducing

and so forth.


Exponential decay is a characteristic of systems with a simple negative feedback cycle
(source: ASCEND project)

Recognising this general pattern was the focus of an 'enrichment' activity designed for upper secondary learners in the Gatsby SEP supported ASCEND project which presented learners with information about the feedback cycle in radioactive decay; and then had them set up and observe some quite different phenomena (Taber, 2011):

  • capacitor discharge
  • levelling of connected uneven water columns
  • hot water cooling

In each case the change driven by some 'driver' reduced the driver itself (so a temperature difference leads to heat transfer which reduces the temperature difference…).

Read about the classroom activity

In Richard Gill's activity the driver is the availability of intact corn kernels being heated such that water vapour is building up inside the kernel – something which is reduced by the consequent popping of those kernels.


A negative feedback cycle

Mapping the analogy

A key feature of an analogy is that it can be understood as a kind of mapping between two conceptual structures. The making popcorn demonstration seems a very simple analogue, but mapping out the analogy might be useful (at least for the teacher) to clarify it. Below I present a representation of a mapping between popping corn and radioactive decay, suggesting which aspects of the analogue (the popping corn) map onto the target scientific concept.


Mapping an analogy between making pop-corn and radioactive decay

In this mapping I have used colour to highlight differences between the two (conceptual) structures. Perhaps the most significant difference is represented by the blue (target concept) versus red (analogue) features.


Most analogies only map to a limited extent

There will be aspects of an analogue that do not map onto anything on the target, and sometimes there will be an important feature of the target which has no analogous feature in the analogue. There is always the possibility that irrelevant features of an analogue will be mapped across by learners.

As one example, the comparison of the atom with a tiny solar system was once an image often used as a teaching analogy, yet it seems learners often have limited understandings of both analogue and target, and may be transferring across inappropriately – such as assuming the electrons are bound to the atom by gravity (Taber, 2013a). Where students have an alternative conception of the analogue (the earth attracts the sun, but not vice versa) they will often assume the same pattern in the target (the nucleus is not attracted to the electrons).

Does this matter? Well, yes and no. A teaching analogy is used to introduce a technical scientific concept by making it seem familiar. This is a starting point to be built upon (so, Richard Gill tells us that he will build upon the diminishing activity of cooking corn in his his popcorn demonstration to introduce the idea of half-life), so it does not matter if students do not fully understand everything immediately. (Indeed, it is naive to assume most learners could acquire a new complex set of ideas all at once: learning is incremental – see Key ideas for constructivist teaching).

Analogies can act as 'scaffolds' to help learners venture out from their existing continents of knowledge towards new territory. Once this 'anchor' in learners' experience is established one can, so to speak, disembark from the scaffolding raft the onto the more solid ground of the shore.

Read about scaffolding learning

However, it is important to be careful to make sure

  • (a) learners appreciate the limitations of models (such an analogies) – that they are thinking and learning tools, and not absolute accounts of the natural word; and that
  • (b) the teacher helps dismantle the 'scaffolding' once it is not needed, so that it is not retained as part of the learners 'scientific' account.
Weak anthropomorphism

An example of that might be Gill's use of anthropomorphism.

…unstable atoms/nuclei need to become stable…

…I'm going to give the kernels some energy, making them unstable and they're going to want to pop…

Anthropomorphism

This type of language is often used to offer narratives that are more readily appreciated by learners (making the unfamiliar familiar, again) but students can come to use such language habitually, and it may come to stand in place of a more scientific account (Taber & Watts, 1996). So, 'weak' anthropomorphism used to help introduce something abstract and counter-intuitive is useful, but 'strong' anthropomorphism that comes to be adopted as a scientific explanation (e.g., nuclei decay because they want to be stable) is best avoided by seeking to move beyond the figurative language as soon as students are ready.

Read about anthropomorphism

The 'negative' analogy

The mapping diagram above may highlight several potential teaching points that may be considered (perhaps not to be introduced immediately, but when the new concepts are later reinforced and developed).

Where does the energy come from?

One key difference between the two systems is that radioactive decay is (we think) completely spontaneous, whereas the corn only pops because we cook it (Gill used a Bunsen burner) and left to its own devices remains as unpopped kernels.

Related to this, the source of energy for popping corn is the applied heat, whereas unstable nuclei are already in a state of high energy and so have an 'internal' source for their activity. This a key difference that will likely be obvious to some, but certainly not all learners in most classes.

When is random, random?

A more subtle point relates to the 'random' nature of the two events. I suggest subtle, because there are many published reports written by researchers in science education which suggests even supposed experts can have a pretty shaky ideas of what counts as random (Taber, 2013b).

Read 'Nothing random about a proper scientific evaluation?'

Read about the randomisation criterion

As far as scientists understand, the decay of one unstable nucleus in a sample of radioactive material (rather than another) is a random process. It is not just that we are not yet able to predict when a particular nucleus will decay – according to current scientific accounts it is not possible to predict in principle. This is an idea that even Einstein found difficult to accept.

That is not true with the corn. Presumably there are subtle differences between kernels – some have slightly more water content, or slightly weaker casings. Perhaps more significantly, some are heated more than others due to their position in the pan and the position of the heat source, or due differential exposure to the cooking oil… In principle it would be possible to measure relevant variables and model the set up to make good predictions. (In principle, even if in practice a very complex task.) The order of corn popping is no more random than…say…the roll of a dice. That is, physics tells us it follows natural laws, even if we are not in a position to fully model the phenomenon.

(We might suggest that a student who considered the corn popping as a random event because she saw apparently identical kernels all being heated in the same pan at the same time is simply missing certain 'hidden variables'. Einstein wondered if there were also 'hidden variables' that science had not yet uncovered which could explain random events such as why one nucleus rather than another decays at a particular moment.)

On the recoil

Perhaps a more significant difference is what is observed. The corn are observed 'jumping' (more anthropomorphic language?) Physics tells us that momentum must always be conserved, and the kernels act like tiny jet propelled rockets. That is, as steam is released when the kernel bursts, the rest of the kernel 'jumps' in the opposite direction. (That is, by Newton's third law, there is a reaction force to the force pushing the steam out of the kernel. Momentum is a vector, so it is possible for a stationary object to break up into several moving parts with conservation of momentum.)

Something similar happens in radioactive decay. The emitted radiation carries away momentum, and the remaining 'daughter' nucleus recoils – although if the material is in the solid state this effect is dissipated by being spread across the lattice. So, the radioactivity which is detected is not analogous to the jumping corn, but to the steam it has released.

Is this important? That likely depends upon the level being taught. If the topics is being introduced to 14-16 years-olds, perhaps not. If the analogy is being explored with post-compulsory students doing an elective course, then maybe. (If not in chemistry; then certainly in physics, where learners are expected to to apply the principle of conservation of momentum across various scenarios.)

Will this be on the exam?

When I drafted this, I suspected most readers might find my caveats above about the limitations of the analogy, a bit pernickety (the kind of things an academic who's been out of the school classroom too long and forgotten the realities of working with pupils might dream up), but then I found what claims to be an Edexcel GCE Physics paper from 2012 (paper reference 6PH05/01) on line. In this paper, one question begins:

"In a demonstration to her class, a teacher pours popcorn kernels onto a hot surface and waits for them to pop…".

Much to my delight, I found the first part of this question asked learners:

"How realistic is this demonstration as an analogy to radioactive decay?

Consider aspects of the demonstration that are similar to radioactive decay and aspects that are different"

Examination paper asking physics students to identify positive and negative aspects of the analogy.

Classes of radioactivity

One further difference did occur to me that may be important. At some level this analogy works for radioactivity regardless of what is being emitted from an excited nucleus. However, the analogy seems clearer for the emission of an alpha particle, or a beta particle, or a neutron, than in the case of gamma radiation.

Although in gamma decay an excited nucleus relaxes to a lower energy state emitting a photon, it may not be as obvious to learners that the nucleus has changed (arguably, it has not 'substantially' changed as there is no change of substance) – as it has the same mass number and charge as before. This may be a point to be raised if moving on later to discuss different classes of radioactivity.

Or, perhaps, with gamma decay one can use a different version of the analogy?

Another corny analogy

Although I do not think I had never come across this analogy before reading the Education in Chemistry piece (perhaps because I do not make myself popcorn), Richard Gill does not seem to be the only person to have noticed this comparison. (They say 'great minds think alike' – and not just physicist Henri Poincaré thinking like Kryten from'Red Dwarf'). When I looked around the world-wide web I found there were two different approaches to using corn kernels to model radioactivity.

Some people use a similar demonstration to Mr Gill.2 However, there was also a different approach to using the corn. There were variations on this 3, but the gist was that

  • one starts with a large number of kernels
  • they are agitated (e.g., shaken in a box with different cells, poured onto the bench…)
  • then inspected to see which are pointing in some arbitrary direction designated as representing decay
  • the 'decayed' kernels are removed and counted
  • the rest of the sample is agitated again
  • etc.
Choose a direction to represent decay, and remove the aligned kernels as the 'activity' in that interval.
(Original image by Susie from Pixabay)

This lacks the excitement of popping corn, but could be a better model for gamma decay where the daughter nucleus is at a different energy after decay, but is otherwise unchanged.

Perhaps this version of the analogy could be improved by using a tray with an array of small dips (like tiny spot tiles) just the right size to stand corn kernels in the depressions with their points upwards. Then, after a very gentle tap on the bench next to the tile, those which have 'relaxed' from the higher energy state (i.e., fallen onto their sides) would be considered decayed. This would more directly model the change in potential energy and also avoid the need to keep removing kernels from the context (just as daughter atoms usually remain in a sample of radioactive material), as further gentle tapes are unlikely to excite them back to the higher energy state. 4

Or, dear reader, perhaps I've just been thinking about this analogy for just a little too long now.


Sources:

Notes

1 Referring to the nuclei before and after radioactive decay as 'parents' and 'daughters' seems metaphorical, but this use has become so well established (in effect, these are now technical terms) that these descriptors are now what are known (metaphorically!) as 'dead metaphors'.

Read about metaphors in science


2 Here are some examples I found:

Jennifer Wenner, University of Wisconsin-Oshkosh uses the demonstration in undergraduate geosciences:

"I usually perform it after I have introduced radioactive decay and talked about how it works. It only takes a few minutes and I usually talk while I am waiting for the "decay" to happen 'Using Popcorn to Simulate Radioactive Decay'"

https://serc.carleton.edu/quantskills/activities/popcorn.html

The Institute of Physics (IoP) include this activity as part of their 'Modelling decay in the laboratory Classroom Activity for 14-16' but suggest the pan lid is kept on as a safety measure. (Any teacher planing on carrying out any activity in the lab., should undertake a risk assessment first.)

I note the IoP also suggests care in using the term 'random':

Teacher: While we were listening to that there didn't seem to be any fixed pattern to the popping. Is there a word that we could use to describe that?

Lydia: Random?

Teacher: Excellent. But the word random has a very special meaning in physics. It isn't like how we think of things in everyday life. When do you use the word random in everyday life?

Lydia: Like if it's unpredictable? Or has no pattern?

https://spark.iop.org/modelling-decay-laboratory

Kieran Maher and 'Kikibooks contributors' suggests readers of their 'Basic Physics of Nuclear Medicine' could "think about putting some in in a pot, adding the corn, heating the pot…" and indeed their readers "might also like to try this out while considering the situation", but warn readers not to "push this popcorn analogy too far" (pp.20-21).


3 Here are some examples I found:

Florida High School teacher Marc Mayntz offers teachers' notes and student instructions for his 'Nuclear Popcorn' activity, where students are told to "Carefully 'spill' the kernels onto the table".

Chelsea Davis (a student?) reports her results in 'Half Life Popcorn Lab' from an approach where kernels are shaken in a Petri dish.

Redwood High School's worksheet for 'Radioactive Decay and Half Life Simulation' has students work with 100 kernels in a box with its sides labelled 1-4 (kernels that have the small end pointed toward side 1 after "a sharp, single shake (up and down, not side to side)" are considered decayed). Students are told at the start to to "Count the popcorn kernels to be certain there are exactly 100 kernels in your box".

This activity is repeated but with (i) kernels pointing to either side 1 or 2; and in a further run (ii) any of sides 1, 2, or 3; being considered decayed. This allows a graph to be drawn comparing all three sets of results.

The same approach is used in the Utah Education network's 'Radioactive Decay' activity, which specifies the use of a shoe box.

A site called 'Chegg' specified "a square box is filled with 100 popcorn kernels". and asked "What alteration in the experimental design would dramatically change the results? Why?" But, sadly, I needed to subscribe to see the answer.

The 'Lesson Planet' site offers 'Nuclear Popcorn' where "Using popcorn kernels spread over a tabletop, participants pick up all of those that point toward the back of the room, that is, those that represent decayed atoms".

'Anonymous' was set a version of this activity, but could not "seem to figure it out". 'Jiskha Homework Help' (tag line: "Ask questions and get helpful responses") helpfully responded,

"You ought to have a better number than 'two units of shake time…'

Read off the graph, not the data table."

(For some reason this brought to mind my sixth form mathematics teacher imploring us in desperation to "look at the ruddy diagram!")


4 Consider the challenge of developing this model to simulate nuclear magnetic resonance or laser excitation!


Just two things

[Science] fiction reflecting life


Keith S. Taber


I imagine the physicist Henri Poincaré was entirely serious when he suggested,

"the principle of relative motion, which forces itself upon us for two reasons:

first, the commonest experience confirms it, and

second, the contrary hypothesis is singularly repugnant to the mind."

Henri Poincaré (mathematician, physicist, philosopher)

Perhaps Poincaré was reflecting how two opposing schools of philosophical thought had disagreed on wherever the primary source of human knowledge was experience (the empiricists) or pure reasoning (the rationalists), but elsewhere in the same text Poincairé (1902/1913/2015) dismisses the idea that the laws of physics can be obtained by simple reflection on human intuitions. Such intuitions can lead us astray.

If he is being consistent then, surely "the contrary hypothesis is [only] singularly repugnant to the mind" because "the commonest experience confirms…the principle of relative motion". That is, suggestions that are clearly contrary to our common experience – such as, perhaps, the earth is moving? – are readily rejected as being nonsensical and ridiculous.

If that is so, then Poincaré was not really offering two independent lines of argument as his second reason was dependent upon his first.

This put me in mind of some comments of Kryten, a character in the sci-fi series 'Red Drawf',

{responding to a crew suggestion "Why don't we drop the defensive shields?"}

"A superlative suggestion, sir, with just two minor flaws.

One, we don't have any defensive shields, and

two, we don't have any defensive shields.

Now I realise that, technically speaking, that's only one flaw but I thought it was such a big one it was worth mentioning twice."

Kryten (mechanoid assigned to the mining spaceship Red Dwarf)

or alternatively,

{responding to the crew suggestion "I got it! We laser our way through [the 53 doors from here to the science deck]!"}

Ah, an excellent plan, sir, with only two minor drawbacks.

One, we don't have a power source for the lasers; and

two, we don't have any lasers.

Kryten


The principle of relative motion

What Poincairé meant by 'the principle of relative motion' was that

"The motion of any system must obey the same laws, whether it be referred to fixed axes, or to moveable axes carried along in a rectilinear and uniform motion."

the principle of relative motion

In other words, imagine a train passing a station at 10 ms-1, in which a naughty physics student throws a pencil eraser of mass m with a force of F at another passenger sitting in front on him; while a model physics student observes this from the stationary station [sic] platform.

The student on the train would consider the eraser to be at rest before being thrown, and can explore its motion by taking u=0 ms-1 and applying some laws summarised by

  • F=ma,
  • v=u+at,
  • v2=u2+2as,
  • s=ut +1/2at2

From the frame or reference of someone in the the station it is the train that moves,
(Image by StockSnap from Pixabay)
but…

…From the frame of reference of the train (or tram), it seems to be the rest of the world that is moving past
(Image by Pasi Mämmelä from Pixabay)

The student on the platform would observe the eraser to initially be moving at 10 ms-1, but could calculate what would happen using the same set of equations, but taking u=10 ms-1

Any values of v calculated would be consistent across the two frames (when allowing for the 10 ms-1 discrepancy) and other values calculated (s, t) would be the same.

This reflects the relativity principle of Galileo which suggests that there is no absolute way of determining whether a body is moving at constant velocity or stationary: rather what appears to be the case depends on one's frame of reference.

We might think that obviously it is the platform which is really stationary, as our intuition is that the earth under our feet is stationary ground. Surely we could tell if the ground moves?

We can directly feel acceleration, and we can sometimes feel the resistance to motion (the air on our face if we cycle, even at a constant velocity), but the idea that we can directly tell whether or not we are moving is an alternative conception.

For centuries the idea of a moving earth was largely considered ridiculous as experience clearly indicated otherwise. But if someone was kidnapped whilst asleep (please note, this would be illegal and is not being encouraged) and awoke in a carriage that had been set up to look like a hotel bedroom, on a train moving with constant velocity, they would not feel they were in motion. Indeed anyone who as travelled on a train at night when nothing is visible outside the carriage might well have experienced the impression that the train is stationary whilst it moves at a steady rate.

Science has shown us that there are good reasons to think that the earth is spinning, and orbiting the sun, as part of the solar system which moves through the galaxy, so who is to say what is really stationary? We cannot tell (and the question may be meaningless).



Who is to say what is moving – we can only make relative judgements?
(Image by Drajt from Pixabay)

Source cited:
  • Poincaré, H. (1902/1913/2015). Science and Hypothesis (G. B. Halstead, Trans.). In The Foundations of Science. Cambridge University Press. {I give three dates because Poincaré published his book in French in 1902, and it was later published in an English translation in 1913, but I have a 2015 edition.}

My work in the field of catalysis

Another predatory conference?


Keith S. Taber


Dear Programme Manager

Thank you for your message on behalf of the scientific committee offering me the position of invited speaker at 12th Edition of Global Conference on Catalysis, Chemical Engineering & Technology.

I appreciate that your scientific committee comprises of eminent leaders in the field of catalysis, but when you write that "By going through your work in the field of Catalysis, our scientific committee would like to offer you the position of Speaker" I am at a loss to work out what

  • Stanislaw Dzwigaj
  • Jose C Conesa
  • Anne M Gaffney
  • Nikolaos C Kokkinos
  • Dmitry Nikushchenko
  • M A Martin Luengo
  • Osman Adiguzel
  • Ahmet Haxhiaj
  • Eugenio Meloni
  • Ramesh C Gupta
  • Abdelkrim Abourriche

have found in my work that makes them feel my it would be of any particular interest to your delegates.

Perhaps you would be kind enough to ask the scientific committee to specify which of my publications they consider to be in the field of catalysis, so I have some idea what I am being invited to speak about.

I assume that as an invited speaker all relevant fees would be waived?

I am afraid that otherwise I will just have to conclude that this is yet another dishonest approach from a predatory conference where 'invited speaker' invitations are of no worth and are issued indiscriminately as a ploy to elicit money from potential speakers: without any regard at all for their suitability or relevance – as long as they can pay you the conference fees.

As you "would be glad to answer any questions [I] may have and provide necessary clarifications where needed" I look forward to your clarification so I can put my mind to rest and avoid concluding that this invitation is just another scam.

Best wishes

Keith

[Email response to the conference (copied to committee members). Clarifications awaited*]


The scientific committee of a catalysis conference has, allegedly, invited me to speak on the topic.

According to the conference programme manager, this committee of experts invited me to speak after 'going through' my (non-existent) 'work in the field of Catalysis'.
Are they incompetent? (I very much doubt that.)
Did the programme manager mishear 'Benjamin List' or 'David MacMillan' as 'Keith Taber'?
Or
Is this just another lie to publicise a predatory conference?



* Update: A clarification

To be fair to 12th Edition of Global Conference on Catalysis, Chemical Engineering & Technology I have today (20th July) received a response. I have been informed that:

"We went through your books and articles regarding teachings in chemistry and concepts and thought to invite you to our event, as most of the delegates who attend our event are from academia"

That sounds reasonable enough, as long as there is a suitable place in the programme.

"As an invited speaker, there are no registration charges to be paid"

Again, that is reasonable.

It is one thing to pay to be present at a conference you are seeking to attend, but another to pay for the privileged of giving a talk when you have been invited to speak.

"But you can present on any of the topics related to scientific sessions"

Okay, so where would a talk to 'mostly academics' about 'teachings in chemistry and concepts' fit?

The conference sessions are on:

  • Catalysis and Porous Materials
  • Catalysis for Energy
  • Chemical Engineering
  • Heterogeneous Catalysis
  • Catalysis in Nanotechnology
  • Environmental Catalysis
  • Catalytic Materials
  • Fluid Mechanics
  • Chemical Synthesis and Catalysts Synthesis
  • Macrocyclic and Supramolecular chemistry
  • Petrochemical Engineering
  • Green and Sustainable Chemistry
  • Catalysis for Renewable Sources
  • Catalysis for Biorefineries
  • Chemical Kinetics and Catalytic Activity
  • Photochemistry, Photobiology and Electrochemistry

So no obvious home for a talk on teaching about chemical concepts.

The topics I was directed to in the email were

  • Catalysis and Porous Materials
  • Catalysis for Energy
  • Photochemistry, Photobiology and Electrochemistry
  • Catalysis for Renewable Sources
  • Chemical Kinetics and Catalytic Activity
  • Catalysis and Applications
  • Homogeneous Catalysis, Molecular Catalysis
  • Catalysis for Biorefineries
  • Chemical Engineering
  • Heterogeneous Catalysis
  • Advances in Catalysis and Chemical Engineering
  • Reaction Chemistry and Engineering
  • Catalysis in Nanotechnology
  • Industrial Catalysis and Process Engineering
  • Environmental Catalysis
  • Advanced synthesis, Catalytic systems and new catalyst designing
  • Biocatalysis and Biotransformation
  • Catalytic Materials
  • Organometallics, Organocatalysis and Bioinorganic Chemistry
  • Surface Chemistry: Colloid and Surface aspects
  • Computational Catalysis
  • Enantioselective catalysis
  • Chemical Synthesis and Catalysts Synthesis
  • Fluid Mechanics
  • Micro-emulsion Catalysis and Catalytic Cracking
  • Macrocyclic and Supramolecular chemistry
  • Integrated Catalysis
  • Plasma Catalysis
  • Enzymes, Coenzymes and Metabolic Pathways
  • Nuclear Chemistry/Radiochemistry
  • Separation Processes in Chemical Technology
  • Petrochemical Engineering
  • Green and Sustainable Chemistry
  • Analytical Methodologies
  • Microbial Technology
  • Mechanisms of Microbial Transcription

So, I have been invited because of my expertise relating to teaching chemical concepts (one of the very few areas where I really might be considered to have some kind of expertise), and can participate for free, as long as I submit a talk on some aspect of the science of chemical catalysis in a session about some sub-field of chemistry relating to catalysis.

This is like writing to Reece James to tell him that on the basis of his exceptional skills as a footballer, he is invited to talk at a literary festival on any any genre of fiction writing; or, on the basis of her song-writing and musical achievements, inviting Kate Bush to be give a keynote at a history conference – and allowing her to choose between speaking about Roman Britain, The Agricultural Revolution, Europe between the 'World Wars', or Sino-Japanese tensions over Korea in the nineteenth century.

So, I recognise the attempt to make good on the invitation, but hardly a total 'save'.



Addendum: A glut of catalytic conferences?



By coincidence, or otherwise, today I also received an invitation to be a speaker at the

"?3rd Global Congress on Chemistry and Catalysis?, an event hosted by Phronesis LLC and held at Dubai, UAE during November 18-19, 2022 [where] The main theme of the conference is ?Contemporary Advances and Innovations in chemistry and catalysis?"

I would apparently be a 'perfect person' to speak at one the sessions. These are on:

  • Materials Science and Engineering
  • Advanced Structural Materials
  • Ceramics, Polymers and Composite Materials
  • Advances in Biosensors, Biomaterials, Medical devices and Soft Materials
  • Corrosion, Alloys, Mining and Metallurgy
  • Hybrid Materials and Bioinspired Materials
  • Materials in Nuclear Energy Science and Engineering
  • Energy, Environment and Materials Technology
  • Computational Materials Science
  • 3D Printing Technology
  • Materials Synthesis And Processing
  • Functional materials, Metals, and Metal Casting Technology
  • Emerging Smart Materials, Meta Materials and Smart Coatings
  • Materials Chemistry, Sustainable Chemistry and Materials Physics
  • Polymer Science and Polymeric Materials
  • Nanoscience and Nanotechnology
  • Optics Photonics Electronic and Magnetic Materials
  • Glass Science and Technologies
  • Nanotechnology in Materials Science
  • Nanotechnology for Energy and the Environment
  • Nanomaterials and 2D Materials
  • Carbon Nanomaterials, Nanostructures and Nanocomposites
  • Graphene Technologies and carbon Nanotubes
  • Manufacturing Technology and Instrumentation Technology
  • Materials for Energy and the Environment
  • Nanotechnology in Healthcare and its Applications

Hm. Perhaps I am not quite the 'perfect person', after all?