Lack of control in educational research

Getting that sinking feeling on reading published studies


Keith S. Taber


this is like finding that, after a period of watering plant A, it is taller than plant B – when you did not think to check how tall the two plants were before you started watering plant A

Research on prelabs

I was looking for studies which explored the effectiveness of 'prelabs', activities which students are given before entering the laboratory to make sure they are prepared for practical work, and can therefore use their time effectively in the lab. There is much research suggesting that students often learn little from science practical work, in part because of cognitive overload – that is, learners can be so occupied with dealing with the apparatus and materials they have little capacity left to think about the purpose and significance of the work. 1


Okay, so is THIS the pipette?
(Image by PublicDomainPictures from Pixabay)

Approaching a practical work session having already spent time engaging with its purpose and associated theories/models, and already having become familiar with the processes to be followed, should mean students enter the laboratory much better prepared to use their time efficiently, and much better informed to reflect on the wider theoretical context of the work.

I found a Swedish paper (Winberg & Berg, 2007) reporting a pair of studies that tested this idea by using a simulation as a prelab activity for undergraduates about to engage with an acid-base titration. The researchers tested this innovation by comparisons between students who completed the prelab before the titration, and those who did not.

The work used two basic measures:

  • types (sophistication) of questions asked by students during the lab. session
  • elicitation of knowledge in interviews after the laboratory activity

The authors found some differences (between those who had completed the prelab and those that had not) in the sophistication of the questions students asked, and in the quality of the knowledge elicited. They used inferential statistics to suggest at least some of the differences found were statistically significant. From my reading of the paper, these claims were not justified.

A peer reviewed journal (no, really, this time)

This is a paper in a well respected journal (not one of the predatory journals I have often discussed on this site). The Journal of Research in Science Teaching is published by Wiley (a major respected publisher of academic material) and is the official journal of NARST (which used to stand for the National Association for Research in Science Teaching – where 'national' referred to the USA 2). This is a journal that does take peer review very seriously.

The paper is well-written and well-structured. Winberg and Berg set out a conceptual framework for the research that includes a discussion of previous relevant studies. They adopt a theoretical framework based on the Perry's model of intellectual development (Taber, 2020). There is considerable detail of how data was collected and analysed. This account is well-argued. (But, you, dear reader, can surely sense a 'but' coming.)

Experimental research into experimental work?

The authors do not seem to explicitly describe their research as an experiment as such (as opposed to adopting some other kind of research strategy such as survey or case study), but the word 'experiment' and variations of it appear in the paper.

For one thing, the authors refer to students' practical work as being experiments,

"Laboratory exercises, especially in higher education contexts, often involve training in several different manipulative skills as well as a high information flow, such as from manuals, instructors, output from the experimental equipment, and so forth. If students do not have prior experiences that help them to sort out significant information or reduce the cognitive effort required to understand what is happening in the experiment, they tend to rely on working strategies that help them simply to cope with the situation; for example, focusing only on issues that are of immediate importance to obtain data for later analysis and reflective thought…"

Winberg & Berg, 2007

Now, some student practical work is experimental, where a student is actively looking to see what happens when they manipulate some variable to test a hypothesis. This type of practical work is sometimes labelled enquiry (or inquiry in US spelling). But a lot of school and university laboratory work, however, is undertaken to learn techniques, or (probably more often) to support the learning of taught theory – where it is usually important the learners know what is meant to happen before they begin the laboratory activity.

Winberg and Berg refer to the 'laboratory exercise' as 'the experiment' as though any laboratory work counts as an experiment. In Winberg and Berg's research, students were asked about their "own [titration] experiment", despite the prelab material involving a simulation of the titration process, in advance of which "the theoretical concepts, ideas, and procedures addressed in the simulation exercise had been treated mainly quantitatively during the preceding 1-week instructional sequence". So, the laboratory titration exercise does not seem to be an experiment in the scientific sense of the term.

School children commonly describe all practical work in the lab as 'doing experiments'. It cannot help students learn what an experiment really is when the word 'experiment' has two quite distinct meanings in the science classroom:

We might describe this second meaning as an alternative conception of 'experiment', a way of understanding that is inconsistent with the scientific meaning. (Just as there are common alternative conceptions of other 'nature of science' concepts such as 'theory').

I would imagine Winberg and Berg were well aware of what an experiment is, although their casual use of language might suggest a lack of rigour in thinking with the term. They refer to having "both control and experiment groups" in their studies, and refer to "the experimental chronology" of their research design. So, they certainly seem to think of their work as a kind of experiment.

Experimental design

In a true experiment, a sample is randomly drawn from a population of interest (say, first year undergraduate chemistry students; or, perhaps, first year undergraduate chemistry students attending Swedish Universities, or… 3) and assigned randomly to the conditions being compared. Providing a genuine form of random assignment is used, then inferential statistical tests can guide on whether any differences found between groups at the end of an experiment should be considered statistically significant. 4

"Statistics can only indicate how likely a measured result would occur by chance (as randomisation of units of analysis to different treatments can only make uneven group composition unlikely, not impossible)…Randomisation cannot ensure equivalence between groups (even if it makes any imbalance just as likely to advantage either condition)"

Taber, 2019, p.73

Inferential statistics can be used to test for statistical significance in experiments – as long as the 'units of analysis' (e.g., students) are randomly assigned to the experimental and control conditions.
(Figure from Taber, 2019)

That is, if the are difference that the stats. tests suggests are very unlikely to happen by chance, then they are very unlikely to be due to an initial difference between the groups in the two conditions as long as the groups were the result of random assignment. But that is a very important proviso.

There are two aspects to this need for randomisation:

  • to be able to suggest any differences found reflect the effects of the intervention, then there should be random assignment to the two (or more) conditions
  • to be able to suggest the results reflect what would probably would be found in a wider population, the sample should be randomly selected from the population of interest 3

Studies in education seldom meet the requirements for being true experiments
(Figure from Taber, 2019)

In education, it is not always possible to use random assignment, so true experiments are then not possible. However, so-called 'quasi-experiments' may be possible where differences between the outcomes in different conditions may be understood as informative, as long as there is good reason to believe that even without random assignment, the groups assigned to the different conditions are equivalent.

In this specific research, that would mean having good reason to believe that without the intervention (the prelab):

  • students in both groups would have asked overall equivalent (in terms of the analysis undertaken in this study) questions in the lab.;
  • students in both groups would have been judged as displaying overall equivalent subject knowledge.

Often in research where a true experiment is not possible some kind of pre-testing is used to make a case for equivalence between groups.

Two control groups that were out of control

In Winberg and Berg's research there were two studies where comparisons were made between 'experimental' and 'control' conditions

StudyExperimentalControl
Study 1n=78: first-year students, following completion of their first chemistry course in 2001n=97: students who had been interviewed by the researchers during the same course in the previous year
Study 2n=21 (of 58 in cohort)n=37 (of 58 in same cohort)

In the first study, a comparison was made between the cohort where the innovation was introduced and a cohort from the previous year. All other things being equal, it seems likely these two cohorts were fairly similar. But in education all thing are seldom equal, so there is no assurance they were similar enough to be considered equivalent.

In the second study

"Students were divided into treatment (n = 21) and control (n = 37) groups. Distribution of students between the treatment and control groups was not controlled by the researchers".

Winberg & Berg, 2007

So, some factor(s) external to the researchers divided the cohort into two groups – and the reader is told nothing about the basis for this, nor even if the two groups were assigned to the treatments randomly.5 The authors report that the cohort "comprised prospective molecular biologists (31%), biologists (51%), geologists (7%), and students who did not follow any specific program (11%)", and so it is possible the division into two uneven sized groups was based on timetabling constraints with students attending chemistry labs sessions according to their availability based on specialism. But that is just a guess. (It is usually better when the reader of a research report is not left to speculate about procedures and constraints.)

What is important for a reader to note is that in these studies:

  • the researchers were not able to assign learners to conditions randomly;
  • nor were the researchers able to offer any evidence of equivalence between groups (such as near identical pre-test scores);
  • so, the requirements for inferring significance from statistical tests were not met;
  • so, claims in the paper about finding statistically significant differences between conditions cannot therefore be justified given the research design;
  • and therefore the conclusions presented in the paper are strictly not valid.

If students are not randomly assigned to conditions, then any statistically unlikely difference found at the end of an experiment cannot be assumed to be likely to be due to intervention, rather than some systematic initial difference between the groups.
(Figure adapted from Taber, 2019)


This is a shame, because this is in many ways an interesting paper, and much thought and care seems to have been taken about the collection and analysis of meaningful data. Yet, drawing conclusions from statistical tests comparing groups that might never have been similar in the first case is like finding that careful use of a vernier scale shows that after a period of watering plant A, plant A is taller than plant B – having been very careful to make sure plant A was watered regularly with carefully controlled volumes, while plant B was not watered at all – when you did not think to check how tall the two plants were before you started watering plant A.

In such a scenario we might be tempted to assume plant A has actually become taller because it had been watered; but that is just applying what we had conjectured should be the case, and we would be mistaking our expectations for experimental evidence.

Work cited:

Notes:

1 The part of the brain where we can consciously mentipulate ideas is called the working memory (WM). Research suggests that WM has a very limited capacity in the sense that people can only hold in mind a very small number of different things at once. (These 'things' however are somewhat subjective – a complex idea that is treated as a single 'thing' in the WM of an expert can overload a novice.) This limit to ~WM is considered to be one of the most substantial constraints on effective classroom learning. This is also, then, one of the key research findings informing the design of effective teaching.

Read about working memory

Read about key ideas for teaching in accordance with learning theory

How fat is your memory? – read about a chemical analogy for working memory


2 The organisation has seemingly spotted that the USA is only one part of the world, and now describes itself as a global organisation for improving science education through research.


3 There is no reason why an experiment cannot be carried out on a very specific population, such as first year undergraduate chemistry students attending a specific Swedish University such a, say, Umea ̊ University. However, if researchers intend their study to have results generalisable beyond their specific research contexts (say, to first year undergraduate chemistry students attending any Swedish University) then it is important to have a representative sample of that population.

Read about populations of interest in research

Read about generalisation from research studies


4 It might be assumed that scientists, and researchers know what is meant by random, and how to undertake random assignment. Sadly, the literature suggests that in practice the term 'randomly' is sometimes used in research reports to mean something like 'arbitrarily' (Taber, 2013), which fills short of being random.

Read about randomisation in research


5 Arguably, even if the two groups were assigned randomly, there is only one 'unit of analysis' in each condition, as they were assigned as groups. That is, for statistical purposes, the two groups have size n=1 and n=1, which would not allow statistical significance to be found: e.g, see 'Quasi-experiment or crazy experiment?'

Author: Keith

Former school and college science teacher, teacher educator, research supervisor, and research methods lecturer. Emeritus Professor of Science Education at the University of Cambridge.

Leave a Reply

Your email address will not be published. Required fields are marked *