Getting that sinking feeling on reading published studies
Keith S. Taber
Research on prelabs
I was looking for studies which explored the effectiveness of 'prelabs', activities which students are given before entering the laboratory to make sure they are prepared for practical work, and can therefore use their time effectively in the lab. There is much research suggesting that students often learn little from science practical work, in part because of cognitive overload – that is, learners can be so occupied with dealing with the apparatus and materials they have little capacity left to think about the purpose and significance of the work. 1
Approaching a practical work session having already spent time engaging with its purpose and associated theories/models, and already having become familiar with the processes to be followed, should mean students enter the laboratory much better prepared to use their time efficiently, and much better informed to reflect on the wider theoretical context of the work.
I found a Swedish paper (Winberg & Berg, 2007) reporting a pair of studies that tested this idea by using a simulation as a prelab activity for undergraduates about to engage with an acid-base titration. The researchers tested this innovation by comparisons between students who completed the prelab before the titration, and those who did not.
The work used two basic measures:
- types (sophistication) of questions asked by students during the lab. session
- elicitation of knowledge in interviews after the laboratory activity
The authors found some differences (between those who had completed the prelab and those that had not) in the sophistication of the questions students asked, and in the quality of the knowledge elicited. They used inferential statistics to suggest at least some of the differences found were statistically significant. From my reading of the paper, these claims were not justified.
A peer reviewed journal (no, really, this time)
This is a paper in a well respected journal (not one of the predatory journals I have often discussed on this site). The Journal of Research in Science Teaching is published by Wiley (a major respected publisher of academic material) and is the official journal of NARST (which used to stand for the National Association for Research in Science Teaching – where 'national' referred to the USA 2). This is a journal that does take peer review very seriously.
The paper is well-written and well-structured. Winberg and Berg set out a conceptual framework for the research that includes a discussion of previous relevant studies. They adopt a theoretical framework based on the Perry's model of intellectual development (Taber, 2020). There is considerable detail of how data was collected and analysed. This account is well-argued. (But, you, dear reader, can surely sense a 'but' coming.)
Experimental research into experimental work?
The authors do not seem to explicitly describe their research as an experiment as such (as opposed to adopting some other kind of research strategy such as survey or case study), but the word 'experiment' and variations of it appear in the paper.
For one thing, the authors refer to students' practical work as being experiments,
"Laboratory exercises, especially in higher education contexts, often involve training in several different manipulative skills as well as a high information flow, such as from manuals, instructors, output from the experimental equipment, and so forth. If students do not have prior experiences that help them to sort out significant information or reduce the cognitive effort required to understand what is happening in the experiment, they tend to rely on working strategies that help them simply to cope with the situation; for example, focusing only on issues that are of immediate importance to obtain data for later analysis and reflective thought…"
Winberg & Berg, 2007
Now, some student practical work is experimental, where a student is actively looking to see what happens when they manipulate some variable to test a hypothesis. This type of practical work is sometimes labelled enquiry (or inquiry in US spelling). But a lot of school and university laboratory work, however, is undertaken to learn techniques, or (probably more often) to support the learning of taught theory – where it is usually important the learners know what is meant to happen before they begin the laboratory activity.
Winberg and Berg refer to the 'laboratory exercise' as 'the experiment' as though any laboratory work counts as an experiment. In Winberg and Berg's research, students were asked about their "own [titration] experiment", despite the prelab material involving a simulation of the titration process, in advance of which "the theoretical concepts, ideas, and procedures addressed in the simulation exercise had been treated mainly quantitatively during the preceding 1-week instructional sequence". So, the laboratory titration exercise does not seem to be an experiment in the scientific sense of the term.
School children commonly describe all practical work in the lab as 'doing experiments'. It cannot help students learn what an experiment really is when the word 'experiment' has two quite distinct meanings in the science classroom:
- experiment(technical) = an empirical test of a hypothesis involving the careful control of variables and observation of the effect on a specified (hypothetised as) dependent variable of changing the variable specified as the independent variable
- experiment(casual) = absolutely any practical activity carried out with laboratory equipment
We might describe this second meaning as an alternative conception of 'experiment', a way of understanding that is inconsistent with the scientific meaning. (Just as there are common alternative conceptions of other 'nature of science' concepts such as 'theory').
I would imagine Winberg and Berg were well aware of what an experiment is, although their casual use of language might suggest a lack of rigour in thinking with the term. They refer to having "both control and experiment groups" in their studies, and refer to "the experimental chronology" of their research design. So, they certainly seem to think of their work as a kind of experiment.
Experimental design
In a true experiment, a sample is randomly drawn from a population of interest (say, first year undergraduate chemistry students; or, perhaps, first year undergraduate chemistry students attending Swedish Universities, or… 3) and assigned randomly to the conditions being compared. Providing a genuine form of random assignment is used, then inferential statistical tests can guide on whether any differences found between groups at the end of an experiment should be considered statistically significant. 4
"Statistics can only indicate how likely a measured result would occur by chance (as randomisation of units of analysis to different treatments can only make uneven group composition unlikely, not impossible)…Randomisation cannot ensure equivalence between groups (even if it makes any imbalance just as likely to advantage either condition)"
Taber, 2019, p.73
That is, if the are difference that the stats. tests suggests are very unlikely to happen by chance, then they are very unlikely to be due to an initial difference between the groups in the two conditions as long as the groups were the result of random assignment. But that is a very important proviso.
There are two aspects to this need for randomisation:
- to be able to suggest any differences found reflect the effects of the intervention, then there should be random assignment to the two (or more) conditions
- to be able to suggest the results reflect what would probably would be found in a wider population, the sample should be randomly selected from the population of interest 3
In education, it is not always possible to use random assignment, so true experiments are then not possible. However, so-called 'quasi-experiments' may be possible where differences between the outcomes in different conditions may be understood as informative, as long as there is good reason to believe that even without random assignment, the groups assigned to the different conditions are equivalent.
In this specific research, that would mean having good reason to believe that without the intervention (the prelab):
- students in both groups would have asked overall equivalent (in terms of the analysis undertaken in this study) questions in the lab.;
- students in both groups would have been judged as displaying overall equivalent subject knowledge.
Often in research where a true experiment is not possible some kind of pre-testing is used to make a case for equivalence between groups.
Two control groups that were out of control
In Winberg and Berg's research there were two studies where comparisons were made between 'experimental' and 'control' conditions
Study | Experimental | Control |
Study 1 | n=78: first-year students, following completion of their first chemistry course in 2001 | n=97: students who had been interviewed by the researchers during the same course in the previous year |
Study 2 | n=21 (of 58 in cohort) | n=37 (of 58 in same cohort) |
In the first study, a comparison was made between the cohort where the innovation was introduced and a cohort from the previous year. All other things being equal, it seems likely these two cohorts were fairly similar. But in education all thing are seldom equal, so there is no assurance they were similar enough to be considered equivalent.
In the second study
"Students were divided into treatment (n = 21) and control (n = 37) groups. Distribution of students between the treatment and control groups was not controlled by the researchers".
Winberg & Berg, 2007
So, some factor(s) external to the researchers divided the cohort into two groups – and the reader is told nothing about the basis for this, nor even if the two groups were assigned to the treatments randomly.5 The authors report that the cohort "comprised prospective molecular biologists (31%), biologists (51%), geologists (7%), and students who did not follow any specific program (11%)", and so it is possible the division into two uneven sized groups was based on timetabling constraints with students attending chemistry labs sessions according to their availability based on specialism. But that is just a guess. (It is usually better when the reader of a research report is not left to speculate about procedures and constraints.)
What is important for a reader to note is that in these studies:
- the researchers were not able to assign learners to conditions randomly;
- nor were the researchers able to offer any evidence of equivalence between groups (such as near identical pre-test scores);
- so, the requirements for inferring significance from statistical tests were not met;
- so, claims in the paper about finding statistically significant differences between conditions cannot therefore be justified given the research design;
- and therefore the conclusions presented in the paper are strictly not valid.
This is a shame, because this is in many ways an interesting paper, and much thought and care seems to have been taken about the collection and analysis of meaningful data. Yet, drawing conclusions from statistical tests comparing groups that might never have been similar in the first case is like finding that careful use of a vernier scale shows that after a period of watering plant A, plant A is taller than plant B – having been very careful to make sure plant A was watered regularly with carefully controlled volumes, while plant B was not watered at all – when you did not think to check how tall the two plants were before you started watering plant A.
In such a scenario we might be tempted to assume plant A has actually become taller because it had been watered; but that is just applying what we had conjectured should be the case, and we would be mistaking our expectations for experimental evidence.
Work cited:
- Taber, K. S. (2013). Non-random thoughts about research. Chemistry Education Research and Practice, 14(4), 359-362. doi:10.1039/c3rp90009f
- Taber, K. S. (2019). Experimental research into teaching innovations: responding to methodological and ethical challenges. Studies in Science Education, 55(1), 69-119. doi:10.1080/03057267.2019.1658058 [Download the manuscript version of this paper].
- Taber, K. S. (2020). Developing intellectual sophistication and scientific thinking – The schemes of William G. Perry and Deanna Kuhn. In B. Akpan & T. Kennedy (Eds.), Science Education in Theory and Practice: An introductory guide to learning theory (pp. 209-223). Springer.
- Winberg, T. M., & Berg, C. A. R. (2007). Students' cognitive focus during a chemistry laboratory exercise: Effects of a computer-simulated prelab. Journal of Research in Science Teaching, 44(8), 1108-1133. https://doi.org/https://doi.org/10.1002/tea.20217
Notes:
1 The part of the brain where we can consciously mentipulate ideas is called the working memory (WM). Research suggests that WM has a very limited capacity in the sense that people can only hold in mind a very small number of different things at once. (These 'things' however are somewhat subjective – a complex idea that is treated as a single 'thing' in the WM of an expert can overload a novice.) This limit to ~WM is considered to be one of the most substantial constraints on effective classroom learning. This is also, then, one of the key research findings informing the design of effective teaching.
Read about key ideas for teaching in accordance with learning theory
How fat is your memory? – read about a chemical analogy for working memory
2 The organisation has seemingly spotted that the USA is only one part of the world, and now describes itself as a global organisation for improving science education through research.
3 There is no reason why an experiment cannot be carried out on a very specific population, such as first year undergraduate chemistry students attending a specific Swedish University such a, say, Umea ̊ University. However, if researchers intend their study to have results generalisable beyond their specific research contexts (say, to first year undergraduate chemistry students attending any Swedish University) then it is important to have a representative sample of that population.
Read about populations of interest in research
Read about generalisation from research studies
4 It might be assumed that scientists, and researchers know what is meant by random, and how to undertake random assignment. Sadly, the literature suggests that in practice the term 'randomly' is sometimes used in research reports to mean something like 'arbitrarily' (Taber, 2013), which fills short of being random.
Read about randomisation in research
5 Arguably, even if the two groups were assigned randomly, there is only one 'unit of analysis' in each condition, as they were assigned as groups. That is, for statistical purposes, the two groups have size n=1 and n=1, which would not allow statistical significance to be found: e.g, see 'Quasi-experiment or crazy experiment?'