The use of dubious comparison conditions in classroom research
A topic in 'Science & Ethics'
Educational research studies are used to inform teachers and others about such matters as effective pedagogy and resources and the implementation of curriculum. Decisions may be made which effect the classroom experiences and learning of learning.
But educational contexts are complexes sites for research, so much of the research undertaken in any one place has limited generalisation.
How likely is it that a study undertaken in one context would produce results which apply beyond that context?
- to another class in the same school?
- to the equivalent class next year?
- to the classroom of another teacher in the school?
- to the next school in the district?
- to a school in another country?
- to students of a different grade level?
- to students studying a different topic?
- to students learning in a different language?
- to student of a different 'ability range'?
- to students of different cultural backgrounds?
If your response is that it depends on what the study is focused on, then consider when you might assume results could transfer in these different scenarios.
Experiments
In the natural sciences, experiments may be used which control all but one factor of interest, to allow investigation of cause and effect. But control of variables-all variables that might reasonably have some effect-in education is seldom feasible.
Consider an experiment to find out if changing the textbook used for a class of 14 year old science students would improve learning outcomes. If one class were to study with the existing textbook, and another class were to study with the new textbook, what would need to be 'controlled' to be the same in the two classes for you to be confident that any difference in learning outcomes was due to the different textbooks, and not some other factor? How easy would it be to 'control' the two classes in this way, in practice?
Read about experiments in research
Inferential statistics
Some kinds of variation that cannot be controlled can be allowed for by the use of statistical methods. By using sufficient numbers of independent 'units of analysis' assigned to experimental conditions randomly, it is possible to test the outcomes of research by using statistical tests to see if differences are so unlikely that we can say they are statistically significant (which means not likely to be just due to chance effects).
Read about statistical testing in research
Consider the textbook example from above. Imagine that 1oo teachers of classes of 14 year old students volunteered to take part in the research – teachers from different schools, with different levels of amenities, in different parts of the country, with diverse socio-economic backgrounds; teachers of different levels of qualification, skill, experience, motivation, classroom rapport, and different teaching styles: if 50 teachers were assigned to each condition (here, textbook) randomly then probably the two resulting groups would result in a more-or-less similar profile. (Of course, random assignments sometimes, by chance, lead to very unequal distributions – but not very often.) This is the kind of approach needed for valid experiments.
Read about randomisation in research
This sort of approach can in principle be applied when individual learners are the 'units of analysis'. If there are fifty 14-year old students in a school, they could be assigned to two groups of 25 randomly. This can do nothing about the other variables, but can at least 'randomise' the pupils. For us to then use statistical tests, we need to be convinced these 'units of analysis' are independent. That is, the way each 'unit' (learner) is impacted by the assigned condition (here textbook) is not subject to any interference from other 'units' (learners). That means, that the 50 pupils are all learning from the assigned textbook without in any way influencing with each other in how they engage with and learn from the book.
Read about units of analysis in research
How reasonable is it to considerer the students in a class as learning independently form each other, and not influencing each other's learning? (Does this depend on the nature of the classroom?)
If the textbooks were only to be used for homework, and students were banned from helping each other with homework (and assuming they followed the rule), would that completely avoid the issue of student-student interaction in class contaminating their 'independence' as units of analysis in the research into textbook effectiveness? (Can we assume learning from using the textbook at home is something that does not interact with classroom experiences involving other learners?)
Quasi-experiments
Many 'experiments' that appear in the educational research literature do NOT randomise learners to experimental conditions (e.g., existing textbook versus new textbook, etc.) but work with existing classes. Schools are often prepared to allow research to occur in existing classes, but are less likely to want to rearrange existing student groups just to help researchers.
Such studies, those without the randomisation needed for a genuine experiment are often called 'quasi-experiments'. If learners have not been randomised, then it becomes completely irrelevant whether differences in outcomes are statistically significant as the statistical tests are supposedly comparing (i) any actual differences at the end of the study with (ii) what is likely to happen by chance due to the randomisation (and that has not occurred).
Imagine you had fifty Pound coins and fifty Euro coins and randomly assigned them to two jars, and then found than the value of the coins was much higher in one jar than in the other. In this situation, statistics can tell you how unlikely that outcome was. But if you just located two jars of coins, then found one jar contained fifty Pound coins, and the other fifty Euro coins; then knowing that would be extremely unlikely by chance simply may just suggest that most likely the coins already had some external ordering! Perhaps some tidy person sorted their coins into different jars.
Some schools randomly, or at least arbitrarily, assign new students to different classes in a year group so that the classes are supposed to be equivalent. But a class is not just a collection of learners – rather it forms its own character from all the interactions in the group (and between the students and their teachers). My own experience from teaching in schools, college and university is that such classes can actually be very different; just as subsequent groups of students on the same course, enrolled though the same admissions procedures, can vary considerably from year to year. *
If you are an experienced teacher who has worked with parallel groups of supposedly equivalent learners, or has taught the same course over successive years: have you also found that there can be considerable differences between nominally similar classes?
Quasi-experiments can have several teaching groups in each condition – but studies can get published when comparing just between two classes in the same school – or even sometimes two classes in neighbouring schools! It is common for such studies to use some kind of pre-test to support a suggestion that the classes are similar enough before the study to treat them as equivalent.
Often this only goes as far as showing there are no statistically significant differences at pre-test. As evidence of equivalence, that's a bit checking a job applicant does not have a criminal record, and so concluding they must be completely honest and never lie. (That might be the case, but…)
Imagine a quasi-experimental study where the two classes are tested with a relevant instrument before the intervention being applied. It is (very) unlikely the two groups would give precisely the same profile of responses. What if the differences between the average scores in the two groups was said to be non-significant because p>0.05 (perhaps p was 0.13). Does that convince you that the two groups can be considered initially equivalent?
If you responded 'yes', perhaps take a look at this page about testing for equivalence in research studies.
If, for good reasons (complexity of teaching and learning; practical constraints of working with schools, limited resources, etc.), a valid experiment is not possible; is it still useful to undertake an 'experimental' study despite knowing that statistical tests cannot strictly offer any guidance as to what to conclude from the results obtained?
Setting up control conditions
Ideally, in an experiment, the only thing that should differ between the conditions is the variable being examined. Let us consider that variable is the pedagogy being used in class. Pedagogies may be labelled by such terms as 'direct instruction', 'enquiry', 'constructivism' etc. (although most of these terms are used in different ways by different authors, so one always needs to check what they mean by these terms).
Specific teaching techniques may be the focus – such as jigsaw learning, or predict-observe-explain, or concept mapping for example.
Clearly if the experimental intervention is, say using concept mapping in learning, then the comparison class should be studying the same material, in the same amount of time, under as equivalent conditions as possible (classroom, time of day…) but not using concept mapping. Instead of concept mapping, some other relevant learning activity must be used. (Relevant: If instead of concept-mapping the other class were basket-weaving, we might think this an unfair test.)
Consider a study to test the effectiveness of using (say) concept mapping to support learning of metabolism in year 12 biology classes; where everything was done the same in both classes: EXCEPT where concept-mapping activities were used in the experimental class these were substituted by a different learning activity in the other class. If the study showed the learners on average acquired a greater understanding of metabolism in the concept mapping condition than in the control condition, and the mean difference was statistically significant, then does this show concept-mapping activities are more effective (for this grade and topic, at least with this teacher in this school) than other learning activities?
Does it change your view if you are told that in the comparison condition the substituted activities included group discussion work, building models, and designing posters with novel ways to represent metabolic information?
Does it change your view if you are told instead that in the comparison condition the substituted activities included copying passages from the board and textbook in silence, and then answering comprehension questions?
Is it possible to draw conclusions about teaching and learning conditions that were not included in the study, or only to compare the actual experimental and comparison conditions?
Comparison conditions: Some examples form practice
The following descriptions from published studies (discussed in Taber, 2019) set out the teaching and learning conditions for comparison ('control') classes that were set up to be test the effectiveness of pedagogies such as active-learning, advanced organisers, animations, concept cartoons, cooperative learning, enquiry-based instruction…
"In the control group, a teacher directed strategy representing the traditional approach was used . . . where students are completely passive. The teacher used direct teaching and question and answer methods… In this group, the teacher provided instruction through lecture and discussion methods to teach the concepts. The teacher… wrote notes on the chalkboard about the definition of concepts, and passed out worksheets for students to complete. The primary underlying principle was that knowledge takes the form of information that is transmitted to students."
"The control group was taught with a teacher-centered traditional didactic lecture format. Teaching strategies were dependent on teacher expression without consideration for student misconceptions…students were required to use their textbooks; students were passive participants and rarely asked questions; they did not benefit from the library or internet sources; activities such as computer animations or brainstorming were not used; generally the teacher wrote the concepts on the board and then explained them; students listened and took notes as the teacher lectured on the content."
". . .the control group was taught…using teacher-centred traditional didactic lecture format. Teaching strategies were dependent on teacher expression. The students were required to use their textbooks …there are [sic, were] not any student centred active activities [that] depend on constructivism. Students were passive participants during the lessons and they only listened and took notes as the teacher lectured on the content"
"Students in CG were instructed by lecturing method, discussion and sometimes students performed the laboratory activities in that students were passive listeners and teacher's role was to transmit the facts and concepts to the students…Teacher did not give emphasis on students' misconceptions. Students were passive listeners and they were taking notes. In the laboratory activity section, students were required to do experiment by using the handout… like 'cookbook', described the all steps of the experiment"
"….teacher-centered instruction, [where] learning focuses on the mastery of content, with little development of the skills and attitudes necessary for scientific inquiry. The teacher transmits information to students, who receive and memorize it… The curriculum is loaded with many facts and a large number of vocabulary words, which encourages a lecture format of teaching…the control group were instructed via teacher-centered didactic lecture format…The students were instructed with regular chemistry textbooks. They listened to the teacher carefully, took notes and solved algorithmic problems"
"…was taught using the lesson plan based on…the conventional teaching method, which was commonly practised in that school…in which the teacher dominants [sic], whereas the learners remain passive"
"traditional instruction which relied on instructors' explanations with no consideration of the students' misconceptions. The instructor used overhead projector to show the definitions of concepts, explained the facts, solved the questions, meanwhile students took notes through the lessons"
"….using chalk and talk method as commonly known name, the traditional method"
In each of these studies the researchers reported that the learners in the experimental group (doing some sort of active, constructivist, engaging learning) had on average better learning outcomes than the learners in the comparison groups.
What do you think can usefully be learned from such studies?
Given that in these kinds of 'rhetorical experiments' the researchers claim to know in advance that the type of teaching/learning they are testing in the experimental condition, (a) is strongly expected to be effective from theoretical considerations; (b) has repeatedly been found to be effective in other contexts – across various locations, topics, age groups, etc.; how informative are such additional studies?
Given that such studies use up valuable resources, and require some degree of disruption to normal schooling, are they a good choice of focus for research? Are they an ethical focus for research?
Given that in these kinds of 'rhetorical experiments' the researchers claim to know in advance that learning does not occur by information simply being transferred from the teacher, and that passive learners are not effective learners; are the comparison conditions used a sensible basis for judging the effectiveness of a teaching approach or learning activity? Is it ethical to set up a research study involving schools, teachers and children when the researchers do not have a genuine enquiry question, but are trying to demonstrate something they think they already know?
Is it ethical to set up studies that require teachers to teach in ways that the researchers already know are ineffective, just to ensure a research study gives a positive outcome? (Does it make any difference to your judgement if the class would normally be taught as passive learners in that classroom anyway?)
There is little point in experiments which compare learning with widely demonstrated effective teaching-learning approaches, against teaching that is deliberately constrained to what is known as being of limited effectiveness (i.e., yet another study suggesting that an engaging, active learning session brings about more learning that an hour listening to the teacher or copying from the textbook). This does not mean such small-scale school-based experimental studies could not be useful if (they met the conditions for genuine experiments, and) they we designed to answer genuine questions – such as comparing between different approaches which are both expected to be effective. This is discussed in Chemical Pedagogy, pp.83-90 (Taber, 2024).
Read more about such rhetorical experiments, here
* To read about one published study that claimed an effective teaching innovation on the basis of the mean class score rising from 79.8 for one year's students to 79.9 for the next cohort, please see Falsifying research conclusions: You do not need to falsify your results if you are happy to draw conclusions contrary to the outcome of your data analysis.
Work cited:
- Taber, K. S. (2019). Experimental research into teaching innovations: responding to methodological and ethical challenges. Studies in Science Education, 55(1), 69-119. https://doi.org/10.1080/03057267.2019.1658058 [download paper]
- Taber, K. S. (2024). Chemical Pedagogy. Instructional Approaches and Teaching Techniques in Chemistry. Royal Society of Chemistry.
