assessment - Science-Education-Research

A concept cartoon to explore learner thinking

Keith S. Taber

I have designed a simple concept cartoon. Concept cartoons are used in teaching, usually as an introductory activity to elicit students' ideas about a topic before proceeding to develop the scientific account. This can be seen as 'diagnostic assessment' or just part of good pedagogy when teaching topics where learners are likely to have alternative conceptions. (So, in science teaching, that means just about any topic!)

Read about concept cartoons

But I am retired and no longer teach classes, so why am I spending my time preparing teaching resources?

Well, I was writing about dialogic teaching, and so devised an outline lesson plan to illustrate what dialogic teaching might look like. The introductory activity was to be a concept cartoon, so I thought I should specify what it might contain – and so then I thought it would help a reader if I actually mocked up the cartoon so it would be clear what I was writing about. That led to:

A concept cartoon provides learners with several competing ideas to discuss (This can be downloaded below)

What happens, and why?

In my concept cartoon the focal question is what will happen when some NaCl is added to water – and why? This is a concept cartoon because there are several characters offering competing ideas to act as foci for learners to discuss and explore. Of course, it is possible to ask learners to engage with a cartoon individually, but they are intended to initiate dialogue between learners. So by talking together learners will each have an audience to ask them to clarify, and to challenge, their thinking and to ensure they try to explain their reasoning.

Of course, there is flexibility in how they can be used. A teacher could ask students to consider the cartoon individually, before moving to small group discussions or whole class discussion work. (It is also possible to move from individual work to pairing up, to forming groups from two pairs, to the teacher then collating ideas from different groups.) During this stage of activity the intention is to let student make their thinking explicit and to consider and compare different views.

Of course, this is a prelude to the teacher persuading everyone in the class of the right answer, and why it is the right answer. Concept cartoons are used where we know student thinking is likely to make that stage more than trivial. Where learners do already have well-entrenched conceptions at odds with the scientific models, we know simply telling them the target curriculum account is unlikely to lead to long-term shifts in their thinking.

And even if they do not, they will be more likely to appreciate, and later recall, the scientific account if the ground is prepared in this way by engaging students with the potential 'explanatory landscape' (thinking about what is to be explained, and what explanation might look like). If they become genuinely engaged with the question then the teacher's presentation of the science is given 'epistemic relevance'. (Inevitably the science curriculum consists of answers to the questions scientists have posed over many years: but in teaching it we may find we are presenting answers to many questions that simply have never occurred to the students. If we can get learners to first wonder about the questions, then that makes the answer more relevant for them – so more likely to be remembered later.)

Is there really likely to be a diversity of opinion?

This example may seem fairly straightforward to a science teacher. Clearly NaCl, sodium chloride (a.k.a. 'common salt' or 'table salt') is an ionic solid that will dissolve in water as the ions are solvated by the polar water molecules clustering around them. That should also be obvious to advanced students. (Should – but research evidence suggests not always.)

What about students who have just learned about ionic bonding and the NaCl crystal structure? What might they think?

Surely, we can dismiss the possibility that salt will not dissolve? Everyone knows it does. The sea is pretty salty, and people often add salt to the water when cooking. And as long as learners know that NaCl is 'salt' there should be no one supporting the option that it does not dissolve. After all, there is a very simple logical syllogism to be applied here:

common salt dissolves in water
common salt is NaCl
so NaCl dissolves in water

Except, of course, learners who know both that salt dissolves in water and that it is NaCl still have to bring both of those points to mind, and coordinate them – and if they are juggling other information at the same time they may have reached the 'working memory capacity' limit.

Moreover, we know that often learners tend to 'compartmentalise' their learning (well, we all do to some extent), so although they may engage with salt in the kitchen or dinner table, and learn about salt as NaCl in science lessons, they may not strongly link these two domains. And the rationale offered here by the student in red, that NaCl is strongly bonded, is a decent reason to expect the salt to be insoluble.

Now as I have just made this cartoon up, and do not have any classes to try it out on, I may be making a misjudgement and perhaps no learners would support this option. But I have a sneaking suspicion there might be a few who would!

The other two options are based on things I was told when a teacher. That the solid may dissolve as separate atoms is based on being told by an advanced student that in 'double decomposition' reactions the precipitate was produced when atoms in the solution paired up to transfer electrons. The student knew the solutions reacting (say of potassium iodide and lead nitrate) contained ions, but obviously (to my informant) the ions changed themselves back into atoms before forming new ionic bonds by new electron transfers.

I was quite shocked to have been told that, but perhaps should not have been as it involves two very common misconceptions:

(Moreover, another advanced student once told me that when bonds broke electrons had to go back to their 'own' atom as it would be odd for an atom to end up with someone else's electron! So, by this logic, of course anions have to return electrons to their rightful owners before ironically bonding elsewhere!)

So, I suspect a fair number of students new to learning about ionic bonding might well expect it to dissolve as atoms rather than ions.

As regards the other option, that the salt dissolves as molecules, I would actually be amazed if quite a few learners in most classes of, say, 13-14-year-olds, did not select this option. It is very common for students to think that, despite its symmetrical crystal structure (visible in the model in the cartoon), NaCl really comprises of NaCl units, molecule-like ions pairs – perhaps even seen as simply NaCl 'molecules'.

It becomes the teacher's job to persuade learners this is not so, for example, by considering how much energy is needed to melt NaCl , and the conductivity of the liquid and the aqueous solution. (In my imaginary lesson the next activity was a 'Predict-Observe-Explain' activity involving measuring the conductivity of a salt solution.)

A challenge to science teachers

Perhaps you think the students in your classes would not find this a challenging task, as you have taught them that NaCl is an ionic solid, held together by the attractions between cations and anions? All your students know NaCl dissolves, and that the dissolved species will (very nearly always) be single hydrated ions.

Perhaps you are right, and I am wrong.

Or perhaps you recognise that given that in the past so many students have demonstrated alternative conceptions of ionic bonding (Taber, 1994) that perhaps some of your own students may find this topic difficult.

As I no longer had classes to teach, I am uploading a copy of the cartoon that can be downloaded in case you want to present this to your classes and see how they get on. This is primary for students who have been introduced to ionic bonding and taught that salts such as NaCl form solids with regular arrangements of charged ions. If they have not yet studied salts dissolving then perhaps this would be a useful introductory ability for that learning that content?

If you have already taught them about salts dissolving, then obviously they should all get the right answer. (But does that mean they will? Is it worth five minutes of class-time to check?)

And if you work with more advanced students who are expected to have mastered ionic bonding some years ago, then we might hope no one in the class would hesitate in selecting the right answer. (But can you be sure? You could present this as something designed for younger students, and ask your students how they would tutor a younger bother or sister who was not sure what the right answer was.)

If you do decide to try this out with your students – I would really like to know how you get on. Perhaps you would even share your experience with other readers by leaving a comment below?

Download the concept cartoon

Work cited:

Taber, K. S. (1994) Misunderstanding the ionic bond, Education in Chemistry, 31 (4), pp.100-103.

ChemTeach – is a discussion list for sharing information and expertise among those involved in teaching chemistry. Click here to learn more about, or find out how to join, the group

Shock result: more study time leads to higher test scores

(But 'all other things' are seldom equal)

Keith S. Taber

I came across an interesting journal article that reported a quasi-experimental study where different groups of students studied the same topic for different periods of time. One group was given 3 half-hour lessons, another group 5 half-hour lessons, and the third group 8 half-hour lessons. Then they were tested on the topic they had been studying. The researchers found that the average group performance was substantially different across the different conditions. This was tested statistically, but the results were clear enough to be quite impressive when presented visually (as I have below).

Results from a quasi-experiment: its seems more study time can lead to higher achievement

These results seem pretty clear cut. If this research could be replicated in diverse contexts then the findings could have great significance.

Is your manager trying to cut course hours to save budget?
Does your school want you to teach 'triple science' in a curriculum slot intended for 'double science'?
Does your child say they have done enough homework?

Research evidence suggests that, ceteris paribus, learners achieve more by spending more time studying.

Ceteris paribus?

That is ceteris paribus (no, it is not a newly discovered species of whale): all other things being equal. But of course, in the real world they seldom – if ever – are.

If you wondered about the motivation for a study designed to see whether more teaching led to more learning (hardly what Karl Popper would have classed as a suitable 'bold conjecture' on which to base productive research), then I should confess I am being disingenuous. The information I give above is based on the published research, but offers a rather different take on the study from that offered by the authors themselves.

An 'alternative interpretation' one might say.

How useful are DARTs as learning activities?

I came across this study when looking to see if there was any research on the effectiveness of DARTs in chemistry teaching. DARTs are directed activities related to text – that is text-based exercises designed to require learners to engage with content rather than just copy or read it. They have long been recommended, but I was not sure I had seen any published research on their use in science classrooms.

Read about using DARTs in teaching

Shamsulbahri and Zulkiply (2021) undertook a study that "examined the effect of Directed Activity Related to Texts (DARTs) and gender on student achievement in qualitative analysis in chemistry" (p.157). They considered their study to be a quasi-experiment.

An experiment…

Experiment is the favoured methodology in many areas of natural science, and, indeed, the double blind experiment is sometimes seen as the gold standard methodology in medicine – and when possible in the social sciences. This includes education, and certainly in science education the literature reports many, many educational experiments. However, doing experiments well in education is very tricky and many published studies have major methodological problems (Taber, 2019).

Read about experiments in education

…requires control of variables

As we teach in school science, fair testing requires careful control of variables.

So, if I suggest there are some issues that prevent a reader from being entirely confident in the conclusions that Shamsulbahri and Zulkiply reach in their paper, it should be borne in mind that I think it is almost impossible to do a rigorously 'fair' small-scale experiment in education. By small-scale, I mean the kind of study that involves a few classes of learners as opposed to studies that can enrol a large number of classes and randomly assign them to conditions. Even large scale randomised studies are usually compromised by factors that simply cannot be controlled in educational contexts (Taber, 2019) , and small scale studies are subject to additional, often (I would argue) insurmountable, 'challenges'.

The study is available on the web, open access, and the paper goes into a good deal of detail about the background to, and aspects of, the study. Here, I am focusing on a few points that relate to my wider concerns about the merits of experimental research into teaching, and there is much of potential interest in the paper that I am ignoring as not directly relevant to my specific argument here. In particular, the authors describe the different forms of DART they used in the study. As, inevitably (considering my stance on the intrinsic problems of small-scale experiments in education), the tone of this piece is critical, I would recommend readers to access the full paper and make up your own minds.

Not a predatory journal

I was not familiar with the journal in which this paper was published – the Malaysian Journal of Learning and Instruction. It describes itself as "a peer reviewed interdisciplinary journal with an international advisory board". It is an open access journal that charges authors for publication. However, the publication fees are modest (US$25 if authors are from countries that are members of The Association of Southeast Asian Nations, and US$50 otherwise). This is an order of magnitude less than is typical for some of the open-access journals that I have criticised here as being predatory – those which do not engage in meaningful peer review, and will publish some very low quality material as long as a fee is paid. 25 dollars seems a reasonable charge for the costs involved in publishing work, unlike the hefty fees charged by many of the less scrupulous journals.

Shamsulbahri and Zulkiply seem, then, to have published in a well-motivated journal and their paper has passed peer review. But this peer thinks that, like most small scale experiments into teaching, it is very hard to draw any solid conclusions from this work.

What do the authors conclude?

Shamsulbahri and Zulkiply argue that their study shows the value of DARTs activities in learning. I approach this work with a bias, as I also think DARTs can be very useful. I used different kinds of DARTs extensively in my teaching with 14-16 years olds when I worked in schools.

The authors claim their study,

"provides experimental evidence in support of the claim that the DARTs method has been beneficial as a pedagogical approach as it helps to enhance qualitative analysis learning in chemistry…

The present study however, has shown that the DARTs method facilitated better learning of the qualitative analysis component of chemistry when it was combined with the experimental method. Using the DARTs method only results in better learning of qualitative analysis component in chemistry, as compared with using the Experimental method only."
Shamsulbahri & Zulkiply, 2021

Yet, despite my bias, which leads me to suspect they are right, I do not think we can infer this much from their quasi-experiment.

I am going to separate out three claims in the quote above:

the DARTs method has been beneficial as a pedagogical approach as it helps to enhance qualitative analysis learning in chemistry
the DARTs method facilitated better learning of the qualitative analysis component of chemistry when it was combined with the [laboratory¹] method
the DARTs method [by itself] results in better learning of qualitative analysis component in chemistry, as compared with using the [laboratory] method only.

I am going to suggest that there are two weak claims here and one strong claim. The weak claims are reasonably well supported (but only as long as they are read strictly as presented and not assumed to extend beyond the study) but the strong claim is not.

Limitations of the experiment

I suggest there are several major limiations of this research design.

What population is represented in the study?

In a true experiment researchers would nominate the population of interest (say, for example, 14-16 year old school learners in Malaysia), and then randomly select participants from this population who would be randomly assigned to the different conditions being compared. Random selection and assignment cannot ensure that the groupings of participants are equivalent, nor that the samples genuinely represent the population; as by chance it could happen that, say, the most studious students are assigned to one condition and all the lazy students to an other – but that is very unlikely. Random selection and assignment means that there is strong statistical case to think the outcomes of the experiment probably represent (more or less) what would have happened on a larger scale had it been possible to include the whole population in the experiment.

Read about sampling in research

Obviously, researchers in small-scale experiments are very unlikely to be able to access full populations to sample. Shamsulbahri and Zulkiply did not – and it would be unreasonable to criticise them for this. But this does raise the question of whether what happens in their samples will reflect what would happen with other groups of students. Shamsulbahri and Zulkiply acknowledge their sample cannot be considered typical,

"One limitation of the present study would be the sample used; the participants were all from two local fully residential schools, which were schools for students with high academic performance."
Shamsulbahri & Zulkiply, 2021

So, we have to be careful about generalising from what happened in this specific experiment to what we might expect with different groups of learners. In that regard, two of the claims from the paper that I have highlighted (i.e., the weaker claims) do not directly imply these results can be generalised:

the DARTs method has been beneficial as a pedagogical approach…
the DARTs method facilitated better learning of the qualitative analysis component of chemistry when it was combined with the [laboratory] method

These are claims about what was found in the study – not inferences about what would happen in other circumstances.

Read about randomisation in studies

Equivalence at pretest?

When it is not possible to randomly assign participants to the different conditions then there is always the possibility that whatever process has been used to assign conditions to groups produces a bias. (An extreme case would be in a school that used setting, that is assigning students to teaching groups according to achievement, if one set was assigned to one condition, and another set to a different condition.)

In quasi-experiments on teaching it is usual to pre-test students and to present analysis to show that at the start of the experiment the groups 'are equivalent'. Of course, it is very unlikely two different classes would prove to be entriely equivalent on a pre-test, so often there is a judgement made of the test results being sufficiently similar across the conditions. In practice, in many published studies, authors settle for the very weak (and inadequate) test of not finding differences so great that would be very unlikely to occur by chance (Taber, 2019)!

Read about testing for equivalence

Shamsulbahri and Zulkiply did pretest all participants as a screening process to exclude any students who already had good subject knowledge in the topic (qualitative chemical analysis),

"Before the experimental manipulation began, all participants were given a pre-screening test (i.e., the Cation assessment test) with the intention of selecting only the most qualified participants, that is, those who had a low-level of knowledge on the topic….The participants who scored ten or below (out of a total mark of 30) were selected for the actual experimental manipulation. As it turned out, all 120 participants scored 10 and below (i.e., with an average of 3.66 out of 30 marks), which was the requirement that had been set, and thus they were selected for the actual experimental manipulation."
Shamsulbahri & Zulkiply, 2021

But the researchers do not report the mean results for the groups in the three conditions (laboratory¹; DARTs; {laboratory+DARTs}) or give any indication of how similar (or not) these were. Nor do these scores seem to have been included as a variable in the analysis of results. The authors seem to be assuming that as no students scored more than one-third marks in the pre-test, then any differences beteen groups at pre-test can be ignored. (This seems to suggest that scoring 30% or 0% can be considered the same level of prior knowledge in terms of the potential influence on further learning and subsequent post-test scores.) That does not seem a sound assumption.

"It is important to note that there was no issue of pre-test treatment interaction in the context of the present study. This has improved the external validity of the study, since all of the participants were given a pre-screening test before they got involved in the actual experimental manipulation, i.e., in one of the three instructional methods. Therefore, any differences observed in the participants' performance in the post-test later were due to the effect of the instructional method used in the experimental manipulation."
Shamsulbahri & Zulkiply, 2021 (emphasis added)

There seems to be a flaw in the logic here, as the authors seem to be equating demonstrating an absence of high scorers at pre-test with there being no differences between groups which might have influenced learning. ²

Units of analysis

In any research study, researchers need to be clear regarding what their 'unit of analysis' should be. In this case the extreme options seem to be:

120 units of analysis: 40 students in each of three conditions
3 units of analysis: one teaching group in each condition

The key question is whether individual learners can be considered as being subject to the treatment conditions independently of others assiged to the same condition.

"During the study phase, student participants from the three groups were instructed by their respective chemistry teachers to learn in pairs…"
Shamsulbahri & Zulkiply, 2021

There is a strong argument that when a group of students attend class together, and are taught together, and interact with each other during class, they strictly should not be considered as learning independently of each other. Anyone who has taught parallel classes that are supposedly equivalent will know that classes take on their own personalities as groups, and the behaviour and learning of individual students is influenced by the particular class ethos.

Read about units of analysis

So, rigorous research into class teaching pedagogy should not treat the individual learners as units of analysis – yet it often does. The reason is obvious – it is only possible to do statistical testing when the sample size is large enough, and in small scale educational experiments the sample size is never going to be large enough unless one…hm…pretends/imagines/considers/judges/assumes/hopes?, that each learner is independently subject to the assigned treatment without being substantially influenced by others in that condition.

So, Shamsulbahri and Zulkiply treated their participants as independent units of analysis and based on this find a statistically significant effect of treatment:

⎟laboratory⎢ vs. ⎟DARTs⎢ vs. ⎟laboratory+DARTs⎢.

That is questionable – but what if, for argument's sake, we accept this assumption that within a class of 40 students the learners can be considered not to influence each other (even their learning partner?) or the classroom more generally sufficiently to make a difference to others in the class?

A confounding variable?

Perhaps a more serious problem with the research design is that there is insufficient control of potentially relevant variables. In order to make a comparison of ⎟laboratory⎢ vs. ⎟DARTs⎢ vs. ⎟laboratory+DARTs⎢ then the only relevant difference between the three treatment conditions should be whether the students learn by laboratory activity, DARTs, or both. There should not be any other differences between the groups in the different treatments that might reasonably be expected to influence the outcomes.

Read about confounding variables

But the description of how groups were set up suggests this was not the case:

"….the researchers conducted a briefing session on the aims and experimental details of the study for the school's [schools'?] chemistry teachers…the researchers demonstrated and then guided the school's chemistry teachers in terms of the appropriate procedures to implement the DARTs instructional method (i.e., using the DARTs handout sheets)…The researcher also explained to the school's chemistry teachers the way to implement the combined method …

Participants were then classified into three groups: control group (experimental method), first treatment group (DARTs method) and second treatment group (Combination of experiment and DARTs method). There was an equal number of participants for each group (i.e., 40 participants) as well as gender distribution (i.e., 20 females and 20 males in each group). The control group consisted of the participants from School A, while both treatment groups consisted of participants from School B"

Shamsulbahri & Zulkiply, 2021

Several different teachers seems to have been involved in teaching the classes, and even if it is not entirely clear how the teaching was divided up, it is clear that the group that only undertook the laboratory activities were from a different school than those in the other two conditions.

If we think one teacher can be replaced by another without changing learning outcomes, and that schools are interchangeable such that we would expect exactly the same outcomes if we swapped a class of students from one school for a class from another school, then these variables are unimportant. If, however, we think the teacher doing the teaching and the school from which learners are sampled could reasonably make a difference to the learning achieved, then these are confounding variables which have not been properly controlled.

In my own experience, I do not think different teachers become equivalent even when their are briefed to teach in the same way, and I do not think we can assume schools are equivalent when providing students to participate in learning. These differences, then, undermine our ability to assign any differences in outcomes as due to the differences in pedagogy (that "any differences observed…were due to the effect of the instructional method used").

Another confounding variable

And then I come back to my starting point. Learners did not just experience different forms of pedagogy but also different amounts of teaching. The difference between 3 lessons and 5 lessons might in itself be a factor (that is, even if the pedagogy employed in those lessons had been the same), as might the difference between 5 lessons and 8 lessons. So, time spent studying must be seen as a likely confounding variable. Indeed, it is not just the amount of time, but also the number of lessons, as the brain processes learning between classes and what is learnt in one lesson can be reinforced when reviewed in the next. (So we could not just assume, for example, that students automatically learn the same amount from, say, two 60 min. classes and four 30 min. classes covering the same material.)

What can we conclude?

As with many experiments in science teaching, we can accept the results of Shamsulbahri and Zulkiply's study, in terms of what they found in the specific study context, but still not be able to draw strong conclusions of wider significance.

Is the DARTs method beneficial as a pedagogical approach?

I expect the answer to this question is yes, but we need to be careful in drawing this conclusion from the experiment. Certainly the two groups which undertook the DARTs activities outperformed the group which did not. Yet that group was drawn from a different school and taught by a different teacher or teachers. That could have explained why there was less learning. (I am not claiming this is so – the point is we have no way of knowing as different variables are conflated.) In any case, the two groups that did undertake the DARTs activity were both given more lessons and spent substantially longer studying the topic they were tested on, than the class that did not. We simply cannot make a fair comparison here with any confidence.

Did the DARTs method facilitate better learning when it was combined with laboratory work?

There is a stronger comparison here. We still do not know if the two groups were taught by the same teacher/teachers (which could make a difference) or indeed whether the two groups started from a very similar level of prior knowledge. But, at least the two groups were from the same school, and both experienced the same DARTs based instruction. Greater learning was achieved when students undertook laboratory work as well as undertaking DARTs activities compared with students who only undertook the DARTs activity.

The 'combined' group still had more teaching than the DARTs group, but that does not matter here in drawing a logical conclusion because the question being explored is of the form 'does additional teaching input provide additional value?' (Taber, 2019). The question here is not whether one type of pedagogy is better than the other, but simply whether also undertaking practical works adds something over just doing the paper based learning activities.

Read about levels of control in experimental design

As the sample of learners was not representative of any specfiic wider population, we cannot assume this result would generalise beyond the participants in the study, although we might reasonably expect this result would be found elsewhere. But that is because we might already assume that learning about a practical activity (qualitative chemical analysis) will be enhanced by adding some laboratory based study!

Does DARTs pedagogy produce more learning about qualitative analysis than laboratory activities?

Shamsulbahri and Zulkiply's third claim was bolder because it was framed as a generalisation: instruction through DARTs produces more learning about qualitative analysis than laboratory-based instruction. That seems quite a stretch from what the study clearly shows us.

What the research does show us with confidence is that a group of 40 students in one school taught by a particular teacher/teaching team with 5 lessons of a specific set of DARTs activities, performed better on a specific assessment instrument than a different group of 40 students in another school taught by a different teacher/teaching team through three lessons of laboratory work following a specific scheme of practical activities.

a group of 40 students	performed better on a specific assessment instrument	than a different group of 40 students
in one school		in another school
taught by a particular teacher/teaching team		taught by a different teacher/teaching team
with 5 lessons		through 3 lessons
of a specific set of DARTs activities,		of laboratory work following a specific scheme of practical activities

Confounded variables

Test instrument bias?

Even if we thought the post-test used by Shamsulbahri and Zulkiply was perfectly valid as an assessment of topic knowledge, we might be concerned by knowing that learning is situated in a context – we better recall in a similar context to that in which we learned.

How can we best assess students' learning about qualitative analysis?

So:

should we be concerned that the form of assessment, a paper-based instrument, is closer in nature to the DARTs learning experience than the laboratory learning experience?

and, if so,

might this suggest a bias in the measurement instrument towards one treatment (i.e., DARTs)

and, if so,

might a laboratory-based assessment have favoured the group that did the laboratory based learning over the DARTs group, and led to different outcomes?

and, if so,

which approach to assessment has more ecological validity in this case: which type of assessment activity is a more authentic way of testing learning about a laboratory-based activity like qualitative chemical analysis?

A representation of my understanding of the experimental design

Can we generalise?

As always with small scale experiments into teaching, we have to judge the extent to which the specifics of the study might prevent us from generalising the findings – to be able to assume they would generally apply elsewhere.³ Here, we are left to ask to what extent we can

ignore any undisclosed difference between the groups in levels of prior learning;
ignore any difference between the schools and their populations;
ignore any differences in teacher(s) (competence, confidence, teaching style, rapport with classes, etc.);
ignore any idiosyncrasies in the DARTs scheme of instruction;
ignore any idiosyncrasies in the scheme of laboratory instruction;
ignore any idiosyncrasies (and potential biases) in the assessment instrument and its marking scheme and their application;

And, if we decide we can put aside any concerns about any of those matters, we can safely assume that (in learning this topic at this level)

5 sessions of learning by DARTs is more effective than 3 sessions of laboratory learning.

Then we only have to decide if that is because

(i) DARTs activities teach more about this topic at this level than laboratory activities, or
(ii) whether some or all of the difference in learning outcomes is simply because 150 minutes of study (broken into five blocks) has more effect than 90 minutes of study (broken into three blocks).

What do you think?

Loading poll ...

Work cited:

Shamsulbahri, M. M., & Zulkiply, N. (2021). Examining the effect of directed activity related to texts (DARTs) and gender on student achievement in qualitative analysis in chemistry. Malaysian Journal of Learning and Instruction, 18(1), 157-181. https://doi.org/10.32890/mjli2021.18.1.7
Taber, K. S. (2019). Experimental research into teaching innovations: responding to methodological and ethical challenges. Studies in Science Education, 55(1), 69-119. doi:10.1080/03057267.2019.1658058 (Download the manuscript version of this paper.)

Notes:

¹ The authors refer to the conditions as

Experimental control group
DARTs
combination of Experiment + DARTs

I am referring to the first group as 'laboratory' both because it not clear the students were doing any experiments (that is, testing hypotheses) as the practical activity was learning to undertake standard analytical tests, and, secondly, to avoid confusion (between the educational experiment and the laboratory practicals).

² I think the reference to "no issue of pre-test treatment interaction" is probably meant to suggest that as all students took the same pre-test it will have had the same effect on all participants. But this not only ignores the potential effect of any differences in prior knowledge reflected in the pre-test scores that might influence subsequent learning, but also the effect of taking the pre-test cannot be assumed to be neutral if for some learners it merely told them they knew nothing about the topic, whilst for others it activated and so reinforced some prior knowledge in the subject. In principle, the interaction between prior knowledge and taking the pretest could have influenced learning at both cognitive and affective levels: that is, both in terms of consolidation of prior learning and cuing for the new learning; and in terms of a learner's confidence in, and attitude towards, learning the topic.

³ Even when we do have a representative sample of a population to test, we can only infer that the outcomes of an experiment reflect what will be most likely for members (schools, learners, classes, teachers…) of the wider population. Individual differences are such that we can never say that what most probably is the case will always be the case.

When an experiment tests a sample drawn at random from a wider population, then the findings of the experiment can be assumed to apply (on average) to the population. (Source: after Taber, 2019).

POEsing assessment questions…

…but not fattening the cow

Keith S. Taber

A well-known Palestinian proverb reminds us that we do not fatten the cow simply by repeatedly weighing it. But, sadly, teachers and others working in education commonly get so fixated on assessment that it seems to become an end in itself.

Images by Clker-Free-Vector-Images from PixabayOpenClipart-Vectors and Deedster from Pixabay

A research study using P-O-E

I was reading a report of a study that adopted the predict-observe-explain, P-O-E, technique as a means to elicit "high school students' conceptions about acids and bases" (Kala, Yaman & Ayas, 2013, p.555). As the name suggests, P-O-E asks learners to make a prediction before observing some phenomenon, and then to explain their observations (something that can be specially valuable when the predictions are based on strongly held intuitions which are contrary to what actually happens).

Read about Predict-Observe-Explain

Kala and colleagues begin the introduction to their paper by stating that

"In any teaching or learning approach enlightened by constructivism, it is important to infer the students' ideas of what is already known"
Kala, Yaman & Ayas, 2013, p.555

Constructivism?

Constructivism is a perspective on learning that is informed by research into how people learn and a great many studies into student thinking and learning in science. A key point is how a learner's current knowledge and understanding influences how they make sense of teaching and what they go on to learn. Research shows it is very common for students to have 'alternative conceptions' of science topics, and often these conceptions either survive teaching or distort how it is understood.

The key point is that teachers who teach the science without regard to student thinking will often find that students retain their alternative ways of thinking, so constructivist teaching is teaching that takes into account and responds to the ideas about science topics that students bring to class.

Read about constructivism

Read about constructivist pedagogy

Assessment: summative, formative and diagnostic

If teachers are to take into account, engage with, and try to reshape, learners ideas about science topics, then they need to know what those ideas are. Now there is a vast literature reporting alternative conceptions in a wide range of science topics, spread across thousands or research reports – but no teacher could possibly find time to study them all. There are books which discuss many examples and highlight some of the most common alternative conceptions (including one of my own, Taber, 2014)

A book that describes and discusses a good many alternative conceptions – but by no means all!

However, in any class studying some particular topic there will nearly always be a spread of different alternative conceptions across the students – including some so idiosyncratic that they have never been reported in any literature. So, although reading about common misconceptions is certainly useful to prime teachers for what to look out for, teachers need to undertake diagnostic assessment to find out about the thinking of their own particular students.

There are many resources available to support teachers in diagnostic assessment, and some activities (such as using concept cartoons) that are especially useful at revealing student thinking.

Read about diagnostic assessment

Diagnostic assessment, assessment to inform teaching, is carried out at the start of a topic, before the teaching, to allow teachers to judge the learners' starting points and any alternative conceptions ('misconceptions') they may have. It can therefore be considered aligned to formative assessment ('assessment for learning') which is carried out as part of the learning process, rather than summative assessment (assessment of leaning) which is used after studying to check, score, grade and certify learning.

P-O-E as a learning activity…

P-O-E can best support learning in topics where it is known learners tend to have strongly held, but unhelpful, intuitions. The predict stage elicits students' expectations – which, when contrary to the scientific account, can be confounded by the observe step. The 'cognitive conflict' generated by seeing something unexpected (made more salient by having been asked to make a formal prediction) is thought to help students concentrate on that actual phenomena, and to provide 'epistemic relevance' (Taber, 2015).

Epistemic relevance refers to the idea that students are learning about things they are actually curious about, whereas for many students following a conventional science course must be experienced as being presented with the answers to a seemingly never-ending series questions that had never occurred to them in the first place.

Read about the Predict-Observe-Explain technique

Students are asked to provide an explanation for what they have observed which requires deeper engagement than just recording an observation. Developing explanations is a core scientific practice (and one which is needed before another core scientific practice – testing explanations – is possible).

Read about teaching about scientific explanations

To be most effective, P-O-E is carried out in small groups, as this encourages the sharing, challenging and justifying of ideas: the kind of dialogic activity thought to be powerful in supporting learners in developing their thinking, as well as practicing their skills in scientific argumentation. As part of dialogic teaching such an open-forum for learners' ideas is not an end in itself, but a preparatory stage for the teacher to marshal the different contributions and develop a convincing argument for how the best account of the phenomenon is the scientific account reflected in the curriculum.

Constructivist teaching is informed by learners' ideas, and therefore relies on their elicitation, but that elicitation is never the end in itself but is a precursor to a customised presentation of the canonical account.

Read about dialogic teaching and learning

…and as a diagnostic activity

Group work also has another function – if the activity is intended to support diagnostic assessment, then the teacher can move around the room listening in to the various discussions and so collecting valuable information on what students think and understand. When assessment is intended to inform teaching it does not need to be about students completing tests and teachers marking them – a key principle of formative assessment is that it occurs as a natural part of the teaching process. It can be based on productive learning activities, and does not need marks or grades – indeed as the point is to help students move on in their thinking, any kind of formal grading whilst learning is in progress would be inappropriate as well as a misuse of teacher time.

Probing students' understandings about acid-base chemistry

The constructivist model of learning applies to us all: students, teachers, professors, researchers. Given what I have written above about P-O-E, about diagnostic assessment, and dialogic approaches to learning, I approached Kala and colleagues' paper with expectations about how they would have carried out their project.

These authors do report that they were able to diagnose aspects of student thinking about acids and bases, and found some learning difficulties and alternative conceptions,

"it was observed that eight of the 27 students had the idea that the "pH of strong acids is the lowest every time," while two of the 27 students had the idea that "strong acids have a high pH." Furthermore, four of the 27 students wrote the idea that the "substance is strong to the extent to which it is burning," while one of the 27 students mentioned the idea that "different acids which have equal concentration have equal pH."
Kala, Yaman & Ayas, 2013, pp.562-3

The key feature seems to be that, as reported in previous research, students conflate acid concentration and acid strength (when it is possible to have a high concentration solution of a weak acid or a very dilute solution of a strong acid).

Yet some aspects of this study seemed out of alignment with the use of P-O-E.

The best research style?

One feature was the adoption of a positivistic approach to the analysis,

Although there has been no reported analyzing procedure for the POE, in this study, a different [sic] analyzing approach was offered taking into account students' level of understanding… Data gathered from the written responses to the POE tasks were analyzed and divided into six groups. In this context, while students' prediction were divided into two categories as being correct or wrong, reasons for predictions were divided into three categories as being correct, partially correct, or wrong.
Kala, Yaman & Ayas, 2013, pp.560

Group	Prediction	Reasons
❀	correct	correct
❁	correct	partially correct
✯	correct	wrong
✿	wrong	correct
❂	wrong	partially correct
✤	wrong	wrong

"the written responses to the POE tasks were analyzed and divided into six groups"

There is nothing inherently wrong with doing this, but it aligns the research with an approach that seems at odds with the thinking behind constructivist studies that are intended to interpret a learner's thinking in its own terms, rather than simply compare it with some standard. (I have explored this issue in some detail in a comparison of two research studies into students' conceptions of forces – see Taber, 2013, pp.58-66.)

In terms of research methodology we might say it seem to be conceptualised within the 'wrong' paradigm for this kind of work. It seems positivist (assuming data can be unambiguously fitted into clear categories), nomothetic (tied to 'norms' and canonical answers) and confirmatory (testing thinking as matching model responses or not), rather than interpretivist (seeking to understand student thinking in its own terms rather than just classifying it as right or wrong), idiographic (acknowledging that every learner's thinking is to some extent unique to them) and discovery (exploring nuances and sophistication, rather than simply deciding if something is acceptable or not).

Read about paradigms in educational research

The approach used seemed more suitable for investigating something in the science laboratory, than the complex, interactive, contextualised, and ongoing life of classroom teaching. Kala and colleagues describe their methodology as case study,

"The present study used a case study because it enables the giving of permission to make a searching investigation of an event, a fact, a situation, and an individual or a group…"
Kala, Yaman & Ayas, 2013, pp.558

A case study?

Case study is a naturalistc methodology (rather than involving an intervention, such as an experiment), and is idiographic, reflecting the value of studying the individual case. The case is one from among many instances of its kind (one lesson, one school, one examination paper, etc.), and is considered as a somewhat self contained entity yet one that is embedded in a context in which it is to some extent entangled (for example, what happens in a particular lesson is inevitably somewhat influenced by

the earlier sequence of lessons that teacher taught that class {the history of that teacher with that class},
the lessons the teacher and student came from immediately before this focal lesson,
the school in which it takes place,
the curriculum set out to be followed…)

Although a lesson can be understood as a bounded case (taking place in a particular room over a particular period of time involving a specified group of people) it cannot be isolated from the embedding context.

Read about case study methodology

Case study – study of one instance from among many

As case study is idiographic, and does not attempt to offer direct generalisation to other situations beyond that case, a case study should be reported with 'thick description' so a reader has a good mental image of the case (and can think about what makes it special – and so what makes it similar to, or different from, other instances the reader may be interested in). But that is lacking in Kala and colleagues' study, as they only tell readers,

"The sample in the present study consisted of 27 high school students who were enrolled in the science and mathematics track in an Anatolian high school in Trabzon, Turkey. The selected sample first studied the acid and base subject in the middle school (grades 6 – 8) in the eighth year. Later, the acid and base topic was studied in high school. The present study was implemented, based on the sample that completed the normal instruction on the acid and base topic."
Kala, Yaman & Ayas, 2013, pp.558-559

The reference to a sample can be understood as something of a 'reveal' of their natural sympathies – 'sample' is the language of positivist studies that assume a suitably chosen sample reflects a wider population of interest. In case study, a single case is selected and described rather than a population sampled. A reader is left to rather guess what population being sampled here, and indeed precisely what the 'case' is.

Clearly, Kala and colleagues elicited some useful information that could inform teaching, but I sensed that their approach would not have made optimal use of a learning activity (P-O-E) that can give insight into the richness, and, sometimes, subtlety of different students' ideas.

Individual work

Even more surprising was the researchers' choice to ask students to work individually without group discussion.

"The treatment was carried out individually with the sample by using worksheets."
Kala, Yaman & Ayas, 2013, p.559

This is a choice which would surely have compromised the potential of the teaching approach to allow learners to explore, and reveal, their thinking?

I wondered why the researchers had made this choice. As they were undertaking research, perhaps they thought it was a better way to collect data that they could readily analyse – but that seems to be choosing limited data that can be easily characterised over the richer data that engagement in dialogue would surely reveal?

Assessment habits

All became clear near the end of the study when, in the final paragraph, the reader is told,

"In the present study, the data collection instruments were used as an assessment method because the study was done at the end of the instruction/ [sic] on the acid and base topics."
Kala, Yaman & Ayas, 2013, p.571

So, it appears that the P-O-E activity, which is an effective way of generating the kind of rich but complex data that helps a teacher hone their teaching for a particular group, was being adopted, instead, as means of a summative assessment. This is presumably why the analysis focused on the degree of match to the canonical science, rather than engaging in interpreting the different ways of thinking in the class. Again presumably, this is why the highly valuable group aspect of the approach was dropped in favour of individual working – summative assessment needs to not only grade against norms, but do this on the basis of each individual's unaided work.

An activity which offers great potential for formative assessment (as it is a learning activity as well as a way of exploring student thinking); and that offers an authentic reflection of scientific practice (where ideas are presented, challenged, justified, and developed in response to criticism); and that is generally enjoyed by students because it is interactive and the predictions are 'low stakes' making for a fun learning session, was here re-purposed to be a means of assessing individual students once their study of a topic was completed.

Kala and colleagues certainly did identify some learning difficulties and alternative conceptions this way, and this allowed them to evaluate student learning. But I cannot help thinking an opportunity was lost here to explore how P-O-E can be used in a formative assessment mode to inform teaching:

diagnostic assessment as formative assessment can inform more effective teaching
diagnostic assessment as summative assessment only shows where teaching has failed

Yes, I agree that "in any teaching or learning approach enlightened by constructivism, it is important to infer the students' ideas of what is already known", but the point of that is to inform the teaching and so support student learning. What were Kala and colleagues going to do with their inferences about students ideas when they used the technique as "an assessment method … at the end of the instruction".

As the Palestinian adage goes, you do not fatten up the cow by weighing it, just as you do not facilitate learning simply by testing students. To mix my farmyard allusions, this seems to be a study of closing the barn door after the horse has already bolted.

Work cited

Kala, N., Yaman, F., & Ayas, A. (2013). The effectiveness of predict-observe-explain technique in probing students' understanding about acid-base chemistry: a case for the concepts of pH, pOH, and strength. International Journal of Science and Mathematics Education, 11(3), 555-574. https://doi.org/10.1007/s10763-012-9354-z
Taber, K. S. (2013). Classroom-based Research and Evidence-based Practice: An introduction (2nd ed.). London: Sage.
Taber, K. S. (2014). Student Thinking and Learning in Science: Perspectives on the Nature and Development of Learners' Ideas. New York: Routledge.
Taber, K. S. (2015). Epistemic relevance and learning chemistry in an academic context. In I. Eilks & A. Hofstein (Eds.), Relevant Chemistry Education: From Theory to Practice (pp. 79-100). Sense Publishers.

Educational fore-hind-sight

Keith S. Taber

Oh dear, a sense of deja vu. One no sooner writes about the errors of the past *, and it is suggested we commit them again.

Return of the 11-plus?

"Return of the 11-plus: Does Theresa May back selective grammar schools?"
Newspaper headline

"It also became clear that although the process was meant to select on the basis of academic ability, to a large extent the outcomes reflected the socio-economic family background of the children.
Where the independent schools largely served the more wealthy in society, the grammar schools admitted disproportionate numbers of children from so called 'middle-class' families (i.e., parents being lower professional and white-collar workers) rather than so-called working class (e.g., children of unskilled labourers). It was found that scholastic achievement at age 11 was strongly linked to social capital deriving from the home background.
If schools are expected to be agents of social change, rather than a means to reproduce existing social differences (and that of course is an ideological choice), then determining a person's educational, and so possibly professional, future at age eleven, based on an examination that did not compensate for levels of educational opportunity and advantage in the home environment, was clearly inappropriate.."
(Taber, 2017: 189)

Source cited:

Taber, K. S. (2017). Teaching science to the gifted in English state schools: Locating a compromised "gifted and talented" policy within its systemic context. In M. Sumida & K. S. Taber (Eds.), Policy and Practice in Science Education for the Gifted: Approaches from diverse national contexts (pp. 185-203). Abingdon, Oxon.: Routledge.

* First published 22nd January 2017 at http://people.ds.cam.ac.uk/kst24/