case study – Science-Education-Research

The scientific language of an anthropologist

Making unfamiliar cultures familiar using scientific concepts

Keith S. Taber

Clifford Geertz may have been a social scientist, but he clearly thought that some abstract ideas about culture, society and politics were best explained using concepts and terminology from the natural sciences.

word cloud featuring a range of terms from Geertz's writings — Clifford Geertz was a social scientist who referenced a goof deal of scientific vocabulary

I first came across the anthropologist Clifford Geertz when teaching research methods to graduate students. Geertz had popularised the notion of the importance of thick description, or rich description, in writing case studies. I acquired his book 'The Interpretation of Cultures' (a collection of his papers and essays) to read more about this. I found Geertz was an engaging and often entertaining author.

"Getting caught, or almost caught, in a vice raid is perhaps not a very generalisable recipe for achieving that mysterious necessity of anthropological field work, rapport, but for me it worked very well."

(From 'Deep play: notes on the Balinese cockfight')

Anthropology: A rather different kind of science – largely based on case studies.

Generalisation in natural science

Case studies are very important in social sciences, in a way that does not really get reflected in natural science.

It has long been recognised that in subjects such as chemistry and physics we can often generalise from a very modest number of specimens. So, any sample of pure water at atmospheric pressure will boil around 100˚C.¹ All crystals of NaCl have the same cubic structure. All steel wires will stretch when loaded. And so on. Clearly scientists have not examined, say, all the NaCl crystals that have ever formed in the universe, and indeed have only actually ever examined a tiny fraction, in one local area (universally speaking), over a short time period (cosmologically, or even geologically, speaking) so such claims are actually generalisations supported by theoretical assumptions. Our theories give us good reasons to think we understand how and why salt crystals form, and so how the same salt (e.g., NaCl) will always form the same type of crystal.²

Even in biology, where the key foci of interest, organisms, are immensely more complex than salt crystals or steel wires, generalisation is, despite Darwin ³, widespread:

"We might imagine a natural scientist, a logician, and a sceptical philosopher, visiting the local pond. The scientist might proclaim,

"see that frog there, if we were to dissect the poor creature, we would find it has a heart".

The logician might suggest that the scientist cannot be certain of this as she is basing her claim on an inductive process that is logically insecure. Certainly, every frog that has ever been examined sufficiently to determine its internal structure has been found to have a heart, but given that many frogs, indeed the vast majority, have never been specifically examined in this regard, it is not possible to know for certain that such a generalisation is valid. (The sceptic, is unable to arbitrate as he simply refuses to acknowledge that he knows there is a frog present, or indeed that he can be sure he is out walking with colleagues who are discussing one, rather than perhaps simply dreaming about the whole episode.)

…I imagine most readers are still siding with the scientist's claim. So, can we be confident this particular frog has a heart, without ourselves being heartless enough to cut it open to see?

(Taber, 2019)

Strictly, in an absolute sense, we cannot know for certain the entity identified as a frog has a heart. After all,

perhaps it is a visiting alien from another solar system that looks superficially like our frogs but has very different anatomy;
perhaps it is a mechanical automaton disguised as a frog, that is covertly collecting intelligence data for a foreign power;
perhaps it is a perfectly convincing holographic image of a 'late' frog that, since being imaged, was eaten by a predator;
perhaps other logically possible but barely feasible options come to mind?

But if it really is a living (Terran) frog, then we know enough about vertebrate evolution, anatomy and physiology, to be as near to certain it has a functioning heart as we could be certain of just about anything. ⁴

Generalisation in social science

Often, however, this type of generalisation simply does not work in social science contexts. If we find that a particular specimen of gorilla has seven cervical vertebrae then we can probably assume: so do other gorillas. But if we find that one school has 26 teachers, we clearly cannot assume this will apply to the next school we look at. Similarly, the examination results and truancy levels will vary greatly between schools. If we find one 14 year old learner thinks that plants only respire during the night time, then it is useful to keep this in mind when working with other students, but we cannot simply assume they will also think this.

The distinction here is not absolute, as clearly there are many things that vary between specimens of the same species, which is why many biological studies use large samples and statistics. In general [sic], generalisation gets more problematic as we shift from physical sciences through life sciences to social sciences. And this is partially why case study is so common within the social sciences.

The point is that the assumption that we can usually safely generalise from one NaCl crystal to another, but not from one biology teacher to another, is based on theoretical considerations that tell us why the shape (but not the mass or temperature) of a crystal transfers from one specimen of a substance to another, but why the teaching style or subject knowledge of one teacher depends on so many factors that it cannot be assumed to transfer to other teachers.

Drawing upon both a quotidian comparison and a scientific simile, Geertz warned against seeing "a remote locality as the world in a teacup or as the sociological equivalent of a cloud chamber".

A case study examines in depth one instance from among many instances of that kind: one teacher's teaching of entropy; one school's schemes of work for lower secondary science; one learner's understanding of photosynthesis; the examples, similes and analogies used in one textbook; …

Read about the use of case study in research

Case studies

Case studies look at a single instance (e.g., one school, one classroom, one lesson, one teaching episode) in great detail. Case studies are used when studying complex phenomena that are embedded in their context and so have to be studied in situ. You can study a crystal in the lab. You can also study extract cells from an organism and look at them in a Petri dish – but those isolated cells in vitro will only tell you so much about how they normally function in vivo within the original tissue.

Similarly, if you move a teacher and her class out of their normal classroom embedded in a particular school in order to to study a lesson in a special teaching laboratory in a research institution set up with many cameras and microphones, you cannot assume you will see the lesson that would have taken place in the normal context. Case studies therefore need to be 'naturalisic' (carried out in their usual context) rather than involving deliberate researcher manipulation. Geertz rejected the description of the field research site as a natural laboratory, reasonably asking "what kind of laboratory is it where none of the parameters are manipulatable?"

When I worked in further education I recall an inspection where one colleague told us that the external inspector had found her way to her classroom late, after the lesson had already started, and so asked the teacher to start the lesson again. This would have enabled the inspector to see the teacher and class act out the start of the lesson, but clearly she could not observe an authentic teaching episode in those circumstances.

Case study is clearly a sensible strategy when he have a particular interest in the specific case (why do this teacher's students gets such amazing examination results?; why does this school have virtually zero truancy rates?), but is of itself a very limited way of learning about the general situation. We learn about the general by a dual track (and often iterative) process where we use both surveys to find out about typicality, and case studies to understand processes and to identify the questions it is useful to include in surveys.

If case studies are to be useful, they need to offer a detailed account (that 'thick description') of the case, including its context: so to understand something about an observed lesson it may be useful to know about the teacher's experience and qualifications; about the school demographic statistics and ethos; about the curriculum being followed, and other policies in place; and so forth.

As one example, to understand why a science teacher does not challenge a student's clear misconceptions about natural selection (is the teacher not paying attention, or not motivated, or herself ignorant of the science?), it may sometimes be important to know something about the local community and and administrative practices. In the UK, a state school teacher (who is legally protected from arbitrary, capricious or disproportionate disciplinary action) is not going to get in trouble for explaining science that is prescribed in the curriculum, even if some parents do not like what is taught; but that may not be true in a very different context where the local population largely holds fundamentalist, anti-science, views, and can put direct pressure on school leaders to fire staff.

Beware of unjustified generalisation

This use of 'thick description' provides the context for a reader to better understand the case. However, no matter how detailed a case study is, and regardless of the insight it offers into that case, a single case by itself never provides the grounds for generalisation beyond the case. It can certainly offer useful hypotheses to be tested in other cases – but not safe conclusions!

Geertz was an anthropologist who knew that much field work involves specific researchers (with their idiosyncratic interpretive resources – background knowledge, past experiences, perspectives, beliefs, etc. -and individual personalities and inter-personal skills) spending extended periods of time in very specific contexts – this village, that town, this monastery, that ministry… The investigators were not just meant to observe and record, but also to look to make sense of (and so interpret) the cultures they were immersed in – but this invites over-generalisation. Geertz warns his readers of this at one point,

"I want to do two things which are quintessentially anthropological: to discuss a curious case from a distant land; and to draw from that case some conclusions of fact and method more far-reaching than any such isolated example can possibly sustain."

(From: 'Politics past, politics present: some notes on the uses of anthropology in understanding the new states')

Using science to make the unfamiliar familiar

One of the features of Geertz's writings that I found interesting was his use of scientific notions. Often on this site I have referred to the role of the teacher in 'making the unfamiliar familiar' and suggested that science communicators (such as teachers, but also journalists, authors of popular science books and so forth) seek to make abstract scientific ideas familiar for their audience by comparing them with something assumed to already be very familiar. As when Geertz suggests that the 'human brain resembles the cabbage'. I have also argued that whilst this may be a very powerful initial teaching move, it needs to be just a first step, or learners are sometimes left with new misconceptions of the science.

Read about 'making the unfamiliar familiar' in teaching

For a science teacher, the scientific idea is the target knowledge to be introduced, and a comparison with something familiar is sought which offers a useful analogue. I list myriad examples on this site – some being science teachers' stock comparisons, some being more original and creative, and indeed some which are perhaps quite obscure. Here are just a few examples:

and so forth…

But this can be flipped when the audience has a strong science knowledge, and so a scientific phenomenon or notion can be used to introduce something less familiar. (As one example, the limited capacity of working memory and the idea of 'chunking' may be introduced by comparison with different triglycerides: see How fat is your memory? A chemical analogy for working memory. But this is only useful if the audience already knows about the basic structure of triglycerides.)

Geertz may have been a social scientist, but he clearly assumed some abstract ideas about culture, society and politics were best explained using concepts and terminology from the natural sciences. So, for example, he made the argument for case study approaches in research,

"The notion that unless a cultural phenomenon is empirically universal it cannot reflect anything about the nature of man is about as logical as the notion that because sickle-cell anaemia is, fortunately, not universal, it cannot tell us anything about human genetic processes. It is not whether phenomena are empirically common that it is critical in science – else why should Becquerel have been so interested in the peculiar behaviour of uranium? – but whether they can be made to reveal the enduring natural processes that underlie them. Seeing heaven in a grain of sand ⁵ is not a trick that only poets can accomplish."

(From: 'The impact of the concept of culture on the concept of man')

Geertz was also aware of another failing that I have seen many novice (and some experienced) researchers fall into. In education, as in anthropology, we often rely on research participants as informants, but we have to be careful not to confuse what they tell us with direct observations:

'the teacher is careful to involve all learners in the class in answering questions and classroom discussion' (when it should be: the teacher reports that he is careful to involve all learners in the class in answering questions and classroom discussion )
'the learner was very good at using mathematics in physics lessons' (when it should be: the learner thought that she very good at using mathematics in physics lessons.
'the school had a highly qualified, and select group of teachers who were all enthusiastic subject experts' (or so the headteacher told me).

If such slips seem rather amateur affairs, it is not uncommon to see participant ratings mis-described: so a statements like '58% of the teachers were highly confident in using the internet in the classroom' may be based on participants responding to a scale item on a questionnaire (asking 'How confident are you…') where 58% of respondents selected the 'highly confident' rating.

So, actually '58% of the teachers rated themselves as highly confident in using the internet in the classroom'. For these two things to be equivalent we have to ourselves be 'highly confident' in a number of regards – some more likely than others. Here are some that come to mind:

the teachers took the questionnaire seriously, and did not just tick boxes arbitrarily to complete the activity quickly (I am sure none of us have ever done that 😎);
the teachers read the question carefully, and ticked the box associated with their genuine rating (i.e., did not tick the wrong box by mistake, perhaps misaligning response boxes for a different item);
the teachers understood and shared the researcher's intended meaning of the item (e.g., researcher and responder mean the same thing by 'confidence in using the internet in the classroom');
the teachers had a stable level of confidence such that a rating reflected more than their feeling at that moment in time (perhaps after an especially successful, or fraught, lesson);
a teacher's assessment of confidence clearly fitted with one of the available response options (here, highly confident – perhaps the only options presented to be selected from were 'highly competent'; 'neither professionally competent nor incompetent'; 'completely hopeless');
the teachers were open and honest about their responses (so, not influenced by how the researcher might perceive them, or who else might gain access to the data and for what purpose);
the teacher was a good judge of their own level of confidence (and does not come from a cultural context where it would be shameful to boast, or where exaggeration is expected).

As scientists we tend to come to rely on instrumentation even though it is fallible. We may report distances and temperatures and so forth without feeling we need to add a caveat such as "according to the thermometer" each time. ⁶ But instrumentation in social science tends to be subject to more complications. Geertz realised that in his field there was commonly the equivalent of writing that '58% of the teachers were highly confident in using the internet in the classroom' when the data only told us '58% of the teachers rated themselves as highly confident in using the internet in the classroom':

"In finished anthropological writings, including those collected here, this fact – that what we call our data are really our own constructions of other people's construction of what they and their compatriots are up to – is obscured because most of what we need to comprehend a particular vent, ritual, custom, idea, or whatever is insinuated as background information before the thing itself is is directly examined."

(From 'Thick description: toward an interpretative theory of culture.')

Some scientific comparisons

In discussing a teknonymous system of reference – where someone who had been named Joe at birth but who is now the father of Bert, is commonly known as 'Father of Bert' rather than Joe; at least until Bert and Bertha bring forth Susie (and take on new names themselves accordingly), at which point Joe/Father of Bert is then henceforth referred to as 'Grandfather of Susie' – Geertz suggests what "looks like a celebration of a temporal process is in fact a celebration of the maintenance of what, borrowing a term from physics, Grgory Batesaon has aptly called 'steady state'." This is best seen as a simile, as the figurative use of the term 'steady state' is clearly marked (both by the 'scare quotes' and the acknowledgement of the borrowing of the term).

Many of Geertz's figures of speech are metaphors where the comparison being used is not explicitly marked (so the state capital 'was' the nucleus). Another 'doubly marked' simile (by scare quotes and the phrase 'so to speak') concerned the idea of a centre of gravity:

"The two betting systems, though formally incongruent, are not really contradictory to one another, but are part of a single larger system, in which the centre bet is, so to speak, the 'centre of gravity', drawing , the larger it is the more so, the outside bets toward the short-odds end of the scale."

(From 'Deep play: notes on the Balinese cockfight')

Among the examples of Geertz using scientific concepts as the 'familiar' to introduce ideas to his readers that I spotted were:

Not all of these examples seemed to be entirely coherent, or strictly aligned with the technical concept. Geertz was using the ideas as figures of speech, relying on the way a general readership might understand them. Although, in some of these cases I wonder how familiar his readership might be with the scientific idea. We can only 'make the unfamiliar familiar' by comparing what is currently unfamiliar with what is – already – familiar.

Making the unfamiliar familiar, by using something else unfamiliar?

My general argument on this site has been that if the comparison being referred to is not already familiar to an audience, then it cannot help explain the target concept – unless the unfamiliar comparison is first itself explained; which would seem to make it self-defeating as a teaching move.

However, while I think this is generally true, I can see possible exceptions.

One scenario might be where the target idea is seen as so abstract, that the teacher or author feels it is worth first introducing, and explaining, a more concrete or visualisable comparison as a potential stepping stone to the target concept.
Another scenario might be where the teacher or author has a comparison which is considered especially memorable (perhaps controversial or risque, or a vivid or bizarre image), and so again thinks the indirect route to the target concept may be effective (or, at least, entertaining).
There might also be an argument, at least with some audiences, that because using a comparison makes the (engaged) reader process the comparison it aids understanding and later recall even when it needs to be explained before it will work as a comparison.

So, for example, typical readers of anthropology reports may know little about the neural organisation of cephalopods, but when being told that "the octopus, whose tentacles are in large part separately integrated, neurally quite poorly connected with one another and with what in the octopus passes for a brain … nonetheless manages…to get around…", perhaps this elicits reflection on how being an octopus must be such a different experience to being human, such that the reader pauses for thought, and then (while imagining the octopus moving around without a"smoothly coordinated synergy of parts" but rather "by disjointed movements of this part, then that") pays particular attention to how this offers a "appropriate image [for] cultural organisation".

These are tentacled, sorry, tentative suggestions, and I would imagine they all sometimes apply – but empirical evidence is needed to test out their range of effectiveness. Perhaps this kind of work has been done (I do not recall seeing any studies) but, if not, it should perhaps be part of a research programme exploring the effectiveness of such devices (similes, metaphors, analogies, etc.) in relation to their dimensions and characteristics, modes of presentation, and particular kinds of audiences (Taber, 2025).

Some of Geertz's references could certainly be seen as fitting the wider zeitgeist – references to DNA with its double helix may be seen to tap into a common cultural motif:

"So far as culture patterns, that is, systems or complexes of symbols, are concerned…that they are extrinsic sources of information. By 'extrinsic', I mean only that – unlike genes for example – they lie outside the boundaries of the individual organism …By 'source of information', I mean only that – like genes – they provide a blueprint of template in terms of which processes external to themselves can be given a definite form. As the order of bases in a strand of DNA forms a coded program, a set of instructions, or a recipe, for the synthesis of the structurally complex proteins which shape organic functioning, so culture patterns provide such programmes for the institution of the social and psychological processes which shape human behavior …this comparison of gene and symbol is more than a strained analogy …

"Symbol systems…are to the process of social life as a computer's program is to its operations, the genic helix to the development of the organism…

There is a sense in which a computer's program is an outcome of prior developments in the technology of computing, a particular helix of phylogenetic history…But …one can, in principle anyhow, write out the program, isolate the helix…"

(From 'Religion as a cultural system' and 'After the revolution: The fate of nationalism in the new states')

Geertz also goes beyond simply offering metaphors, as in this extract from an essay review of the classic structuralist anthropology text with a title normally rendered into English as 'The Savage Mind':

"That Lévi-Strauss should have been able to transmute the romantic passions of Tristes Tropiques into the hypermodern intellectualism of La Pense Sauvage is surely a startling achievement. But there remain the questions one cannot help but ask. Is this transmutation science or alchemy? Is the 'very simple transformation' which produced a general theory out of a persona disappointment real or a sleight of hand? Is it a genuine demolition of the walls which seem to separate mind from mind by showing that the walls are surface structures only, or is it an elaborately disguised evasion necessitated by a failure to breach them when they were directly encountered?"

(From 'The cerebral savage: on the work of Claude Lévi-Strauss')

It is worth noting here that whenever a work is translated from one language to another, there is an interpretive process, as many words do not have direct equivalents (covering precisely the same scope or range, with exactly the same nuances) in other languages. 'Savage' in English suggests (to me at least) aggression, and an association with violence. The French original 'sauvage' could be translated instead as 'wild' or 'untamed' which do not necessarily have the same negative associations. This is why when educational, and other social, research is reported in a language other than that in which data was collected, it is important for investigators to report this, and explain how the authenticity of translation was tested (Taber, 2018).

This device of an extended metaphor, where a comparison is not just mentioned at one point but threaded through a passage, can approach analogy – but without the explicit mapping of analogue-to-target expected in a formal teaching analogy. Here the idea of a property of meaning is compared with physical or chemical properties, but without the techniques the scientist has to identify and quantify such properties:

"And so we hear cultural integration spoken of as a harmony of meaning, cultural change as an instability of meaning, and cultural conflict as an incongruity of meaning, with the implication that the harmony, the instability, or the incongruity are properties of meaning itself, as, say, sweetness is a property of sugar or brittleness of glass.

Yet, when we try to treat these properties as we would sweetness or brittleness, they fail to behave, 'logically', in the expected way. When we look for the constituents of the harmony, the instability, or the incongruity, we are unable to find them resident in that of which they are presumably properties. One cannot run symbolic forms through some sort of cultural assay to discover their harmony content, their stability ratio, or their index of incongruity; one can only look and see if the forms in question are in fact coexisting, changing, or interfering with one another in some way or other, which is like tasting sugar to see if it is sweet or dropping a glass to see if it is brittle, not like investigating the chemical composition of sugar or the physical structure of glass."

(From 'Person, time, and conduct in Bali')

Geertz was clearly not adverse to using extended metaphors in his work:

"But, details aside, the point is that there swirl around the emerging governmental institutions of the new states, and the specialised politics they tend to support, a whole host of self-reinforcing whirlpools of primordial discontent, and that the parapolitical maelstrom is a great part an outcome – to continue the metaphor, a backwash – of that process of political development itself."

(From 'The integrative revolution: The primordial sentiments and civil politics in the new states')

Offering manifold comparisons

Sometimes Geertz offers several alternative comparisons for his readers: so, above, the genetic helix is offered in parallel with a computer program, a blueprint for building a bridge, the score of a musical performance, and a recipe for cake. Another example might be:

"The second law of thermodynamics, or

the principle of natural selection, or

the production of unconscious motivation, or

the organisation of the means of production

does not explain everything, not even everything human, but it still explains something; and our attention shifts to isolating just what that something is, to disentangle ourselves from a lot of pseudoscience to which, in the first flush of its celebrity, it has also given rise."

(From 'Thick description: toward an interpretative theory of culture')

There are several ways to explain use of this technique. One is that Geertz is sometimes not confident in his comparisons, so offers alternatives – if one does not 'hit home' with a reader, another might. Perhaps this is about diversity and personalisation – after all, each reader brings their own unique set of interpretive resources (based on their idiosyncratic array of knowledge and experience), so if you do not know about the physics or biology, perhaps you do know about the example from psychology, or economics.

Alternatively, I sometimes got a sense that Geertz was simply enjoying the writing process, and not wanting to censor the creative spark as ideas presented themselves to him. Of course, that is a personal interpretation based on my unique set of interpretive resources: I have also sometimes got the feeling that I am getting carried away with my writing – carried along by the 'flow' experience described by Mihaly Csikszentmihalyi – enjoying my own prose (which at least means that a minimum of one person does), and so possibly at risk of writing too much and consequently boring the reader. Reading Geertz gave me the feeling that he enjoyed the writing process, and that he crafted his writing with a concern for style as well as to communicate information.

From a pedagogic point of view, comparisons (similes, metaphors, analogies) are like models, always imperfect reflections of the target. Greetz suggested that not only was metaphor strictly wrong, but that it could be most effective when most wrong! In teaching it is important to highlight the positive and negative analogy (how this model is like a cell or a star or a molecule, and also how it is not), and that level of explication would be suitable for a textbook; but otherwise could (as well as disrupting style) come across as too didactic. By offering multiple comparisons, each of which are wrong in different ways, perhaps the common target feature can be highlighted?

"To put the matter this way is to engage in a bit of metaphorical refocusing of one's own, for it shifts the analysis of cultural forms form an endeavour in general parallel to

dissecting an organism,

diagnosing a symptom,

deciphering a code, or

ordering a system

– the dominant analogies in contemporary anthropology – to one in general parallel with penetrating a literary text."

(From 'Deep play: notes on the Balinese cockfight')

"The meanings that symbols, the material vehicles of thought, embody are often elusive, vague, fluctuating, and convoluted, but they are, in principle, as capable of being discovered through systematic investigation – especially if the people who perceive them will cooperate a little –

as the atomic weight of hydrogen or

the function of the adrenal glands."

(From 'Person, time, and conduct in Bali')

Even if this is not the case; we can expect that if the reader does mental work comparing across the multiple comparisons then this will have brought focal attention to the point being made as the reader passes through the passage of text. (Again, there is a useful theme here for any research programme on the use of figures of speech in science communication: how do readers or listeners process multiple comparisons of this kind, and does this figurative device lead to greater understanding?)

Cultural crystallisation

One recurring image in Geertz's writing is that of crystallisation.

"It is the crystallisation of a direct conflict between primordial and civil sentiments – this 'longing not to belong to any other group' – that gives to the problem variously called tribalism, parochialism, communalism, and so on, a more ominous and deeply threatening quality that most of the other, also very serious and intractable, problems the new states face…

"The actual foci around which such discontent tends to crystallise are various"

"In the one case where [a particular pattern of social organisation] might have crystallised, with the Ashanti in Ghana, the power of the central group seems to have, at least temporarily, been broken."

"The pattern that seems to be developing and perhaps crystallising, is one in which a comprehensive national party…comes almost to comprise the state…"

"…raises the spectre of separatism by superimposing a comprehensive political significance upon those antagonisms, and, particularly when the crystallising ethnic blocs outrun state boundaries…"

(From 'The integrative revolution: The primordial sentiments and civil politics in the new states')

Crystallisation occurs when existing parts (ions, molecules) that are in a fluid (in solution, in the molten state) come together into a unified whole as a result of interactions between the system and its surroundings (evaporation of solvent, thermal radiation). Some of these quoted examples might stand up to being developed as analogies (where features of the social phenomenon can be mapped onto features of the physical change), but when used metaphorically the requirement is simply that there is some sense of parallel.

An interesting question might be whether such metaphors are understood differently by subject experts (here, chemists or mineralogists for example) who may be (consciously or otherwise) looking to map a scientific model onto the author's accounts, rather than a general reader who may have a much less technical notion of 'crystallisation' but might find the reference triggers a strong image?

We might also ask if 'crystallisation' has become something of a dead metaphor: that is, has it been used as a metaphor (a comparison with the change of state) so widely that it has taken on a new general meaning (of little more than things coming together)?

Balanced and unbalanced (social) forces

Another motif that I noticed in Geertz's writing was talking about social/cultural 'forces' as if they were indeed analogous to physical forces. In the following example, the force metaphor is extended:

"In sum, nineteenth century Balinese politics can be seen as stretched taut between two opposing forces; the centripetal one of state ritual and the centrifugal one of state structure."

(From 'Politics past, politics present: some notes on the uses of anthropology in understanding the new states')

The following example also features (as opposed to crystallisation) dissolving,

"In Malaya one of the more effective binding forces that has, so far at least, held Chinese and Malays together in a single state despite the tremendous centrifugal tendencies the racial and cultural differences generates is the fear on the part of either group that should the Federation dissolve they may become a clearly submerged minority in some other political framework: the Malays through the turn of the Chinese to Singapore or China; the Chinese through the turn of the Malays to Indonesia."

(From 'The integrative revolution: The primordial sentiments and civil politics in the new states')

There seems to be another extended metaphor here as well, in that following dissolution, there is a danger of becoming submerged in the fluid. This quote also features the notion of 'centrifugal' force, which reappears elsewhere in Geertz's work (see below).

Canonical and alternative conceptions

I associate references to centrifugal force with the common alternative conception that orbiting bodies are subject to balancing forces – a centripetal force that pulls an orbiting body towards the centre, which is balanced or cancelled by a centrifugal force which pulls the object away from the centre. This (incorrect) notion is very common (read about 'Centrifugal force').

From a Newtonian perspective, orbital motion is accelerated motion, which requires a net force – if there was not a centripetal force, then the orbiting body would leave orbit (as, incredibly, happened to the moon in the sci-fi series 'Space: 1999' ⁷) and move off along a straight line. So, circular motion requires an unbalanced force (at least if we ignore the way mass effects the geometry of space ⁸).

Image of actors Martin Landau and Barbara Bain - Space: 1999 (1975) with speech bubble:

Martin Landau and Barbara Bain – in 'Space: 1999' (created by Gerry and Sylvia Anderson, produced by ITC Entertainment)

I wondered whether the quote above would be interpreted differently according to a person's level of scientific literacy. Two possible readings are:

"…one of the more effective binding forces…has…held Chinese and Malays together in a single state despite the tremendous centrifugal tendencies…" because the binding forces were stronger than the centrifugal force.
"…one of the more effective binding forces…has…held Chinese and Malays together in a single state despite the tremendous centrifugal tendencies…" because the binding forces balance the centrifugal force.

The implication in the quote is that there is a steady state (if you will pardon the pun), so there must be an equilibrium of forces (that is, option 2). But, this is an area where learners will commonly have alternative conceptions: for example suggesting that gravity must be larger than the reaction force with the floor, or else they would float away; or that in a solid structure the attractive forces between molecules must be greater than the repulsive forces to hold the structure together.

"When [English A level students in a Further Education College] were shown diagrams of stable systems (objects stationary on the ground, or on a table) they did not always recognise that there was an equilibrium of forces acting. Rather, several of the students took the view that the downward force due to gravity was the larger, or only force acting. Two alternative notions were uncovered. One view was that no upward force was needed, as the object was supported instead, or simply that the object could not fall any lower as the ground was in the way. The other view was the downward force had to be greater to hold the object down: if the forces had been balanced there would have been nothing stopping the object from floating away."

(Taber, 1998)

Geertz was writing about society, and using the notion of forces metaphorically, but we know that when a learner is led to activate something in memory this reinforces that prior learning. For someone holding this common misconception for static equilibrium (as being due to a larger maintaining force overcoming some smaller force) then reading Geertz's account is likely to lead to:

triggering prior learning about forces as relevant 'interpretive resources' for making sense of the metaphor;
interpreting the social example in terms of the misconception: binding forces are larger so they hold the state together;
thus rehearsing and reinforcing the prior (mis)understanding of forces!

That is, even though the topic is cultural not physical, and even though Geertz may well have held a perfectly canonical understanding of the physics, his metaphorical language has the potential to reinforce a scientific misconception!

This is not a particular criticism of Geertz: whenever a learner comes across an example that fits their prior conceptions, they are likely to activate that prior knowledge, and so reinforce the prior learning. This is helpful if they have learnt the principles as intended, but can reinforce misconceptions as well as canonical ideas. References to a scientific phenomenon or principle that assume, and so do not make explicit, the scientific ideas, always risk reinforcing existing misconceptions. (The teacher therefore tends to reiterate the core scientific message each time a previously taught principle is referenced in class – what might be called a 'drip-feed' tactic!)

Geertz seemed to be quite keen on the 'centrifugal' reference:

"It is the Alliance…where the strong centrifugal tendencies, as intense as perhaps any state…"
"the integrative power of a generally mid-eastern urban civilisation against the centrifugal tendencies of tribal particularism".

In the following extract, Geertz has two opposed centrifugal influences:

"Yet out of all this low cunning has come not only the most democratic state in the Arab world [Lebanon], but the most prosperous; and one that has in addition been able to – with one spectacular exception – to maintain its equilibrium under intense centrifugal pressures from two of the most radially opposed extrastate primordial yearnings extant: that of the Christians, especially the Maronites, to be part of Europe, and that of the Moslems especially the Sunnis, to be part of pan-Arabia."

(From 'The integrative revolution: The primordial sentiments and civil politics in the new states')

A cursory reading might be that as these two opposed forces balance there is an equilibrium between them – but the scientist would realise this must be read as there being a strong enough cohesive force to hold the centre together against the combined effect of these forces – think perhaps of the famous Magdeburg hemispheres where two teams of forces were unable to pull apart two hemispheres with a vacuum between them (so that the pressure of air pushing on the outside spheres applied sufficient force to balance the maximum pull the horses could manage).

Engraving showing Otto von Guericke's 'Magdeburg hemispheres' experiment (Source: https://commons.wikimedia.org/wiki/File:Magdeburg.jpg)

Again, the metaphor might well lead a reader to apply, and so reinforce, their notions of forces acting – whether these notions match the canonical science account or not.

Some other scientific references.

Among the other scientific concepts I noticed referenced were

"A cockfight is …not vertebrate enough to be called a group…"
"…the intense stillness that falls with instant suddenness, rather as someone had turned off the current…"
"…that is like saying that as a perfectly aseptic environment is impossible, one might as well conduct surgery in a sewer."
"there has almost universally arisen around the developing struggle for governmental power as such a broad penumbra of primordial strife."

Sharing scientific and cultural resources

The very way that language evolves means that words change, or acquire new, meanings, and also shift between domains. If scientific terms are used enough figuratively, metaphorically, as part of non-scientific contexts then in time they will acquire new accepted non-technical meanings. We see this shift from metaphorical to widely accepted meanings in the establishment of idioms which must sometimes be quite mystifying to those not familiar with them, like non-native language speakers (Taber, 2025): understanding an idiom is not rocket science to the initiated, but the language learner might feel they've missed the boat or are having their leg pulled – and, if already struggling with the language, may consider them the last straw.

Read about idioms in communicating science

Indeed, there is a scholarly equivalent. So, I suspect many natural scientists may not know what a Procrustean bed is, or the significance of finding yourself between Scylla and Charybdis ("I'll have a chocolate and strawberry Scylla in a cone, and a bottle of 1990 Charybdis please"?), but such references are common in academic writing in some fields.

But scientists are in no position to complain when technical terms drift into figurative use in everyday language. After all, scientists are not above borrowing everyday terms metaphorically, and then through repeated use treating them as if technical terms. Certainly (as I describe in detail elsewhere, 'The passing of stars: Birth, death, and afterlife in the universe'), references to the 'births' and 'deaths' of stars are now used as formal technical terms in astronomy; but this is nothing new, for 'charge', as in electrical charge, was borrowed from the charge used in early firearms; and quarks originated in James Joyce – and calling them 'up', 'down', 'truth'/'top', 'beauty'/'bottom' and their qualities as 'strangeness' and 'charm' gave new meanings to terms taken from common usage. And having been sequestered by physics, they have then been borrowed back into popular culture again by the likes of Hawkwind and Florence and the Machine. ⁹

So, I have no criticisms of Geertz in using scientific terms figuratively in his writings about culture- even if sometimes those uses seem a little forced; and even if inevitably (simply because this is how human memory works) when such terms are used without definition or explication they may actually activate and reinforce alternative conceptions in those who already hold misconceptions of the science. A communicator has to draw upon the resources they have available, and which they hope will resonate (sic) with their audience in order to bring about the challenging task of sharing ideas between minds.

I read Geertz to find out a little more about his area of (social) science, but ended up reflecting especially upon how he used the language of natural science and how this might be understood by non-scientists. It has been suggested there is no privileged meaning to a text, as each reader brings their own personal reading. I do not entirely agree, at least with regard to non-fiction. There is certainly no meaning in the text itself (it is just the representation of the author's ideas and needs to be interpreted) but there is an intended meaning that the author hopes to communicate, and which the author seeks to privilege by using all the rhetorical tools available in the hope that readers will understand the texts much as intended. As every teacher likely knows: that is not an automatic or easy task.

Sources:

Change, H. (2004) Inventing Temperature: Measurement and Scientific Progress. Oxford University Press.
Clifford Geertz (2000) After the revolution: The fate of nationalism in the new states (first published 1971), in The Interpretation of Cultures. Selected Essays (2nd Edition). New York. Basic Books, pp.234-254
Clifford Geertz (2000) Deep play: notes on the Balinese cockfight (first published 1972), in The Interpretation of Cultures. Selected Essays (2nd Edition). New York. Basic Books.
Clifford Geertz (2000) Person, time, and conduct in Bali (first published 1966), in The Interpretation of Cultures. Selected Essays (2nd Edition). New York. Basic Books, pp.360-411.
Clifford Geertz (2000) Politics past, politics present: some notes on the uses of anthropology in understanding the new states (first published 1967), in The Interpretation of Cultures. Selected Essays (2nd Edition). New York. Basic Books.
Clifford Geertz (2000) Religion as a cultural system (first published 1966), in The Interpretation of Cultures. Selected Essays. 2nd Edition. New York. Basic Books, pp.87-125.
Clifford Geertz (2000) The cerebral savage: on the work of Claude Lévi-Strauss (first published 1967), in The Interpretation of Cultures. Selected Essays (2nd Edition). New York. Basic Books, pp.345-359.
Clifford Geertz (2000) The impact of the concept of culture on the concept of man (first published 1966), in The Interpretation of Cultures. Selected Essays. 2nd Edition. New York. Basic Books, pp.33-54.
Clifford Geertz (2000) The integrative revolution: The primordial sentiments and civil politics in the new states (first published 1963), in The Interpretation of Cultures. Selected Essays (2nd Edition). New York. Basic Books, pp.255-310
Clifford Geertz (2000) Thick description: toward an interpretative theory of culture, in The Interpretation of Cultures. Selected Essays. 2nd Edition (1973), pp.3-30.
Taber, K. S. (1998) Understanding Chemical Bonding. The development of A level students' understanding of the concept of chemical bonding. PhD Thesis.
[Download the thesis (Warning – large file size)]
Taber, K. S. (2017). Representing evolution in science education: The challenge of teaching about natural selection. In B. Akpan (Ed.), Science Education: A Global Perspective (pp. 71-96). Switzerland: Springer International Publishing.
[Download chapter]
Taber, K. S. (2018). Lost and found in translation: guidelines for reporting research data in an 'other' language. Chemistry Education Research and Practice, 19, 646-652 doi:10.1039/C8RP90006J [Free access]
Taber, K. S. (2019). The Nature of the Chemical Concept: Constructing chemical knowledge in teaching and learning. Cambridge: Royal Society of Chemistry.
Taber, K. S. (2025) What's the point of explaining science in the public domain? Seminar, Centre for Research in Education in Science, Technology, Engineering & Mathematics, King's College London. (Download this paper)

Notes:

¹ We often say exactly 100˚C, but in practice factors such as the container used do make measurable differences – (Chang, 2004) – that we generally ignore.

² Life is not always so simple. Sulphur, for example, forms different crystal structures at different temperatures; and many metals also undergo 'phase transitions' between structures at different temperatures. But we think our theories can also explain this, so we can generalise about, say, the shape of all sulphur crystals formed below 96˚C.

³ That is, before Darwin it was widely believed that species represented clear cut types of beings where in principle clear demarcation lines could be established between different natural kinds. We now understand that even if at any one time this is approximately true (see the figure), taking a broader perspective informed by Darwin's work we find different types of organisms blend into each other and there is no absolute boundary around one species distinguishing it from others. See, for example, 'Can ancestors be illegitimate?'

The scientific perspective on the evolution of living things
considers 'deep time' whereas the everyday experience of learners is
limited to a 'snapshot' of the species alive at one geological moment (from Taber, 2017).

⁴ This is a tricky area for the science educator. Scientists should always be open to alternative explanations, and even the overthrow of long accepted ideas. But sometimes the evidence is so overwhelming that for all practical purposes we assume we have certain knowledge. There are alternative explanations for the vast evidence for evolution (e.g., an omnipotent creator who wants to mislead us) but these seem so unfeasible and convoluted that we would be foolish to take them too seriously.

Read about the treatment of scientific certainty in the media

When it comes to climate change, we can never be absolutely sure the effects we are seeing are due to the anthropogenic actions we believe to be damaging, but the case is so strong, and the consequences of not changing our behaviours so serious, that no reasonable person should suggest delaying remedial action. This would be like someone playing 'Russian roulette' with a revolver with only one empty chamber. They cannot be sure they would shoot themselves, so why not go ahead and pull the trigger?

Similar arguments relate to the Apollo moon landings. One can imagine a highly convoluted ongoing global conspiracy to fake the landings with all the diverse evidence – but this requires accepting a large number of incredibly infeasible propositions. (Read: 'The moon is a long way off and it is impossible to get there'.)

⁵ The radical poet (and engraver and visionary) William Blake:

"To see a world in a grain of sand

And a heaven in a wild flower,

Hold infinity in the palm of your hand

And eternity in an hour."

⁶ Even in the natural sciences, this depends upon how we think about the instrument used. If the instrument and technique are considered basic and simple and relivable, and 'standard' for the job in hand (part of the 'disciplinary matrix' of an established research field), we may not bother adding 'as measured with the metre rule' or 'according to the calibrated markings on the measuring cylinder' and then describe how we used the rule or cylinder. However, if a technique or instrument is new, or considered problematic, or known to be open to large errors in some contexts, we would be expected to give details.

⁷ Supposedly, according to the premise of 'Space: 1999', by 1999 the people of earth had amassed a vast stockpile of nuclear waste which was stored on one location on the moon. Even more supposedly, this was meant to have exploded with sufficient force to eject the moon from earth orbit and indeed the solar system, but without the moon actually losing its structural integrity. Just as unlikely, the space through which the moon moved was so dense with other planetary systems that the humans stranded on the moon at the time of the accident were able to engage in regular interplanetary adventures. Despite the fact that

"Space is big. You just won't believe how vastly, hugely, mind-bogglingly big it is. I mean, you may think it's a long way down the road to the chemist's, but that's just peanuts to space."

Douglas Adams

and that generally interstellar distances are vast, the projectile moon moved fast enough to quickly reach new alien civilizations but slowly enough to allow some interaction before passing by. (It was just entertainment. Extremely long sequences of episodes where the moon just moved through very tenuous gas and the odd dust cloud, and incrementally approaches some far star, may have been much more realistic, but would not have made for exciting television.)

Actors Martin Landau and Barbara Bain (seen in the publicity shot for 'Space: 1999' reproduced above) were a married couple who starred in 'Space: 1999', having previously appeared together in the classic series 'Mission: Impossible' – which also featured one Leonard Nimoy (see below) who also famously later ventured into space as Mr Spock.

The 'Mission: Impossible' team. "No Jim, not impossible captain, just very challenging."

⁸ From the perspective of general relativity, an orbiting body is simply following a geodesic in the curved space around a massive body, so gravitational force might be seen as an epiphenomenon: fictitious – a bit like centrifugal force.

⁹

"Copernicus had those Renaissance ladies
Crazy about his telescope
And Galileo had a name that made his
Reputation higher than his hopes
Did none of these astronomers discover
While they were staring out into the dark
That what a lady looks for in her lover
Is charm, strangeness and quark"

From the lyrics of 'Quark, strangeness and charm' (Dave Brock, Robert Newton Calvert)

"The static of your arms, it is the catalyst
Oh the chemical it burns, there is nothing but this
It's the purest element, but it's so volatile
An equation heaven sent, a drug for angels
Strangeness and Charm"

From the lyrics of 'Strangeness and Charm' (Florence Welch
Paul Epworth)

Delusions of educational impact

A 'peer-reviewed' study claims to improve academic performance by purifying the souls of students suffering from hallucinations

Keith S. Taber

The research design is completely inadequate…the whole paper is confused…the methodology seems incongruous…there is an inconsistency…nowhere is the population of interest actually identified…No explanation of the discrepancy is provided…results of this analysis are not reported…the 'interview' technique used in the study is highly inadequate…There is a conceptual problem here…neither the validity nor reliability can be judged…the statistic could not apply…the result is not reported…approach is completely inappropriate…these tables are not consistent…the evidence is inconclusive…no evidence to demonstrate the assumed mechanism…totally unsupported claims…confusion of recommendations with findings…unwarranted generalisation…the analysis that is provided is useless…the research design is simply inadequate…no control condition…such a conclusion is irresponsible
Some issues missed in peer review for a paper in the European Journal of Education and Pedagogy

An invitation to publish without regard to quality?

I received an email from an open-access journal called the European Journal of Education and Pedagogy, with the subject heading 'Publish Fast and Pay Less' which immediately triggered the thought "another predatory journal?" Predatory journals publish submissions for a fee, but do not offer the editorial and production standards expected of serious research journals. In particular, they publish material which clearly falls short of rigorous research despite usually claiming to engage in peer review.

A peer reviewed journal?

Checking out the website I found the usual assurances that the journal used rigorous peer review as:

"The process of reviewing is considered critical to establishing a reliable body of research and knowledge. The review process aims to make authors meet the standards of their discipline, and of science in general.

We use a double-blind system for peer-reviewing; both reviewers and authors' identities remain anonymous to each other. The paper will be peer-reviewed by two or three experts; one is an editorial staff and the other two are external reviewers."
https://www.ej-edu.org/index.php/ejedu/about

Peer review is critical to the scientific process. Work is only published in (serious) research journals when it has been scrutinised by experts in the relevant field, and any issues raised responded to in terms of revisions sufficient to satisfy the editor.

I could not find who the editor(-in-chief) was, but the 'editorial team' of European Journal of Education and Pedagogy were listed as

Bea Tomsic Amon, University of Ljubljana, Slovenia
Chunfang Zhou, University of Southern Denmark, Denmark
Gabriel Julien, University of Sheffield, UK
Intakhab Khan, King Abdulaziz University, Saudi Arabia
Mustafa Kayıhan Erbaş, Aksaray University, Turkey
Panagiotis J. Stamatis, University of the Aegean, Greece

I decided to look up the editor based in England where I am also based but could not find a web presence for him at the University of Sheffield. Using the ORCID (Open Researcher and Contributor ID) provided on the journal website I found his ORCID biography places him at the University of the West Indies and makes no mention of Sheffield.

If the European Journal of Education and Pedagogy is organised like a serious research journal, then each submission is handled by one of this editorial team. However the reference to "editorial staff" might well imply that, like some other predatory journals I have been approached by (e.g., Are you still with us, Doctor Wu?), the editorial work is actually carried out by office staff, not qualified experts in the field.

That would certainly help explain the publication, in this 'peer-reviewed research journal', of the first paper that piqued my interest enough to motivate me to access and read the text.

The Effects of Using the Tazkiyatun Nafs Module on the Academic Achievement of Students with Hallucinations

The abstract of the paper published in what claims to be a peer-reviewed research journal

The paper initially attracted my attention because it seemed to about treatment of a medical condition, so I wondered was doing in an education journal. Yet, the paper seemed to also be about an intervention to improve academic performance. As I read the paper, I found a number of flaws and issues (some very obvious, some quite serious) that should have been spotted by any qualified reviewer or editor, and which should have indicated that possible publication should have been be deferred until these matters were satisfactorily addressed.

This is especially worrying as this paper makes claims relating to the effective treatment of a symptom of potentially serious, even critical, medical conditions through religious education ("a spiritual approach", p.50): claims that might encourage sufferers to defer seeking medical diagnosis and treatment. Moreover, these are claims that are not supported by any evidence presented in this paper that the editor of the European Journal of Education and Pedagogy decided was suitable for publication.

An overview of what is demonstrated, and what is claimed, in the study.

Limitations of peer review

Peer review is not a perfect process: it relies on busy human beings spending time on additional (unpaid) work, and it is only effective if suitable experts can be found that fit with, and are prepared to review, a submission. It is also generally more challenging in the social sciences than in the natural sciences. ¹

That said, one sometimes finds papers published in predatory journals where one would expect any intelligent person with a basic education to notice problems without needing any specialist knowledge at all. The study I discuss here is a case in point.

Purpose of the study

Under the heading 'research objectives', the reader is told,

"In general, this journal [article?] attempts to review the construction and testing of Tazkiyatun Nafs [a Soul Purification intervention] to overcome the problem of hallucinatory disorders in student learning in secondary schools. The general objective of this study is to identify the symptoms of hallucinations caused by subtle beings such as jinn and devils among students who are the cause of disruption in learning as well as find solutions to these problems.

Meanwhile, the specific objective of this study is to determine the effect of the use of Tazkiyatun Nafs module on the academic achievement of students with hallucinations.

To achieve the aims and objectives of the study, the researcher will get answers to the following research questions [sic]:

Is it possible to determine the effect of the use of the Tazkiyatun Nafs module on the academic achievement of students with hallucinations?"
Awang, 2022, p.42

I think I can save readers a lot of time regarding the research question by suggesting that, in this study, at least, the answer is no – if only because the research design is completely inadequate to answer the research question. (I should point that the author comes to the opposite conclusion: e.g., "the approach taken in this study using the Tazkiyatun Nafs module is very suitable for overcoming the problem of this hallucinatory disorder", p.49.)

Indeed, the whole paper is confused in terms of what it is setting out to do, what it actually reports, and what might be concluded. As one example, the general objective of identifying "the symptoms of hallucinations caused by subtle beings such as jinn and devils" (but surely, the hallucinations are the symptoms here?) seems to have been forgotten, or, at least, does not seem to be addressed in the paper. ²

The study assumes that hallucinations are caused by subtle beings such as jinn and devils possessing the students.
(Image by Tünde from Pixabay)

Methodology

So, this seems to be an intervention study.

Some students suffer from hallucinations.
This is detrimental to their education.
It is hypothesised that the hallucinations are caused by supernatural spirits ("subtle beings that lead to hallucinations"), so, a soul purification module might counter this detriment;
if so, sufferers engaging with the soul purification module should improve their academic performance;
and so the effect of the module is being tested in the study.

Thus we have a kind of experimental study?

No, not according to the author. Indeed, the study only reports data from a small number of unrepresentative individuals with no controls,

"The study design is a case study design that is a qualitative study in nature. This study uses a case study design that is a study that will apply treatment to the study subject to determine the effectiveness of the use of the planned modules and study variables measured many times to obtain accurate and original study results. This study was conducted on hallucination disorders [students suffering from hallucination disorders?] to determine the effectiveness of the Tazkiyatun Nafs module in terms of aspects of student academic achievement."
Awang, 2022, p.42

Case study?

So, the author sees this as a case study. Research methodologies are better understood as clusters of similar approaches rather than unitary categories – but case study is generally seen as naturalistic, rather than involving an intervention by an external researcher. So, case study seems incongruous here. Case study involves the detailed exploration of an instance (of something of interest – a lesson, a school, a course of tudy, a textbook, …) reported with 'thick description'.

Read about the characteristics of case study research

The case is usually a complex phenomena which is embedded within a context from which is cannot readily be untangled (for example, a lesson always takes place within a wider context of a teacher working over time with a class on a course of study, within a curricular, and institutional, and wider cultural, context, all of which influence the nature of the specific lesson). So, due to the complex and embedded nature of cases, they are all unique.

"a case study is a study that is full of thoroughness and complex to know and understand an issue or case studied…this case study is used to gain a deep understanding of an issue or situation in depth and to understand the situation of the people who experience it"
Awang, 2022, p.42

A case is usually selected either because that case is of special importance to the researcher (an intrinsic case study – e.g., I studied this school because it is the one I was working in) or because we hope this (unique) case can tell us something about similar (but certainly not identical) other (also unique) cases. In the latter case [sic], an instrumental case study, we are always limited by the extent we might expect to be able to generalise beyond the case.

This limited generalisation might suggest we should not work with a single case, but rather look for a suitably representative sample of all cases: but we sometimes choose case study because the complexity of the phenomena suggests we need to use extensive, detailed data collection and analyses to understand the complexity and subtlety of any case. That is (i.e., the compromise we choose is), we decide we will look at one case in depth because that will at least give us insight into the case, whereas a survey of many cases will inevitably be too superficial to offer any useful insights.

So how does Awang select the case for this case study?

"This study is a case study of hallucinatory disorders. Therefore, the technique of purposive sampling (purposive sampling [sic]) is chosen so that the selection of the sample can really give a true picture of the information to be explored ….

Among the important steps in a research study is the identification of populations and samples. The large group in which the sample is selected is termed the population. A sample is a small number of the population identified and made the respondents of the study. A case or sample of n = 1 was once used to define a patient with a disease, an object or concept, a jury decision, a community, or a country, a case study involves the collection of data from only one research participant…
Awang, 2022, p.42

Of course, a case study of "a community, or a country" – or of a school, or a lesson, or a professional development programme, or a school leadership team, or a homework policy, or an enrichnment activity, or … – would almost certainly be inadequate if it was limited to "the collection of data from only one research participant"!

I do not think this study actually is "a case study of hallucinatory disorders [sic]". Leading aside the shift from singular ("a case study") to plural ("disorders"), the research does not investigate a/some hallucinatory disorders, but the effect of a soul purification module on academic performance. (Actually, spoiler alert 😉, it does not actually investigate the effect of a soul purification module on academic performance either, but the author seems to think it does.)

If this is a case study, there should be the selection of a case, not a sample. Sometimes we do sample within a case in case study, but only from those identified as part of the case. (For example, if the case was a year group in a school, we may not have resources to interact in depth with several hundred different students). Perhaps this is pedantry as the reader likely knows what Awang meant by 'sample' in the paper – but semantics is important in research writing: a sample is chosen to represent a population, whereas the choice of case study is an acknowledgement that generalisation back to a population is not being claimed).

However, if "among the important steps in a research study is the identification of populations" then it is odd that nowhere in the paper is the population of interest actually specified!

Things slip our minds. Perhaps Awang intended to define the population, forgot, and then missed this when checking the text – buy, hey, that is just the kind of thing the reviewers and editor are meant to notice! Otherwise this looks very like including material from standard research texts to play lip-service to the idea that research-design needs to be principled, but without really appreciating what the phrases used actually mean. This impression is also given by the descriptions of how data (for example, from interviews) were analysed – but which are not reflected at all in the results section of the paper. (I am not accusing Awang of this, but because of the poor standard of peer review not raising the question, the author is left vulnerable to such an evaluation.)

The only one research participant?

So, what do we know about the "case or sample of n = 1 ", the "only one research participant" in this study?

The actual respondents in this case study related to hallucinatory disorders were five high school students. The supportive respondents in the case study related to hallucination disorders were five counseling teachers and five parents or guardians of students who were the actual respondents."
Awang, 2022, p.42

It is certainly not impossible that a case could comprise a group of five people – as long as those five make up a naturally bounded group – that is a group that a reasonable person would recognise as existing as a coherent entiy as they clearly had something in common (they were in the same school class, for example; they were attending the same group therapy session, perhaps; they were a friendship group; they were members of the same extended family diagnosed with hallucinatory disorders…something!) There is no indication here of how these five make up a case.

The identification of the participants as a case might have made sense had the participants collectively undertaken the module as a group, but the reader is told: "This study is in the form of a case study. Each practice and activity in the module are done individually" (p.50). Another justification could have been if the module had been offered in one school, and these five participants were the students enrolled in the programme at that time but as "analysis of the respondents' academic performance was conducted after the academic data of all respondents were obtained from the respective respondent's school" (p.45) it seems they did not attend a single school.

The results tables and reports in the text refer to "respondent 1" to "respondent 4". In case study, an approach which recognises the individuality and inherent value of the particular case, we would usually assign assumed names to research participants, not numbers. But if we are going to use numbers, should there not be a respondent 5?

The other one research participant?

It seems that these is something odd here.

Both the passage above, and the abstract refer to five respondents. The results report on four. So what is going on? No explanation of the discrepancy is provided. Perhaps:

There only ever were four participants, and the author made a mistake in counting.
There only ever were four participants, and the author made a typographical mistake (well, strictly, six typographical mistakes) in drafting the paper, and then missed this in checking the manuscript.
There were five respondents and the author forgot to include data on respondent 5 purely by accident.
There were five respondents, but the author decided not to report on the fifth deliberately for a reason that is not revealed (perhaps the results did not fit with the desired outcome?)

The significant point is not that there is an inconsistency but that this error was missed by peer reviewers and the editor – if there ever was any genuine peer review. This is the kind of mistake that a school child could spot – so, how is it possible that 'expert reviewers' and 'editorial staff' either did not notice it, or did not think it important enough to query?

Research instruments

Another section of the paper reports the instrumentation used in the paper.

"The research instruments for this study were Takziyatun Nafs modules, interview questions, and academic document analysis. All these instruments were prepared by the researcher and tested for validity and reliability before being administered to the selected study sample [sic, case?]."
Awang, 2022, p.42

Of course, it is important to test instruments for validity and reliability (or perhaps authenticity and trustworthiness when collecting qualitative data). But it is also important

to tell the reader how you did this
to report the outcomes

which seems to be missing (apart from in regard to part of the implemented module – see below). That is, the reader of a research study wants evidence not simply promises. Simply telling readers you did this is a bit like meeting a stranger who tells you that you can trust them because they (i.e., say that they) are honest.

Later the reader is told that

"Semi- structured interview questions will be [sic, not 'were'?] developed and validated for the purpose of identifying the causes and effects of hallucinations among these secondary school students…

…this interview process will be [sic, not 'was'] conducted continuously [sic!] with respondents to get a clear and specific picture of the problem of hallucinations and to find the best solution to overcome this disorder using Islamic medical approaches that have been planned in this study
Awang, 2022, pp.43-44

At the very least, this seems to confuse the plan for the research with a report of what was done. (But again, apparently, the reviewers and editorial staff did not think this needed addressing.) This is also confusing as it is not clear how this aspect of the study relates to the intervention. Were the interviews carried out before the intervention to help inform the design of the modules (presumably not as they had already been "tested for validity and reliability before being administered to the selected study sample"). Perhaps there are clear and simple answers to such questions – but the reader will not know because the reviewers and editor did not seem to feel they needed to be posed.

If "Interviews are the main research instrument in this study" (p.43), then one would expect to see examples of the interview schedules – but these are not presented. The paper reports a complex process for analysing interview data, but this is not reflected in the findings reported. The readers is told that the six stage process leads to the identifications and refinement of main and sub-categories. Yet, these categories are not reported in the paper. (But, again, peer reviewers and the editor did not apparently raise this as something to be corrected.) More generally "data analysis used thematic analysis methods" (p.44), so why is there no analysis presented in terms of themes? The results of this analysis are simply not reported.

The reader is told that

"This interview method…aims to determine the respondents' perspectives, as well as look at the respondents' thoughts on their views on the issues studied in this study."
Awang, 2022, p.44

But there is no discussion of participants perspectives and views in the findings of the study. ² Did the peer reviewers and editor not think this needed addressing before publication?

Even more significantly, in a qualitative study where interviews are supposedly the main research instrument, one would expect to see extracts from the interviews presented as part of the findings to support and exemplify claims being made: yet, there are none. (Did this not strike the peer reviewers and editor as odd: presumably they are familiar with the norms of qualitative research?)

The only quotation from the qualitative data (in this 'qualitative' study) I can find appears in the implications section of the paper:

"Are you aware of the importance of education to you? Realize. Is that lesson really important? Important. The success of the student depends on the lessons in school right or not? That's right"
Respondent 3: Awang, 2022, p.49

This seems a little bizarre, if we accept this is, as reported, an utterance from one of the students, Respondent 3. It becomes more sensible if this is actually condensed dialogue:

"Are you aware of the importance of education to you?"

"Realize."

"Is that lesson really important?"

"Important."

"The success of the student depends on the lessons in school right or not?"

"That's right"

It seems the peer review process did not lead to suggesting that the material should be formatted according to the norms for presenting dialogue in scholarly texts by indicating turns. In any case, if that is typical of the 'interview' technique used in the study then it is highly inadequate, as clearly the interviewer is leading the respondent, and this is more an example of indoctrination than open-ended enquiry.

Random sampling of data

Completely incongruous with the description of the purposeful selection of the participants for a case study is the account of how the assessment data was selected for analysis:

"The process of analysis of student achievement documents is carried out randomly by taking the results of current examinations that have passed such as the initial examination of the current year or the year before which is closest to the time of the study."
Awang, 2022, p.44

Did the peer reviewers or editor not question the use of the term random here? It is unclear what is meant to by 'random' here, but clearly if the analysis was based on randomly selected data that would undermine the results.

Validating the soul purification module

There is also a conceptual problem here. The Takziyatun Nafs modules are the intervention materials (part of what is being studied) – so they cannot also be research instruments (used to study them). Surely, if the Takziyatun Nafs modules had been shown to be valid and reliable before carrying out the reported study, as suggested here, then the study would not be needed to evaluate their effectiveness. But, presumably, expert peer reviewers (if there really were any) did not see an issue here.

The reliability of the intervention module

The Takziyatun Nafs modules had three components, and the author reports the second of the three was subjected to tests of validity and reliability. It seems that Awang thinks that this demonstrates the validity and reliability of the complete intervention,

"The second part of this module will go through [sic] the process of obtaining the validity and reliability of the module. Proses [sic] to obtain this validity, a questionnaire was constructed to test the validity of this module. The appointed specialists are psychologists, modern physicians (psychiatrists), religious specialists, and alternative medicine specialists. The validity of the module is identified from the aspects of content, sessions, and activities of the Tazkiyatun Nafs module. While to obtain the value of the reliability coefficient, Cronbach's alpha coefficient method was used. To obtain this Cronbach's alpha coefficient, a pilot test was conducted on 50 students who were randomly selected to test the reliability of this module to be conducted."
Awang, 2022, pp.43-44

Now to unpack this, it may be helpful to briefly outline what the intervention involved (as as the paper is open access anyone can access and read the full details in the report).

From the MGM film 'A Night at the Opera' (1935): "The introduction of the module will elaborate on the introduction, rationale, and objectives of this module introduced"

The description does not start off very helpfully ("The introduction of the module will elaborate on the introduction, rationale, and objectives of this module introduced" (p.43) put me in mind of the Marx brothers: "The party of the first part shall be known in this contract as the party of the first part"), but some key points are,

"the Tazkiyatun Nafs module was constructed to purify the heart of each respondent leading to the healing of hallucinatory disorders. This liver purification process is done in stages…

"the process of cleansing the patient's soul will be done …all the subtle beings in the patient will be expelled and cleaned and the remnants of the subtle beings in the patient will be removed and washed…

The second process is the process of strengthening and the process of purification of the soul or heart of the patient …All the mazmumah (evil qualities) that are in the heart must be discarded…

The third process is the process of enrichment and the process of distillation of the heart and the practices performed. In this process, there will be an evaluation of the practices performed by the patient as well as the process to ensure that the patient is always clean from all the disturbances and disturbances [sic] of subtle beings to ensure that students will always be healthy and clean from such disturbances…
Awang, 2022, p.45, p.43

Quite how this process of exorcising and distilling and cleansing will occur is not entirely clear (and if the soul is equated with the heart, how is the liver involved?), but it seems to involve reflection and prayer and contemplation of scripture – certainly a very personal and therapeutic process.

And yet its validity and reliability was tested by giving a questionnaire to 50 students randomly selected (from the unspecified population, presumably)? No information is given on how a random section was made (Taber, 2013) – which allows a reader to be very sceptical that this actually was a random sample from the (un?)identified population, and not just an arbitrary sample of 50 students. (So, that is twice the word 'random' is used in the paper when it seems inappropriate.)

It hardly matters here, as clearly neither the validity nor the reliability of a spiritual therapy can be judged from a questionnaire (especially when administered to people who have never undertaken the therapy). In any case, the "reliability coefficient" obtained from an administration of a questionnaire ONLY applies to that sample on that occasion. So, the statistic could not apply to the four participants in the study. And, in any case, the result is not reported, so the reader has no idea what the value of Cronbach's alpha was (but then, this was described as a qualitative study!)

Moreover, Cronbach's alpha only indicates the internal coherence of the items on a scale (Taber, 2019): so, it only indicates whether the set of questions included in the questionnaire seem to be accessing the same underlying construct in motivating the responses of those surveyed across the set of items. It gives no information about the reliability of the instrument (i.e., whether it would give the same results on another occasion).

This approach to testing validity and reliability is then completely inappropriate and unhelpful. So, even if the outcomes of the testing had been reported (and they are not) they would not offer any relevant evidence. Yet it seems that peer reviewers and editor did not think to question why this section was included in the paper.

Ethical issues

A study of this kind raises ethical issues. It may well be that the research was carried out in an entirely proper and ethical manner, but it is usual in studies with human participants ('human subjects') to make this clear in the published report (Taber, 2014b). A standard issue is whether the participants gave voluntary, informed, consent. This would mean that they were given sufficient information about the study at the outset to be able to decide if they wished to participate, and were under no undue pressure to do so. The 'respondents' were school students: if they were considered minors in the research context (and oddly for a 'case study' such basic details as age and gender are not reported) then parental permission would also be needed, again subject to sufficient briefing and no duress.

However, in this specific research there are also further issues due to the nature of the study. The participants were subject to medical disorders, so how did the researcher obtain information about, and access to, the students without medical confidentiality being broken? Who were the 'gatekeepers' who provided access to the children and their personal data? The researcher also obtained assessment data "from the class teacher or from the Student Affairs section of the student's school" (p.44), so it is important to know that students (and parents/guardians) consented to this. Again, peer review does not seem to have identified this as an issue to address before publication.

There is also the major underlying question about the ethics of a study when recognising that these students were (or could be, as details are not provided) suffering from serious medical conditions, but employing religious education as a treatment ("This method of treatment is to help respondents who suffer from hallucinations caused by demons or subtle beings", p.44). Part of the theoretical framework underpinning the study is the assumption that what is being addressed is"the problem of hallucinations caused by the presence of ethereal beings…" (p.43) yet it is also acknowledged that,

"Hallucinatory disorders in learning that will be emphasized in this study are due to several problems that have been identified in several schools in Malaysia. Such disorders are psychological, environmental, cultural, and sociological disorders. Psychological disorders such as hallucinatory disorders can lead to a more critical effect of bringing a person prone to Schizophrenia. Psychological disorders such as emotional disorders and psychiatric disorders. …Among the causes of emotional disorders among students are the school environment, events in the family, family influence, peer influence, teacher actions, and others."
Awang, 2022, p.41

There seem to be three ways of understanding this apparent discrepancy, which I might gloss:

there are many causes of conditions that involve hallucinations, including, but not only, possession by evil or mischievousness spirits;
the conditions that lead to young people having hallucinations may be understood at two complementary levels, at a spiritual level in terms of a need for inner cleansing and exorcising of subtle beings, and in terms of organic disease or conditions triggered by, for example, social and psychological factors;
in the introduction the author has relied on various academic sources to discuss the nature of the phenomenon of students having hallucinations, but he actually has a working assumption that is completely different: hallucinations are due to the presence of jinn or other spirits.

I do not think it is clear which of these positions is being taken by the study's author.

In the first case it would be necessary to identify which causes are present in potential respondents and only recruit those suffering possession for this study (which does not seem to have been done);
In the second case, spiritual treatment would need to complement medical intervention (which would completely undermine the validity of the study as medical treatments for the underlying causes of hallucinations are likely to be the cause of hallucinations ceasing, not the tested intervention);
The third position is clearly problematic in terms of academic scholarship as it is either completely incompetent or deliberately disregards academic norms that require the design of a study to reflect the conceptual framework set out to motivate it.

So, was this tested intervention implemented instead of or alongside formal medical intervention?

If it was alongside medical treatment, then that raises a major confound for the study.
Yet it would clearly be unacceptable to deny sufferers indicated medical treatment in order to test an educational intervention that is in effect a form of exorcism.

Again, it may be there are simple and adequate responses to these questions (although here I really cannot see what they might be), but unfortunately it seems the journal referees and editor did not think to ask for them.

Findings

The key findings presented concern academic performance at school. Core results are presented in tables I and II. Unfortunately these tables are not consistent as they report contradictory results for the academic performance of students before and during periods when they had hallucinations.

They can be made consistent if the reader assumes that two of the columns in table II are mislabelled. If the reader assumes that the column labelled 'before disruption' actually reports the performance 'during disruption' and that the column actually labelled 'during disruption' is something else, then they become consistent. For the results to tell a coherent story and agree with the author's interpretation this 'something else' presumably should be 'after disruption'.

This is a very unfortunate error – and moreover one that is obvious to any careful reader. (So, why was it not obvious to the referees and editor?)

As well as looking at these overall scores, other assessment data is presented separately for each of respondent 1 – respondent 4. Theses sections comprise presentations of information about grades and class positions, mixed with claims about the effects of the intervention. These claims are not based on any evidence and in many cases are conclusions about 'respondents' in general although they are placed in sections considering the academic assessment data of individual respondents. So,there are a number of problems with these claims:

they are of the nature of conclusions, but appear in the section presenting the findings;
they are about the specific effects of the intervention that the author assumes has influenced academic performance, not the data analysed in these sections;
they are completely unsubstantiated as no data or analysis is offered to support them;
often they make claims about 'respondents' in general, although as part of the consideration of data from individual learners.

Despite this, the paper passed peer-review and editorial scrutiny.

Rhetorical research?

This paper seems to be an example of a kind of 'rhetorical research' where a researcher is so convinced about their pre-existant theoretical commitments that they simply assume they have demonstrated them. Here the assumption seem to be:

Recovering from suffering hallucinations will increase student performance
Hallucinations are caused by jinn and devils
A spiritual intervention will expel jinn and devils
So, a spiritual intervention will cure hallucinations
So, a spiritual intervention will increase student performance

The researcher provided a spiritual intervention, and the student performance increased, so it is assumed that the scheme is demonstrated. The data presented is certainly consistent with the assumption, but does not in itself support this scheme without evidence. Awang provides evidence that student performance improved in four individuals after they had received the intervention – but there is no evidence offered to demonstrate the assumed mechanism.

A gardener might think that complimenting seedlings will cause them to grow. Perhaps she praises her seedlings every day, and they do indeed grow. Are we persuaded about the efficacy of her method, or might we suspect another cause at work? Would the peer-reveiewers and editor of the European Journal of Education and Pedagogy be persuaded this demonstrated that compliments cause plant growth? On the evidence of this paper, perhaps they would.

This is what Awang tells readers about the analysis undertaken:

Each student respondent involved in this study [sic, presumably not, rather the researcher] will use the analysis of the respondent's performance to determine the effect of hallucination disorders on student achievement in secondary school is accurate.

The elements compared in this analysis are as follows: a) difference in mean percentage of achievement by subject, b) difference in grade achievement by subject and c) difference in the grade of overall student achievement. All academic results of the respondents will be analyzed as well as get the mean of the difference between the performance before, during, and after the respondents experience hallucinations.

These results will be used as research material to determine the accuracy of the use of the Tazkiyatun Nafs Module in solving the problem of hallucinations in school and can improve student achievement in academic school."
Awang, 2022, p.45

There is clearly a large jump between the analysis outlined in the second paragraph here, and testing the study hypotheses as set out in the final paragraph. But the author does not seem to notice this (and more worryingly, nor do the journal's reviewers and editor).

So interleaved into the account of findings discussing "mean percentage of achievement by subject…difference in grade achievement by subject…difference in the grade of overall student achievement" are totally unsupported claims. Here is an example for Respondent 1:

"Based on the findings of the respondent's achievement in the grade for Respondent 1 while facing the problem of hallucinations shows that there is not much decrease or deterioration of the respondent's grade. There were only 4 subjects who experienced a decline in grade between before and during hallucination disorder. The subjects that experienced decline were English, Geography, CBC, and Civics. Yet there is one subject that shows a very critical grade change the Civics subject. The decline occurred from grade A to grade E. This shows that Civics education needs to be given serious attention in overcoming this problem of decline. Subjects experiencing this grade drop were subjects involving emotion, language, as well as psychomotor fitness. In the context of psychology, unstable emotional development leads to a decline in the psychomotor and emotional development of respondents.

After the use of the Tazkiyatun Nafs module in overcoming this problem, hallucinatory disorders can be overcome. This situation indicates the development of the respondents during and after experiencing hallucinations after practicing the Tazkiyatun Nafs module. The process that takes place in the Tzkiyatun Nafs module can help the respondent to stabilize his emotions and psyche for the better. From the above findings there were 5 subjects who experienced excellent improvement in grades. The increase occurred in English, Malay, Geography, and Civics subjects. The best improvement is in the subject of Civic education from grade E to grade B. The improvement in this language subject shows that the respondents' emotions have stabilized. This situation is very positive and needs to be continued for other subjects so that respondents continue to excel in academic achievement in school.""
Awang, 2022, p.45 (emphasis added)

The material which I show here as underlined is interjected completely gratuitously. It does not logically fit in the sequence. It is not part of the analysis of school performance. It is not based on any evidence presented in this section. Indeed, nor is it based on any evidence presented anywhere else in the paper!

This pattern is repeated in discussing other aspects of respondents' school performance. Although there is mention of other factors which seem especially pertinent to the dip in school grades ("this was due to the absence of the respondents to school during the day the test was conducted", p.46; "it was an increase from before with no marks due to non-attendance at school", p.46) the discussion of grades is interspersed with (repetitive) claims about the effects of the intervention for which no evidence is offered.

	Respondent 1	Respondent 2	Respondent 3	Respondent 4
§: Differences in Respondents' Grade Achievement by Subject	"After the use of the Tazkiyatun Nafs module in overcoming this problem, hallucinatory disorders can be overcome. This situation indicates the development of the respondents during and after experiencing hallucinations after practicing the Tazkiyatun Nafs module. The process that takes place in the Tzkiyatun Nafs module can help the respondent to stabilize his emotions and psyche for the better." (p.45)	"After the use of the Tazkiyatun Nafs module as a soul purification module, showing the development of the respondents during and after experiencing hallucination disorders is very good. The process that takes place in the Tzkiyatun Nafs module can help the respondent to stabilize his emotions and psyche for the better." (p.46)	"*The process that takes place in the Tazkiyatun Nafs module can help the respondent to stabilize his emotions and psyche for the better*" (p.46)	"*The process that takes place in the Tazkiyatun Nafs module can help the respondent to stabilize his emotions and psyche for the better*." (p.46)
§:Differences in Respondent Grades according to Overall Academic Achievement	"Based on the findings of the study after the hallucination disorder was overcome showed that the development of the respondents was very positive after going through the treatment process using the Tazkiyatun Nafs module…In general, the use of Tazkiyatun Nafs module successfully changed the learning lifestyle and achievement of the respondents from poor condition to good and excellent achievement." (pp.46-7)	"Based on the findings of the study after the hallucination disorder was overcome showed that the development of the respondents was very positive after going through the treatment process using the Tazkiyatun Nafs module. … This excellence also shows that the respondents have recovered from hallucinations after practicing the methods found in the Tazkiayatun Nafs module that has been introduced. In general, the use of the Tazkiyatun Nafs module successfully changed the learning lifestyle and achievement of the respondents from poor condition to good and excellent achievement." (p.47)	"Based on the findings of the study after the hallucination disorder was overcome showed that the development of the respondents was very positive after going through the treatment process using the Tazkiyatun Nafs module…In general, the use of the Tazkiyatun Nafs module successfully changed the learning lifestyle and achievement of the respondents from poor condition to good and excellent achievement." (p.47)	"Based on the findings of the study after the hallucination disorder was overcome showed that the development of the respondents was very positive after going through the treatment process using the Tazkiyatun Nafs module…In general, the use of the Tazkiyatun Nafs module has successfully changed the learning lifestyle and achievement of the respondents from poor condition to good and excellent achievement." (p.47)

Unsupported claims made within findings sections reporting analyses of individual student academic grades: note (a) how these statements included in the analysis of individual school performance data from four separate participants (in a case study – a methodology that recognises and values diversity and individuality) are very similar across the participants; (b) claims about 'respondents' (plural) are included in the reports of findings from individual students.

Awang summarises what he claims the analysis of 'differences in respondents' grade achievement by subject' shows:

"The use of the Tazkiyatun Nafs module in this study helped the students improve their respective achievement grades. Therefore, this soul purification module should be practiced by every student to help them in stabilizing their soul and emotions and stay away from all the disturbances of the subtle beings that lead to hallucinations"
Awang, 2022, p.46

And, on the next page, Awang summarises what he claims the analysis of 'differences in respondent grades according to overall academic achievement' shows:

"The use of the Tazkiyatun Nafs module in this study helped the students improve their respective overall academic achievement. Therefore, this soul purification module should be practiced by every student to help them in stabilizing the soul and emotions as well as to stay away from all the disturbances of the subtle beings that lead to hallucination disorder."
Awang, 2022, p.47

So, the analysis of grades is said to demonstrate the value of the intervention, and indeed Awang considers this is reason to extend the intervention beyond the four participants, not just to others suffering hallucinations, but to "every student". The peer review process seems not to have raised queries about

the unsupported claims,
the confusion of recommendations with findings (it is normal to keep to results in a findings section), nor
the unwarranted generalisation from four hallucination suffers to all students whether healthy or not.

Interpreting the results

There seem to be two stories that can be told about the results:

When the four students suffered hallucinations, this led to a deterioration in their school performance. Later, once they had recovered from the episodes of hallucinations, their school performance improved.
Narrative 1

Now narrative 1 relies on a very substantial implied assumption – which is that the numbers presented as school performance are comparable over time. So, a control would be useful: such as what happened to the performance scores of other students in the same classes over the same time period. It seems likely they would not have shown the same dip – unless the dip was related to something other than hallucinations – such as the well-recognised dip after long school holidays, or some cultural distraction (a major sports tournament; fasting during Ramadan; political unrest; a pandemic…). Without such a control the evidence is suggestive (after all, being ill, and missing school as a result, is likely to lead to a dip in school performance, so the findings are not surprising), but inconclusive.

Intriguingly, the author tells readers that "student achievement statistics from the beginning of the year to the middle of the current [sic, published in 2022] year in secondary schools in Northern Peninsular Malaysia that have been surveyed by researchers show a decline (Sabri, 2015 [sic])" (p.42), but this is not considered in relation to the findings of the study.

When the four students suffered hallucinations, this led to a deterioration in their school performance. Later, as a result of undergoing the soul purification module, their school performance improved.
Narrative 2

Clearly narrative 2 suffers from the same limitation as narrative 1. However, it also demands an extra step in making an inference. I could re-write this narrative:

When the four students suffered hallucinations, this led to a deterioration in their school performance. Later, once they had recovered from the episodes of hallucinations, their school performance improved.
AND
the recovery was due to engagement with the soul purification module.
Narrative 2'.

That is, even if we accept narrative 1 as likely, to accept narrative 2 we would also need to be convinced that:

a) sufferers from medical conditions leading to hallucinations do not suffer periodic attacks with periods of remission in between; or
b) episodes of hallucinations cannot be due to one-off events (emotional trauma, T.I.A. {transient ischaemic attack or mini-strokes},…) that resolve naturally in time; or
c) sufferers from medical conditions leading to hallucinations do not find they resolve due to maturation; or
d) the four participants in this study did not undertaken any change in life-style (getting more sleep, ceasing eating strange fungi found in the woods) unrelated to the intervention that might have influenced the onset of hallucinations; or
e) the four participants in this study did not receive any medical treatment independent of the intervention (e.g., prescribed medication to treat migraine episodes) that might have influenced the onset of hallucinations

Despite this study being supposedly a case study (where the expectation is there should be 'thick description' of the case and its context), there is no information to help us exclude such options. We do not know the medical diagnoses of the conditions causing the participants' hallucinations, or anything about their lives or any medical treatment that may have been administered. Without such information, the analysis that is provided is useless for answering the research question.

In effect, regardless of all the other issues raised, the key problem is that the research design is simply inadequate to test the research question. But it seems the referees and editor did not notice this shortcoming.

Alleged implications of the research

After presenting his results Awang draws various implications, and makes a number of claims about what had been found in the study:

"After the students went through the treatment session by using the Tazkiayatun Nafsmodule to treat hallucinations, it showed a positive effect on the student respondents. All this was certified by the expert, the student's parents as well as the counselor's teacher." (p.48)
"Based on these findings, shows that hallucinations are very disturbing to humans and the appropriate method for now to solve this problem is to use the Tazkiyatun Nafs Module." (p.48)
"…the use of the Tazkiyatun Nafs module while the respondent is suffering from hallucination disorder is very appropriate…is very helpful to the respondents in restoring their minds and psyche to be calmer and healthier. These changes allow students to focus on their studies as well as allow them to improve their academic performance better." (p.48)
"The use of the Tazkiyatun Nafs Module in this study has led to very positive changes there are attitudes and traits of students who face hallucinations before. All the negative traits like irritability, loneliness, depression,etc. can be overcome completely." (p.49)
"The personality development of students is getting better and perfect with the implementation of the Tazkiaytun Nafs module in their lives." (p.49)
"Results indicate that students who suffer from this hallucination disorder are in a state of high depression, inactivity, fatigue, weakness and pain,and insufficient sleep." (p.49)
"According to the findings of this study, the history of this hallucination disorder started in primary school and when a person is in adolescence, then this disorder becomes stronger and can cause various diseases and have various effects on a person who is disturbed." (p.50)

Given the range of interview data that Awang claims to have collected and analysed, at least some of the claims here are possibly supported by the data. However, none of this data and analysis is available to the reader. ² These claims are not supported by any evidence presented in the paper. Yet peer reviewers and the editor who read the manuscript seem to feel it is entirely acceptable to publish such claims in a research paper, and not present any evidence whatsoever.

Summing up

In summary: as far as these four students were concerned (but not perhaps the fifth participant?), there did seem to be a relationship between periods of experiencing hallucinations and lower school performance (perhaps explained by such factors as "absenteeism to school during the day the test was conducted" p.46) ,

"the performance shown by students who face chronic hallucinations is also declining and declining. This is all due to the actions of students leaving the teacher's learning and teaching sessions as well as not attending school when this hallucinatory disorder strikes. This illness or disorder comes to the student suddenly and periodically. Each time this hallucination disease strikes the student causes the student to have to take school holidays for a few days due to pain or depression"
Awang, 2022, p.42

However,

these four students do not represent any wider population;
there is no information about the specific nature, frequency, intensity, etcetera, of the hallucinations or diagnoses in these individuals;
there was no statistical test of significance of changes; and
there was no control condition to see if performance dips were experienced by others not experiencing hallucinations at the same time.

Once they had recovered from the hallucinations (and it is not clear on what basis that judgement was made) their scores improved.

The author would like us to believe that the relief from the hallucinations was due to the intervention, but this seems to be (quite literally) an act of faith ³ as no actual research evidence is offered to show that the soul purification module actually had any effect. It is of course possible the module did have an effect (whether for the conjectured or other reasons – such as simply offering troubled children some extra study time in a calm and safe environment and special attention – or because of an expectancy effect if the students were told by trusted authority figures that the intervention would lead to the purification of their hearts and the healing of their hallucinatory disorder) but the study, as reported, offers no strong grounds to assume it did have such an effect.

An irresponsible journal

As hallucinations are often symptoms of organic disease affecting blood supply to the brain, there is a major question of whether treating the condition by religious instruction is ethically sound. For example, hallucinations may indicate a tumour growing in the brain. Yet, if the module was only a complement to proper medical attention, a reader may prefer to suspect that any improvement in the condition (and consequent increased engagement in academic work) may have been entirely unrelated to the module being evaluated.

Indeed, a published research study that claims that soul purification is a suitable treatment for medical conditions presenting with hallucinations is potentially dangerous as it could lead to serious organic disease going untreated. If Awang's recommendations were widely taken up in Malaysia such that students with serious organic conditions were only treated for their hallucinations by soul purification rather than with medication or by surgery it would likely lead to preventable deaths. For a research journal to publish a paper with such a conclusion, where any qualified reviewer or editor could easily see the conclusion is not warranted, is irresponsible.

As the journal website points out,

"The process of reviewing is considered critical to establishing a reliable body of research and knowledge. The review process aims to make authors meet the standards of their discipline, and of science in general."
https://www.ej-edu.org/index.php/ejedu/about

So, why did the European Journal of Education and Pedagogy not subject this submission to meaningful review to help the author of this study meet the standards of the discipline, and of science in general?

Work cited:

Awang, S. B. (2022). Hallucination Disorders: The Effects of Using the Tazkiyatun Nafs Module on the Academic Achievement of Students with Hallucinations. European Journal of Education and Pedagogy, 3(4), 41-51.
Taber, K. S. (2013). Non-random thoughts about research. Chemistry Education Research and Practice, 14(4), 359-362. doi: 10.1039/c3rp90009f. [Free access]
Taber, K. S. (2014). Methodological issues in science education research: a perspective from the philosophy of science. In M. R. Matthews (Ed.), International Handbook of Research in History, Philosophy and Science Teaching (Vol. 3, pp. 1839-1893): Springer Netherlands.) (Download the author's manuscript version of the chapter.)
Taber, K. S. (2014b). Ethical considerations of chemistry education research involving "human subjects". Chemistry Education Research and Practice, 15(2), 109-113. [Free access]
Taber, K. S. (2018). The Use of Cronbach's Alpha When Developing and Reporting Research Instruments in Science Education. Research in Science Education, 48, 1273-1296. doi:10.1007/s11165-016-9602-2 [Open access]

Notes:

¹ In mature fields in the natural sciences there are recognised traditions ('paradigms', 'disciplinary matrices') in any active field at any time. In general (and of course, there will be exceptions):

at any historical time, there is a common theoretical perspective underpinning work in a research programme, aligned with specific ontological and epistemological commitments;
at any historical time, there is a strong alignment between the active theories in a research programme and the acceptable instrumentation, methodology and analytical conventions.

Put more succinctly, in a mature research field, there is generally broad agreement on how a phenomenon is to be understood; and how to go about investigating it, and how to interpret data as research evidence.

This is generally not the case in educational research – which is in part at least due to the complexity and, so, multi-layered nature, of the phenomena studied (Taber, 2014a): phenomena such as classroom teaching. So, in reviewing educational papers, it is sometimes necessary to find different experts to look at the theoretical and the methodological aspects of the same submission.

² The paper is very strange in that the introductory sections and the conclusions and implications sections have a very broad scope, but the actual research results are restricted to a very limited focus: analysis of school test scores and grades.

It is as if as (and could well be that) a dissertation with a number of evidential strands has been reduced to a paper drawing upon only one aspect of the research evidence, but with material from other sections of the dissertation being unchanged from the original broader study.

³ Readers are told that

"All these acts depend on the sincerity of the medical researcher or fortune-teller seeking the help of Allah S.W.T to ensure that these methods and means are successful. All success is obtained by the permission of Allah alone"
Awang, 2022, p.43

A case study of educational innovation?

Design and Assessment of an Online Prelab Model in General Chemistry

Keith S. Taber

Case study is meant to be naturalistic – whereas innovation sounds like an intervention. But interventions can be the focus of naturalistic enquiry.

One of the downsides of having spent years teaching research methods is that one cannot help but notice how so much published research departs from the ideal models one offers to students. (Which might be seen as a polite way of saying authors often seem to get key things wrong.) I used to teach that how one labelled one's research was less important than how well one explained it. That is, different people would have somewhat different takes on what is, or is not, grounded theory, case study or action research, but as long as an author explained what they had done, and could adequately justify why, the choice of label for the methodology was of secondary importance.

A science teacher can appreciate this: a student who tells the teacher they are doing a distillation when they are actually carrying out reflux – but clearly explains what they are doing and why, will still be understood (even if the error should be pointed out). On the other hand if a student has the right label but an alternative conception this is likely to be a more problematic 'bug' in the teaching-learning system. ¹

That said, each type of research strategy has its own particular weaknesses and strengths so describing something as an experiment, or a case study, if it did not actually share the essential characteristics of that strategy, can mislead the reader – and sometimes even mislead the authors such that invalid conclusions are drawn.

A 'case study', that really is a case study

I made reference above to action research, grounded theory, and case study – three methodologies which are commonly name-checked in education research. There are a vast number of papers in the literature with one of these terms in the title, and a good many of them do not report work that clearly fits the claimed approach! ²

The case study was published in the *Journal for the Research Center for Educational Technology*

So, I was pleased to read an interesting example of a 'case study' that I felt really was a case study (Llorens-Molina, 2009). 'Design and assessment of an online prelab model in general chemistry: A case study' offered a good example of a case study. Although, I suspect some other authors might have been tempted to describe this research differently.

Is it a bird, is it a plane; no it's…

Llorens-Molina's study included an experimental aspect. A cohort of learners was divided into two groups to allow the researcher to compare two different educational treatments; then, measurements were made to compare outcomes quantitatively. That might sound like an experiment. Moreover, this study reported an attempt to innovate in a teaching situation, which gives the work a flavour of action research. Despite this, I agree with Llorens-Molinathat that the work is best characterised as a case study.

Read about experiments

Read about action research

A case study focuses on 'one instance' from among many

What is a case study?

A case study is an in-depth examination of one instance: one example – of something for which there are many examples. The focus of a case study might be one learner, one teacher, one group of students working together on a task, one class, one school, one course, one examination paper, one text book, one laboratory session, one lesson, one enrichment programme… So, there is great variety in what kind of entity a case study is a study of, but what case studies have in common is they each focus in detail on that one instance.

Read about case study methodology

Characteristics of case study

Case studies are naturalistic studies, which means they are studies of things as they are, not attempts to change things. The case has to be bounded (a reader of a case study learns what is in the case and what is not) but tends to be embedded in a wider context that impacts upon it. That is, the case is entangled in a context from which it could not easily be extracted and still be the same case. (Imagine moving a teacher with her class from their school to have their lesson in a university where it could be observed by researchers – it would not be 'the same lesson' as would have occurred in situ).

The case study is reported in detail, often in a narrative form (not just statistical summaries) – what is sometimes called 'thick description'. Usually several 'slices' of data are collected – often different kinds of data – and often there is a process of 'triangulation' to check the consistency of the account presented in relation to the different slices of data available. Although case studies can include analysis of quantitative data, they are usually seen as interpretive as the richness of data available usually reflects complexity and invites nuance.

Design and Assessment of an Online Prelab Model in General Chemistry

Llorens-Molina's study explored the use of prelabs that are "used to introduce and contextualize laboratory work in learning chemistry" (p.15), and in particular "an alternative prelab model, which consists of an audiovisual tutorial associated with an online test" (p.15).

An innovation

The research investigated an innovation in teaching practice,

"In our habitual practice, a previous lecture at the beginning of each laboratory session, focused almost exclusively on the operational issues, was used. From our teaching experience, we can state that this sort of introductory activity contributes to a "cookbook" way to carry out the laboratory tasks. Furthermore, the lecture takes up valuable time (about half an hour) of each ordinary two-hour session. Given this set-up, the main goal of this research was to design and assess an alternative prelab model, which was designed to enhance the abilities and skills related to an inquiry-type learning environment. Likewise, it would have to allow us to save a significant amount of time in laboratory sessions due to its online nature….
a prelab activity developed …consists of two parts…a digital video recording about a brief tutorial lecture, supported by a slide presentation…[followed by ] an online multiple choice test"
Llorens-Molina, 2009, p.16-17

Not action research?

The reference to shifting "our habitual practice" indicates this study reports practitioner research. Practitioner studies, such as this, that test a new innovation are often labelled by authors as 'action research'. (Indeed, sometimes, the fact that research is carried out by practitioners looking to improve their own practice is seen as sufficient for action research: when actually this is a necessary, but not a sufficient condition.)

Genuine action research aims at improving practice, not simply seeing if a specific innovation is working. This means action research has an open-ended design, and is cyclical – with iterations of an innovation tested and the outcomes used as feedback to inform changes in the innovation. (Despite this, a surprising number of published studies labelled as action research lack any cyclic element, simply reporting one iteration of a innovation.) Llorens-Molina's study does not have a cyclic design, so would not be well-characterised as action research.

An experimental design?

Llorens-Molina reports that the study was motivated by three hypotheses (p.16):

"Substituting an initial lecture by an online prelab to save time during laboratory sessions will not have negative repercussions in final examination marks.
The suggested online prelab model will improve student autonomy and prerequisite knowledge levels during laboratory work. This can be checked by analyzing the types and quantity of SGQ [student generated questions].
Student self-perceptions about prelab activities will be more favourable than those of usual lecture methods."

To test these hypotheses the student cohort was divided into two groups, to be split between the customary and innovative approach. This seems very much like an experiment.

It may be useful here to make a discrimination between two levels of research design – methodology (akin to strategy) and techniques (akin to tactics). In research design, a methodology is chosen to meet the overall aims of the study, and then one or more research techniques are selected consistent with that methodology (Taber, 2013). Experimental techniques may be included in a range of methodologies, but experiment as an overall methodology has some specific features.

Read about Research design

In a true experiment there is random assignment to conditions, and often there is an intention to generalise results to a wider population considered to be sampled in the study. Llorens-Molina reports that although inferential statistics were used to test the hypotheses, there was no intention to offer statistical generalisation beyond the case. The cohort of students was not assumed to be a sample representing some wider population (such as, say, undergraduates on chemistry courses in Spain) – and, indeed, clearly such an assumption would not have been justified.

Case study is naturalistic – but an innovation is an intervention in practice…

Case study is said to be naturalistic research – it is a method used to understand and explore things as they are, not to bring about change. Yet, here the focus is an innovation. That seems a contradiction. It would be a contradiction if the study was being carried out by external researchers who had asked the teaching team to change practice for the benefits of their study. However, here it is useful to separate out the two roles of teacher and researcher.

This is a situation that I commonly faced when advising graduates preparing for school teaching who were required to carry out a classroom based study into an aspect of their school placement practice context as part of their university qualification (the Post-Graduate Certificate in Education, P.G.C.E.). Many of these graduates were unfamiliar with research into social phenomena. Science graduates often brought a model of what worked in the laboratory to their thinking about their projects – and had a tendency to think that transferring the experimental approach to classrooms (where there are usually a large number of potentially relevant variables, many of which can not be controlled) would be straightforward.

Read 'Why do natural scientists tend to make poor social scientists?'

The Cambridge P.G.C.E. teaching team put into place a range of supports to introduce graduate preparing for teaching to the kinds of education research useful for teachers who want to evaluate and improve their own teaching. This included a book written to introduce classroom-based research that drew heavily on analysis of published studies (Taber, 2007; 2013). Part of our advice was that those new to this kind of enquiry might want to consider action research and case study as suitable options for their small-scale projects.

Useful strategies for the novice practitioner-researcher (Figure: diagram used in working with graduates preparing for teaching, from **Taber, 2010)**

Simplistically, action research might be considered best suited to a project to test an innovation or address a problem (e.g., evaluating a new teaching resource; responding to behavioural issues), and case study best suited to an exploratory study (e.g., what do Y9 students understand about photosynthesis?; what is the nature of peer dialogue during laboratory working in this class?) However, it was often difficult for the graduates to carry out authentic action research as the constraints of the school-based placements seldom allowed them to test successive iterations of the same intervention until they found something like an optimal specification.

Yet, they often were in a good position to undertake a detailed study of one iteration, collecting a range of different data, and so producing a detailed evaluation. That sounds like a case study.

Case study is supposed to be naturalistic – whereas innovation sounds like an intervention. But some interventions in practice can be considered the focus of naturalistic enquiry. My argument was that when a teacher changes the way they do something to try and solve a problem, or simply to find a better way to work, that is a 'natural' part of professional practice. The teacher-researcher, as researcher, is exploring something the fully professional teacher does as matter of course – seek to develop practice. After all, our graduates were being asked to undertake research to give them the skills expected to meet professional teaching standards, which

"clearly requires the teacher to have both the procedural knowledge to undertake small-scale classroom enquiry, and 'conceptual frameworks' for thinking about teaching and learning that can provide the basis for evaluating their teaching. In other words, the professional teacher needs both the ability to do her own research and knowledge of what existing research suggests"
Taber, 2013, p.8

So, the research is on something that is naturally occurring in the classroom context, rather than an intervention imported into the context in order to answer an external researcher's questions. A case study of an intervention introduced by practitioners themselves can be naturalistic – even if the person implementing the change is the researcher as well as the teacher.

If a teacher-researcher (qua researcher) wishes to enquire into an innovation introduced by the teacher-researcher (qua teacher) then this can be considered as naturalistic enquiry

The case and the context

In Llorens-Molina's study, the case was a sequence of laboratory activities carried out by a cohort of undergraduates undertaking a course of General and Organic Chemistry as part of an Agricultural Engineering programme. So, the case was bounded (the laboratory part of one taught course) and embedded in a wider context – a degree programme in a specific institution in Spain: the Polytechnic University of Valencia.

The primary purpose of the study was to find out about the specific innovation in the particular course that provided the case. This was then what is known as an intrinsic case study. (When a case is studied primarily as an example of a class of cases, rather than primarily for its own interest, it is called an instrumental case study).

Llorens-Molina recognised that what was found in this specific case, in its particular context, could not be assumed to apply more widely. There can be no statistical generalisation to other courses elsewhere. In case study, the intention is to offer sufficient detail of the case for readers to make judgements of the likely relevance to other context of interest (so-called 'reader generalisation').

The published report gives a good deal of information about the course as well as much information about how data was collected, and equally important, analysed.

Different slices of data

Case study often uses a range of data sources to develop a rounded picture of the case. In this study the identification of three specific hypotheses (less usual in case studies, which often have more open-ended research questions) led to the collection of three different types of data.

Students were assessed on each of six laboratory activities. A comparison was made between the prelab condition and the existing approach.
Questions asked by students in the laboratories were recorded and analysed to see if the quality/nature of such questions was different in the two conditions. A sophisticated approach was developed to analyse the questions.
Students were asked to rate the prelabs through responding to items on a questionnaire.

This approach allowed the author to go beyond simply reporting whether hypotheses were supported by the analysis, to offer a more nuanced discussion around each feature. Such nuance is not only more informative to the reader of a case study, but reflects how the researcher, as practitioner, has an ongoing commitment to further develop practice and not see the study as an end in itself.

Avoiding the 'equivalence' and the 'misuse of control groups' problems

I particularly appreciate a feature of the research design that many educational studies that claim to be experiments could benefit from. To test his hypotheses Llorens-Molina employed two conditions or treatments, the innovation and a comparison condition, and divided the cohort: "A group with 21 students was split into two subgroups, with 10 and 11 in each one, respectively". Llorens-Molina does not suggest this was based on random assignment, which is necessary for a 'true' experiment.

In many such quasi-experiments (where randomisation to condition is not carried out, and is indeed often not possible) the researchers seek to offer evidence of equivalence before the treatments occur. After all, if the two subgroups are different in terms of past subject attainment or motivation or some other relevant factor (or, indeed, if there is no information to allow a judgement regarding whether this is the case or not), no inferences about an intervention can be drawn from any measured differences. (Although that does not always stop researchers from making such claims regardless: e.g., see Lack of control in educational research.)

Another problem is that if learners are participating in research but are assigned to a control or comparison condition then it could be asked if they are just being used as 'data fodder', and would that be fair to them? This is especially so in those cases (so, not this one) where researchers require that the comparison condition is educationally deficient – many published studies report a control condition where schools students have effectively been lectured to, and no discussion work, group work, practical work, digital resources, et cetera, have been allowed, in order to ensure a stark contrast with whatever supposedly innovative pedagogy or resource is being evaluated (Taber, 2019).

These issues are addressed in research designs which have a compensatory structure – in effect the groups switch between being the experimental and comparison condition – as here:

"Both groups carried out the alternative prelab and the previous lecture (traditional practice), alternately. In this way, each subgroup carried out the same number of laboratory activities with either a prelab and previous lecture"
Llorens-Molina, 2009, p.19

This is good practice both from methodological and ethical considerations.

The study used a compensatory design which avoids the need to ensure both groups are equivalent at the start, and does not disadvantage one group. (Figure from Llorens-Molina, 2009, p.22 – published under a creative commons Attribution-NonCommercial-NoDerivs 3.0 United States license allowing redistribution with attribution)

A case of case study

Do I think this is a model case study that perfectly exemplifies all the claimed characteristics of the methodology? No, and very few studies do. Real research projects, often undertaken in complex contexts with limited resources and intractable constraints, seldom fit such ideal models.

However, unlike some studies labelled as case studies, this study has an explicit bounded case and has been carried out in the spirit of case study that highlights and values the intrinsic worth of individual cases. There is a good deal of detail about aspects of the case. It is in essence a case study, and (unlike what sometimes seems to be the case [sic]) not just called a case study for want of a methodological label. Most educational research studies examine one particular case of something – but (and I do not think this is always appreciated) that does not automatically make them case studies. Because it has been both conceptualised and operationalised as a case study, Llorens-Molina's study is a coherent piece of research.

Given how, in these pages, I have often been motivated to call out studies I have read that I consider have major problems – major enough to be sufficient to undermine the argument for the claimed conclusions of the research – I wanted to recognise a piece of research that I felt offered much to admire.

Work cited:

Llorens-Molina, J.-A. (2009). Design and assessment of an online prelab model in general chemistry: A case study. Journal of the Research Center for Educational Technology, 4(2), 15-31. (Available at https://rcetj.org/index.php/rcetj/article/view/22 )
Taber, K. S. (2007) Classroom-based Research and Evidence-based Practice: A Guide for Teachers, SAGE Publications.
Taber, K. S. (2010). Preparing teachers for a research-based profession. In M. V. Zuljan & J. Vogrinc (Eds.), Facilitating effective student learning through teacher research and innovation (pp. 19-47). Ljubljana: Faculty of Education, University of Ljubljana. [download Facilitating effective student learning through teacher research and innovation]
Taber, K. S. (2013). Classroom-based Research and Evidence-based Practice: An introduction (2nd ed.). London: Sage.
Taber, K. S. (2019). Experimental research into teaching innovations: responding to methodological and ethical challenges. Studies in Science Education, 55(1), 69-119. doi:10.1080/03057267.2019.1658058 [Download manuscript version]

Notes:

¹ I am using language here reflecting a perspective on teaching as being based on a model (whether explicit or not) in the teacher's mind of the learners' current knowledge and understanding and how this will respond to teaching. That expects a great deal of the teacher, so there are often bugs in the system (e.g., the teacher over-estimates prior knowledge) that need to be addressed. This is why being a teacher involves being something of a 'learning doctor'.

Read about the learning doctor perspective on teaching

² I used to teach sessions introducing each of these methodologies when I taught on an Educational Research course. One of the class activities was to examine published papers claiming the focal methodology, asking students to see if studies matched the supposed characteristics of the strategy. This was a course with students undertaking a very diverse range of research projects, and I encouraged them to apply the analysis to papers selected because they were of particular interest and relevance to to their own work. Many examples selected by students proved to offer poor match between claimed methodology and the actual research design of ther study!

POEsing assessment questions…

…but not fattening the cow

Keith S. Taber

A well-known Palestinian proverb reminds us that we do not fatten the cow simply by repeatedly weighing it. But, sadly, teachers and others working in education commonly get so fixated on assessment that it seems to become an end in itself.

Images by Clker-Free-Vector-Images from PixabayOpenClipart-Vectors and Deedster from Pixabay

A research study using P-O-E

I was reading a report of a study that adopted the predict-observe-explain, P-O-E, technique as a means to elicit "high school students' conceptions about acids and bases" (Kala, Yaman & Ayas, 2013, p.555). As the name suggests, P-O-E asks learners to make a prediction before observing some phenomenon, and then to explain their observations (something that can be specially valuable when the predictions are based on strongly held intuitions which are contrary to what actually happens).

Read about Predict-Observe-Explain

Kala and colleagues begin the introduction to their paper by stating that

"In any teaching or learning approach enlightened by constructivism, it is important to infer the students' ideas of what is already known"
Kala, Yaman & Ayas, 2013, p.555

Constructivism?

Constructivism is a perspective on learning that is informed by research into how people learn and a great many studies into student thinking and learning in science. A key point is how a learner's current knowledge and understanding influences how they make sense of teaching and what they go on to learn. Research shows it is very common for students to have 'alternative conceptions' of science topics, and often these conceptions either survive teaching or distort how it is understood.

The key point is that teachers who teach the science without regard to student thinking will often find that students retain their alternative ways of thinking, so constructivist teaching is teaching that takes into account and responds to the ideas about science topics that students bring to class.

Read about constructivism

Read about constructivist pedagogy

Assessment: summative, formative and diagnostic

If teachers are to take into account, engage with, and try to reshape, learners ideas about science topics, then they need to know what those ideas are. Now there is a vast literature reporting alternative conceptions in a wide range of science topics, spread across thousands or research reports – but no teacher could possibly find time to study them all. There are books which discuss many examples and highlight some of the most common alternative conceptions (including one of my own, Taber, 2014)

A book that describes and discusses a good many alternative conceptions – but by no means all!

However, in any class studying some particular topic there will nearly always be a spread of different alternative conceptions across the students – including some so idiosyncratic that they have never been reported in any literature. So, although reading about common misconceptions is certainly useful to prime teachers for what to look out for, teachers need to undertake diagnostic assessment to find out about the thinking of their own particular students.

There are many resources available to support teachers in diagnostic assessment, and some activities (such as using concept cartoons) that are especially useful at revealing student thinking.

Read about diagnostic assessment

Diagnostic assessment, assessment to inform teaching, is carried out at the start of a topic, before the teaching, to allow teachers to judge the learners' starting points and any alternative conceptions ('misconceptions') they may have. It can therefore be considered aligned to formative assessment ('assessment for learning') which is carried out as part of the learning process, rather than summative assessment (assessment of leaning) which is used after studying to check, score, grade and certify learning.

P-O-E as a learning activity…

P-O-E can best support learning in topics where it is known learners tend to have strongly held, but unhelpful, intuitions. The predict stage elicits students' expectations – which, when contrary to the scientific account, can be confounded by the observe step. The 'cognitive conflict' generated by seeing something unexpected (made more salient by having been asked to make a formal prediction) is thought to help students concentrate on that actual phenomena, and to provide 'epistemic relevance' (Taber, 2015).

Epistemic relevance refers to the idea that students are learning about things they are actually curious about, whereas for many students following a conventional science course must be experienced as being presented with the answers to a seemingly never-ending series questions that had never occurred to them in the first place.

Read about the Predict-Observe-Explain technique

Students are asked to provide an explanation for what they have observed which requires deeper engagement than just recording an observation. Developing explanations is a core scientific practice (and one which is needed before another core scientific practice – testing explanations – is possible).

Read about teaching about scientific explanations

To be most effective, P-O-E is carried out in small groups, as this encourages the sharing, challenging and justifying of ideas: the kind of dialogic activity thought to be powerful in supporting learners in developing their thinking, as well as practicing their skills in scientific argumentation. As part of dialogic teaching such an open-forum for learners' ideas is not an end in itself, but a preparatory stage for the teacher to marshal the different contributions and develop a convincing argument for how the best account of the phenomenon is the scientific account reflected in the curriculum.

Constructivist teaching is informed by learners' ideas, and therefore relies on their elicitation, but that elicitation is never the end in itself but is a precursor to a customised presentation of the canonical account.

Read about dialogic teaching and learning

…and as a diagnostic activity

Group work also has another function – if the activity is intended to support diagnostic assessment, then the teacher can move around the room listening in to the various discussions and so collecting valuable information on what students think and understand. When assessment is intended to inform teaching it does not need to be about students completing tests and teachers marking them – a key principle of formative assessment is that it occurs as a natural part of the teaching process. It can be based on productive learning activities, and does not need marks or grades – indeed as the point is to help students move on in their thinking, any kind of formal grading whilst learning is in progress would be inappropriate as well as a misuse of teacher time.

Probing students' understandings about acid-base chemistry

The constructivist model of learning applies to us all: students, teachers, professors, researchers. Given what I have written above about P-O-E, about diagnostic assessment, and dialogic approaches to learning, I approached Kala and colleagues' paper with expectations about how they would have carried out their project.

These authors do report that they were able to diagnose aspects of student thinking about acids and bases, and found some learning difficulties and alternative conceptions,

"it was observed that eight of the 27 students had the idea that the "pH of strong acids is the lowest every time," while two of the 27 students had the idea that "strong acids have a high pH." Furthermore, four of the 27 students wrote the idea that the "substance is strong to the extent to which it is burning," while one of the 27 students mentioned the idea that "different acids which have equal concentration have equal pH."
Kala, Yaman & Ayas, 2013, pp.562-3

The key feature seems to be that, as reported in previous research, students conflate acid concentration and acid strength (when it is possible to have a high concentration solution of a weak acid or a very dilute solution of a strong acid).

Yet some aspects of this study seemed out of alignment with the use of P-O-E.

The best research style?

One feature was the adoption of a positivistic approach to the analysis,

Although there has been no reported analyzing procedure for the POE, in this study, a different [sic] analyzing approach was offered taking into account students' level of understanding… Data gathered from the written responses to the POE tasks were analyzed and divided into six groups. In this context, while students' prediction were divided into two categories as being correct or wrong, reasons for predictions were divided into three categories as being correct, partially correct, or wrong.
Kala, Yaman & Ayas, 2013, pp.560

Group	Prediction	Reasons
❀	correct	correct
❁	correct	partially correct
✯	correct	wrong
✿	wrong	correct
❂	wrong	partially correct
✤	wrong	wrong

"the written responses to the POE tasks were analyzed and divided into six groups"

There is nothing inherently wrong with doing this, but it aligns the research with an approach that seems at odds with the thinking behind constructivist studies that are intended to interpret a learner's thinking in its own terms, rather than simply compare it with some standard. (I have explored this issue in some detail in a comparison of two research studies into students' conceptions of forces – see Taber, 2013, pp.58-66.)

In terms of research methodology we might say it seem to be conceptualised within the 'wrong' paradigm for this kind of work. It seems positivist (assuming data can be unambiguously fitted into clear categories), nomothetic (tied to 'norms' and canonical answers) and confirmatory (testing thinking as matching model responses or not), rather than interpretivist (seeking to understand student thinking in its own terms rather than just classifying it as right or wrong), idiographic (acknowledging that every learner's thinking is to some extent unique to them) and discovery (exploring nuances and sophistication, rather than simply deciding if something is acceptable or not).

Read about paradigms in educational research

The approach used seemed more suitable for investigating something in the science laboratory, than the complex, interactive, contextualised, and ongoing life of classroom teaching. Kala and colleagues describe their methodology as case study,

"The present study used a case study because it enables the giving of permission to make a searching investigation of an event, a fact, a situation, and an individual or a group…"
Kala, Yaman & Ayas, 2013, pp.558

A case study?

Case study is a naturalistc methodology (rather than involving an intervention, such as an experiment), and is idiographic, reflecting the value of studying the individual case. The case is one from among many instances of its kind (one lesson, one school, one examination paper, etc.), and is considered as a somewhat self contained entity yet one that is embedded in a context in which it is to some extent entangled (for example, what happens in a particular lesson is inevitably somewhat influenced by

the earlier sequence of lessons that teacher taught that class {the history of that teacher with that class},
the lessons the teacher and student came from immediately before this focal lesson,
the school in which it takes place,
the curriculum set out to be followed…)

Although a lesson can be understood as a bounded case (taking place in a particular room over a particular period of time involving a specified group of people) it cannot be isolated from the embedding context.

Read about case study methodology

Case study – study of one instance from among many

As case study is idiographic, and does not attempt to offer direct generalisation to other situations beyond that case, a case study should be reported with 'thick description' so a reader has a good mental image of the case (and can think about what makes it special – and so what makes it similar to, or different from, other instances the reader may be interested in). But that is lacking in Kala and colleagues' study, as they only tell readers,

"The sample in the present study consisted of 27 high school students who were enrolled in the science and mathematics track in an Anatolian high school in Trabzon, Turkey. The selected sample first studied the acid and base subject in the middle school (grades 6 – 8) in the eighth year. Later, the acid and base topic was studied in high school. The present study was implemented, based on the sample that completed the normal instruction on the acid and base topic."
Kala, Yaman & Ayas, 2013, pp.558-559

The reference to a sample can be understood as something of a 'reveal' of their natural sympathies – 'sample' is the language of positivist studies that assume a suitably chosen sample reflects a wider population of interest. In case study, a single case is selected and described rather than a population sampled. A reader is left to rather guess what population being sampled here, and indeed precisely what the 'case' is.

Clearly, Kala and colleagues elicited some useful information that could inform teaching, but I sensed that their approach would not have made optimal use of a learning activity (P-O-E) that can give insight into the richness, and, sometimes, subtlety of different students' ideas.

Individual work

Even more surprising was the researchers' choice to ask students to work individually without group discussion.

"The treatment was carried out individually with the sample by using worksheets."
Kala, Yaman & Ayas, 2013, p.559

This is a choice which would surely have compromised the potential of the teaching approach to allow learners to explore, and reveal, their thinking?

I wondered why the researchers had made this choice. As they were undertaking research, perhaps they thought it was a better way to collect data that they could readily analyse – but that seems to be choosing limited data that can be easily characterised over the richer data that engagement in dialogue would surely reveal?

Assessment habits

All became clear near the end of the study when, in the final paragraph, the reader is told,

"In the present study, the data collection instruments were used as an assessment method because the study was done at the end of the instruction/ [sic] on the acid and base topics."
Kala, Yaman & Ayas, 2013, p.571

So, it appears that the P-O-E activity, which is an effective way of generating the kind of rich but complex data that helps a teacher hone their teaching for a particular group, was being adopted, instead, as means of a summative assessment. This is presumably why the analysis focused on the degree of match to the canonical science, rather than engaging in interpreting the different ways of thinking in the class. Again presumably, this is why the highly valuable group aspect of the approach was dropped in favour of individual working – summative assessment needs to not only grade against norms, but do this on the basis of each individual's unaided work.

An activity which offers great potential for formative assessment (as it is a learning activity as well as a way of exploring student thinking); and that offers an authentic reflection of scientific practice (where ideas are presented, challenged, justified, and developed in response to criticism); and that is generally enjoyed by students because it is interactive and the predictions are 'low stakes' making for a fun learning session, was here re-purposed to be a means of assessing individual students once their study of a topic was completed.

Kala and colleagues certainly did identify some learning difficulties and alternative conceptions this way, and this allowed them to evaluate student learning. But I cannot help thinking an opportunity was lost here to explore how P-O-E can be used in a formative assessment mode to inform teaching:

diagnostic assessment as formative assessment can inform more effective teaching
diagnostic assessment as summative assessment only shows where teaching has failed

Yes, I agree that "in any teaching or learning approach enlightened by constructivism, it is important to infer the students' ideas of what is already known", but the point of that is to inform the teaching and so support student learning. What were Kala and colleagues going to do with their inferences about students ideas when they used the technique as "an assessment method … at the end of the instruction".

As the Palestinian adage goes, you do not fatten up the cow by weighing it, just as you do not facilitate learning simply by testing students. To mix my farmyard allusions, this seems to be a study of closing the barn door after the horse has already bolted.

Work cited

Kala, N., Yaman, F., & Ayas, A. (2013). The effectiveness of predict-observe-explain technique in probing students' understanding about acid-base chemistry: a case for the concepts of pH, pOH, and strength. International Journal of Science and Mathematics Education, 11(3), 555-574. https://doi.org/10.1007/s10763-012-9354-z
Taber, K. S. (2013). Classroom-based Research and Evidence-based Practice: An introduction (2nd ed.). London: Sage.
Taber, K. S. (2014). Student Thinking and Learning in Science: Perspectives on the Nature and Development of Learners' Ideas. New York: Routledge.
Taber, K. S. (2015). Epistemic relevance and learning chemistry in an academic context. In I. Eilks & A. Hofstein (Eds.), Relevant Chemistry Education: From Theory to Practice (pp. 79-100). Sense Publishers.