methodology

Delusions of educational impact

A 'peer-reviewed' study claims to improve academic performance by purifying the souls of students suffering from hallucinations

Keith S. Taber

The research design is completely inadequate…the whole paper is confused…the methodology seems incongruous…there is an inconsistency…nowhere is the population of interest actually identified…No explanation of the discrepancy is provided…results of this analysis are not reported…the 'interview' technique used in the study is highly inadequate…There is a conceptual problem here…neither the validity nor reliability can be judged…the statistic could not apply…the result is not reported…approach is completely inappropriate…these tables are not consistent…the evidence is inconclusive…no evidence to demonstrate the assumed mechanism…totally unsupported claims…confusion of recommendations with findings…unwarranted generalisation…the analysis that is provided is useless…the research design is simply inadequate…no control condition…such a conclusion is irresponsible
Some issues missed in peer review for a paper in the European Journal of Education and Pedagogy

An invitation to publish without regard to quality?

I received an email from an open-access journal called the European Journal of Education and Pedagogy, with the subject heading 'Publish Fast and Pay Less' which immediately triggered the thought "another predatory journal?" Predatory journals publish submissions for a fee, but do not offer the editorial and production standards expected of serious research journals. In particular, they publish material which clearly falls short of rigorous research despite usually claiming to engage in peer review.

A peer reviewed journal?

Checking out the website I found the usual assurances that the journal used rigorous peer review as:

"The process of reviewing is considered critical to establishing a reliable body of research and knowledge. The review process aims to make authors meet the standards of their discipline, and of science in general.

We use a double-blind system for peer-reviewing; both reviewers and authors' identities remain anonymous to each other. The paper will be peer-reviewed by two or three experts; one is an editorial staff and the other two are external reviewers."
https://www.ej-edu.org/index.php/ejedu/about

Peer review is critical to the scientific process. Work is only published in (serious) research journals when it has been scrutinised by experts in the relevant field, and any issues raised responded to in terms of revisions sufficient to satisfy the editor.

I could not find who the editor(-in-chief) was, but the 'editorial team' of European Journal of Education and Pedagogy were listed as

Bea Tomsic Amon, University of Ljubljana, Slovenia
Chunfang Zhou, University of Southern Denmark, Denmark
Gabriel Julien, University of Sheffield, UK
Intakhab Khan, King Abdulaziz University, Saudi Arabia
Mustafa Kayıhan Erbaş, Aksaray University, Turkey
Panagiotis J. Stamatis, University of the Aegean, Greece

I decided to look up the editor based in England where I am also based but could not find a web presence for him at the University of Sheffield. Using the ORCID (Open Researcher and Contributor ID) provided on the journal website I found his ORCID biography places him at the University of the West Indies and makes no mention of Sheffield.

If the European Journal of Education and Pedagogy is organised like a serious research journal, then each submission is handled by one of this editorial team. However the reference to "editorial staff" might well imply that, like some other predatory journals I have been approached by (e.g., Are you still with us, Doctor Wu?), the editorial work is actually carried out by office staff, not qualified experts in the field.

That would certainly help explain the publication, in this 'peer-reviewed research journal', of the first paper that piqued my interest enough to motivate me to access and read the text.

The Effects of Using the Tazkiyatun Nafs Module on the Academic Achievement of Students with Hallucinations

The abstract of the paper published in what claims to be a peer-reviewed research journal

The paper initially attracted my attention because it seemed to about treatment of a medical condition, so I wondered was doing in an education journal. Yet, the paper seemed to also be about an intervention to improve academic performance. As I read the paper, I found a number of flaws and issues (some very obvious, some quite serious) that should have been spotted by any qualified reviewer or editor, and which should have indicated that possible publication should have been be deferred until these matters were satisfactorily addressed.

This is especially worrying as this paper makes claims relating to the effective treatment of a symptom of potentially serious, even critical, medical conditions through religious education ("a spiritual approach", p.50): claims that might encourage sufferers to defer seeking medical diagnosis and treatment. Moreover, these are claims that are not supported by any evidence presented in this paper that the editor of the European Journal of Education and Pedagogy decided was suitable for publication.

An overview of what is demonstrated, and what is claimed, in the study.

Limitations of peer review

Peer review is not a perfect process: it relies on busy human beings spending time on additional (unpaid) work, and it is only effective if suitable experts can be found that fit with, and are prepared to review, a submission. It is also generally more challenging in the social sciences than in the natural sciences. ¹

That said, one sometimes finds papers published in predatory journals where one would expect any intelligent person with a basic education to notice problems without needing any specialist knowledge at all. The study I discuss here is a case in point.

Purpose of the study

Under the heading 'research objectives', the reader is told,

"In general, this journal [article?] attempts to review the construction and testing of Tazkiyatun Nafs [a Soul Purification intervention] to overcome the problem of hallucinatory disorders in student learning in secondary schools. The general objective of this study is to identify the symptoms of hallucinations caused by subtle beings such as jinn and devils among students who are the cause of disruption in learning as well as find solutions to these problems.

Meanwhile, the specific objective of this study is to determine the effect of the use of Tazkiyatun Nafs module on the academic achievement of students with hallucinations.

To achieve the aims and objectives of the study, the researcher will get answers to the following research questions [sic]:

Is it possible to determine the effect of the use of the Tazkiyatun Nafs module on the academic achievement of students with hallucinations?"
Awang, 2022, p.42

I think I can save readers a lot of time regarding the research question by suggesting that, in this study, at least, the answer is no – if only because the research design is completely inadequate to answer the research question. (I should point that the author comes to the opposite conclusion: e.g., "the approach taken in this study using the Tazkiyatun Nafs module is very suitable for overcoming the problem of this hallucinatory disorder", p.49.)

Indeed, the whole paper is confused in terms of what it is setting out to do, what it actually reports, and what might be concluded. As one example, the general objective of identifying "the symptoms of hallucinations caused by subtle beings such as jinn and devils" (but surely, the hallucinations are the symptoms here?) seems to have been forgotten, or, at least, does not seem to be addressed in the paper. ²

The study assumes that hallucinations are caused by subtle beings such as jinn and devils possessing the students.
(Image by Tünde from Pixabay)

So, this seems to be an intervention study.

Some students suffer from hallucinations.
This is detrimental to their education.
It is hypothesised that the hallucinations are caused by supernatural spirits ("subtle beings that lead to hallucinations"), so, a soul purification module might counter this detriment;
if so, sufferers engaging with the soul purification module should improve their academic performance;
and so the effect of the module is being tested in the study.

Thus we have a kind of experimental study?

No, not according to the author. Indeed, the study only reports data from a small number of unrepresentative individuals with no controls,

"The study design is a case study design that is a qualitative study in nature. This study uses a case study design that is a study that will apply treatment to the study subject to determine the effectiveness of the use of the planned modules and study variables measured many times to obtain accurate and original study results. This study was conducted on hallucination disorders [students suffering from hallucination disorders?] to determine the effectiveness of the Tazkiyatun Nafs module in terms of aspects of student academic achievement."
Awang, 2022, p.42

Case study?

So, the author sees this as a case study. Research methodologies are better understood as clusters of similar approaches rather than unitary categories – but case study is generally seen as naturalistic, rather than involving an intervention by an external researcher. So, case study seems incongruous here. Case study involves the detailed exploration of an instance (of something of interest – a lesson, a school, a course of tudy, a textbook, …) reported with 'thick description'.

Read about the characteristics of case study research

The case is usually a complex phenomena which is embedded within a context from which is cannot readily be untangled (for example, a lesson always takes place within a wider context of a teacher working over time with a class on a course of study, within a curricular, and institutional, and wider cultural, context, all of which influence the nature of the specific lesson). So, due to the complex and embedded nature of cases, they are all unique.

"a case study is a study that is full of thoroughness and complex to know and understand an issue or case studied…this case study is used to gain a deep understanding of an issue or situation in depth and to understand the situation of the people who experience it"
Awang, 2022, p.42

A case is usually selected either because that case is of special importance to the researcher (an intrinsic case study – e.g., I studied this school because it is the one I was working in) or because we hope this (unique) case can tell us something about similar (but certainly not identical) other (also unique) cases. In the latter case [sic], an instrumental case study, we are always limited by the extent we might expect to be able to generalise beyond the case.

This limited generalisation might suggest we should not work with a single case, but rather look for a suitably representative sample of all cases: but we sometimes choose case study because the complexity of the phenomena suggests we need to use extensive, detailed data collection and analyses to understand the complexity and subtlety of any case. That is (i.e., the compromise we choose is), we decide we will look at one case in depth because that will at least give us insight into the case, whereas a survey of many cases will inevitably be too superficial to offer any useful insights.

So how does Awang select the case for this case study?

"This study is a case study of hallucinatory disorders. Therefore, the technique of purposive sampling (purposive sampling [sic]) is chosen so that the selection of the sample can really give a true picture of the information to be explored ….

Among the important steps in a research study is the identification of populations and samples. The large group in which the sample is selected is termed the population. A sample is a small number of the population identified and made the respondents of the study. A case or sample of n = 1 was once used to define a patient with a disease, an object or concept, a jury decision, a community, or a country, a case study involves the collection of data from only one research participant…
Awang, 2022, p.42

Of course, a case study of "a community, or a country" – or of a school, or a lesson, or a professional development programme, or a school leadership team, or a homework policy, or an enrichnment activity, or … – would almost certainly be inadequate if it was limited to "the collection of data from only one research participant"!

I do not think this study actually is "a case study of hallucinatory disorders [sic]". Leading aside the shift from singular ("a case study") to plural ("disorders"), the research does not investigate a/some hallucinatory disorders, but the effect of a soul purification module on academic performance. (Actually, spoiler alert 😉, it does not actually investigate the effect of a soul purification module on academic performance either, but the author seems to think it does.)

If this is a case study, there should be the selection of a case, not a sample. Sometimes we do sample within a case in case study, but only from those identified as part of the case. (For example, if the case was a year group in a school, we may not have resources to interact in depth with several hundred different students). Perhaps this is pedantry as the reader likely knows what Awang meant by 'sample' in the paper – but semantics is important in research writing: a sample is chosen to represent a population, whereas the choice of case study is an acknowledgement that generalisation back to a population is not being claimed).

However, if "among the important steps in a research study is the identification of populations" then it is odd that nowhere in the paper is the population of interest actually specified!

Things slip our minds. Perhaps Awang intended to define the population, forgot, and then missed this when checking the text – buy, hey, that is just the kind of thing the reviewers and editor are meant to notice! Otherwise this looks very like including material from standard research texts to play lip-service to the idea that research-design needs to be principled, but without really appreciating what the phrases used actually mean. This impression is also given by the descriptions of how data (for example, from interviews) were analysed – but which are not reflected at all in the results section of the paper. (I am not accusing Awang of this, but because of the poor standard of peer review not raising the question, the author is left vulnerable to such an evaluation.)

The only one research participant?

So, what do we know about the "case or sample of n = 1 ", the "only one research participant" in this study?

The actual respondents in this case study related to hallucinatory disorders were five high school students. The supportive respondents in the case study related to hallucination disorders were five counseling teachers and five parents or guardians of students who were the actual respondents."
Awang, 2022, p.42

It is certainly not impossible that a case could comprise a group of five people – as long as those five make up a naturally bounded group – that is a group that a reasonable person would recognise as existing as a coherent entiy as they clearly had something in common (they were in the same school class, for example; they were attending the same group therapy session, perhaps; they were a friendship group; they were members of the same extended family diagnosed with hallucinatory disorders…something!) There is no indication here of how these five make up a case.

The identification of the participants as a case might have made sense had the participants collectively undertaken the module as a group, but the reader is told: "This study is in the form of a case study. Each practice and activity in the module are done individually" (p.50). Another justification could have been if the module had been offered in one school, and these five participants were the students enrolled in the programme at that time but as "analysis of the respondents' academic performance was conducted after the academic data of all respondents were obtained from the respective respondent's school" (p.45) it seems they did not attend a single school.

The results tables and reports in the text refer to "respondent 1" to "respondent 4". In case study, an approach which recognises the individuality and inherent value of the particular case, we would usually assign assumed names to research participants, not numbers. But if we are going to use numbers, should there not be a respondent 5?

The other one research participant?

It seems that these is something odd here.

Both the passage above, and the abstract refer to five respondents. The results report on four. So what is going on? No explanation of the discrepancy is provided. Perhaps:

There only ever were four participants, and the author made a mistake in counting.
There only ever were four participants, and the author made a typographical mistake (well, strictly, six typographical mistakes) in drafting the paper, and then missed this in checking the manuscript.
There were five respondents and the author forgot to include data on respondent 5 purely by accident.
There were five respondents, but the author decided not to report on the fifth deliberately for a reason that is not revealed (perhaps the results did not fit with the desired outcome?)

The significant point is not that there is an inconsistency but that this error was missed by peer reviewers and the editor – if there ever was any genuine peer review. This is the kind of mistake that a school child could spot – so, how is it possible that 'expert reviewers' and 'editorial staff' either did not notice it, or did not think it important enough to query?

Research instruments

Another section of the paper reports the instrumentation used in the paper.

"The research instruments for this study were Takziyatun Nafs modules, interview questions, and academic document analysis. All these instruments were prepared by the researcher and tested for validity and reliability before being administered to the selected study sample [sic, case?]."
Awang, 2022, p.42

Of course, it is important to test instruments for validity and reliability (or perhaps authenticity and trustworthiness when collecting qualitative data). But it is also important

to tell the reader how you did this
to report the outcomes

which seems to be missing (apart from in regard to part of the implemented module – see below). That is, the reader of a research study wants evidence not simply promises. Simply telling readers you did this is a bit like meeting a stranger who tells you that you can trust them because they (i.e., say that they) are honest.

Later the reader is told that

"Semi- structured interview questions will be [sic, not 'were'?] developed and validated for the purpose of identifying the causes and effects of hallucinations among these secondary school students…

…this interview process will be [sic, not 'was'] conducted continuously [sic!] with respondents to get a clear and specific picture of the problem of hallucinations and to find the best solution to overcome this disorder using Islamic medical approaches that have been planned in this study
Awang, 2022, pp.43-44

At the very least, this seems to confuse the plan for the research with a report of what was done. (But again, apparently, the reviewers and editorial staff did not think this needed addressing.) This is also confusing as it is not clear how this aspect of the study relates to the intervention. Were the interviews carried out before the intervention to help inform the design of the modules (presumably not as they had already been "tested for validity and reliability before being administered to the selected study sample"). Perhaps there are clear and simple answers to such questions – but the reader will not know because the reviewers and editor did not seem to feel they needed to be posed.

If "Interviews are the main research instrument in this study" (p.43), then one would expect to see examples of the interview schedules – but these are not presented. The paper reports a complex process for analysing interview data, but this is not reflected in the findings reported. The readers is told that the six stage process leads to the identifications and refinement of main and sub-categories. Yet, these categories are not reported in the paper. (But, again, peer reviewers and the editor did not apparently raise this as something to be corrected.) More generally "data analysis used thematic analysis methods" (p.44), so why is there no analysis presented in terms of themes? The results of this analysis are simply not reported.

The reader is told that

"This interview method…aims to determine the respondents' perspectives, as well as look at the respondents' thoughts on their views on the issues studied in this study."
Awang, 2022, p.44

But there is no discussion of participants perspectives and views in the findings of the study. ² Did the peer reviewers and editor not think this needed addressing before publication?

Even more significantly, in a qualitative study where interviews are supposedly the main research instrument, one would expect to see extracts from the interviews presented as part of the findings to support and exemplify claims being made: yet, there are none. (Did this not strike the peer reviewers and editor as odd: presumably they are familiar with the norms of qualitative research?)

The only quotation from the qualitative data (in this 'qualitative' study) I can find appears in the implications section of the paper:

"Are you aware of the importance of education to you? Realize. Is that lesson really important? Important. The success of the student depends on the lessons in school right or not? That's right"
Respondent 3: Awang, 2022, p.49

This seems a little bizarre, if we accept this is, as reported, an utterance from one of the students, Respondent 3. It becomes more sensible if this is actually condensed dialogue:

"Are you aware of the importance of education to you?"

"Realize."

"Is that lesson really important?"

"Important."

"The success of the student depends on the lessons in school right or not?"

"That's right"

It seems the peer review process did not lead to suggesting that the material should be formatted according to the norms for presenting dialogue in scholarly texts by indicating turns. In any case, if that is typical of the 'interview' technique used in the study then it is highly inadequate, as clearly the interviewer is leading the respondent, and this is more an example of indoctrination than open-ended enquiry.

Random sampling of data

Completely incongruous with the description of the purposeful selection of the participants for a case study is the account of how the assessment data was selected for analysis:

"The process of analysis of student achievement documents is carried out randomly by taking the results of current examinations that have passed such as the initial examination of the current year or the year before which is closest to the time of the study."
Awang, 2022, p.44

Did the peer reviewers or editor not question the use of the term random here? It is unclear what is meant to by 'random' here, but clearly if the analysis was based on randomly selected data that would undermine the results.

Validating the soul purification module

There is also a conceptual problem here. The Takziyatun Nafs modules are the intervention materials (part of what is being studied) – so they cannot also be research instruments (used to study them). Surely, if the Takziyatun Nafs modules had been shown to be valid and reliable before carrying out the reported study, as suggested here, then the study would not be needed to evaluate their effectiveness. But, presumably, expert peer reviewers (if there really were any) did not see an issue here.

The reliability of the intervention module

The Takziyatun Nafs modules had three components, and the author reports the second of the three was subjected to tests of validity and reliability. It seems that Awang thinks that this demonstrates the validity and reliability of the complete intervention,

"The second part of this module will go through [sic] the process of obtaining the validity and reliability of the module. Proses [sic] to obtain this validity, a questionnaire was constructed to test the validity of this module. The appointed specialists are psychologists, modern physicians (psychiatrists), religious specialists, and alternative medicine specialists. The validity of the module is identified from the aspects of content, sessions, and activities of the Tazkiyatun Nafs module. While to obtain the value of the reliability coefficient, Cronbach's alpha coefficient method was used. To obtain this Cronbach's alpha coefficient, a pilot test was conducted on 50 students who were randomly selected to test the reliability of this module to be conducted."
Awang, 2022, pp.43-44

Now to unpack this, it may be helpful to briefly outline what the intervention involved (as as the paper is open access anyone can access and read the full details in the report).

From the MGM film 'A Night at the Opera' (1935): "The introduction of the module will elaborate on the introduction, rationale, and objectives of this module introduced"

The description does not start off very helpfully ("The introduction of the module will elaborate on the introduction, rationale, and objectives of this module introduced" (p.43) put me in mind of the Marx brothers: "The party of the first part shall be known in this contract as the party of the first part"), but some key points are,

"the Tazkiyatun Nafs module was constructed to purify the heart of each respondent leading to the healing of hallucinatory disorders. This liver purification process is done in stages…

"the process of cleansing the patient's soul will be done …all the subtle beings in the patient will be expelled and cleaned and the remnants of the subtle beings in the patient will be removed and washed…

The second process is the process of strengthening and the process of purification of the soul or heart of the patient …All the mazmumah (evil qualities) that are in the heart must be discarded…

The third process is the process of enrichment and the process of distillation of the heart and the practices performed. In this process, there will be an evaluation of the practices performed by the patient as well as the process to ensure that the patient is always clean from all the disturbances and disturbances [sic] of subtle beings to ensure that students will always be healthy and clean from such disturbances…
Awang, 2022, p.45, p.43

Quite how this process of exorcising and distilling and cleansing will occur is not entirely clear (and if the soul is equated with the heart, how is the liver involved?), but it seems to involve reflection and prayer and contemplation of scripture – certainly a very personal and therapeutic process.

And yet its validity and reliability was tested by giving a questionnaire to 50 students randomly selected (from the unspecified population, presumably)? No information is given on how a random section was made (Taber, 2013) – which allows a reader to be very sceptical that this actually was a random sample from the (un?)identified population, and not just an arbitrary sample of 50 students. (So, that is twice the word 'random' is used in the paper when it seems inappropriate.)

It hardly matters here, as clearly neither the validity nor the reliability of a spiritual therapy can be judged from a questionnaire (especially when administered to people who have never undertaken the therapy). In any case, the "reliability coefficient" obtained from an administration of a questionnaire ONLY applies to that sample on that occasion. So, the statistic could not apply to the four participants in the study. And, in any case, the result is not reported, so the reader has no idea what the value of Cronbach's alpha was (but then, this was described as a qualitative study!)

Moreover, Cronbach's alpha only indicates the internal coherence of the items on a scale (Taber, 2019): so, it only indicates whether the set of questions included in the questionnaire seem to be accessing the same underlying construct in motivating the responses of those surveyed across the set of items. It gives no information about the reliability of the instrument (i.e., whether it would give the same results on another occasion).

This approach to testing validity and reliability is then completely inappropriate and unhelpful. So, even if the outcomes of the testing had been reported (and they are not) they would not offer any relevant evidence. Yet it seems that peer reviewers and editor did not think to question why this section was included in the paper.

Ethical issues

A study of this kind raises ethical issues. It may well be that the research was carried out in an entirely proper and ethical manner, but it is usual in studies with human participants ('human subjects') to make this clear in the published report (Taber, 2014b). A standard issue is whether the participants gave voluntary, informed, consent. This would mean that they were given sufficient information about the study at the outset to be able to decide if they wished to participate, and were under no undue pressure to do so. The 'respondents' were school students: if they were considered minors in the research context (and oddly for a 'case study' such basic details as age and gender are not reported) then parental permission would also be needed, again subject to sufficient briefing and no duress.

However, in this specific research there are also further issues due to the nature of the study. The participants were subject to medical disorders, so how did the researcher obtain information about, and access to, the students without medical confidentiality being broken? Who were the 'gatekeepers' who provided access to the children and their personal data? The researcher also obtained assessment data "from the class teacher or from the Student Affairs section of the student's school" (p.44), so it is important to know that students (and parents/guardians) consented to this. Again, peer review does not seem to have identified this as an issue to address before publication.

There is also the major underlying question about the ethics of a study when recognising that these students were (or could be, as details are not provided) suffering from serious medical conditions, but employing religious education as a treatment ("This method of treatment is to help respondents who suffer from hallucinations caused by demons or subtle beings", p.44). Part of the theoretical framework underpinning the study is the assumption that what is being addressed is"the problem of hallucinations caused by the presence of ethereal beings…" (p.43) yet it is also acknowledged that,

"Hallucinatory disorders in learning that will be emphasized in this study are due to several problems that have been identified in several schools in Malaysia. Such disorders are psychological, environmental, cultural, and sociological disorders. Psychological disorders such as hallucinatory disorders can lead to a more critical effect of bringing a person prone to Schizophrenia. Psychological disorders such as emotional disorders and psychiatric disorders. …Among the causes of emotional disorders among students are the school environment, events in the family, family influence, peer influence, teacher actions, and others."
Awang, 2022, p.41

There seem to be three ways of understanding this apparent discrepancy, which I might gloss:

there are many causes of conditions that involve hallucinations, including, but not only, possession by evil or mischievousness spirits;
the conditions that lead to young people having hallucinations may be understood at two complementary levels, at a spiritual level in terms of a need for inner cleansing and exorcising of subtle beings, and in terms of organic disease or conditions triggered by, for example, social and psychological factors;
in the introduction the author has relied on various academic sources to discuss the nature of the phenomenon of students having hallucinations, but he actually has a working assumption that is completely different: hallucinations are due to the presence of jinn or other spirits.

I do not think it is clear which of these positions is being taken by the study's author.

In the first case it would be necessary to identify which causes are present in potential respondents and only recruit those suffering possession for this study (which does not seem to have been done);
In the second case, spiritual treatment would need to complement medical intervention (which would completely undermine the validity of the study as medical treatments for the underlying causes of hallucinations are likely to be the cause of hallucinations ceasing, not the tested intervention);
The third position is clearly problematic in terms of academic scholarship as it is either completely incompetent or deliberately disregards academic norms that require the design of a study to reflect the conceptual framework set out to motivate it.

So, was this tested intervention implemented instead of or alongside formal medical intervention?

If it was alongside medical treatment, then that raises a major confound for the study.
Yet it would clearly be unacceptable to deny sufferers indicated medical treatment in order to test an educational intervention that is in effect a form of exorcism.

Again, it may be there are simple and adequate responses to these questions (although here I really cannot see what they might be), but unfortunately it seems the journal referees and editor did not think to ask for them.

Findings

The key findings presented concern academic performance at school. Core results are presented in tables I and II. Unfortunately these tables are not consistent as they report contradictory results for the academic performance of students before and during periods when they had hallucinations.

They can be made consistent if the reader assumes that two of the columns in table II are mislabelled. If the reader assumes that the column labelled 'before disruption' actually reports the performance 'during disruption' and that the column actually labelled 'during disruption' is something else, then they become consistent. For the results to tell a coherent story and agree with the author's interpretation this 'something else' presumably should be 'after disruption'.

This is a very unfortunate error – and moreover one that is obvious to any careful reader. (So, why was it not obvious to the referees and editor?)

As well as looking at these overall scores, other assessment data is presented separately for each of respondent 1 – respondent 4. Theses sections comprise presentations of information about grades and class positions, mixed with claims about the effects of the intervention. These claims are not based on any evidence and in many cases are conclusions about 'respondents' in general although they are placed in sections considering the academic assessment data of individual respondents. So,there are a number of problems with these claims:

they are of the nature of conclusions, but appear in the section presenting the findings;
they are about the specific effects of the intervention that the author assumes has influenced academic performance, not the data analysed in these sections;
they are completely unsubstantiated as no data or analysis is offered to support them;
often they make claims about 'respondents' in general, although as part of the consideration of data from individual learners.

Despite this, the paper passed peer-review and editorial scrutiny.

Rhetorical research?

This paper seems to be an example of a kind of 'rhetorical research' where a researcher is so convinced about their pre-existant theoretical commitments that they simply assume they have demonstrated them. Here the assumption seem to be:

Recovering from suffering hallucinations will increase student performance
Hallucinations are caused by jinn and devils
A spiritual intervention will expel jinn and devils
So, a spiritual intervention will cure hallucinations
So, a spiritual intervention will increase student performance

The researcher provided a spiritual intervention, and the student performance increased, so it is assumed that the scheme is demonstrated. The data presented is certainly consistent with the assumption, but does not in itself support this scheme without evidence. Awang provides evidence that student performance improved in four individuals after they had received the intervention – but there is no evidence offered to demonstrate the assumed mechanism.

A gardener might think that complimenting seedlings will cause them to grow. Perhaps she praises her seedlings every day, and they do indeed grow. Are we persuaded about the efficacy of her method, or might we suspect another cause at work? Would the peer-reveiewers and editor of the European Journal of Education and Pedagogy be persuaded this demonstrated that compliments cause plant growth? On the evidence of this paper, perhaps they would.

This is what Awang tells readers about the analysis undertaken:

Each student respondent involved in this study [sic, presumably not, rather the researcher] will use the analysis of the respondent's performance to determine the effect of hallucination disorders on student achievement in secondary school is accurate.

The elements compared in this analysis are as follows: a) difference in mean percentage of achievement by subject, b) difference in grade achievement by subject and c) difference in the grade of overall student achievement. All academic results of the respondents will be analyzed as well as get the mean of the difference between the performance before, during, and after the respondents experience hallucinations.

These results will be used as research material to determine the accuracy of the use of the Tazkiyatun Nafs Module in solving the problem of hallucinations in school and can improve student achievement in academic school."
Awang, 2022, p.45

There is clearly a large jump between the analysis outlined in the second paragraph here, and testing the study hypotheses as set out in the final paragraph. But the author does not seem to notice this (and more worryingly, nor do the journal's reviewers and editor).

So interleaved into the account of findings discussing "mean percentage of achievement by subject…difference in grade achievement by subject…difference in the grade of overall student achievement" are totally unsupported claims. Here is an example for Respondent 1:

"Based on the findings of the respondent's achievement in the grade for Respondent 1 while facing the problem of hallucinations shows that there is not much decrease or deterioration of the respondent's grade. There were only 4 subjects who experienced a decline in grade between before and during hallucination disorder. The subjects that experienced decline were English, Geography, CBC, and Civics. Yet there is one subject that shows a very critical grade change the Civics subject. The decline occurred from grade A to grade E. This shows that Civics education needs to be given serious attention in overcoming this problem of decline. Subjects experiencing this grade drop were subjects involving emotion, language, as well as psychomotor fitness. In the context of psychology, unstable emotional development leads to a decline in the psychomotor and emotional development of respondents.

After the use of the Tazkiyatun Nafs module in overcoming this problem, hallucinatory disorders can be overcome. This situation indicates the development of the respondents during and after experiencing hallucinations after practicing the Tazkiyatun Nafs module. The process that takes place in the Tzkiyatun Nafs module can help the respondent to stabilize his emotions and psyche for the better. From the above findings there were 5 subjects who experienced excellent improvement in grades. The increase occurred in English, Malay, Geography, and Civics subjects. The best improvement is in the subject of Civic education from grade E to grade B. The improvement in this language subject shows that the respondents' emotions have stabilized. This situation is very positive and needs to be continued for other subjects so that respondents continue to excel in academic achievement in school.""
Awang, 2022, p.45 (emphasis added)

The material which I show here as underlined is interjected completely gratuitously. It does not logically fit in the sequence. It is not part of the analysis of school performance. It is not based on any evidence presented in this section. Indeed, nor is it based on any evidence presented anywhere else in the paper!

This pattern is repeated in discussing other aspects of respondents' school performance. Although there is mention of other factors which seem especially pertinent to the dip in school grades ("this was due to the absence of the respondents to school during the day the test was conducted", p.46; "it was an increase from before with no marks due to non-attendance at school", p.46) the discussion of grades is interspersed with (repetitive) claims about the effects of the intervention for which no evidence is offered.

	Respondent 1	Respondent 2	Respondent 3	Respondent 4
§: Differences in Respondents' Grade Achievement by Subject	"After the use of the Tazkiyatun Nafs module in overcoming this problem, hallucinatory disorders can be overcome. This situation indicates the development of the respondents during and after experiencing hallucinations after practicing the Tazkiyatun Nafs module. The process that takes place in the Tzkiyatun Nafs module can help the respondent to stabilize his emotions and psyche for the better." (p.45)	"After the use of the Tazkiyatun Nafs module as a soul purification module, showing the development of the respondents during and after experiencing hallucination disorders is very good. The process that takes place in the Tzkiyatun Nafs module can help the respondent to stabilize his emotions and psyche for the better." (p.46)	"*The process that takes place in the Tazkiyatun Nafs module can help the respondent to stabilize his emotions and psyche for the better*" (p.46)	"*The process that takes place in the Tazkiyatun Nafs module can help the respondent to stabilize his emotions and psyche for the better*." (p.46)
§:Differences in Respondent Grades according to Overall Academic Achievement	"Based on the findings of the study after the hallucination disorder was overcome showed that the development of the respondents was very positive after going through the treatment process using the Tazkiyatun Nafs module…In general, the use of Tazkiyatun Nafs module successfully changed the learning lifestyle and achievement of the respondents from poor condition to good and excellent achievement." (pp.46-7)	"Based on the findings of the study after the hallucination disorder was overcome showed that the development of the respondents was very positive after going through the treatment process using the Tazkiyatun Nafs module. … This excellence also shows that the respondents have recovered from hallucinations after practicing the methods found in the Tazkiayatun Nafs module that has been introduced. In general, the use of the Tazkiyatun Nafs module successfully changed the learning lifestyle and achievement of the respondents from poor condition to good and excellent achievement." (p.47)	"Based on the findings of the study after the hallucination disorder was overcome showed that the development of the respondents was very positive after going through the treatment process using the Tazkiyatun Nafs module…In general, the use of the Tazkiyatun Nafs module successfully changed the learning lifestyle and achievement of the respondents from poor condition to good and excellent achievement." (p.47)	"Based on the findings of the study after the hallucination disorder was overcome showed that the development of the respondents was very positive after going through the treatment process using the Tazkiyatun Nafs module…In general, the use of the Tazkiyatun Nafs module has successfully changed the learning lifestyle and achievement of the respondents from poor condition to good and excellent achievement." (p.47)

Unsupported claims made within findings sections reporting analyses of individual student academic grades: note (a) how these statements included in the analysis of individual school performance data from four separate participants (in a case study – a methodology that recognises and values diversity and individuality) are very similar across the participants; (b) claims about 'respondents' (plural) are included in the reports of findings from individual students.

Awang summarises what he claims the analysis of 'differences in respondents' grade achievement by subject' shows:

"The use of the Tazkiyatun Nafs module in this study helped the students improve their respective achievement grades. Therefore, this soul purification module should be practiced by every student to help them in stabilizing their soul and emotions and stay away from all the disturbances of the subtle beings that lead to hallucinations"
Awang, 2022, p.46

And, on the next page, Awang summarises what he claims the analysis of 'differences in respondent grades according to overall academic achievement' shows:

"The use of the Tazkiyatun Nafs module in this study helped the students improve their respective overall academic achievement. Therefore, this soul purification module should be practiced by every student to help them in stabilizing the soul and emotions as well as to stay away from all the disturbances of the subtle beings that lead to hallucination disorder."
Awang, 2022, p.47

So, the analysis of grades is said to demonstrate the value of the intervention, and indeed Awang considers this is reason to extend the intervention beyond the four participants, not just to others suffering hallucinations, but to "every student". The peer review process seems not to have raised queries about

the unsupported claims,
the confusion of recommendations with findings (it is normal to keep to results in a findings section), nor
the unwarranted generalisation from four hallucination suffers to all students whether healthy or not.

Interpreting the results

There seem to be two stories that can be told about the results:

When the four students suffered hallucinations, this led to a deterioration in their school performance. Later, once they had recovered from the episodes of hallucinations, their school performance improved.
Narrative 1

Now narrative 1 relies on a very substantial implied assumption – which is that the numbers presented as school performance are comparable over time. So, a control would be useful: such as what happened to the performance scores of other students in the same classes over the same time period. It seems likely they would not have shown the same dip – unless the dip was related to something other than hallucinations – such as the well-recognised dip after long school holidays, or some cultural distraction (a major sports tournament; fasting during Ramadan; political unrest; a pandemic…). Without such a control the evidence is suggestive (after all, being ill, and missing school as a result, is likely to lead to a dip in school performance, so the findings are not surprising), but inconclusive.

Intriguingly, the author tells readers that "student achievement statistics from the beginning of the year to the middle of the current [sic, published in 2022] year in secondary schools in Northern Peninsular Malaysia that have been surveyed by researchers show a decline (Sabri, 2015 [sic])" (p.42), but this is not considered in relation to the findings of the study.

When the four students suffered hallucinations, this led to a deterioration in their school performance. Later, as a result of undergoing the soul purification module, their school performance improved.
Narrative 2

Clearly narrative 2 suffers from the same limitation as narrative 1. However, it also demands an extra step in making an inference. I could re-write this narrative:

When the four students suffered hallucinations, this led to a deterioration in their school performance. Later, once they had recovered from the episodes of hallucinations, their school performance improved.
AND
the recovery was due to engagement with the soul purification module.
Narrative 2'.

That is, even if we accept narrative 1 as likely, to accept narrative 2 we would also need to be convinced that:

a) sufferers from medical conditions leading to hallucinations do not suffer periodic attacks with periods of remission in between; or
b) episodes of hallucinations cannot be due to one-off events (emotional trauma, T.I.A. {transient ischaemic attack or mini-strokes},…) that resolve naturally in time; or
c) sufferers from medical conditions leading to hallucinations do not find they resolve due to maturation; or
d) the four participants in this study did not undertaken any change in life-style (getting more sleep, ceasing eating strange fungi found in the woods) unrelated to the intervention that might have influenced the onset of hallucinations; or
e) the four participants in this study did not receive any medical treatment independent of the intervention (e.g., prescribed medication to treat migraine episodes) that might have influenced the onset of hallucinations

Despite this study being supposedly a case study (where the expectation is there should be 'thick description' of the case and its context), there is no information to help us exclude such options. We do not know the medical diagnoses of the conditions causing the participants' hallucinations, or anything about their lives or any medical treatment that may have been administered. Without such information, the analysis that is provided is useless for answering the research question.

In effect, regardless of all the other issues raised, the key problem is that the research design is simply inadequate to test the research question. But it seems the referees and editor did not notice this shortcoming.

Alleged implications of the research

After presenting his results Awang draws various implications, and makes a number of claims about what had been found in the study:

"After the students went through the treatment session by using the Tazkiayatun Nafsmodule to treat hallucinations, it showed a positive effect on the student respondents. All this was certified by the expert, the student's parents as well as the counselor's teacher." (p.48)
"Based on these findings, shows that hallucinations are very disturbing to humans and the appropriate method for now to solve this problem is to use the Tazkiyatun Nafs Module." (p.48)
"…the use of the Tazkiyatun Nafs module while the respondent is suffering from hallucination disorder is very appropriate…is very helpful to the respondents in restoring their minds and psyche to be calmer and healthier. These changes allow students to focus on their studies as well as allow them to improve their academic performance better." (p.48)
"The use of the Tazkiyatun Nafs Module in this study has led to very positive changes there are attitudes and traits of students who face hallucinations before. All the negative traits like irritability, loneliness, depression,etc. can be overcome completely." (p.49)
"The personality development of students is getting better and perfect with the implementation of the Tazkiaytun Nafs module in their lives." (p.49)
"Results indicate that students who suffer from this hallucination disorder are in a state of high depression, inactivity, fatigue, weakness and pain,and insufficient sleep." (p.49)
"According to the findings of this study, the history of this hallucination disorder started in primary school and when a person is in adolescence, then this disorder becomes stronger and can cause various diseases and have various effects on a person who is disturbed." (p.50)

Given the range of interview data that Awang claims to have collected and analysed, at least some of the claims here are possibly supported by the data. However, none of this data and analysis is available to the reader. ² These claims are not supported by any evidence presented in the paper. Yet peer reviewers and the editor who read the manuscript seem to feel it is entirely acceptable to publish such claims in a research paper, and not present any evidence whatsoever.

Summing up

In summary: as far as these four students were concerned (but not perhaps the fifth participant?), there did seem to be a relationship between periods of experiencing hallucinations and lower school performance (perhaps explained by such factors as "absenteeism to school during the day the test was conducted" p.46) ,

"the performance shown by students who face chronic hallucinations is also declining and declining. This is all due to the actions of students leaving the teacher's learning and teaching sessions as well as not attending school when this hallucinatory disorder strikes. This illness or disorder comes to the student suddenly and periodically. Each time this hallucination disease strikes the student causes the student to have to take school holidays for a few days due to pain or depression"
Awang, 2022, p.42

However,

these four students do not represent any wider population;
there is no information about the specific nature, frequency, intensity, etcetera, of the hallucinations or diagnoses in these individuals;
there was no statistical test of significance of changes; and
there was no control condition to see if performance dips were experienced by others not experiencing hallucinations at the same time.

Once they had recovered from the hallucinations (and it is not clear on what basis that judgement was made) their scores improved.

The author would like us to believe that the relief from the hallucinations was due to the intervention, but this seems to be (quite literally) an act of faith ³ as no actual research evidence is offered to show that the soul purification module actually had any effect. It is of course possible the module did have an effect (whether for the conjectured or other reasons – such as simply offering troubled children some extra study time in a calm and safe environment and special attention – or because of an expectancy effect if the students were told by trusted authority figures that the intervention would lead to the purification of their hearts and the healing of their hallucinatory disorder) but the study, as reported, offers no strong grounds to assume it did have such an effect.

An irresponsible journal

As hallucinations are often symptoms of organic disease affecting blood supply to the brain, there is a major question of whether treating the condition by religious instruction is ethically sound. For example, hallucinations may indicate a tumour growing in the brain. Yet, if the module was only a complement to proper medical attention, a reader may prefer to suspect that any improvement in the condition (and consequent increased engagement in academic work) may have been entirely unrelated to the module being evaluated.

Indeed, a published research study that claims that soul purification is a suitable treatment for medical conditions presenting with hallucinations is potentially dangerous as it could lead to serious organic disease going untreated. If Awang's recommendations were widely taken up in Malaysia such that students with serious organic conditions were only treated for their hallucinations by soul purification rather than with medication or by surgery it would likely lead to preventable deaths. For a research journal to publish a paper with such a conclusion, where any qualified reviewer or editor could easily see the conclusion is not warranted, is irresponsible.

As the journal website points out,

"The process of reviewing is considered critical to establishing a reliable body of research and knowledge. The review process aims to make authors meet the standards of their discipline, and of science in general."
https://www.ej-edu.org/index.php/ejedu/about

So, why did the European Journal of Education and Pedagogy not subject this submission to meaningful review to help the author of this study meet the standards of the discipline, and of science in general?

Work cited:

Awang, S. B. (2022). Hallucination Disorders: The Effects of Using the Tazkiyatun Nafs Module on the Academic Achievement of Students with Hallucinations. European Journal of Education and Pedagogy, 3(4), 41-51.
Taber, K. S. (2013). Non-random thoughts about research. Chemistry Education Research and Practice, 14(4), 359-362. doi: 10.1039/c3rp90009f. [Free access]
Taber, K. S. (2014). Methodological issues in science education research: a perspective from the philosophy of science. In M. R. Matthews (Ed.), International Handbook of Research in History, Philosophy and Science Teaching (Vol. 3, pp. 1839-1893): Springer Netherlands.) (Download the author's manuscript version of the chapter.)
Taber, K. S. (2014b). Ethical considerations of chemistry education research involving "human subjects". Chemistry Education Research and Practice, 15(2), 109-113. [Free access]
Taber, K. S. (2018). The Use of Cronbach's Alpha When Developing and Reporting Research Instruments in Science Education. Research in Science Education, 48, 1273-1296. doi:10.1007/s11165-016-9602-2 [Open access]

Notes:

¹ In mature fields in the natural sciences there are recognised traditions ('paradigms', 'disciplinary matrices') in any active field at any time. In general (and of course, there will be exceptions):

at any historical time, there is a common theoretical perspective underpinning work in a research programme, aligned with specific ontological and epistemological commitments;
at any historical time, there is a strong alignment between the active theories in a research programme and the acceptable instrumentation, methodology and analytical conventions.

Put more succinctly, in a mature research field, there is generally broad agreement on how a phenomenon is to be understood; and how to go about investigating it, and how to interpret data as research evidence.

This is generally not the case in educational research – which is in part at least due to the complexity and, so, multi-layered nature, of the phenomena studied (Taber, 2014a): phenomena such as classroom teaching. So, in reviewing educational papers, it is sometimes necessary to find different experts to look at the theoretical and the methodological aspects of the same submission.

² The paper is very strange in that the introductory sections and the conclusions and implications sections have a very broad scope, but the actual research results are restricted to a very limited focus: analysis of school test scores and grades.

It is as if as (and could well be that) a dissertation with a number of evidential strands has been reduced to a paper drawing upon only one aspect of the research evidence, but with material from other sections of the dissertation being unchanged from the original broader study.

³ Readers are told that

"All these acts depend on the sincerity of the medical researcher or fortune-teller seeking the help of Allah S.W.T to ensure that these methods and means are successful. All success is obtained by the permission of Allah alone"
Awang, 2022, p.43

A case study of educational innovation?

Design and Assessment of an Online Prelab Model in General Chemistry

Keith S. Taber

Case study is meant to be naturalistic – whereas innovation sounds like an intervention. But interventions can be the focus of naturalistic enquiry.

One of the downsides of having spent years teaching research methods is that one cannot help but notice how so much published research departs from the ideal models one offers to students. (Which might be seen as a polite way of saying authors often seem to get key things wrong.) I used to teach that how one labelled one's research was less important than how well one explained it. That is, different people would have somewhat different takes on what is, or is not, grounded theory, case study or action research, but as long as an author explained what they had done, and could adequately justify why, the choice of label for the methodology was of secondary importance.

A science teacher can appreciate this: a student who tells the teacher they are doing a distillation when they are actually carrying out reflux – but clearly explains what they are doing and why, will still be understood (even if the error should be pointed out). On the other hand if a student has the right label but an alternative conception this is likely to be a more problematic 'bug' in the teaching-learning system. ¹

That said, each type of research strategy has its own particular weaknesses and strengths so describing something as an experiment, or a case study, if it did not actually share the essential characteristics of that strategy, can mislead the reader – and sometimes even mislead the authors such that invalid conclusions are drawn.

A 'case study', that really is a case study

I made reference above to action research, grounded theory, and case study – three methodologies which are commonly name-checked in education research. There are a vast number of papers in the literature with one of these terms in the title, and a good many of them do not report work that clearly fits the claimed approach! ²

The case study was published in the *Journal for the Research Center for Educational Technology*

So, I was pleased to read an interesting example of a 'case study' that I felt really was a case study (Llorens-Molina, 2009). 'Design and assessment of an online prelab model in general chemistry: A case study' offered a good example of a case study. Although, I suspect some other authors might have been tempted to describe this research differently.

Is it a bird, is it a plane; no it's…

Llorens-Molina's study included an experimental aspect. A cohort of learners was divided into two groups to allow the researcher to compare two different educational treatments; then, measurements were made to compare outcomes quantitatively. That might sound like an experiment. Moreover, this study reported an attempt to innovate in a teaching situation, which gives the work a flavour of action research. Despite this, I agree with Llorens-Molinathat that the work is best characterised as a case study.

Read about experiments

Read about action research

A case study focuses on 'one instance' from among many

What is a case study?

A case study is an in-depth examination of one instance: one example – of something for which there are many examples. The focus of a case study might be one learner, one teacher, one group of students working together on a task, one class, one school, one course, one examination paper, one text book, one laboratory session, one lesson, one enrichment programme… So, there is great variety in what kind of entity a case study is a study of, but what case studies have in common is they each focus in detail on that one instance.

Read about case study methodology

Characteristics of case study

Case studies are naturalistic studies, which means they are studies of things as they are, not attempts to change things. The case has to be bounded (a reader of a case study learns what is in the case and what is not) but tends to be embedded in a wider context that impacts upon it. That is, the case is entangled in a context from which it could not easily be extracted and still be the same case. (Imagine moving a teacher with her class from their school to have their lesson in a university where it could be observed by researchers – it would not be 'the same lesson' as would have occurred in situ).

The case study is reported in detail, often in a narrative form (not just statistical summaries) – what is sometimes called 'thick description'. Usually several 'slices' of data are collected – often different kinds of data – and often there is a process of 'triangulation' to check the consistency of the account presented in relation to the different slices of data available. Although case studies can include analysis of quantitative data, they are usually seen as interpretive as the richness of data available usually reflects complexity and invites nuance.

Design and Assessment of an Online Prelab Model in General Chemistry

Llorens-Molina's study explored the use of prelabs that are "used to introduce and contextualize laboratory work in learning chemistry" (p.15), and in particular "an alternative prelab model, which consists of an audiovisual tutorial associated with an online test" (p.15).

An innovation

The research investigated an innovation in teaching practice,

"In our habitual practice, a previous lecture at the beginning of each laboratory session, focused almost exclusively on the operational issues, was used. From our teaching experience, we can state that this sort of introductory activity contributes to a "cookbook" way to carry out the laboratory tasks. Furthermore, the lecture takes up valuable time (about half an hour) of each ordinary two-hour session. Given this set-up, the main goal of this research was to design and assess an alternative prelab model, which was designed to enhance the abilities and skills related to an inquiry-type learning environment. Likewise, it would have to allow us to save a significant amount of time in laboratory sessions due to its online nature….
a prelab activity developed …consists of two parts…a digital video recording about a brief tutorial lecture, supported by a slide presentation…[followed by ] an online multiple choice test"
Llorens-Molina, 2009, p.16-17

Not action research?

The reference to shifting "our habitual practice" indicates this study reports practitioner research. Practitioner studies, such as this, that test a new innovation are often labelled by authors as 'action research'. (Indeed, sometimes, the fact that research is carried out by practitioners looking to improve their own practice is seen as sufficient for action research: when actually this is a necessary, but not a sufficient condition.)

Genuine action research aims at improving practice, not simply seeing if a specific innovation is working. This means action research has an open-ended design, and is cyclical – with iterations of an innovation tested and the outcomes used as feedback to inform changes in the innovation. (Despite this, a surprising number of published studies labelled as action research lack any cyclic element, simply reporting one iteration of a innovation.) Llorens-Molina's study does not have a cyclic design, so would not be well-characterised as action research.

An experimental design?

Llorens-Molina reports that the study was motivated by three hypotheses (p.16):

"Substituting an initial lecture by an online prelab to save time during laboratory sessions will not have negative repercussions in final examination marks.
The suggested online prelab model will improve student autonomy and prerequisite knowledge levels during laboratory work. This can be checked by analyzing the types and quantity of SGQ [student generated questions].
Student self-perceptions about prelab activities will be more favourable than those of usual lecture methods."

To test these hypotheses the student cohort was divided into two groups, to be split between the customary and innovative approach. This seems very much like an experiment.

It may be useful here to make a discrimination between two levels of research design – methodology (akin to strategy) and techniques (akin to tactics). In research design, a methodology is chosen to meet the overall aims of the study, and then one or more research techniques are selected consistent with that methodology (Taber, 2013). Experimental techniques may be included in a range of methodologies, but experiment as an overall methodology has some specific features.

Read about Research design

In a true experiment there is random assignment to conditions, and often there is an intention to generalise results to a wider population considered to be sampled in the study. Llorens-Molina reports that although inferential statistics were used to test the hypotheses, there was no intention to offer statistical generalisation beyond the case. The cohort of students was not assumed to be a sample representing some wider population (such as, say, undergraduates on chemistry courses in Spain) – and, indeed, clearly such an assumption would not have been justified.

Case study is naturalistic – but an innovation is an intervention in practice…

Case study is said to be naturalistic research – it is a method used to understand and explore things as they are, not to bring about change. Yet, here the focus is an innovation. That seems a contradiction. It would be a contradiction if the study was being carried out by external researchers who had asked the teaching team to change practice for the benefits of their study. However, here it is useful to separate out the two roles of teacher and researcher.

This is a situation that I commonly faced when advising graduates preparing for school teaching who were required to carry out a classroom based study into an aspect of their school placement practice context as part of their university qualification (the Post-Graduate Certificate in Education, P.G.C.E.). Many of these graduates were unfamiliar with research into social phenomena. Science graduates often brought a model of what worked in the laboratory to their thinking about their projects – and had a tendency to think that transferring the experimental approach to classrooms (where there are usually a large number of potentially relevant variables, many of which can not be controlled) would be straightforward.

Read 'Why do natural scientists tend to make poor social scientists?'

The Cambridge P.G.C.E. teaching team put into place a range of supports to introduce graduate preparing for teaching to the kinds of education research useful for teachers who want to evaluate and improve their own teaching. This included a book written to introduce classroom-based research that drew heavily on analysis of published studies (Taber, 2007; 2013). Part of our advice was that those new to this kind of enquiry might want to consider action research and case study as suitable options for their small-scale projects.

Useful strategies for the novice practitioner-researcher (Figure: diagram used in working with graduates preparing for teaching, from **Taber, 2010)**

Simplistically, action research might be considered best suited to a project to test an innovation or address a problem (e.g., evaluating a new teaching resource; responding to behavioural issues), and case study best suited to an exploratory study (e.g., what do Y9 students understand about photosynthesis?; what is the nature of peer dialogue during laboratory working in this class?) However, it was often difficult for the graduates to carry out authentic action research as the constraints of the school-based placements seldom allowed them to test successive iterations of the same intervention until they found something like an optimal specification.

Yet, they often were in a good position to undertake a detailed study of one iteration, collecting a range of different data, and so producing a detailed evaluation. That sounds like a case study.

Case study is supposed to be naturalistic – whereas innovation sounds like an intervention. But some interventions in practice can be considered the focus of naturalistic enquiry. My argument was that when a teacher changes the way they do something to try and solve a problem, or simply to find a better way to work, that is a 'natural' part of professional practice. The teacher-researcher, as researcher, is exploring something the fully professional teacher does as matter of course – seek to develop practice. After all, our graduates were being asked to undertake research to give them the skills expected to meet professional teaching standards, which

"clearly requires the teacher to have both the procedural knowledge to undertake small-scale classroom enquiry, and 'conceptual frameworks' for thinking about teaching and learning that can provide the basis for evaluating their teaching. In other words, the professional teacher needs both the ability to do her own research and knowledge of what existing research suggests"
Taber, 2013, p.8

So, the research is on something that is naturally occurring in the classroom context, rather than an intervention imported into the context in order to answer an external researcher's questions. A case study of an intervention introduced by practitioners themselves can be naturalistic – even if the person implementing the change is the researcher as well as the teacher.

If a teacher-researcher (qua researcher) wishes to enquire into an innovation introduced by the teacher-researcher (qua teacher) then this can be considered as naturalistic enquiry

The case and the context

In Llorens-Molina's study, the case was a sequence of laboratory activities carried out by a cohort of undergraduates undertaking a course of General and Organic Chemistry as part of an Agricultural Engineering programme. So, the case was bounded (the laboratory part of one taught course) and embedded in a wider context – a degree programme in a specific institution in Spain: the Polytechnic University of Valencia.

The primary purpose of the study was to find out about the specific innovation in the particular course that provided the case. This was then what is known as an intrinsic case study. (When a case is studied primarily as an example of a class of cases, rather than primarily for its own interest, it is called an instrumental case study).

Llorens-Molina recognised that what was found in this specific case, in its particular context, could not be assumed to apply more widely. There can be no statistical generalisation to other courses elsewhere. In case study, the intention is to offer sufficient detail of the case for readers to make judgements of the likely relevance to other context of interest (so-called 'reader generalisation').

The published report gives a good deal of information about the course as well as much information about how data was collected, and equally important, analysed.

Different slices of data

Case study often uses a range of data sources to develop a rounded picture of the case. In this study the identification of three specific hypotheses (less usual in case studies, which often have more open-ended research questions) led to the collection of three different types of data.

Students were assessed on each of six laboratory activities. A comparison was made between the prelab condition and the existing approach.
Questions asked by students in the laboratories were recorded and analysed to see if the quality/nature of such questions was different in the two conditions. A sophisticated approach was developed to analyse the questions.
Students were asked to rate the prelabs through responding to items on a questionnaire.

This approach allowed the author to go beyond simply reporting whether hypotheses were supported by the analysis, to offer a more nuanced discussion around each feature. Such nuance is not only more informative to the reader of a case study, but reflects how the researcher, as practitioner, has an ongoing commitment to further develop practice and not see the study as an end in itself.

Avoiding the 'equivalence' and the 'misuse of control groups' problems

I particularly appreciate a feature of the research design that many educational studies that claim to be experiments could benefit from. To test his hypotheses Llorens-Molina employed two conditions or treatments, the innovation and a comparison condition, and divided the cohort: "A group with 21 students was split into two subgroups, with 10 and 11 in each one, respectively". Llorens-Molina does not suggest this was based on random assignment, which is necessary for a 'true' experiment.

In many such quasi-experiments (where randomisation to condition is not carried out, and is indeed often not possible) the researchers seek to offer evidence of equivalence before the treatments occur. After all, if the two subgroups are different in terms of past subject attainment or motivation or some other relevant factor (or, indeed, if there is no information to allow a judgement regarding whether this is the case or not), no inferences about an intervention can be drawn from any measured differences. (Although that does not always stop researchers from making such claims regardless: e.g., see Lack of control in educational research.)

Another problem is that if learners are participating in research but are assigned to a control or comparison condition then it could be asked if they are just being used as 'data fodder', and would that be fair to them? This is especially so in those cases (so, not this one) where researchers require that the comparison condition is educationally deficient – many published studies report a control condition where schools students have effectively been lectured to, and no discussion work, group work, practical work, digital resources, et cetera, have been allowed, in order to ensure a stark contrast with whatever supposedly innovative pedagogy or resource is being evaluated (Taber, 2019).

These issues are addressed in research designs which have a compensatory structure – in effect the groups switch between being the experimental and comparison condition – as here:

"Both groups carried out the alternative prelab and the previous lecture (traditional practice), alternately. In this way, each subgroup carried out the same number of laboratory activities with either a prelab and previous lecture"
Llorens-Molina, 2009, p.19

This is good practice both from methodological and ethical considerations.

The study used a compensatory design which avoids the need to ensure both groups are equivalent at the start, and does not disadvantage one group. (Figure from Llorens-Molina, 2009, p.22 – published under a creative commons Attribution-NonCommercial-NoDerivs 3.0 United States license allowing redistribution with attribution)

A case of case study

Do I think this is a model case study that perfectly exemplifies all the claimed characteristics of the methodology? No, and very few studies do. Real research projects, often undertaken in complex contexts with limited resources and intractable constraints, seldom fit such ideal models.

However, unlike some studies labelled as case studies, this study has an explicit bounded case and has been carried out in the spirit of case study that highlights and values the intrinsic worth of individual cases. There is a good deal of detail about aspects of the case. It is in essence a case study, and (unlike what sometimes seems to be the case [sic]) not just called a case study for want of a methodological label. Most educational research studies examine one particular case of something – but (and I do not think this is always appreciated) that does not automatically make them case studies. Because it has been both conceptualised and operationalised as a case study, Llorens-Molina's study is a coherent piece of research.

Given how, in these pages, I have often been motivated to call out studies I have read that I consider have major problems – major enough to be sufficient to undermine the argument for the claimed conclusions of the research – I wanted to recognise a piece of research that I felt offered much to admire.

Work cited:

Llorens-Molina, J.-A. (2009). Design and assessment of an online prelab model in general chemistry: A case study. Journal of the Research Center for Educational Technology, 4(2), 15-31. (Available at https://rcetj.org/index.php/rcetj/article/view/22 )
Taber, K. S. (2007) Classroom-based Research and Evidence-based Practice: A Guide for Teachers, SAGE Publications.
Taber, K. S. (2010). Preparing teachers for a research-based profession. In M. V. Zuljan & J. Vogrinc (Eds.), Facilitating effective student learning through teacher research and innovation (pp. 19-47). Ljubljana: Faculty of Education, University of Ljubljana. [download Facilitating effective student learning through teacher research and innovation]
Taber, K. S. (2013). Classroom-based Research and Evidence-based Practice: An introduction (2nd ed.). London: Sage.
Taber, K. S. (2019). Experimental research into teaching innovations: responding to methodological and ethical challenges. Studies in Science Education, 55(1), 69-119. doi:10.1080/03057267.2019.1658058 [Download manuscript version]

Notes:

¹ I am using language here reflecting a perspective on teaching as being based on a model (whether explicit or not) in the teacher's mind of the learners' current knowledge and understanding and how this will respond to teaching. That expects a great deal of the teacher, so there are often bugs in the system (e.g., the teacher over-estimates prior knowledge) that need to be addressed. This is why being a teacher involves being something of a 'learning doctor'.

Read about the learning doctor perspective on teaching

² I used to teach sessions introducing each of these methodologies when I taught on an Educational Research course. One of the class activities was to examine published papers claiming the focal methodology, asking students to see if studies matched the supposed characteristics of the strategy. This was a course with students undertaking a very diverse range of research projects, and I encouraged them to apply the analysis to papers selected because they were of particular interest and relevance to to their own work. Many examples selected by students proved to offer poor match between claimed methodology and the actual research design of ther study!

Psychological skills, academic achievement and…swimming

Keith S. Taber

'Psychological Skills in Relation to Academic Achievement through Swimming Context'

Original image by Clker-Free-Vector-Images from Pixabay

I was intrigued by the title of an article I saw in a notification: "Psychological Skills in Relation to Academic Achievement through Swimming Context". In part, it was the 'swimming context' – despite never having been very athletic or sporty (which is not to say I did not enjoy sports, just that I was never particularly good at any), I have always been a regular and enthusiastic swimmer. Not a good swimmer, mind (too splashy, too easily veering off-line) – but an enthusiastic one. But I was also intrigued by the triad of psychological skills, academic achievement, and swimming.

Perhaps I had visions of students' psychological skills being tested in relation to their academic achievement as they pounded up and down the pool. So, I was tempted to follow this up.

Investigating psychological skills and academic achievement

The abstract of the paper by Bayyat and colleagues reported three aims for their study:

"This study aimed to investigate:

(1) the level of psychological skills among students enrolled in swimming courses at the Physical Education faculties in the Jordanian Universities.

(2) the relation between their psychological skills and academic achievement.

(3) the differences in these psychological skills according to gender."

Bayyat et al., 2021: 4535

The article was published in a journal called 'Psychology and Education', which, its publishers* suggest is "a quality journal devoted to basic research, theory, and techniques and arts of practice in the general field of psychology and education".

A peer reviewed journal

The peer review policy reports this is a double-blind peer-reviewed journal. This means other academics have critiqued and evaluated a submission prior to its being accepted for publication. Peer review is a necessary (but not sufficient) condition for high quality research journals.

Journals with high standards use expert peer reviewers, and the editors use their reports to both reject low-quality submissions, and to seek to improve high-quality submissions by providing feedback to authors about points that are not clear, any missing information, incomplete chains of argumentation, and so forth. In the best journals editors only accept submissions after reviewers' criticisms have been addressed to the satisfaction of reviewers (or authors have made persuasive arguments for why some criticism does not need addressing).

(Read about peer review)

The authors here report that

"The statistical analysis results revealed an average level of psychological skills, significant differences in psychological skills level in favor of female students, A students and JU[**], and significant positive relation between psychological skills and academic achievement."
Bayyat et al., 2021: 4535

Rewriting slightly it seems that, according to this study:

the students in the study had average levels of psychological skills;
the female students have higher levels of psychological skills than their male peers;
and that there was some kind of positive correlation between psychological
skills and academic achievement;

Anyone reading a research paper critically asks themselves questions such as

'what do they mean by that?';
'how did they measure that?;
'how did they reach that conclusion?'; and
'who does this apply to?'

Females are better – but can we generalise?

In this study it was reported that

"By comparing psychological skills between male and female participants, results revealed significant differences in favor [sic] of female participants"
"All psychological skills' dimensions of female participants were significant in favor [sic] of females compared to their male peers. They were more focused, concentrated, confident, motivated to achieve their goals, and sought to manage their stress."
Bayyat et al., 2021: 4541, 4545

"It's our superior psychological skills that make us this good!" (Image by CristianoCavina from Pixabay)

A pedant (such as the present author) might wonder if "psychological skills' dimensions of female participants" [cf. psychological skills' dimensions of male participants?] would not be inherently likely to be in favour of females , but it is clear from the paper that this is intended to refer to the finding that females (as a group) got significantly higher ratings than males (as a group) on the measures of 'psychological skills'.

If we for the moment (but please read on below…) accept these findings as valid, an obvious question is the extent to which these results might generalise beyond the study. That is, to what extent would these findings being true for the participants of this study imply the same thing would be found more widely (e.g., among all students in Jordanian Universities? among all university students? Among all adult Jordanians? among all humans?)

Statistical generalisation (From Taber, 2019)

Two key concepts here are the population and the sample. The population is the group that we wish our study to be about (e.g., chemistry teachers in English schools, 11-year olds in New South Wales…), and the sample is the group who actually provide data. In order to generalise to the population from the sample it is important that the sample is large enough and representative of the population (which of course may be quite difficult to ascertain).

(Read about sampling in research)

(Read about generalisation)

In this study the reader is told that "The population of this study was undergraduate male and female students attending both intermediate and advanced swimming courses" (Bayyat et al., 2021: 4536). Taken at face value this might raise the question of why a sample was drawn exclusively from Jordan – unless of course this is the only national context where students attend intermediate or advanced swimming courses. *** However, it was immediately clarified that "They consisted of (n= 314) students enrolled at the schools of Sport Sciences at three state universities". That is, the population was actually undergraduate male and female students from schools of Sport Sciences at three Jordanian state universities attending both intermediate and advanced swimming courses.

"The Participants were an opportunity sample of 260 students" (Bayyat et al., 2021: 4536). So in terms of sample size, 260, the sample made up most of the population – almost 83%. This is in contrast to many educational studies where the samples may necessarily only reflect a small proportion of the population. In general, representatives of a sample is more important than size as skew in the sample undermines statistical generalisations (whereas size, for a representative sample, influences the magnitude of the likely error ****) – but a reader is likely to feel that when over four-fifths of the population were sampled it is less critical that a convenience sample was used.

This still does not ensure us that the results can be generalised to the population (students from schools of Sport Sciences at three Jordanian state universities attending 'both' intermediate and advanced swimming courses), but psychologically it seems quite convincing.

Ontology: What are we dealing with?

The study is only useful if it is about something that readers think is important – and it is clear what it is about. The authors tells us their study is about

Psychological Skills
Academic Achievement

which would seem to be things educators should be interested in. We do need to know however how the authors understand these constructs: what do they mean by 'a Psychological Skill' and 'Academic achievement'? Most people would probably think they have a pretty good idea what these terms might mean, but that is no assurance at all that different people would agree on this.

So, in reading this paper it is important to know what the authors themselves mean by these terms – so a reader can check they understand these terms in a sufficiently similar way.

What is academic achievement?

The authors suggest that

"academic achievement reflects the learner's accomplishment of specific goals by the end of an educational experience in a determined amount of time"
Bayyat et al., 2021: 4535
Bayyat et al., 2021: 4535

This seems to be the extent of the general characterisation of this construct in the paper *****.

What are psychological skills?

The authors tell readers that

"Psychological skills (PS) are a group of skills and abilities that enhances peoples' performance and achievement…[It has been] suggested that PS includes a whole set of trainable skills including emotional control and self-confidence"
Bayyat et al., 2021: 4535
Bayyat et al., 2021: 4535

For the purposes of this particular study, they

"identified the psychological skills related to the swimming context such as; leadership, emotional stability, sport achievement motivation, self-confidence, stress management, and attention"
Bayyat et al., 2021: 4536
Bayyat et al., 2021: 4536

So the relevant skills are considered to be:

leadership
emotional stability
sport achievement motivation
self-confidence
stress management
attention

I suspect that there would not be complete consensus among psychologists or people working in education over whether all of these constructs actually are 'skills'. Someone who did not consider these (or some of these) characteristics as skills would need to read the authors' claims arising from the study about 'psychological skills' accordingly (i.e., perhaps as being about something other than skills) but as the authors have been clear about their use of the term, this should not confuse or mislead readers.

Epistemology: How do we know?

Having established what is meant by 'psychological skills' and 'academic achievement' a reader would want to know how these were measured in the present study – do the authors use techniques that allow them to obtain valid and reliable measures of 'psychological skills' and 'academic achievement'?

How is academic achievement measured?

The authors inform readers that

"To calculate students' academic achievement, the instructors of the swimming courses conducted a valid and reliable assessment as a pre-midterm, midterm, and final exam throughout the semester…The assessment included performance tests and theoretical tests (paper and pencil tests) for each level"
Bayyat et al., 2021: 4538
Bayyat et al., 2021: 4538

Although the authors claim their assessment are valid and reliable, a careful reader will note that the methodology here does not match the definition of "accomplishment of specific goals by the end of an educational experience" (emphasis added)- as only the final examinations took place at the end of the programme. On that point, then, there is a lack of internal consistency in the study. This might not matter to a reader who did not think academic achievement needed to be measured at the end of a course of study.

Information on the "Academic achievement assessment tool", comprising six examinations (pre-midterm, midterm, and final examinations at each of the intermediate and advanced levels) is included as an appendix – good practice that allows a reader to interrogate the instrument.

Although this appendix is somewhat vague on precise details, it offers a surprise to someone (i.e., me) with a traditional notion of what is meant by 'academic achievement' – as both theory and practical aspects are included. Indeed, most of the marks seem to be given for practical swimming proficiency. So, the 'Intermediate swimming Pre-midterm exam' has a maximum of 20 marks available – with breast stroke leg technique and arm technique each scored out of ten marks.

The 'Advanced swimming midterm exam' is marked out of 30, with 10 marks each available for the 200m crawl (female), individual medley (female) and life guarding techniques. This seems to suggest that 20 of the 30 marks available can only be obtained by being female, but this point does not seem to be clarified. Presumably (?) male students had a different task that the authors considered equivalent.

How are psychological skills measured?

In order to measure psychological skills the authors proceeded to "to develop and validate a questionnaire" (p.4536). Designing a new instrument is a complex and challenging affair. The authors report how they

"generated a 40 items-questionnaire reflecting the psychological skills previously mentioned [leadership, emotional stability, sport achievement motivation, self-confidence, stress management, and attention] by applying both deductive and inductive methods. The items were clear, understandable, reflect the real-life experience of the study population, and not too long in structure."
Bayyat et al., 2021: 4538

So, items were written which it was thought would reflect the focal skills of interest. (Unfortunately there are no details of what the authors mean by "applying both deductive and inductive methods" to generate the items.) Validity was assured by asking a panel of people considered to have expertise to critique the items:

"the scale was reviewed and assessed by eight qualified expert judges from different related fields (sport psychology, swimming, teaching methodology, scientific research methodology, and kinesiology). They were asked to give their opinion of content representation of the suggested PS [psychological skills], their relatedness, clarity, and structure of items. According to the judges' reviews, we omitted both leadership and emotional stability domains, in addition to several items throughout the questionnaire. Other items were rephrased, and some items were added. Again, the scale was reviewed by four judges, who agreed on 80% of the items."

So, construct validity was a kind of face validity, in that people considered to be experts thought the final set of items would elicit the constructs intended, but there was no attempt to see if responses correlated in any way with any actual measurements of the 'skills'.

Readers of the paper wondering if they should be convinced by the study would need to judge if the expert panel had the right specialisms to evaluate scale items for 'psychological skills',and might find some of the areas of expertise (i.e.,

sport psychology
swimming
teaching methodology
scientific research methodology
kinesiology)

more relevant than others:

Self-reports

If respondents responded honestly, their responses would have reflected their own estimates of their 'skills' – at least to the extent that their interpretation of the items matched that of the experts. (That is, there was no attempt to investigate how members of the population of interest would understand what was meant by the items.)

Here are some examples of the items in the instrument:

Construct ('psychological skill')	Example item
self-confidence	I manage my time effectively while in class
sports motivation achievement	I do my best to control everything related to swimming lessons.
attention	I can pay attention and focus on different places in the pool while carrying out swimming tasks
stress-management	I am not afraid to perform any difficult swimming skill, no matter what

Examples of statements students were asked to rate in order to measure their 'psychological skills' (source: Bayyat et al., 2021: 4539-4541)

Analysis of data

The authors report various analyses of their data, that lead to the conclusions they reach. If a critical reader was convinced about matters so far, they would still need to beleive that the analyses undertaken were

appropriate, and
completed competently, and
correctly interpreted.

Drawing conclusions

However, as a reader I personally would have too many quibbles with the conceptualisation and design of instrumentation to consider the analysis in much detail.

To my mind, at least, the measure of 'academic achievement' seems to be largely an evaluation of swimming skills. They are obviously important in a swimming course, but I do not consider this a valid measure of academic achievement. That is not a question of suggesting academic achievement is better or more important than practical or athletic achievements, but it is surely something different (akin to me claiming to have excellent sporting achievement on the basis of holding a PhD in education).

The measure of psychological skills does not convince me either. I am not sure some of the focal constructs can really be called 'skills' (self-confidence? motivation?), but even if they were, there is no attempt to directly measure skill. At best, the questionnaire offers self-reports of how students perceive (or perhaps wish to be seen as perceiving) their characteristics.

It is quite common in research to see the term 'questionnaire' used for an instrument that is intended to test knowledge or skill – but questionnaires are not the right kind of instrument for that job.

(Read about questionnaires)

Significant positive relation between psychological skills and academic achievement?

So, I do not think this methodology would allow anyone to find a "significant positive relation between psychological skills and academic achievement" – only a relationship between students self-ratings on some psychological characteristics and swimming achievement. (That may reflect an interesting research question, and could perhaps be a suitable basis for a study, but is not what this study claims to be about.)

Significant differences in psychological skills level in favor of female students?

In a similar way, although it is interesting that females tended to score higher on the questionnaire scales, this shows they had higher self-ratings on average, and tells us nothing about their actual skills.

It may be that the students have great insight into these constructs and their own characteristics and so make very accurate ratings on these scales – but with absolutely no evidential basis for thinking this there are no grounds for making such a claim.

An alternative interpretation of the results is that on average the male students under-rate their 'skills' compared to their female peers. That is the 'skills' could be much the same across gender, but there might be a gender-based difference in perception. (I am not suggesting that is the case, but the evidence presented in the paper can be explained just as well by that possibility.)

An average level of psychological skills?

Finally, we might ask what is meant by

"The statistical analysis results revealed an average level of psychological skills…"
"Results of this study revealed that the participants acquired all four psychological skills at a moderate level."
Bayyat et al., 2021: 4535, 4545

Even leaving aside that what is being measured is something other than psychological skills, it is hard to see how these statements can be justified. This was the first administration of a new instrument being applied to a sample of a very specific population.

The paper reports standard deviations for the ratings on the items in the questionnaire, so – as would be expected – there were distributions of results: spreads with different students giving different ratings. Within the sample tested, some of the students will have given higher than median ratings on an item, some will have given lower than median ratings – although on average the ratings for that particular item would have been – average for this sample (that is, by definition!) So, assuming this claim (of average/moderate levels of psychological skills) was not meant as a tautology, the authors might seem to be suggesting that the ratings given on this administration of the instrument align with what would be typically obtained, that is from across other administrations.

That is, the authors seem to be suggesting that the ratings given on this administration of the instrument align with what they expect would be typically obtained from across other administrations. Of course they have absolutely no way of knowing that is the case without collecting data from samples of other populations.

What the authors actually seem to be basing these claims (of average/moderate levels of psychological skills) on is that the average responses on these scales did not give a very high or very low rating in terms of the raw scale. Yet, with absolutely no reference data for how other groups of people might respond on the same instrument that offers little useful information. At best, it suggests something welcome about the instrument itself (ideally one would wish items to elicit a spread of responses, rather than having most responses rated very high or very low) but nothing about the students sampled.

On this point the authors seem to be treating the scale as calibrated in terms of some nominal standard (e.g. 'a rating of 3-4 would be the norm'), when there is no inherent interpretation of particular ratings of items in such a scale that can just be assumed – rather this would be a matter that would need to be explored empirically.

The research paper as an argument

The research paper is a very specific genre of writing. It is an argument for new knowledge claims. The conclusions of the paper rest on a chain of argument that starts with the conceptualisation of the study and moves through research design, data collection, analysis, and interpretation. As a reader, any link in the argument chain that is not convincing potentially invalidates the knowledge claim(s) being made. Thus the standards expected for research papers are very high.

In sum then, this was an intriguing study, but did not convince me (even if it apparently convinced the peer reviewers and editor of Psychology and Education). I am not sure it was really about psychological skills or, academic achievement

…but at least it was clearly set in the context of swimming.

Work cited:

Bayyat, M. M., Orabi, S. M., Al-Tarawneh, A. D., Alleimon, S. M., & Abaza, S. N. (2021). Psychological Skills in Relation to Academic Achievement through Swimming Context. Psychology and Education, 58(5), 4535-4551.

Taber, K. S. (2019). Experimental research into teaching innovations: responding to methodological and ethical challenges. Studies in Science Education, 55(1), 69-119. doi:10.1080/03057267.2019.1658058 [Download manuscript version]

* Despite searching the different sections of the journal site, I was unable to find who publishes the journal. However, searching outside the site I found a record of the publisher of this journal being 'Auricle Technologies, Pvt., Ltd'.

** It transpired later in the paper that 'JU' referred to students at the University of Jordan: one of three universities involved in the study.

*** I think literally this means those who participated in the study were students attending both an intermediate swimming course and an advanced swimming course – but I read this to mean those who participated in the study were students attending either an intermediate or advanced swimming course. This latter interpretation is consistent with information given elsewhere in the paper: "All schools of sports sciences at the universities of Jordan offer mandatory, reliable, and valid swimming programs. Students enroll in one of three swimming courses consequently: the basic, intermediate, and advanced levels". (Bayyat et al., 2021: 4535, emphasis added)

**** That is, if the sample is unrepresentative of the population, there is no way to know how biased the sample might be. However, if there is a representative sample, then although there will still likely be some error (the results for the sample will not be precisely what the results across the whole population would be) it is possible to calculate the likely size of this error (e.g., say ±3%) which will be smaller when a higher proportion of the population are sampled.

***** It is possible some text that was intended to be at this point has gone missing during production – as, oddly, the following sentence is

facilisi, veritus invidunt ei mea (Times New Roman, 10)
Bayyat et al., 2021: 4535

which seems to be an accidental retention of text from the journal's paper template.

A case of hybrid research design?

When is "a case study" not a case study? Perhaps when it is (nearly) an experiment?

Keith S. Taber

I read this interesting study exploring learners shifting conceptions of the particulate nature of gases.

Mamombe, C., Mathabathe, K. C., & Gaigher, E. (2020). The influence of an inquiry-based approach on grade four learners' understanding of the particulate nature of matter in the gaseous phase: a case study. EURASIA Journal of Mathematics, Science and Technology Education, 16(1), 1-11. doi:10.29333/ejmste/110391

Key features:

Science curriculum context: the particulate nature of matter in the gaseous phase
Educational context: Grade 4 students in South Africa
Pedagogic context: Teacher-initiated inquiry approach (compared to a 'lecture' condition/treatment)
Methodology: "qualitative pre-test/post-test case study design" – or possibly a quasi-experiment?
Population/sample: the sample comprised 116 students from four grade four classes, two from each of two schools

This study offers some interesting data, providing evidence of how students represent their conceptions of the particulate nature of gases. What most intrigued me about the study was its research design, which seemed to reflect an unusual hybrid of quite distinct methodologies.

In this post I look at whether the study is indeed a case study as the authors suggest, or perhaps a kind of experiment. I also make some comments about the teaching model of the states of matter presented to the learners, and raise the question of whether the comparison condition (lecturing 8-9 year old children about an abstract scientific model) is appropriate, and indeed ethical.

Learners' conceptions of the particulate nature of matter

This paper is well worth reading for anyone who is not familiar with existing research (such as that cited in the paper) describing how children make sense of the particulate nature of matter, something that many find counter-intuitive. As a taster for this, I reproduce here two figures from the paper (which is published open access under a creative common license* that allows sharing and adaption of copyright material with due acknowledgement).

Pre-test data from Mamombe et al, 2020
Post-test data from Mamombe et al, 2020.

Conceptions are internal, and only directly available to the epistemic subject, the person holding the conception. (Indeed, some conceptions may be considered implicit, and so not even available to direct introspection.) In research, participants are asked to represent their understandings in the external 'public space' – often in talk, here by drawing (Taber, 2013). The drawings have to be interpreted by the researchers (during data analysis). In this study the researchers also collected data from group work during learning (in the enquiry condition) and by interviewing students.

What kind of research design is this?

Mamombe and colleagues describe their study as "a qualitative pre-test/post-test case study design with qualitative content analysis to provide more insight into learners' ideas of matter in the gaseous phase" (p. 3), yet it has many features of an experimental study.

The study was

"conducted to explore the influence of inquiry-based education in eliciting learners' understanding of the particulate nature of matter in the gaseous phase"
p.1

The experiment compared two pedagogical treatments :

"inquiry-based teaching…teacher-guided inquiry method" (p.3) guided by "inquiry-based instruction as conceptualized in the 5Es instructional model" (p.5)
"direct instruction…the lecture method" (p.3)

These pedagogic approaches were described:

"In the inquiry lessons learners were given a lot of materials and equipment to work with in various activities to determine answers to the questions about matter in the gaseous phase. The learners in the inquiry lessons made use of their observations and made their own representations of air in different contexts."
"the teacher gave probing questions to learners who worked in groups and constructed different models of their conceptions of matter in the gaseous phase. The learners engaged in discussion and asked the teacher many questions during their group activities. Each group of learners reported their understanding of matter in the gaseous phase to the class"
p.5, p.1

"In the lecture lessons learners did not do any activities. They were taught in a lecturing style and given all the notes and all the necessary drawings.
In the lecture classes the learners were exposed to lecture method which constituted mainly of the teacher telling the learners all they needed to know about the topic PNM [particulate nature of matter]. …During the lecture classes the learners wrote a lot of notes and copied a lot of drawings. Learners were instructed to paste some of the drawings in their books."
pp.5-6

The authors report that,

"The learners were given clear and neat drawings which represent particles in the gaseous, liquid and solid states…The following drawing was copied by learners from the chalkboard."
p.6

Figure used to teach learners in the 'lecture' condition. Figure © 2020 by the authors of the cited paper *

A teaching model of the states of matter

This figure shows increasing separation between particles moving from solid to liquid to gas. It is not a canonical figure, in that the spacing in a liquid is not substantially greater than in a solid (indeed, in ice floating on water the spacing is greater in the solid), whereas the difference in spacing in the two fluid states is under-represented.

Such figures do not show the very important dynamic aspect: that in a solid particles can usually only oscillate around a fixed position (a very low rate of diffusion not withstanding), where in a liquid particles can move around, but movement is restricted by the close arrangement of (and intermolecular forces between) the particles, where in a gas there is a significant mean free path between collisions where particles move with virtually constant velocity. A static figure like this, then, does not show the critical differences in particle interactions which are core to the basic scientific model

Perhaps even more significant, figure 2 suggests there is the same level of order in the three states, whereas the difference in ordering between a solid and liquid is much more significant than any change in particle spacing.

In teaching, choices have to be made about how to represent science (through teaching models) to learners who are usually not ready to take on board the full details and complexity of scientific knowledge. Here, Figure 2 represents a teaching model where it has been decided to emphasise one aspect of the scientific model (particle spacing) by distorting the canonical model, and to neglect other key features of the basic scientific account (particle movement and arrangement).

External teachers taught the classes

The teaching was undertaken by two university lecturers

"Two experienced teachers who are university lecturers and well experienced in teacher education taught the two classes during the intervention. Each experienced teacher taught using the lecture method in one school and using the teacher-guided inquiry method in the other school."
p.3

So, in each school there was one class taught by each approach (enquiry/lecture) by a different visiting teacher, and the teachers 'swapped' the teaching approaches between schools (a sensible measure to balance possible differences between the skills/styles of the two teachers).

The research design included a class in each treatment in each of two schools

An experiment; or a case study?

Although the study compared progression in learning across two teaching treatments using an analysis of learner diagrams, the study also included interviews, as well as learners' "notes during class activities" (which one would expect would be fairly uniform within each class in the 'lecture' treatment).

The outcome

The authors do not consider their study to be an experiment, despite setting up two conditions for teaching, and comparing outcomes between the two conditions, and drawing conclusions accordingly:

"The results of the inquiry classes of the current study revealed a considerable improvement in the learners' drawings…The results of the lecture group were however, contrary to those of the inquiry group. Most learners in the lecture group showed continuous model in their post-intervention results just as they did before the intervention…only a slight improvement was observed in the drawings of the lecture group as compared to their pre-intervention results"
pp.8-9

These statements can be read in two ways – either

a description of events (it just happened that with these particular classes the researchers found better outcomes in the enquiry condition), or
as the basis for a generalised inference.

An experiment would be designed to test a hypothesis (this study does not seem to have an explicit hypothesis, nor explicit research questions). Participants would be assigned randomly to conditions (Taber, 2019), or, at least, classes would be randomly assigned (although then strictly each class should be considered as a single unit of analysis offering much less basis for statistical comparisons). No information is given in the paper on how it was decided which classes would be taught by which treatment.

Representativeness

A study could be carried out with the participation of a complete population of interest (e.g., all of the science teachers in one secondary school), but more commonly a sample is selected from a population of interest. In a true experiment, the sample has to be selected randomly from the population (Taber, 2019) which is seldom possible in educational studies.

The study investigated a sample of 'grade four learners'

In Mamombe and colleagues' study the sample is described. However, there is no explicit reference to the population from which the sample is drawn. Yet the use of the term 'sample' (rather than just, say, 'participants') implies that they did have a population in mind.

The aim of the study is given as to "to explore the influence of inquiry-based education in eliciting learners' understanding of the particulate nature of matter in the gaseous phase" (p.1) which could be considered to imply that the population is 'learners'. The title of the paper could be taken to suggest the population of interests is more specific: "grade four learners". However, the authors make no attempt to argue that their sample is representative of any particular population, and therefore have no basis for statistical generalisation beyond the sample (whether to learners, or to grade four learners, or to grade four learners in RSA, or to grade four learners in farm schools in RSA, or…).

Indeed only descriptive statistics are presented: there is no attempt to use tests of statistical significance to infer whether the difference in outcomes between conditions found in the sample would probably have also been found in the wider population.

(That is inferential stats. are commonly used to suggest 'we found a statistically significant better outcome in one condition in our sample, so in the hypothetical situation that we had been able to include the entire population in out study we would probably have found better mean outcomes in that same condition'.)

This may be one reason why Mamombe and colleagues do not consider their study to be an experiment. The authors acknowledge limitations in their study (as there always are in any study) including that "the sample was limited to two schools and two science education specialists as instructors; the results should therefore not be generalized" (p.9).

Yet, of course, if the results cannot be generalised beyond these four classes in two schools, this undermines the usefulness of the study (and the grounds for the recommendations the authors make for teaching based on their findings in the specific research contexts).

If considered as an experiment, the study suffers from other inherent limitations (Taber, 2019). There were likely novelty effects, and even though there was no explicit hypothesis, it is clear that the authors expected enquiry to be a productive approach, so expectancy effects may have been operating.

Analytical framework

In an experiment is it important to have an objective means to measure outcomes, and this should be determined before data are collected. (Read about 'Analysis' in research studies.). In this study methods used in previous published work were adopted, and the authors tell us that "A coding scheme was developed based on the findings of previous research…and used during the coding process in the current research" (p.6).

But they then go on to report,

"Learners' drawings during the pre-test and post-test, their notes during class activities and their responses during interviews were all analysed using the coding scheme developed. This study used a combination of deductive and inductive content analysis where new conceptions were allowed to emerge from the data in addition to the ones previously identified in the literature"
p.6

An emerging analytical frame is perfectly appropriate in 'discovery' research where a pre-determined conceptualisation of how data is to be understood is not employed. However in 'confirmatory' research, testing a specific idea, the analysis is operationalised prior to collecting data. The use of qualitative data does not exclude a hypothesis-testing, confirmatory study, as qualitative data can be analysed quantitatively (as is done in this study), but using codes that link back to a hypothesis being tested, rather than emergent codes. (Read about 'Approaches to qualitative data analysis'.)

Much of Mamombe and colleagues' description of their work aligns with an exploratory discovery approach to enquiry, yet the gist of the study is to compare student representations in relation to a model of correct/acceptable or alternative conceptions to test the relative effectiveness of two pedagogic treatments (i.e., an experiment). That is a 'nomothetic' approach that assumed standard categories of response.

Overall, the author's account of how they collected and analysed data seem to suggest a hybrid approach, with elements of both a confirmatory approach (suitable for an experiment) and elements of a discovery approach (more suitable for case study). It might seem this is a kind of mixed methods study with both confirmatory/nomothetic and discovery/idiographic aspects – responding to two different types of research question the same study.

Yet there do not actually seem (**) to be two complementary strands to the research (one exploring the richness of student's ideas, the other comparing variables – i.e., type of teaching versus degree of learning), but rather an attempt to hybridise distinct approaches based on incongruent fundamental (paradigmatic) assumptions about research. (** Having explicit research questions stated in the paper could have clarified this issue for a reader.)

So, do we have a case study?

Mamombe and colleagues may have chosen to frame their study as a kind of case study because of the issues raised above in regard to considering it an experiment. However, it is hard to see how it qualifies as case study (even if the editor and peer reviewers of the EURASIA Journal of Mathematics, Science and Technology Education presumably felt this description was appropriate).

Mamombe and colleagues do use multiple data sources, which is a common feature of case study. However, in other ways the study does not meet the usual criteria for case study. (Read more about 'Case study'.)

For one thing, case study is naturalistic. The method is used to study a complex phenomena (e.g., a teacher teaching a class) that is embedded in a wider context (e.g., a particular school, timetable, cultural context, etc.) such that it cannot be excised for clinical examination (e.g., moving the lesson to a university campus for easy observation) without changing it. Here, there was an intervention, imposed from the outside, with external agents acting as the class teachers.

Even more fundamentally – what is the 'case'?

A case has to have a recognisable ('natural') boundary, albeit one that has some permeability in relation to its context. A classroom, class, year group, teacher, school, school district, etcetera, can be the subject of a case study. Two different classes in one school, combined with two other classes from another school, does not seem to make a bounded case.

In case study, the case has to be defined (not so in this study); and it should be clear it is a naturally occurring unit (not so here); and the case report should provide 'thick description' (not provided here) of the case in its context. Mamombe and colleagues' study is simply not a case study as usually understood: not a "qualitative pre-test/post-test case study design" or any other kind of case study.

That kind of mislabelling does not in itself does not invalidate research – but may indicate some confusion in the basic paradigmatic underpinnings of a study. That seems to be the case [sic] here, as suggested above.

Suitability of the comparison condition: lecturing

A final issue of note about the methodology in this study is the nature of one of the two conditions used as a pedagogic treatment. In a true experiment, this condition (against which the enquiry condition was contrasted) would be referred to as the control condition. In a quasi-experiment (where randomisation of participants to conditions is not carried out) this would usually be referred to as the comparison condition.

At one point Mamombe and colleagues refer to this pedagogic treatment as 'direct instruction' (p.3), although this term has become ambiguous as it has been shown to mean quite different things to different authors. This is also referred to in the paper as the lecture condition.

Is the comparison condition ethical?

Parental consent was given for students contributing data for analysis in the study, but parents would likely trust the professional judgement of the researchers to ensure their children were taught appropriately. Readers are informed that "the learners whose parents had not given consent also participated in all the activities together with the rest of the class" (p.3) so it seems some children in the lecture treatment were subject to the inferior teaching approach despite this lack of consent, as they were studying "a prescribed topic in the syllabus of the learners" (p.3).

I have been very critical of a certain kind of 'rhetorical' research (Taber, 2019) report which

begins by extolling the virtues of some kind of active / learner-centred / progressive / constructivist pedagogy; explaining why it would be expected to provide effective teaching; and citing numerous studies that show its proven superiority across diverse teaching contexts;
then compares this with passive modes of learning, based on the teacher talking and giving students notes to copy, which is often characterised as 'traditional' but is said to be ineffective in supporting student learning;
then describes how authors set up an experiment to test the (superior) pedagogy in some specific context, using as a comparison condition the very passive learning approach they have already criticised as being ineffective as supporting learning.

My argument is that such research is unethical

It is not genuine science as the researchers are not testing a genuine hypothesis, but rather looking to demonstrate something they are already convinced of (which does not mean they could not be wrong, but in research we are trying to develop new knowledge).
It is not a proper test of the effectiveness of the progressive pedagogy as it is being compared against a teaching approach the authors have already established is sub-standard.

Most critically, young people are subjected to teaching that the researchers already believe they know will disadvantage them, just for the sake of their 'research', to generate data for reporting in a research journal. Sadly, such rhetorical studies are still often accepted for publication despite their methodological weaknesses and ethical flaws.

I am not suggesting that Mamombe, Mathabathe and Gaigher have carried out such a rhetorical study (i.e., one that poses a pseudo-question where from the outset only one outcome is considered feasible). They do not make strong criticisms of the lecturing approach, and even note that it produces some learning in their study:

"Similar to the inquiry group, the drawings of the learners were also clearer and easier to classify after teaching"
"although the inquiry method was more effective than the lecture method in eliciting improved particulate conception and reducing continuous conception, there was also improvement in the lecture group"
p.9, p.10

I have no experience of the South African education context, so I do not know what is typical pedagogy in primary schools there, nor the range of teaching approaches that grade 4 students there might normally experience (in the absence of external interventions such as reported in this study).

It is for the "two experienced teachers who are university lecturers and well experienced in teacher education" (p.3) to have judged whether a lecture approach based on teacher telling, children making notes and copying drawings, but with no student activities, can be considered an effective way of teaching 8-9 year old children a highly counter-intuitive, abstract, science topic. If they consider this good teaching practice (i.e., if it is the kind of approach they would recommend in their teacher education roles) then it is quite reasonable for them to have employed this comparison condition.

However, if these experienced teachers and teacher educators, and the researchers designing the study, considered that this was poor pedagogy, then there is a real question for them to address as to why they thought it was appropriate to implement it, rather than compare the enquiry condition with an alternative teaching approach that they would have expected to be effective.

Sources cited:

Mamombe, C., Mathabathe, K. C., & Gaigher, E. (2020). The influence of an inquiry-based approach on grade four learners' understanding of the particulate nature of matter in the gaseous phase: a case study. EURASIA Journal of Mathematics, Science and Technology Education, 16(1), 1-11. doi:10.29333/ejmste/110391
Taber, K. S. (2013). Modelling Learners and Learning in Science Education: Developing representations of concepts, conceptual structure and conceptual change to inform teaching and research. Dordrecht: Springer.
Taber, K. S. (2019). Experimental research into teaching innovations: responding to methodological and ethical challenges. Studies in Science Education, 55(1), 69-119. doi:10.1080/03057267.2019.1658058

* Material reproduced from Mamombe, Mathabathe & Gaigher, 2020 is © 2020 licensee Modestum Ltd., UK. That article is an open access article distributed under the terms and conditions of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/4.0/) [This post, excepting that material, is © 2020, Keith S. Taber.]

An introduction to research in education:

Taber, K. S. (2013). Classroom-based Research and Evidence-based Practice: An introduction (2nd ed.). London: Sage.