The sugger strikes back!

An update on the 'first annual International Survey of Research Centres and Institutes'


Keith S. Taber (masquerading as a learned academic)


if he wanted me to admit I had been wrong, Hussain could direct me to the released survey results and assure me that the data collected for the survey was not being used for other purposes. That is, he should given me grounds to think the survey was a genuine piece of research and not 'sugging'


Some months ago I published an article in this blog about a message I received from an organisation called Acaudio, that has a website where academics can post audio recordings promoting their research, that invited me to participate in "the first annual International Survey of Research Centres and Institutes". I was suspicious of this invitation for a number of reason as I discuss at 'The first annual International Survey of Gullible Research Centres and Institutes')

Several things suggested to me that this was not a genuine piece of academic research, including the commitment that "We will release the results over the next month" which seemed so unrealistic as to have been written either by someone with no experience of collecting and analysing large scale survey data – or someone with no intention of actually following through on the claim.

Sugging?

Having taken a look at the survey questions, I felt pretty sure thus was an example of what has been labelled as 'sugging'. Sugging is a widely recognised, and indeed widely adopted, unethical practice of collecting marketing information by framing it as a survey. The Market Research Society explains that,

Sugging is a market research industry term, meaning 'selling under the guise of research'. Sugging occurs when individuals or companies pretend to be market researchers conducting a research, when in reality they are trying to build databases, generate sales leads or directly sell product or services….

The practices of sugging and frugging [fundraising under the guise of market research] bring discredit on the profession of research… and mislead members of the public when they are being asked for their co-operation…

Failing to clearly specify the purpose for which the data is being collected is also a breach of…the first principle of the Data Protection Act 1998.

https://www.mrs.org.uk/standards/suggingfaq

Although I thought the chances of the results of the first annual International Survey of Research Centres and Institutes actually being released within the month, or even within a few months to allow for a modest level of over-promising, were pretty minuscule, I did think I should wait a few months and then do a search to see if such a report had appeared. I did not think I was likely to find such a report released into the public domain, but any scientist has to be open-minded enough to consider they might be wrong – and certainly in my own case I've collected enough empirical evidence over the years to know I am not just, in principle, fallible.

Acaudio doth protest too much, methink

But (being fallible) I'd rather forgotten about this and had not got round to doing a web search. Until, that is, I was prompted to do so by receiving an email from the company founder, Hussain Ayed, who had had his attention drawn to my blog, and was – understandably perhaps – not happy about my criticisms:



Hussain's letter did not address my specific points from the blog (as he did not want to "get into the nitty gritty of it all"), but assured me his company was genuinely trying to do useful work, and there was no scamming.

Of course, I had not suggested Acaudio, the organisation, was itself a 'scam': in my earlier article I had pointed that Acaudio was offering a free, open-access, service which was likely to be useful to academics – and even briefly pointed out some positive features of their website.

But Acaudio's 'survey' was a different matter. It did not meet the basic requirements for a serious academic study, and it asked questions that seemed to be clearly designed as linked to potential selling points for a company that was offering services to increase research impact (so, perhaps, Acaudio).



And it promised a fantastic time-scale. Perhaps a very large organisation, with staff fully dedicated to analysis and reporting could have released international survey results within a month of collecting data – perhaps? But Acaudio was a company with one company officer that reported employing one person.

Given the scale of the organisation, what Acaudio have achieved with their website in a relatively short time is highly impressive. But…

…where is that survey report?

I replied to Hussain, as below.

Dear Hussain Ayed

Thank you for your message.

I have not written "a comprehensive attack on [your] company" and do not have a sufficient knowledge-base to have done so. I have indeed, however, published a blog article criticising your marketing techniques based on the direct evidence in messages you have sent me. In particular, I claimed that,

(i) (despite being registered as a UK based company) you did not adhere to the UK regulations concerning direct marketing. (I assume you are not seeking to challenge this given the evidence of your own emails)

(ii) that you were also 'sugging': undertaking marketing under the guise of carrying out a survey.

If I understand your complaint, you are suggesting in regard to point (ii) that you really were carrying out a survey for the public good (rather than to collect information for your own commercial purposes) and that any apparent failure of rigour in this regard actually resulted from a lack of relevant expertise within the company. If so, perhaps you will send me, or tell me where I can access, the published outcome of the survey (due to be available by the middle of June 2023 according to your earlier message). I have looked on line for this, but a Google search (using the term "International Survey of Research Centres and Institutes") failed to locate the report.

Can you offer me an assurance that information collected for the survey was ONLY used for the analysis that led to the published survey report (assuming there is one you can point me to), and that this information was not retained by your organisation as a basis for contacting individuals with regard to your company's services? If you can offer appropriate assurances then I will be happy to add an inserted edit into the blog to include a statement along the lines that the company assures me that all information collected was only used for the purposes of producing a survey report, and was not retained or used in any other way by the company.

So, to summarise regarding point (ii), if this survey was not a scam, please (a) point me to the outcomes, and (b) give me these assurances about not collecting information under false premises.

You also have the right to reply directly. If you really think anything in my article amounted to "misleading bits of 'evidence' " then please do correct this. You are free to submit a response in the comments section at the bottom of the page. If you wish to do that, I will be happy to publish your reply (subject to my usual restrictions which I am sure should not be any impediment to you – so, I will not publish anything I think might be libellous of a third party, nor anything with obscenity/profanity etc. Sadly, I do sometimes have to reject comments of these kinds.)

I recognise that comments have less prominence than the blog article they follow, and that indeed some readers may not get that far in their engagement with an article. Therefore, if you do submit a reply I am happy to also add a statement at the HEAD of my article to point out out to readers that there is a reply on behalf of the company beneath the article, so my readers see that notice BEFORE proceeding to read my own account. I am not looking for people/organisations to criticise for the sake of it, but have become concerned about the extent of unethical practice in the name of academic work (such as the marketing of predatory journals and conferences) and do point out some of the examples that come my way. I believe such bad practice is very damaging, and especially so for students who are new to the academic world, and for those working working in under-resourced contexts who may be under extreme pressure to achieve 'tenure'. People spend their limited funds on getting published in journals that have no serious peer review (and so are not taken seriously by most academics), or presenting at conferences which 'invite' contributions from anyone prepared to pay the fees. I do not spend time looking for such bad practice: it arrives in my inbox on a fairly frequent basis.

Perhaps your intentions are indeed honourable, and perhaps you are doing good work. Perhaps you are indeed "working to tackle inequality in higher education and academia", which obviously would be valuable, although I am not sure how this is achieved by working with groups at Cambridge such as the Bioelectronic Systems Tech Group – unless you perhaps charge fees to those in wealthy institutions to allow you to offer a free service for those elsewhere? If you do: good on you. Even so, I would strongly suggest you 'clean up your act' as far as your marketing is concerned, and make sure your email campaigns are within the law. By failing to follow the regulations you present your organisation as either being unprofessional (giving the impression no one knows what they are doing) or dodgy (if you* know the regulations, but are choosing not to follow them). *I assume you are responsible for the marketing strategy, but even if someone else is doing this for you, I suspect you (as the only registered company officer) would be considered ultimately responsible for not following the regulations.

If you are genuine about wishing to learn more about undertaking quality surveys, there are many sources of information. My pages on research methods might be a place to get some introductory background, but if this to be a major part of your company's activity I would really suggest you should employ someone with expertise, or retain a consultant who works in that area.

Thank you for the offer to work with you, but I am retired and have too many existing projects to work on – and in any case you should work with someone you genuinely respect, not someone that you consider only to "masquerade as a learned academic" and who has "shaky morals".

Best wishes

Keith

My key point was that if he wanted me to admit I had been wrong, Hussain could direct me to the released survey results and assure me that the data collected for the survey was not being used for other purposes. That is, he should given me grounds to think the survey was a genuine piece of research and not 'sugging'.

The findings of the survey are 'reserved'

Later that day, I got the following reply:



So, it seems the research report that was supposed to have been released ("over the next month" – according to Acaudio's email dated 15th May 2023) was not available, and – furthermore – would not be made available to me.

  • A key principle of scientific research is that the outcomes are published – that is made available to the public: and not "reserved" for certain people the researchers select!
  • A key feature of ethical research is that a commitment is made to make outcomes available (as Acaudio did) and this is followed through (as Acaudio did not).
What is the research data being used for?

Hussain also failed to offer any assurances that the data collected under the claim (pretence, surely) of carrying out survey research was not being used for commercial purposes – as a basis for evaluating the potential merits of approaching different respondents to tender for services. I cannot prove that Acaudio was using the collected information for such purposes, but if my suspicions were misplaced (and if Hussain really wanted to persuade me that the survey was not intended as a scam) it would have been very easy to simply include a sentence in his response to that effect – to have assured me that the research data was being analysed anonymously and handled separately from the company's marketing data with a suitable 'ethical wall' between.1

That is, Hussain could have simply got into enough of the "nitty gritty" to have offered an assurance of following an ethical protocol, instead of choosing to insult me…as I pointed out to him:-


Dear Hussain

Thank you for your message.

So, the 'survey' results (if indeed any such document actually exists) that you indicated to me would be released by mid-June are still not actually available in the public domain. As you say: 'Hmm'.

You are right, that I would have no right to ask you to provide me with anything – except that YOU ASKED ME to believe I misjudged you, and to withdraw my public criticisms; and so I ASKED YOU to provide the evidence to persuade me by (i) proving there was a survey analysis with published results, and (ii) giving an assurance that you did not use, for your company's marketing purposes, data supposedly collected for publishable research. There is of course no reason why you should have provided either the results or the assurances, unless you actually did feel I had judged Acaudio too harshly and you wanted to give me reason to acknowledge this. The only thing that might give me "some sort of power over [you]" in this regard is your suggestion to me that I might wish to "take back the claims that [I] made". Can I remind you: you contacted me. You contacted me, unsolicited, in December 2022, and then again in May 2023. This morning, you contacted me again specifically to suggest my suggestions of wrong-doing were misjudged. But you will not back that up, so you have simply reinforced my earlier inferences.

For some reason that is not clear to me, you think that my mind is on money – that is presumably why I spend some of my valuable time highlighting poor academic practices on a personal website that brings in no income and is financed from my personal funds. Perhaps that is the company director finding it hard to get inside the mind of a retired teacher who worked his entire career in the public sector? (That is not meant as an insult – I probably have the reverse difficulty in understanding the motivations of the commercial mind. Perhaps that is why these are "things that are beyond [my] understanding"?) I do not have any problem with you setting up a company to make money (good luck to you if you work hard and treat people with due respect), and think it is perfectly possible for an organisation to both make money and produce public goods – I am not against commercial organisations per se. My 'vested interests' relate to commitments to certain values that I think underpin both good science and academic activities more broadly. A key one is honesty (which is one fundamental aspect of treating people with due respect). We are all entitled (perhaps even have a duty?) to make the strongest arguments for our positions, but when people knowingly misrepresent (e.g., "We will release the results over the next month" but no publication is forthcoming) in order to to advance their interests, this undermines the scholarly community. Anyone can be wrong. Anyone can be mistaken. Anyone can fail in a venture. (Such as promising a report, genuinely intending to produce one, but finding the task was more complex than anticipated. Had that been your response, I might have found this feasible. Instead, you promised to release the results, but now you claim you have "every right to ignore [my] request for the outcomes". Yes, that is so – if the commitment you made means nothing.) As long as we can trust each other to be open and honest the system will eventually self-correct in cases when there are false (but honestly motivated) claims. Yet, these days, academics are flooded with offers and claims that are not mistaken, but deliberately misleading. That is what I find so troublesome that I take time to call out examples. That may seem strange to you, but you have to remember I have worked as a school, college, and university, teacher all my working life, so I identify at a very deep level with the basic values underpinning the quest for knowledge and learning. When I get an email from someone claiming they are doing a survey, but which seems to be an attempt to market services, I do take it personally. I do not like to be lied to. I do not like to be treated as a fool. And I do not like the thought that perhaps less experienced colleagues and graduate students may take such approaches at face value and not appreciate they are being scammed. Can does not equate to should: you may have "the ability to write and say what [you] want", but that does not mean you have the right to deliberately mislead people. You say you will not be engaging with me any more. Fine. You started this correspondence with your unsolicited approaches. I will be very happy if you remove me from your marketing list (that I did not sign up for) and do not contact me again. That might be in both our interests.

And despite all this, I wish you well. Whatever your mistakes in the past, if you do genuinely wish to make a difference in the way you suggest, then I hope you are successful. But please, if you believe in your company and the contribution it can make, seek to be totally honest with potential clients. If you are in this for the long term, then developing trust and a strong reputation for ethical business practices will surely create a fund of social capital that will pay dividends as you build up the organisation. Whereas producing emails of the kind you have sent me today is likely to be counter-productive and just alienate people: using ad hominem points – I am masquerading as a learned academic, out of touch, arrogant, unfit and entitled; with shaky morals and vested interests; things are beyond my understanding; I write nonsense – simply suggests you have no substantive points to support your position. By doing this you automatically cede the higher ground. And, moreover, is that really the way you want your company represented in its communications?

Best wishes

Keith 


As I wrote above, Acaudio seem to be doing a really good job in setting up a platform where researchers can post accounts of their research – and given the scale of the organisation – I assume much (if not all) of that is down to Hussain. That, he can be proud of.

However, using the appearance of an international survey as a cover for collecting data that can be used to market a company's services is widely recognised as a dishonest and unethical (if not illegal 2) practice. I think he should less proud of himself in that regard.

If Hussain still wants to maintain that his request for contributions to the first annual International Survey of Research Centres and Institutes was intended as a genuine attempt at academic research, rather than just a marketing scam, then he still has the option of publishing a report of the study so that the academic community can evaluate the extent to which the survey meets the norms of genuine research; and so that, at very least, he will have met one key criterion of academic research (publication).

This would also show that Acaudio are prepared to meet their side of the contract they offered to potential respondents (i.e., please contribute to this survey – in consideration we will release the results over the next month). Any reputable business should be looking to make good on its promises.


Notes

1 The idea of an ethical wall (sometimes referred to as a 'Chinese wall') is important in businesses where there is the potential for conflicts of interest. Consider, for example, firms of lawyers that may have multiple clients, and where information offered in confidence by one client could have commercial value for another. The firm is expected to have protocols in place so that information about one client is not either leaked to another client, or (deliberately or inadvertently) influences advice given to another client. To avoid inadvertent influence, it may be necessary to ensure staff working with one client are not involved in work for another client that may be seen to have conflicting interests.

A company may hire a market research organisation to carry out market research to inform then about future strategies – so the people analysing the data have no bias due to preferred outcomes, and no temptation to misuse the data for direct marketing purposes. The commissioned report will not identify particular respondents. Then there is an ethical wall between the market researchers who report on the overall state of the market, and the client company's marketing and sales section.

My reference to the small size of Acaudio is not intended as an inherent criticism. My original point was that such a small company was unlikely to have the capacity to carry out a meaningful international survey (which does not imply the intention to do so was necessarily inauthentic – Acaudio might have simply overstretched itself).

However, a very small company might well have inherent difficulties in carrying out genuine research which did not leak information about specific respondents to those involved in sales.

Many surveys invite people to offer their email if they wish for feedback or to make themselves available for follow-up interviews – but offer an assurance the email address will not be used for other purposes, and need not be given to participate. Acaudio's survey required identifying information.2 This is a strong indicator that the primary purpose was not scholarly research.



2 The Data Protection Act 2018 concerns personal information:

"Everyone responsible for using personal data has to follow strict rules called 'data protection principles'. They must make sure the information is:

  • used fairly, lawfully and transparently
  • used for specified, explicit purposes
  • used in a way that is adequate, relevant and limited to only what is necessary
  • accurate and, where necessary, kept up to date
  • kept for no longer than is necessary
  • handled in a way that ensures appropriate security, including protection against unlawful or unauthorised processing, access, loss, destruction or damage"
GOV.UK

Acaudio's survey is nominally about research institutes not individual people.

However, it asks questions such as

  • "How satisfied are you with…"
  • "How much time do you spend…"
  • "Do you feel like…"
  • "What are the biggest challenges you face…"
  • "Who do you feel is…"
  • "How effective do you think…"
  • "Do you agree…"
  • "What would you consider..."
  • "How much would you consider…"
  • "Would you be interested in…"
  • "How do you decide…"
  • "What do you hope…"

This is information about a person, moreover a person of known email address:

" 'personal data' means any information relating to an identified or identifiable natural person ('data subject'); an identifiable natural person is one who can be identified, directly or indirectly, in particular by reference to an identifier such as a name, an identification number, location data, an online identifier…"

Information Commissioner's Office

So, if information collected by this survey was used for purposes other than the survey itself –

  • say perhaps for identifying sales leads {e.g., "How satisfied are you with the level of awareness people have of your centre / institute?" "How effective do you think your current promotion methods are?"; "How important is building an audience for the work of the research centre / institute?"};
  • and/or profiling potential clients
    • in terms of level of resource that might be available to buy services {e.g., "How much would you consider to be a reasonable amount to spend on promotional activities?"},
    • or priorities for research impact strategies {e.g., "What mediums [sic] would you consider using to promote your research centre / institute?"; "Do you agree it is important to have a dedicated person to take care of promotional activities?"}

– would that not be a breach of UK data protection law?


Educational experiments – making the best of an unsuitable tool?

Can small-scale experimental investigations of teaching carried-out in a couple of arbitrary classrooms really tells us anything about how to teach well?


Keith S. Taber


Undertaking valid educational experiments involves (often, insurmountable) challenges, but perhaps this grid (shown larger below) might be useful for researchers who do want to do genuinely informative experimental studies into teaching?


Applying experimental method to educational questions is a bit like trying to use a precision jeweller's screwdriver to open a tin of paint: you may get the tin open eventually, but you will probably have deformed the tool in the process whilst making something of a mess of the job.


In recent years I seem to have developed something of a religious fervour about educational research studies of the kind that claim to be experimental evaluations of pedagogies, classroom practices, teaching resources, and the like. I think this all started when, having previously largely undertaken interpretive studies (for example, interviewing learners to find out what they knew and understood about science topics) I became part of a team looking to develop, and experimentally evaluate, classroom pedagogy (i.e., the epiSTEMe project).

As a former school science teacher, I had taught learners about the basis of experimental method (e.g., control of variables) and I had read quite a number of educational research studies based on 'experiments', so I was pretty familiar with the challenges of doing experiments in education. But being part of a project which looked to actually carry out such a study made a real impact on me in this regard. Well, that should not be surprising: there is a difference between watching the European Cup Final on the TV, and actually playing in the match, just as reading a review of a concert in the music press is not going to impact you as much as being on stage performing.

Let me be quite clear: the experimental method is of supreme value in the natural sciences; and, even if not all natural science proceeds that way, it deserves to be an important focus of the science curriculum. Even in science, the experimental strategy has its limitations. 1 But experiment is without doubt a precious and powerful tool in physics and chemistry that has helped us learn a great deal about the natural world. (In biology, too, but even here there are additional complications due to the variations within populations of individuals of a single 'kind'.)

But transferring experimental method from the laboratory to the classroom to test hypotheses about teaching is far from straightforward. Most of the published experimental studies drawing conclusions about matters such as effective pedagogy, need to be read with substantive and sometimes extensive provisos and caveats; and many of them are simply invalid – they are bad experiments (Taber, 2019). 2

The experiment is a tool that has been designed, and refined, to help us answer questions when:

  • we are dealing with non-sentient entities that are indifferent to outcomes;
  • we are investigating samples or specimens of natural kinds;
  • we can identify all the relevant variables;
  • we can measure the variables of interest;
  • we can control all other variables which could have an effect;

These points simply do not usually apply to classrooms and other learning contexts. 3 (This is clearly so, even if educational researchers often either do not appreciate these differences, or simply pretend they can ignore them.)

Applying experimental method to educational questions is a bit like trying to use a precision jeweller's screwdriver to open a tin of paint: you may get the tin open eventually, but you will probably have deformed the tool in the process whilst making something of a mess of the job.

The reason why experiments are to be preferred to interpretive ('qualitative') studies is that supposedly experiments can lead to definite conclusions (by testing hypotheses), whereas studies that rely on the interpretation of data (such as classroom observations, interviews, analysis of classroom talk, etc.) are at best suggestive. This would be a fair point when an experimental study genuinely met the control-of-variables requirements for being a true experiment – although often, even then, to draw generalisable conclusions that apply to a wide population one has to be confident one is working with a random or representatives sample, and use inferential statistics which can only offer a probabilistic conclusion.

My creed…researchers should prefer to undertake competent work

My proselytising about this issue, is based on having come to think that:

  • most educational experiments do not fully control relevant variables, so are invalid;
  • educational experiments are usually subject to expectancy effects that can influence outcomes;
  • many (perhaps most) educational experiments have too few independent units of analysis to allow the valid use of inferential statistics;
  • most large-scale educational experiments can not assure that samples are fully representative of populations, so strictly cannot be generalised;
  • many experiments are rhetorical studies that deliberately compare a condition (supposedly being tested but actually) assumed to be effective with a teaching condition known to fall short of good teaching practice;
  • an invalid experiment tells us nothing that we can rely upon;
  • a detailed case study of a learning context which offers rich description of teaching and learning potentially offers useful insights;
  • given a choice between undertaking a competent study of a kind that can offer useful insights, and undertaking a bad experiment which cannot provide valid conclusions, researchers should prefer to undertake competent work;
  • what makes work scientific is not the choice of methodology per se, but the adoption of a design that fits the research constraints and offers a genuine opportunity for useful learning.

However, experiments seem very popular in education, and often seem to be the methodology of choice for researchers into pedagogy in science education.

Read: Why do natural scientists tend to make poor social scientists?

This fondness of experiments will no doubt continue, so here are some thoughts on how to best draw useful implications from them.

A guide to using experiments to inform education

It seems there are two very important dimensions that can be used to characterise experimental research into teaching – relating to the scale and focus of the research.


Two dimensions used to characterise experimental studies of teaching


Scale of studies

A large-scale study has a large number 'units of analysis'. So, for example, if the research was testing out the value of using, say, augmented reality in teaching about predator-prey relationships, then in such a study there would need to be a large number of teaching-learning 'units' in the augmented learning condition and a similarly large number of teaching-learning 'units' in the comparison condition. What a unit actually is would vary from study to study. Here a unit might be a sequence of three lessons where a teacher teaches the topic to a class of 15-16 year-old learners (either with, or without, the use of augmented reality).

For units of analysis to be analysed statistically they need to be independent from each other – so different students learning together from the same teacher in the same classroom at the same time are clearly not learning independently of each other. (This seems obvious – but in many published studies this inconvenient fact is ignored as it is 'unhelpful' if researchers wish to use inferential statistics but are only working with a small number of classes. 4)

Read about units of analysis in research

So, a study which compared teaching and learning in two intact classes can usually only be considered to have one unit of analysis in each condition (making statistical tests completely irrelevant 5, thought this does not stop them often being applied anyway). There are a great many small scale studies in the literature where there are only one or a few units in each condition.

Focus of study

The other dimension shown in the figure concerns the focus of a study. By the focus, I mean whether the researchers are interested in teaching and learning in some specific local context, or want to find out about some general population.

Read about what is meant by population in research

Studies may be carried out in a very specific context (e.g., one school; one university programme) or across a wide range of contexts. That seems to simply relate to the scale of the study, just discussed. But by focus I mean whether the research question of interest concerns just a particular teaching and learning context (which may be quite appropriate when practitioner-researchers explore their own professional contexts, for exmample), or is meant to help us learn about a more general situation.


local focusgeneral focus
Why does school X get such outstanding science examination scores?Is there a relationship between teaching pedagogy employed and science examination results in English schools?
Will jig-saw learning be a productive way to teach my A level class about the properties of the transition elements?Is jig-saw learning an effective pedagogy for use in A level chemistry classes?
Some hypothetical research questions relating either to a specific teaching context, or a wider population. (n.b. The research literature includes a great many studies that claim to explore general research questions by collecting data in a single specific context.)

If that seems a subtle distinction between two quite similar dimensions then it is worth noting that the research literature contains a great many studies that take place in one context (small-scale studies) but which claim (implicitly or explicitly) to be of general relevance. So, many authors, peer reviewers, and editors clearly seem think one can generalise from such small scale studies.

Generalisation

Generalisation is the ability to draw general conclusions from specific instances. Natural science does this all the time. If this sample of table salt has the formula NaCl, then all samples of table salt do; if the resistance of this copper wire goes up when the wire is heated the same will be found with other specimens as well. This usually works well when dealing with things we think are 'natural kinds' – that is where all the examples (all samples of NaCl, all pure copper wires) have the same essence.

Read about generalisation in research

Education deals with teachers, classes, lessons, schools…social kinds that lack that kind of equivalence across examples. You can swap any two electrons in a structure and it will make absolutely no difference. Does any one think you can swap the teachers between two classes and safely assume it will not have an effect?

So, by focus I mean whether the point of the research is to find out about the research context in its own right (context-directed research) or to learn something that applies to a general category of phenomena (theory-directed research).

These two dimensions, then, lead to a model with four quadrants.

Large-scale research to learn about the general case

In the top-right quadrant is research which focuses on the general situation and is larger-scale. In principle 6 this type of research can address a question such as 'is this pedagogy (teaching resource, etc.) generally effective in this population', as long as

  • the samples are representative of the wider population of interest, and
  • those sampled are randomly assigned to conditions, and
  • the number of units supports statistical analysis.

The slight of hand employed in many studies is to select a convenience sample (two classes of thirteen years old students at my local school) yet to claim the research is about, and so offers conclusions about, a wider population (thirteen year learners).

Read about some examples of samples used to investigate populations


When an experiment tests a sample drawn at random from a wider population, then the findings of the experiment can be assumed to (probably) apply (on average) to the population. (Taber, 2019)

Even when a population is properly sampled, it is important not to assume that something which has been found to be generally effective in a population will be effective throughout the population. Schools, classes, courses, learners, topics, etc. vary. If it has been found that, say, teaching the reactivity series through enquiry generally works in the population of English classes of 13-14 year students, then a teacher of an English class of 13-14 year students might sensibly think this is an approach to adopt, but cannot assume it will be effective in her classroom, with a particular group of students.

To implement something that has been shown to generally work might be considered research-based teaching, as long as the approach is dropped or modified if indications are it is not proving effective in this particular context. That is, there is nothing (please note, UK Department for Education, and Ofsted) 'research-based' about continuing with a recommended approach in the face of direct empirical evidence that it is not working in your classroom.

Large-scale research to learn about the range of effectiveness

However, even large-scale studies where there are genuinely sufficient units of analysis for statistical analysis may not logically support the kinds of generalisation in the top-right quadrant. For that, researchers needs either a random sampling of the full population (seldom viable given people and institutions must have a choice to participate or not 7), or a sample which is known to be representative of the population in terms of the relevant characteristics – which means knowing a lot about

  • (i) the population,
  • (ii) the sample, and
  • (ii) which variables might be relevant!

Imagine you wanted to undertake a survey of physics teachers in some national context, and you knew you could not reach all that population so you needed to survey a sample. How could you possibly know that the teachers in your sample were representative of the wider population on whatever variables might potentially be pertinent to the survey (level of qualification?; years of experience?; degree subject?; type of school/college taught in?; gender?…)

But perhaps a large scale study that attracts a diverse enough sample may still be very useful if it collects sufficient data about the individual units of analysis, and so can begin to look at patterns in how specific local conditions relate to teaching effectiveness. That is, even if the sample cannot be considered representative enough for statistical generalisation to the population, such a study might be a be to offer some insights into whether an approach seems to work well in mixed-ability classes, or top sets, or girls' schools, or in areas of high social deprivation, or…

In practice, there are very few experimental research studies which are large-scale, in the sense of having enough different teachers/classes as units of analysis to sit in either of these quadrants of the chart. Educational research is rarely funded at a level that makes this possible. Most researchers are constrained by the available resources to only work with a small number of accessible classes or schools.

So, what use are such studies for producing generalisable results?

Small-scale research to incrementally extend the range of effectiveness

A single small-scale study can contribute to a research programme to explore the range of application of an innovation as if it was part of a large-scale study with a diverse sample. But this means such studies need to be explicitly conceptualised and planned as part of such a programme.

At the moment it is common for research papers to say something like

"…lots of research studies, from all over the place, report that asking students to

(i) first copy science texts omitting all the vowels, and then

(ii) re-constituting them in full by working from the reduced text, by writing it out adding vowels that produce viable words and sentences,

is an effective way of supporting the learning of science concepts; but no one has yet reported testing this pedagogic method when twelve year old students are studying the topic of acids in South Cambridgeshire in a teaching laboratory with moveable stools and West-facing windows.

In this ground-breaking study, we report an experiment to see if this constructivist, active-learning, teaching approach leads to greater science learning among twelve year old students studying the topic of acids in South Cambridgeshire in a teaching laboratory with moveable stools and West-facing windows…"

Over time, the research literature becomes populated with studies of enquiry-based science education, jig-saw learning, use of virtual reality, etc., etc., and these tend to refer to a range of national contexts, variously aged students, diverse science topics, etc., this all tends to be piecemeal. A coordinated programme of research could lead to researchers both (a) giving rich description of the context used, and (b) selecting contexts strategically to build up a picture across ranges of contexts,

"When there is a series of studies testing the same innovation, it is most useful if collectively they sample in a way that offers maximum information about the potential range of effectiveness of the innovation.There are clearly many factors that may be relevant. It may be useful for replication studies of effective innovations to take place with groups of different socio-economic status, or in different countries with different curriculum contexts, or indeed in countries with different cultural norms (and perhaps very different class sizes; different access to laboratory facilities) and languages of instruction …. It may be useful to test the range of effectiveness of some innovations in terms of the ages of students, or across a range of quite different science topics. Such decisions should be based on theoretical considerations.

Given the large number of potentially relevant variables, there will be a great many combinations of possible sets of replication conditions. A large number of replications giving similar results within a small region of this 'phase space' means each new study adds little to the field. If all existing studies report positive outcomes, then it is most useful to select new samples that are as different as possible from those already tested. …

When existing studies suggest the innovation is effective in some contexts but not others, then the characteristics of samples/context of published studies can be used to guide the selection of new samples/contexts (perhaps those judged as offering intermediate cases) that can help illuminate the boundaries of the range of effectiveness of the innovation."

Taber, 2019

Not that the research programme would be co-ordinated by a central agency or authority, but by each contributing researcher/research team (i) taking into account the 'state of play' at the start of their research; (ii) making strategic decisions accordingly when selecting contexts for their own work; (iii) reporting the context in enough detail to allow later researchers to see how that study fits into the ongoing programme.

This has to be a more scientific approach than simply picking a convenient context where researchers expect something to work well; undertake a small-scale local experiment (perhaps setting up a substandard control condition to be sure of a positive outcome); and then report along the lines "this widely demonstrated effective pedagogy works here too", or, if it does not, perhaps putting the study aside without publication. As the philosopher of science, Karl Popper, reminded us, science proceeds through the testing of bold conjectures: an 'experiment' where you already know the outcome is actually a demonstration. Demonstrations are useful in teaching, but do not contribute to research. What can contribute is an experiment in a context where there is reason to be unsure if an innovation will be an improvement or not, and where the comparison reflects good teaching practice to offer a meaningful test.

Small-scale research to inform local practice

Now, I would be the first to admit that I am not optimistic that such an approach will be developed by researchers; and even if it is, it will take time for useful patterns to arise that offer genuine insights into the range of convenience of different pedagogies.

Does this mean that small-scale studies in single context are really a waste of research resource and an unmerited inconvenient for those working in such contexts?

Well, I have time for studies in my final (bottom left) quadrant. Given that schools and classrooms and teachers and classes all vary considerably, and that what works well in a highly selective boys-only fee-paying school with a class size of 16 may not be as effective in a co-educational class of 32 mixed ability students in an under-resourced school in an area of social deprivation – and vice versa, of course!, there is often value in testing out ideas (even recommended 'research-based' ones) in specific contexts to inform practice in that context. These are likely to be genuine experiments, as the investigators are really motived to find out what can improve practice in that context.

Often such experiments will not get published,

  • perhaps because the researchers are teachers with higher priorities than writing for publication;
  • perhaps because it is assumed such local studies are not generalisable (but they could sometimes be moved into the previous category if suitably conceptualised and reported);
  • perhaps because the investigators have not sought permissions for publication (part of the ethics of research), usually not necessary for teachers seeking innovations to improve practice as part of their professional work;
  • perhaps because it has been decided inappropriate to set up control conditions which are not expected to be of benefit to those being asked to participate;
  • but also because when trying out something new in a classroom, one needs to be open to make ad hoc modifications to, or even abandon, an innovation if it seems to be having a deleterious effect.

Evaluation of effectiveness here usually comes down to professional judgement (rather than statistical testing – which assumes a large random sample of a population – being used to invalidly generalise small, non-random, local results to that population) which might, in part, rely on the researcher's close (and partially tacit) familiarity with the research context.

I am here describing 'action research', which is highly useful for informing local practice, but which is not ideally suited for formal reporting in academic journals.

Read about action research

So, I suspect there may be an irony here.

There may be a great many small-scale experiments undertaken in schools and colleges which inform good teaching practice in their contexts, without ever being widely reported; whilst there are a great many similar scale, often 'forced' experiments, carried out by visiting researchers with little personal stake in the research context, reporting the general effectiveness of teaching approaches, based on misuse of statistics. I wonder which approach best reflects the true spirit of science?

Source cited:


Notes:

1 For example:

Even in the natural sciences, we can never be absolutely sure that we have controlled all relevant variables (after all, if we already knew for sure which variables were relevant, we would not need to do the research). But usually existing theory gives us a pretty good idea what we need to control.

Experiments are never a simple test of the specified hypothesis, as the experiment is likely to depends upon the theory of instrumentation and the quality of instruments. Consider an extreme case such as the discovery of the Higgs boson at CERN: the conclusions relied on complex theory that informed the design of the apparatus, and very challenging precision engineering, as well as complex mathematical models for interpreting data, and corresponding computer software specifically programmed to carry out that analysis.

The experimental results are a test of a hypothesis (e.g., that a certain particle would be found at events below some calculated energy level) subject to the provisos that

  • the theory of the the instrument and its design is correct; and
  • the materials of the apparatus (an apparatus as complex and extensive as a small city) have no serious flaws; and
  • the construction of the instrumentation precisely matches the specifications;
  • and the modelling of how the detectors will function (including their decay in performance over time) is accurate; and
  • the analytical techniques designed to interpret the signals are valid;
  • the programming of the computers carries out the analysis as intended.

It almost requires an act of faith to have confidence in all this (and I am confident there is no one scientist anywhere in the world who has a good enough understanding and familiarity will all these aspects of the experiment to be able to give assurances on all these areas!)


CREST {Critical Reading of Empirical Studies} evaluation form: when you read a research study, do you consider the cumulative effects of doubts you may have about different aspects of the work?

I would hope at least that as professional scientists and engineers they might be a little more aware of this complex chain of argumentation needed to support robust conclusions than many students – for students often seem to be overconfident in the overall value of research conclusions given any doubts they may have about aspects of the work reported.

Read about the Critical Reading of Empirical Studies Tool


Galileo Galilei was one of the first people to apply the telescope to study the night sky

Galileo Galilei was one of the first people to apply the telescope to study the night sky (image by Dorothe from Pixabay)


A historical example is Galileo's observations of astronomical phenomena such as Jovian moons (he spotted the four largest: Io, Europa, Ganymede and Callisto) and the irregular surface of the moon. Some of his contemporaries rejected these findings on the basis that they were made using an apparatus, the newly fanged telescope, that they did not trust. Whilst this is now widely seen as being arrogant and/or ignorant, arguably if you did not understand how a telescope could magnify, and you did not trust the quality of the lenses not to produce distortions, then it was quite reasonable to be sceptical of findings which were counter to a theory of the 'heavens' that had been generally accepted for many centuries.


2 I have discussed a number of examples on this site. For example:

Falsifying research conclusions: You do not need to falsify your results if you are happy to draw conclusions contrary to the outcome of your data analysis.

Why ask teachers to 'transmit' knowledge…if you believe that "knowledge is constructed in the minds of students"?

Shock result: more study time leads to higher test scores (But 'all other things' are seldom equal)

Experimental pot calls the research kettle black: Do not enquire as I do, enquire as I tell you

Lack of control in educational research: Getting that sinking feeling on reading published studies


3 For a detailed discussion of these and other challenges of doing educational experiments, see Taber, 2019.


4 Consider these two situations.

A researcher wants to find out if a new textbook 'Science for the modern age' leads to more learning among the Grade 10 students she teaches than the traditional book 'Principles of the natural world'. Imagine there are fifty grade 10 students divided already into two classes. The teacher flips a coin and randomly assigns one of the classes to the innovative book, the other being assigned by default the traditional book. We will assume she has a suitable test to assess each students' learning at the end of the experiment.

The teacher teaches the two classes the same curriculum by the same scheme of work. She presents a mini-lecture to a class, then sets them some questions to discuss using the text book. At the end of the (three part!) lesson, she leads a class disucsison drawing on students' suggested answers.

Being a science teacher, who believes in replication, she decides to repeat the exercise the following year. Unfortunately there is a pandemic, and all the students are sent into lock-down at home. So, the teacher assigns the fifty students by lot into two groups, and emails one group the traditional book, and the other the innovative text. She teaches all the students on line as one cohort: each lesson giving them a mini-lecture, then setting them some reading from their (assigned) book, and a set of questions to work through using the text, asking them to upload their individual answers for her to see.

With regard to experimental method, in the first cohort she has only two independent units of analysis – so she may note that the average outcome scores are higher in one group, but cannot read too much into that. However, in the second year, the fifty students can be considered to be learning independently, and as they have been randomly assigned to conditions, she can treat the assessment scores as being from 25 units of analysis in each condition (and so may sensibly apply statistics to see if there is a statistically significant different in outcomes).


5 Inferential statistical tests are usually used to see if the difference in outcomes across conditions is 'significant'. Perhaps the average score in a class with an innovation is 5.6, compared with an average score in the control class of 5.1. The average score is higher in the experimental condition, but is the difference enough to matter?

Well, actually, if the question is whether the difference is big enough to likely to make a difference in practice then researchers should calculate the 'effect size' which will suggest whether the difference found should be considered small, moderate or large. This should ideally be calculated regardless of whether inferential statistics are being used or not.

Inferential statistical tests are often used to see if the result is generalisable to the wider population – but, as suggested above, this is strictly only valid if the population of interest have been randomly sampled – which virtually never happens in educational studies as it is usually not feasible.

Often researchers will still do the calculation, based on the sets of outcome scores in the two conditions, to see if they can claim a statistically significant difference – but the test will only suggest how likely or unlikely the difference between the outcomes is, if the units of analysis have been randomly assigned to the conditions. So, if there are 50 learners each randomly assigned to experimental or control condition this makes sense. That is sometimes the case, but nearly always the researchers work with existing classes and do not have the option of randomly mixing the students up. [See the example in the previous note 4.] In such a situation, the stats. are not informative. (That does not stop them often being reported in published accounts as if they are useful.)


6 That is, if it possible to address such complications as participant expectations, and equitable teacher-familiarity with the different conditions they are assigned to (Taber, 2019).

Read about expectancy effects


7 A usual ethical expectation is that participants voluntarily (without duress) offer informed consent to participate.

Read about voluntary informed consent


Is your heart in the research?

Someone else's research, that is


Keith S. Taber


Imagine you have a painful and debilitating illness. Your specialist tells you there is no conventional treatment known to help. However, there is a new – experimental – procedure: a surgery that may offer relief. But it has not yet been fully tested. If you are prepared to sign up for a study to evaluate this new procedure, then you can undergo surgery.

You are put under and wheeled into the operating theatre. Whilst you experience – rather, do not experience – the deep, sleepless rest of anaesthesia, the surgeon saws through your breastbone, prises open your ribcage with a retractor (hopefully avoiding breaking any ribs),
reaches in, and gently lifts up your heart.

The surgeon, pauses, perhaps counts to five, then carefully replaces your heart between the lungs. The ribcage is closed, and you are sown-up without any actual medical intervention. You had been randomly assigned to the control group.


How can we test whether surgical interventions are really effective without blind controls?

Is it right to carry out sham operations on sick people just for the sake of research?

Where is the balance of interests?

(Image from Pixabay)


Research ethics

A key aspect of planning, executing and reviewing research is ethical scrutiny. Planning, obviously, needs to take into account ethical considerations and guidelines. But even the best laid plans 'of mice and men' (or, of, say, people investigating mice) may not allow for all eventualities (after all, if we knew what was going to happen for sure in a study, it would not be research – and it would be unethical to spend precious public resources on the study), so the ethical imperative does not stop once we have got approval and permissions. And even then, we may find that we cannot fully mitigate for unexpected eventualities – which is something to be reported and discussed to help inform future research.

Read about research ethics

When preparing students setting out on research, instruction about research ethics is vital. It is possible to teach about rules, and policies, and guidelines and procedures – but real research contexts are often complex, and ethical thinking cannot be algorithmic or a matter of adopting slogans and following heuristics. In my teaching I would include discussion of past cases of research studies that raised ethical questions for students to discuss and consider.

One might think that as research ethics is so important, it would be difficult to find many published studies which were not exemplars of good practice – but attitudes to, and guidance on, ethics have developed over time, and there are many past studies which, if not clearly unethical in today's terms, at least present problematic cases. (That is without the 'doublethink' that allows some contemporary researchers to, in a single paper, both claim active learning methods should be studied because it is known that passive learning activities are not effective, yet then report how they required teachers to instruct classes through passive learning to act as control groups.)

Indeed, ethical decision-making may not always be straight-forward – as it often means balancing different considerations, and at a point where any hoped-for potential benefits of the research must remain uncertain.

Pretending to operate on ill patients

I recently came across an example of a medical study which I thought raised some serious questions, and which I might well have included in my teaching of research ethics as a case for discussion, had I known about before I retired.

The research apparently involved surgeons opening up a patient's ribcage (not a trivial procedure), and lifting out the person's heart in order to carry out a surgical intervention…or not,

"In the late 1950s and early 60s two different surgical teams, one in Kansas City and one in Seattle, did double-blind trials of a ligation procedure – the closing of a duct or tube using a clip – for very ill patients suffering from severe angina, a condition in which pain radiates from the chest to the outer extremities as a result of poor blood supply to the heart. The surgeons were not told until they arrived in the operating theatre which patients were to receive a real ligation and which were not. All the patients, whether or not they were getting the procedure, had their chest cracked open and their heart lifted out. But only half the patients actually had their arteries rerouted so that their blood could more efficiently bathe its pump …"

Slater, 2018

The quote is taken from a book by Lauren Slater which sets out a history of drug use in psychiatry. Slater is a psychotherapist who has written a number of books about aspects of mental health conditions and treatments.

Fair testing

In order to make a fair experiment, the double-blind procedure sought to treat the treatment and control group the same in all respects, apart from the actual procedure of ligation of selected blood vessels that comprised the mooted intervention. The patients did not know (at least, in one of the studies) they might not have the real operation. Their physicians were not told who was getting the treatment. Even the surgeons only found out who was in each group when the patient arrived in theatre.

It was necessary for those in the control group to think they were having an intervention, and to undergo the sham surgery, so that they formed a fair comparison with those who got the ligation.

Read about control of variables

It was necessary to have double-blind study (neither the patients themselves, nor the physicians looking after them, were told which patients were, and which were not, getting the treatment), because there is a great deal of research which shows that people's beliefs and expectations make substantial differences to outcomes. This is a real problem in educational research when researchers want to test classroom practices such as new teaching schemes or resources or innovative pedagogies (Taber, 2019). The teacher almost certainly knows whether she is teaching the experimental or control group, and usually the students have a pretty good idea. (If every previous lesson has been based on teacher presentations and note-taking, and suddenly they are doing group discussion work and making videos, they are likely to notice.)

Read about expectancy effects

It was important to undertake a study, because there was not clear objective evidence to show whether the new procedure actually improved patient outcomes (or possibly even made matters worst). Doctors reported seeing treated patients do better – but could only guess how they might have done without surgery. Without proper studies, many thousands or people might ultimately undergo an ineffective surgery, with all the associated risks and costs, without getting any benefit.

Simply comparing treated patients with matched untreated patients would not do the job, as there can be a strong placebo effect of believing one is getting a treatment. (It is likely that at least some alternative therapies largely work because a practitioner with good social skills spends time engaging with the patient and their concerns, and the client expects a positive outcome.)

If any positive effects of heart surgery were due to the placebo effect, then perhaps a highly coloured sugar pill prescribed with confidence by a physician could have the same effect without operating theatres, surgical teams, hospital stays… (For that matter, a faith healer who pretended to operate without actually breaking the skin, and revealed a piece of material {perhaps concealed in a pocket or sleeve} presented as an extracted mass of diseased tissue or a foreign body, would be just as effective if the patient believed in the procedure.)

So, I understood the logic here.

Do no harm

All the same – this seemed an extreme intervention. Even today, anaesthesia is not very well understood in detail: it involves giving a patient drugs that could kill them in carefully controlled sub-lethal doses – when how much would actually be lethal (and what would be insufficient to fully sedate) varies from person to person. There are always risks involved.


"All the patients, whether or not they were getting the procedure had their chest cracked open and their heart lifted out."

(Image by Starllyte from Pixabay)


Open heart surgery exposes someone to infection risks. Cracking open the chest is a big deal. It can take two months for the disrupted tissues to heal. Did the research really require opening up the chest and lifting the heart for the control group?

Could this really ever have been considered ethical?

I might have been much more cynical had I not known of other, hm, questionable medical studies. I recall hearing a BBC radio documentary in the 1990s about American physicians who deliberately gave patients radioactive materials without their knowledge, just to to explore the effects. Perhaps most infamously there was the Tuskegee Syphilis study where United States medical authorities followed the development of disease over decades without revealing the full nature of the study, or trying to treat any of those infected. Compared with these violations, the angina surgery research seemed tame.

But do not believe everything you read…

According to the notes at the back of Slater's book, her reference was another secondary source (Moerman, 2002) – that is someone writing about what the research reports said, not those actual 'primary' accounts in the research journals.

So, I looked on-line for the original accounts. I found a 1959 study, by a team from the University of Washington School of Medicine. They explained that:

"Considerable relief of symptoms has been reported for patient with angina pectoris subjected to bilateral ligation of the internal mammary arteries. The physiologic basis for the relief of angina afforded by this rather simple operation is not clear."

Cobb, Thomas, Dillard, Merendino & Bruce, 1959

It was not clear why clamping these blood vessels in the chest should make a substantial difference to blood flow to the heart muscles – despite various studies which had subjected a range of dogs (who were not complaining of the symptoms of angina, and did not need any surgery) to surgical interventions followed by invasive procedures in order to measure any modifications in blood flow (Blair, Roth & Zintel, 1960).

Would you like your aorta clamped, and the blood drained from the left side of your heart, for the sake of a research study?

That raises another ethical issue – the extent of pain and suffering and morbidity it is fair to inflect on non-human animals (which are never perfect models for human anatomy and physiology) to progress human medicine. Some studies explored the details of blood circulation in dogs. Would you like your aorta clamped, and the blood drained from the left side of your heart, for the sake of a research study? Moreover, in order to test the effectiveness of the ligation procedure, in some studies healthy dogs had to have the blood supply to the heart muscles disrupted to given them similar compromised heart function as the human angina sufferers. 1

But, hang on a moment. I think I passed over something rather important in that last quote: "this rather simple operation"?

"Considerable relief of symptoms has been reported for patient with angina pectoris subjected to bilateral ligation of the internal mammary arteries. The physiologic basis for the relief of angina afforded by this rather simple operation is not clear."

Cobb and colleagues' account of the procedure contradicted one of my assumptions,

 At the time of operation, which was performed under local anesthesia [anaesthesia], the surgeon was handed a randomly selected envelope, which contained a card instructing him whether or not to ligate the internal mammary arteries after they had been isolated.

Cobb et al, 1959

It seems my inference that the procedure was carried out under general anaesthetic was wrong. Never assume! Surgery under local anaesthetic is not a trivial enterprise, but carries much less risk than general anaesthetic.

Yet, surely, even back then, no surgeon was going to open up the chest and handle the heart under a local anaesthetic? Cobb and colleagues wrote:

"The surgical procedures commonly used in the therapy of coronary-artery disease have previously been "major" operations utilizing thoracotomy and accompanied by some morbidity and a definite mortality. … With the advent of internal-mammary-artery ligation and its alleged benefit, a unique opportunity for applying the principles of a double-blind evaluation to a surgical procedure has been afforded

Cobb, Thomas, Dillard, Merendino & Bruce, 1959

So, the researchers were arguing that, previously, surgical interventions for this condition were major operations that did involve opening up the chest (thorax) – thoracotomy – where sham surgery would not have been ethical; but the new procedure they were testing – "this rather simple operation" was different.

Effects of internal-mammary-artery ligation on 17 patients with angina pectoris were evaluated by a double-blind technic. Eight patients had their internal mammary arteries ligated; 9 had skin incisions only. 

Cobb et al, 1959

They describe "a 'placebo' procedure consisting of parasternal skin incisions"– that is some cuts were made into the skin next to the breast bone. Skin incisions are somewhat short of open heart surgery.

The description given by the Kansas team (from the Departments of Medicine and Surgery, University of Kansas Medical Center, Kansas City) also differs from Slater's third-hand account in this important way:

"The patients were operated on under local anesthesia. The surgeon, by random sampling, selected those in whom bilateral internal mammary artery and vein ligation (second interspace) was to be carried out and those in whom a sham procedure was to be performed. The sham procedure consisted of a similar skin incision with exposure of the internal mammary vessels, but without ligation."

Dimond, Kittle & Crocket, 1960

This description of the surgery seemed quite different from that offered by Slater.

These teams seemed to be reporting a procedure that could be carried out without exposing the lungs or the heart and opening their protective covers ("in this technique…the pericardium and pleura are not entered or disturbed", Glover, et al, 1957), and which could be superficially forged by making a few cuts into the skin.


"The performance of bilateral division of the internal mammary arteries as compared to other surgical procedures for cardiac disease is safe, simple and innocuous in capable hands."

Glover, Kitchell, Kyle, Davila & Trout, 1958

The surgery involved making cuts into the skin of the chest to access, and close off, arteries taking blood to (more superficial) chest areas in the hope it would allow more to flow to the heart muscles; the sham surgery, the placebo, involved making similar incisions, but without proceeding to change the pattern of arterial blood flow.

The sham surgery did not require general anaesthesia and involved relatively superficial wounds – and offered a research technique that did not need to cause suffering to, and the sacrifice of, perfectly healthy dogs. So, that's all ethical then?

The first hand research reports at least give a different impression of the balance of costs and potential benefits to stakeholders than I had originally drawn from Lauren Slater's account.

Getting consent for sham surgery

A key requirement for ethical research with human participants is being offered voluntary informed consent. Unlike dogs, humans can assent to research procedures, and it is generally considered that research should not be undertaken without such consent.

Read about voluntary informed consent

Of course, there is nuance and complication. The kind of research where investigators drop large denomination notes to test the honesty of passers by – where the 'participants' are in a public place and will not be identified or identifiable – is not usually seen as needing such consent (which would clearly undermine any possibility of getting authentic results). But is it acceptable to observe people using public toilets without their knowledge and consent (as was described in one published study I used as a teaching example)?

The extent to which a lay person can fully understand the logic and procedures explained to them when seeking consent can vary. The extent to which most participants would need, or even want to, know full details of the study can vary. When children of various ages are are involved, the extent to which consent can be given on their behalf by a parent or teachers raises interesting questions.


"I'm looking for volunteers to have a procedure designed to make it look like you've had surgery"

Image by mohamed_hassan from Pixabay


There is much nuance and many complications – and this is an area researchers needs to give very careful consideration.

  • How many ill patients would volunteer for sham surgery to help someone else's research?
  • Would that answer change, if the procedure being tested would later be offered to them?
  • What about volunteering for a study where you have a 50-50 chance of getting the real surgery or the placebo treatment?

In Cobb's study, the participants had all volunteered – but we might wonder if the extent of the information they were given amounted to what was required for informed consent,

The subjects were informed of the fact that this procedure had not been proved to be of value, and yet many were aware of the enthusiastic report published in the Reader's Digest. The patients were told only that they were participating in an evaluation of this operation; they were not informed of the double-blind nature of the study.

Cobb et al, 1959

So, it seems the patients thought they were having an operation that had been mooted to help angina sufferers – and indeed some of them were, but others just got taken into surgery to get a few wounds that suggested something more substantive had been done.

Was that ethical? (I doubt it would be allowed anywhere today?)

The outcome of these studies was that although the patients getting the ligation surgery did appear to get relief from their angina – so did those just getting the skin incisions. The placebo seemed just as good as the re-plumbing.

In hindsight, does this make the studies more worthwhile and seem more ethical? This research has probably prevented a great many people having an operation to have some of their vascular system blocked when that does not seem to make any difference to angina. Does that advance in medical knowledge justify the deceit involved in leading people to think they would get an experimental surgical treatment when they might just get an experimental control treatment?


Ethical principles and guidelines can helps us judge the merits of study

Coda – what did the middle man have to say?

I wondered how a relatively minor sham procedure under local anaesthetic became characterised as "the patients, whether or not they were getting the procedure had their chest cracked open and their heart lifted out" – a description which gave a vivid impression of a major intervention.


The heart is pretty well integrated into the body – how easy is it to life an intact, fully connected, working heart out of position?

Image by HANSUAN FABREGAS from Pixabay


I wondered to what extent it would even be possible to lift the heart out from the chest whilst it remained connected with the major vessels passing the blood it was pumping, and the nerves supplying it, and the vessels supplying blood to its own muscles (the ones that were considered compromised enough to make the treatment being tested worth considering). Some sources I found on-line referred to the heart being 'lifted' during open-heart procedures to give the surgeon access to specific sites: but that did not mean taking the heart out of the body. Having the heart 'lifted out' seemed more akin to Aztec sacrificial rites than medical treatment.

Although all surgery involves some risk, the actual procedure being investigated seemed of relatively routine nature. I actually attended a 'minor' operation which involved cutting into the chest when my late wife was prepared for kidney dialysis. Usually a site for venal access is prepared in the arm well in advance, but it was decided my wife needed to be put on dialysis urgently. A temporary hole was cut into her neck to allow the surgeon to connect a tube (a central venous catheter) to a vein, and another hole into her chest so that the catheter would exit in her chest, where the tap could be kept sterile, bandaged to the chest. This was clearly not considered a high risk operation (which is not to say I think I could have coped with having this done to me!) as I was asked by the doctors to stay in the room with my wife during the procedure, and I did not need to 'scrub' or 'gown up'.

Bilateral internal mammary artery ligation seemed a procedure on that kind of level, accessing blood vessels through incisions made in the skin. However, if Lauren Slater had read up some of the earlier procedures that did require opening the chest, or if she had read the papers describing how the dogs were investigated to trace blood flow through connected vessels, measure changes in flow, and prepare them for induced heart conditions, I could appreciate the potential for confusion. Yet she did not cite the primary research, but rather Daniel Moerman, an Emeritus Professor of Anthropology at University of Michigan-Dearborn, who has written a book about placebo treatments in medicine.

Moerman does write about the bilateral internal mammary artery ligation, and the two sham surgery studies I found in my search. Moerman describes the operation:

"It was quite simple, and since the arteries were not deep in the body, could be performed under local anaesthetic."

Moerman, 2002

He also refers to the subjective reports on one of the patients assigned to the placebo condition in one of the studies, who claimed to feel much better immediately after the procedure:

"This patient's arteries were not ligated…But he did have two scars on his chest…"

Moerman, 2002

But nobody cracked open his chest, and no one handled his heart.

There are still ethical issues here, but understanding the true (almost superficial) nature of the sham surgery clearly changes the balance of concerns. If there is a moral to this article, it is perhaps the importance of being fully informed before reaching judgement about the ethics of a research study.


Work cited:
  • Blair, C. R., Roth, R. F., & Zintel, H. A. (1960). Measurement of coronary artery blood-flow following experimental ligation of the internal mammary artery. Annals of Surgery, 152(2), 325.
  • Cobb, L. A., Thomas, G. I., Dillard, D. H., Merendino, K. A., & Bruce, R. A. (1959). An evaluation of internal-mammary-artery ligation by a double-blind technic. New England Journal of Medicine, 260(22), 1115-1118.
  • Dimond, E. G., Kittle, C. F., & Crockett, J. E. (1960). Comparison of internal mammary artery ligation and sham operation for angina pectoris. The American Journal of Cardiology, 5(4), 483-486.
  • Glover, R. P., Davila, J. C., Kyle, R. H., Beard, J. C., Trout, R. G., & Kitchell, J. R. (1957). Ligation of the internal mammary arteries as a means of increasing blood supply to the myocardium. Journal of Thoracic Surgery, 34(5), 661-678. https://doi.org/https://doi.org/10.1016/S0096-5588(20)30315-9
  • Glover, R. P., Kitchell, J. R., Kyle, R. H., Davila, J. C., & Trout, R. G. (1958). Experiences with Myocardial Revascularization By Division of the Internal Mammary Arteries. Diseases of the Chest, 33(6), 637-657. https://doi.org/https://doi.org/10.1378/chest.33.6.637
  • Moerman, D. E. (2002). Meaning, Medicine, and the "Placebo Effect". Cambridge University Press Cambridge.
  • Slater, Lauren (2018) The Drugs that Changed our Minds. The history of psychiatry in ten treatments. London. Simon & Schuster
  • Taber, K. S. (2019). Experimental research into teaching innovations: responding to methodological and ethical challengesStudies in Science Education, 55(1), 69-119. doi:10.1080/03057267.2019.1658058 [Download this paper.]


Note:

1 To find out if the ligation procedure protected a dog required stressing the blood supply to the heart itself,

"An attempt has been made to evaluate the degree of protection preliminary ligation of the internal mammary artery may afford the experimental animal when subjected to the production of sudden, acute myocardial infarction by ligation of the anterior descending coronary artery at its origin. …

It was hoped that survival in the control group would approximate 30 per cent so that infarct size could be compared with that of the "protected" group of animals. The "protected" group of dogs were treated in the same manner but in these the internal mammary arteries were ligated immediately before, at 24 hours, and at 48 hours before ligation of the anterior descending coronary.

In 14 control dogs, the anterior descending coronary artery with the aforementioned branch to the anterolateral aspect of the left ventricle was ligated. Nine of these animals went into ventricular fibrillation and died within 5 to 20 minutes. Attempts to resuscitate them by defibrillation and massage were to no avail. Four others died within 24 hours. One dog lived 2 weeks and died in pulmonary edema."

Glover, Davila, Kyle, Beard, Trout & Kitchell, 1957

Pulmonary oedema involves fluid build up in the lungs that restricts gaseous exchange and prevents effective breathing. The dog that survived longest (if it was kept conscious) will have experienced death as if by slow suffocation or drowning.

Why ask teachers to 'transmit' knowledge…

…if you believe that "knowledge is constructed in the minds of students"?


Keith S. Taber


While the students in the experimental treatment undertook open-ended enquiry, the learners in the control condition undertook practical work to demonstrate what they had already been told was the case – a rhetorical exercise that reflected the research study they were participating in


A team of researchers chose to compare a teaching approach they believed met the requirements for good science instruction, and which they knew had already been demonstrated effective pedagogy in other studies, with teaching they believed was not suitable for bringing about conceptual change.
(Ironically, they chose a research design more akin to the laboratory activities in the substandard control condition, than to the open-ended enquiry that was part of the pedagogy they considered effective!)

An imaginary conversation 1 with a team of science education researchers.

When we critically read a research paper, we interrogate the design of the study, and the argument for new knowledge claims that are being made. Authors of research papers need to anticipate the kinds of questions readers (editors, reviewers, and the wider readership on publication) will be asking as they try to decide if they find the study convincing.

Read about writing-up research

In effect, there is an asynchronous conversation.

Here I engage in 'an asynchronous conversation' with the authors of a research paper I was interrogating:

What was your study about?

"This study investigated the effect of the Science Writing Heuristic (SWH) approach on grade 9 students' understanding of chemical change and mixture concepts [in] a Turkish public high school."

Kingir, Geban & Gunel, 2013

I understand this research was set up as a quasi-experiment – what were the conditions being compared?

"Students in the treatment group were instructed by the SWH approach, while those in the comparison group were instructed with traditionally designed chemistry instruction."

Kingir, Geban & Gunel, 2013

Constructivism

Can you tell me about the theoretical perspective informing this study?

"Constructivism is increasingly influential in guiding student learning around the world. However, as knowledge is constructed in the minds of students, some of their commonsense ideas are personal, stable, and not congruent with the scientifically accepted conceptions… Students' misconceptions [a.k.a. alternative conceptions] and learning difficulties constitute a major barrier for their learning in various chemistry topics"

Kingir, Geban & Gunel, 2013

Read about constructivist pedagogy

Read about alternative conceptions

'Traditional' teaching versus 'constructivist' teaching

So, what does this suggest about so-called traditional teaching?

"Since prior learning is an active agent for student learning, science educators have been focused on changing these misconceptions with scientifically acceptable ideas. In traditional science teaching, it is difficult for the learners to change their misconceptions…According to the conceptual change approach, learning is the interaction between prior knowledge and new information. The process of learning depends on the degree of the integration of prior knowledge with the new information.2"

Kingir, Geban & Gunel, 2013

And does the Science Writing Heuristic Approach contrast to that?

"The Science Writing Heuristic (SWH) approach can be used to promote students' acquisition of scientific concepts. The SWH approach is grounded on the constructivist philosophy because it encourages students to use guided inquiry laboratory activities and collaborative group work to actively negotiate and construct knowledge. The SWH approach successfully integrates inquiry activities, collaborative group work, meaning making via argumentation, and writing-to-learn strategies…

The negotiation activities are the central part of the SWH because learning occurs through the negotiation of ideas. Students negotiate meaning from experimental data and observations through collaboration within and between groups. Moreover, the student template involves the structure of argumentation known as question, claim, and evidence. …Reflective writing scaffolds the integration of new ideas with prior learning. Students focus on how their ideas changed through negotiation and reflective writing, which helps them confront their misconceptions and construct scientifically accepted conceptions"

Kingir, Geban & Gunel, 2013

What is already known about SWH pedagogy?

It seems like the SWH approach should be effective at supporting student learning. So, has this not already been tested?

"There are many international studies investigating the effectiveness of the SWH approach over the traditional approach … [one team] found that student-written reports had evidence of their science learning, metacognitive thinking, and self-reflection. Students presented reasons and arguments in the meaning-making process, and students' self-reflections illustrated the presence of conceptual change about the science concepts.

[another team] asserted that using the SWH laboratory report format in lieu of a traditional laboratory report format was effective on acquisition of scientific conceptions, elimination of misconceptions, and learning difficulties in chemical equilibrium.

[Another team] found that SWH activities led to greater understanding of grade 6 science concepts when compared to traditional activities. The studies conducted at the postsecondary level showed similar results as studies conducted at the elementary level…

[In two studies] it was demonstrated that the SWH approach can be effective on students' acquisition of chemistry concepts. SWH facilitates conceptual change through a set of argument-based inquiry activities. Students negotiate meaning and construct knowledge, reflect on their own understandings through writing, and share and compare their personal meanings with others in a social context"

Kingir, Geban & Gunel, 2013

What was the point of another experimental test of SWH?

So, it seems that from a theoretical point of view, so-called traditional teaching is likely to be ineffective in bringing about conceptual learning in science, whilst a constructivist approach based on the Science Writing Heuristic is likely to support such learning. Moreover, you are aware of a range of existing studies which suggest that in practice the Science Writing Heuristic is indeed an effective basis for science teaching.

So, what was the point of your study?

"The present study aimed to investigate the effect of the SWH approach compared to traditional chemistry instruction on grade 9 students' understanding of chemical change and mixture concepts."

Kingir, Geban & Gunel, 2013

Okay, I would certainly accept that just because a teaching approach has been found effective with one age group, or in one topic, or in one cultural context, we cannot assume those findings can be generalised and will necessarily apply in other teaching contexts (Taber, 2019).

Read about generalisation from studies

What happened in the experimental condition?

So, what happened in the two classes taught in the experimental condition?

"The teacher asked students to form their own small groups (n=5) and introduced to them the SWH approach …they were asked to suggest a beginning question…, write a claim, and support that claim with evidence…

they shared their questions, claims, and evidence in order to construct a group question, claim, and evidence. …each group, in turn, explained their written arguments to the entire class. … the rest of the class asked them questions or refuted something they claimed or argued. …the teacher summarized [and then] engaged students in a discussion about questions, claims, and evidence in order to make students aware of the meaning of those words. The appropriateness of students' evidence for their claims, and the relations among questions, claims, and evidence were also discussed in the classroom…

The teacher then engaged students in a discussion about …chemical change. First, the teacher attempted to elicit students' prior understanding about chemical change through questioning…The teacher asked students to write down what they wanted to learn about chemical change, to share those items within their group, and to prepare an investigation question with a possible test and procedure for the next class. While students constructed their own questions and planned their testing procedure, the teacher circulated through the groups and facilitated students' thinking through questioning…

Each group presented their questions to the class. The teacher and the rest of the class evaluated the quality of the question in relation to the big idea …The groups' procedures were discussed and revised prior to the actual laboratory investigation…each group tested their own questions experimentally…The teacher asked each student to write a claim about what they thought happened, and support that claim with the evidence. The teacher circulated through the classroom, served as a resource person, and asked …questions

…students negotiated their individual claims and evidence within their groups, and constructed group claims and evidence… each group…presented … to the rest of the class."

Kingir, Geban & Gunel, 2013
What happened in the control condition?

Okay, I can see that the experimental groups experienced the kind of learning activities that both educational theory and previous research suggests are likely to engage them and develop their thinking.

So, what did you set up to compare with the Science Writing Heuristic Approach as a fair test of its effectiveness as a pedagogy?

"In the comparison group, the teacher mainly used lecture and discussion[3] methods while teaching chemical change and mixture concepts. The chemistry textbook was the primary source of knowledge in this group. Students were required to read the related topic from the textbook prior to each lesson….The teacher announced the goals of the lesson in advance, wrote the key concepts on the board, and explained each concept by giving examples. During the transmission of knowledge, the teacher and frequently used the board to write chemical formula[e] and equations and draw some figures. In order to ensure that all of the students understood the concepts in the same way, the teacher asked questions…[that] contributed to the creation of a discussion[3] between teacher and students. Then, the teacher summarized the concepts under consideration and prompted students to take notes. Toward the end of the class session, the teacher wrote some algorithmic problems [sic 4] on the board and asked students to solve those problems individually….the teacher asked a student to come to the board and solve a problem…

The …nature of their laboratory activities was traditional … to verify what students learned in the classroom. Prior to the laboratory session, students were asked to read the procedures of the laboratory experiment in their textbook. At the laboratory, the teacher explained the purpose and procedures of the experiment, and then requested the students to follow the step-by-step instructions for the experiment. Working in groups (n=5), all the students conducted the same experiment in their textbook under the direct control of the teacher. …

The students were asked to record their observations and data. They were not required to reason about the data in a deeper manner. In addition, the teacher asked each group to respond to the questions about the experiment included in their textbook. When students failed to answer those questions, the teacher answered them directly without giving any hint to the students. At the end of the laboratory activity, students were asked to write a laboratory report in traditional format, including purpose, procedure, observations and data, results, and discussion. The teacher asked questions and helped students during the activity to facilitate their connection of laboratory activity with what they learned in the classroom.

Kingir, Geban & Gunel, 2013

The teacher variable

Often in small scale research studies in education, a different teacher teaches each group and so the 'teacher variable' confounds the experiment (Taber, 2019). Here, however, you avoid that problem 5, as you had a sample of four classes, and two different teachers were involved, each teaching one class in each condition?

"In order to facilitate the proper instruction of the SWH approach in the treatment group, the teachers were given training sessions about its implementation prior to the study. The teachers were familiar with the traditional instruction. One of the teachers was teaching chemistry for 20 years, while the other was teaching chemistry for 22 years at a high school. The researcher also asked the teachers to teach the comparison group students in the same way they taught before and not to do things specified for the treatment group."

Kingir, Geban & Gunel, 2013

Was this research ethical?

As this is an imaginary conversation, not all of the questions I might like to ask are actually addressed in the paper. In particular, I would love to know how the authors would justify that their study was ethical, considering that the control condition they set up deliberately excluded features of pedagogy that they themselves claim are necessary to support effective science learning:

"In traditional science teaching, it is difficult for the learners to change their misconceptions"

The authors beleive that "learning occurs through the negotiation of ideas", and their experimental condition provides plenty of opportunity for that. The control condition is designed to avoid the explicit elicitation of learners' idea, dialogic talk, or peer interactions when reading, listening, writing notes or undertaking exercises. If the authors' beliefs are correct (and they are broadly consistent with a wide consensus across the global science education research community), then the teaching in the comparison condition is not suitable for facilitating conceptual learning.

Even if we think it is conceivable that highly experienced teachers, working in a national context where constructivist teaching has long been official education policy, had somehow previously managed to only teach in an ineffective way: was it ethical to ask these teachers to teach one of their classes poorly even after providing them with professional development enabling them to adopt a more engaging approach better aligned with our understanding of how science can be effectively taught?

Read about unethical control conditions

Given that the authors already believed that –

  • "Students' misconceptions and learning difficulties constitute a major barrier for their learning in various chemistry topics"
  • "knowledge is constructed in the minds of students"
  • "The process of learning depends on the degree of the integration of prior knowledge with the new information"
  • "learning occurs through the negotiation of ideas"
  • "The SWH approach successfully integrates inquiry activities, collaborative group work, meaning making" – A range of previous studies have shown that SWH effectively supports student learning

– why did they not test the SWH approach against existing good practice, rather than implement a control pedagogy they knew should not be effective, so setting up two classes of learners (who do not seem to have been asked to consent to being part of the research) to fail?

Read about the expectation for voluntary informed consent

Why not set up a genuinely informative test of the SWH pedagogy, rather than setting up conditions for manufacturing a forgone conclusion?


When it has already been widely established that a pedagogy is more effective than standard practice, there is little point further testing it against what is believed to be ineffective instruction.

Read about level of contol in experiments


How can it be ethical to ask teachers to teach in a way that is expected to be ineffective?

  • transmission of knowledge
  • follow the step-by-step instructions
  • not required to reason in a deeper manner
  • individual working

A rhetorical experiment?

Is this not just a 'rhetorical' experiment engineered to produce a desired outcome (a demonstration), rather than an open-ended enquiry (a genuine experiment)?

A rhetorical experiment is not designed to produce substantially new knowledge: but rather to create the conditions for a 'positive' result (Figure 8 from Taber, 2019).

Read about rhetorical experiments


A technical question

Any study of a teaching innovation requires the commitment of resources and some disruption of teaching. Therefore any research study which has inherent design faults that will prevent it producing informative outcomes can be seen as a misuse of resources, and an unproductive disruption of school activities, and so, if only in that sense, unethical.

As the research was undertaken with "four intact classes" is it possible to apply any statistical tests that can offer meaningful results, when there are only two units of analysis in each condition? [That is, I think not.]

The researchers claim to have 117 degrees of freedom when applying statistical tests to draw conclusions. They seem to assume that each of the 122 children can be considered to be a separate unit of analysis. But is it reasonable to assume that c.30 children taught together in the same intact class by the same teacher (and working in groups for at least part of the time) are independently experiencing the (experimental or control) treatment?

Surely, the students within a class influence each other's learning (especially during group-work), so the outcomes of statistical tests that rely on treating each learner as an independent unit of analysis are invalid (Taber, 2019). This is especially so in the experimental treatment where dialogue (and "the negotiation of ideas") through group-work, discussion, and argumentation were core parts of the instruction.

Read about units of analysis

Sources cited:

  • Ausubel, D. P. (1968). Educational Psychology: A cognitive view. Holt, Rinehart & Winston.
  • Kingir, S., Geban, O., & Gunel, M. (2013). Using the Science Writing Heuristic Approach to Enhance Student Understanding in Chemical Change and Mixture. Research in Science Education, 43(4), 1645-1663. https://doi.org/10.1007/s11165-012-9326-x
  • Taber, K. S. (2019). Experimental research into teaching innovations: responding to methodological and ethical challengesStudies in Science Education, 55(1), 69-119. doi:10.1080/03057267.2019.1658058 [Download]

Notes:

1 I have used direct quotes from the published report in Research in Science Education (but I have omitted citations to other papers), with some emphasis added. Please refer to the full report of the study for further details. I have attempted to extract relevant points from the paper to develop an argument here. I have not deliberately distorted the published account by selection and/or omission, but clearly am only reproducing small extracts. I would recommend readers might access the original study in order to make up their own minds.


2 The next statement is "If individuals know little about the subject matter, new information is easily embedded in their cognitive structure (assimilation)." This is counter to the common thinking that learning about an unfamiliar topic is more difficult, and learning is made meaningful when it can be related to prior knowledge (Ausubel, 1968).

Read about making the unfamiliar familiar


3 The term 'discussion' might suggest an open-ended exchange of ideas and views. This would be a dialogic technique typical of constructivist approaches. From the wider context its seems likely something more teacher-directed and closed than this was meant here – but this is an interpretation which goes beyond the description available in the original text.

Read about dialogic learning


4 Researchers into problem-solving consider that a problem has to require a learner to do more that simply recall and apply previously learned knowledge and techniques – so an 'algorithmic problem' might be considered an oxymoron. However, it is common for teachers to refer to algorithmic exercises as 'problems' even though they do not require going beyond application of existing learning.


5 This design does avoid the criticism that one of the teacher may have just been more effective at teaching the topic to this age group, as both teachers teach in both conditions.

This does not entirely remove potential confounds as teachers interact differently with different classes, and with only four teacher-class combinations it could well be that there is better rapport in the two classes in one or other condition. It is very hard to see how this can be addressed (except by having a large enough sample of classes to allow inferential statistics to be used rigorously – which is not feasible in small scale studies).

A potentially more serious issue is 'expectancy' effects. There is much research in education and other social contexts to show that people's beliefs and expectations influence outcomes of studies – and this can make a substantial difference. If the two teachers were unconvinced by the newfangled and progressive approach being tested, then this could undermine their ability to effectively teach that way.

On the other hand, although it is implied that these teachers normally teach in the 'traditional' way, actually constructivist approaches are recommended in Turkey, and are officially sanctioned, and widely taught in teacher education and development courses. If the teachers accepted the arguments for believing the SWH was likely to be more effective at bringing about conceptual learning than the methods they were asked to adopt in the comparison classes, that would further undermine that treatment as a fair control condition.

Read about expectancy effects in research

Again, there is very little researchers can do about this issue as they cannot ensure that teachers participating in research studies are equally confident in the effectivenes of different treatments (and why should they be – the researchers are obviously expecting a substantive difference*), and this is a major problem in studies into teaching innovations (Taber, 2019).

* This is clear from their paper. Is it likely that they would have communicated this to the teachers? "The teachers were given training sessions about [SWH's] implementation prior to the study." Presumably, even if somehow these experienced teachers had previously managed to completely avoid or ignore years of government policy and guidance intending to persuade them of the value of constructivist approaches, the researchers could not have offered effective "training sessions" without explaining the rationales of the overall approach, and for the specific features of the SWH that they wanted teachers to adopt.


A drafted man is like a draft horse because…

A case of analogy in scientific discovery


Keith S. Taber


How is a drafted man like a draft horse (beyond them both having been required to give service?)

"The phthisical soldier is to his messmates
what
the glandered horse is to its yoke fellow"

Jean-Antoine Villemin quoted by Goetz, 2013

Analogy in science

I have discussed many examples of analogies in these pages. Often, these are analogies intended to help communicate scientific ideas – to introduce some scientific concept by suggesting it is similar to something already familiar. However, analogy is important in the practice of science itself – not just when teaching about or communicating science to the general public. Scientific discoveries are often made by analogical thinking – perhaps this as-yet-unexplained phenomenon is a bit like that other well-conceptualised phenomenon?

Analogies are more than just similes (simply suggesting that X is like Y; say that the brain is like a telephone exchange 1) because they are based on an explicit structural mapping. That is, there are parallels between relationships within a concept.

So,

  • to say that the atom is a tiny solar system would just be a metaphor, and
  • to simply state that the atom is like a tiny solar system would be a simile;
  • but to say that the atom is like a tiny solar system because both consist of a more massive central body orbited by much less massive bodies would be an analogy. 2

Read about analogies in science

A medical science analogy

Thomas Goetz describes how, in the nineteenth century, Jean-Antoine Villemin suspected that the disease known as phthisis (tuberculosis, 'T.B.') was passed between people, and that this tended to occur when people were living in crowded conditions. Villemin was an army surgeon and the disease was very common among soldiers, even though they tended to be drawn from younger, healthier members of the population. (This phenomenon continued into the twentieth century long after the cause of the infection was understood. 3)


Heavy horses: it is not just the workload of draught horses that risks their health 4
(Image by Daniel Borker from Pixabay)


Villemin knew that a horse disease, glanders, was often found to spread among horses that were yoked closely together to work in teams, and he suspected something similar was occurring among the enlisted men due to their living and working in close quarters.

"…Jean-Antoine Villemin, a French army surgeon…in the 1860s conducted a series of experiments testing whether tuberculosis could be transmitted form one animal to another. Villemin's interest began when he observed how tuberculosis seemed to affect young men who moved to the city, even though they were previously healthy in their rural homes. He compared the effect to how glanders, a horse disease, seemed to spread when a team [of horses] was yoked together. "The phthisical soldier is to his messmates what the glandered horse is to its yoke fellow", Villemin conjectured."

Goetz, 2013, p.104

To a modern reader this seems an unremarkable suggestion, but that would be an ahistorical evaluation. Glanders is an infectious disease, and so is tuberculosis, so being in close contact with an infected cospecific is clearly a risk factor for being infected. Yet, when Villemin was practising medicine it was not accepted that tuberculosis was infectious, and infectious agents such as bacteria and viruses had not been identified.

Before the identification of the bacterium Mycobacterium tuberculosis as the infectious agent, there was no specific test to demarcate tuberculosis from other diseases. This mattered as although T.B. tends to especially affect the pulmonary system, it can cause a wide range of problems for an infected person. Scrofula, causing swollen lymph nodes, was historically seen as quite distinct from consumption, recognised by bloody coughing, but these are now both recognised as the results of Mycobacterium tuberculosis infection (when the bacterium moves from the lungs into the lymphatic system it leads to the symptoms of scrofula). The bacterium can spread through the bloodstream to cause systemic disease. However, a person may be infected with the bacterium for years before becoming ill. Before the advent of 'germ theory', and the ability to identify specific 'germs', the modern account of tuberculosis as a complex condition with diverse symptoms caused by a single infectious agent was not at all obvious.

The contexts of discovery and justification

Although the analogy with glanders was suggestive to Villemin, this was just the formation of a hypothesis: that T.B. could be passed from one person to another via some form of material transfer during close contact. The context of discovery was the recognition of an analogy, but the context of justification needed to be the laboratory.

Sacrifices for medical science

The basic method for testing the hypothesis consisted of taking diseased animals (today we would say infected, but that was not yet accepted), excising diseased material from their bodies, or taking samples of tissue from diseased people, and introducing it into the bodies of healthy animals. If the healthy animals quickly showed signs of disease, when similar controls remained healthy, it seemed likely that the transfer of material from the diseased animal was the cause.

Although the microbes responsible for T.B. and similar diseases had not been found, autopsy showed irregularities in diseased bodies. The immune system acts to localise the infection and contain it within tissue nobules or granuloma known as 'tubercles'. These tubercles are large enough to be detected and recognised post-mortem.

It was therefore possible to harvest diseased material and introduce it into healthy animals:

"If one shaves a narrow area on the ear of a rabbit or at the groin or on the chest under the elbow of a dog, and then creates a subcutaneous wound so small and so shallow that it does not yield the slightest drop of blood, and then one introduces into this wound, such that it cannot escape, a pinhead-sized packet of tuberculous material obtained from a man, a cow or a rabbit that has already been rendered tuberculous; or if, alternatively, one uses a Pravaz [hypodermic] syringe to instil, under the skin of the animal, a few droplets of sputum from a patient with phthisis…"

Villemin, 1868/2015, p.256

Villemin reports that the tiny wound quickly heals, and then the introduced material cannot be felt beneath the site of introduction. However after a few days:

"a slight swelling is observed, accompanied in some cases by redness and warmth, and one observes the progressive development of a local tubercle of a size between that of a hemp seed and that of a cobnut. When they reach a certain volume, these tubercles generally ulcerate. In some cases, there is an inflammatory reaction…"

Villemin, 1868/2015, p.256

Despite these signs, the animals remain in reasonable health – for a while,

"Only after 15, 20 or 30 days does it become evident that they are losing weight, and have lost their appetite, gaiety and vivacity of movement. Some, after going into decline for a certain period, regain some weight. Others gradually weaken, falling into the doldrums, often suffering from debilitating diarrhoea, finally succumbing to their illness in a state of emaciation."

Villemin, 1868/2015, p.256
In the doldrums

The doldrums refers to oceanic waters within about five degrees of the equator where there are often 'lulls' or calms with no substantial winds. Sailing ships relied on winds to make progress, and ships that were in the doldrums might be becalmed for extended periods, and so unable to make progress, leaving crews listless and frustrated – and possibly running out of essential supplies.

"Down dropt the breeze, the sails dropt down, 'Twas sad as sad could be; And we did speak only to break The silence of the sea! 

All in a hot and copper sky, The bloody Sun, at noon, Right up above the mast did stand, No bigger than the Moon. 

Day after day, day after day, We stuck, nor breath nor motion; As idle as a painted ship Upon a painted ocean. 

Water, water, every where, And all the boards did shrink; Water, water, every where, Nor any drop to drink." 

Extract from The Rime of the Ancient Mariner, 1834, Samuel Taylor Coleridge

So, the inoculated animals 'fell into the duldrums', metaphorically speaking.

Read about metaphors in science


Under a hot and copper sky
(Image by Youssef Jheir from Pixabay)

The needs of the many are outweighed by the needs of humans

It was widely considered entirely acceptable to sacrifice the lives and well-being of animals in this way, to generate knowledge that is was hoped might help reduce human suffering. 'Animal rights' had not become a mainstream cause (even if animals had occasionally been subject to legal prosecution and sometimes found guilty in European courts – suggesting they had responsibilities if not rights).

Similar experiments were later carried out by Robert Koch in his own investigations of T.B. and other diseases soon after. Indeed, Goetz notes that when working on anthrax in 1875,

"As Koch's experiments went on, his backyard menagerie began to thin out; his daughter, Getrud, grew concerned that she was losing all her pets."
p.27

Goetz, 2013, p.27

"Let us hope that daddy can draw conclusions from his experiments soon…"
(Image by Adina Voicu from Pixabay )

Although animals are still used in medical research today, there is much more concern about their welfare and researchers are expected to avoid the suffering and death of more animals than considered strictly necessary. 5 Wherever possible, alternatives to animal experimentation are preferred.

Inadmissible analogies?

One of the arguments made against animal studies is that as different species are by definition different in their anatomy and physiology, non-human animals are imperfect models for human disease processes. One argument that Villemin faced was that his inoculations between animals was most successful in rabbits, when, it was claimed, rabbits were widely tubercular in the normal population. In other words, it was suggested that Villemin only found evidence of disease in his inoculated test animals because they probably already had the disease anyway.

That suggests the need for some sort of experimental control, and Villemin reported that

"…despite routine sequestration and the tortures that the vivisectionists force them to endure, rabbits are almost never tuberculous. I have explored more than a hundred lungs from these rodents from markets and I found none to be tuberculous."

Villemin, 1868/2015, p.257
Indirect evidence

Villemin had made an analogy between disease transfer between horses to disease transfer between humans. His experiments did not directly test disease transfer between humans – as that would not have been considered unethical (and so "absolutely forbidden") even at a time when animal (i.e., non-human animal) research was not widely questioned:

I believe that I have experimentally demonstrated that phthisis, like syphilis and glanders, is communicable by inoculation. It can be inoculated from humans to certain animals, and from these animals to others of the same species. Can it be inoculated between humans? It is absolutely forbidden for us to provide experimental proof of this, but all the evidence is in favour of an affirmative response.

Villemin, 1868/2015, p.265

So, Villemin did not demonstrate that T.B. could be transferred between people, but only that analogous transfers occurred. So, in a sense, the context of justification, as well as the context of discovery, relied on analogies. Despite this, the indirect evince was strong and Villemin's failure to persuade most of the wider scientific community of his arguments likely reflected the general paradigmatic beliefs at the time that disease was caused by hereditary weakness, or through broad environmental conditions, rather than minute amounts of material being transferred between bodies.


Mycobacterium tuberculosis – the infectious agent in tuberculosis – could only be detected once suitable microscopes were available – Koch published his discovery of the bacterium in 1882.

(source: Wikipedia Commons)


Koch was able to be more persuasive because he was also able to actually identify a microbe present in diseased bodies, as well as show inoculation led to the microbe being found in the inoculated animal. That shift in thinking required the acceptance of a different kind of analogy: that the presence, or absence, of a bacterium in the tissues mapped onto being infected with, or free from, a disease.

present in tissuesMycobacterium tuberculosis
a microscopic 'germ' – only visible under the microscope
absent in tissues
     â†•ï¸Žâ†•ï¸Ž
infectedtuberculosis
a widespread and often fatal disease of people and other mammals
not infected
In a sense, diagnosis through microbiological methods relies on a kind of analogy

Sources cited:
  • Daniel, T. M. (2015). Jean-Antoine Villemin and the infectious nature of tuberculosis. The International Journal of Tuberculosis and Lung Disease, 19(3), 267-268. https://doi.org/10.5588/ijtld.06.0636
  • Frith, J. (2014). History of Tuberculosis. Part 1 – Phthisis, consumption and the White Plague. Journal of Military and Veterans' Health, 22(2), 29-35.
  • Goetz, T. (2013). The Remedy. Robert Koch, Arthur Conan Doyle, and the quest to cure tuberculosis. Gotham Books.
  • Surget, A. (2022). Being between Scylla and Charybdis: designing animal studies in neurosciences and psychiatry – too ethical to be ethical? In Seminar series: Berlin-Bordeaux Working Group Translating Validity in Psychiatric Research.
  • Taber, K. S. (2013). Upper Secondary Students' Understanding of the Basic Physical Interactions in Analogous Atomic and Solar Systems. Research in Science Education, 43(4), 1377-1406. doi:10.1007/s11165-012-9312-3
  • Villemin, J. A. (1868/2015). On the virulence and specificity of tuberculosis [De la virulence et de la spécificité de la tuberculose]. The International Journal of Tuberculosis and Lung Disease, 19(3), 256-266. https://doi.org/https://doi.org/10.5588/ijtld.06.0636-v

Notes

1 As analogies link to what is familiar, they tend to reflect cultural contexts. At one time the mind was referred to as being like a slate. The once-common comparison of the brain to a telephone exchange has tended to have been largely displaced now by the commparison to a computer.


2 Whilst this is a common teaching analogy, it is also problematic if it taught without considering the negative aspects of the analogy (e.g. electrons repel each other, unlike planets; planets vary in mass etc.), and if the target concept is not clearly presented as one (simplified) model of atomic structure. See Taber, 2013.


3 "During both World War I and World War II in the US Army, tuberculosis was the leading cause of discharge [i.e., from the service]. Annual incidence of tuberculosis in the military of Western countries is very low, however in the last several decades microepidemics have occurred in small close knit units on US and British Naval warships and land based units deployed overseas. Living and working in close quarters and overseas deployment to tuberculosis-endemic areas of the world such as Afghanistan, Iraq and South-East Asia remain significant risk factors for tuberculosis infection in military personnel, particularly multidrug resistant tuberculosis."

Frith, 2014, p.29

4 Some horses have been bred to be fast runners, and others to be capable of pulling heavy loads. (That is some have been artificially selected to be like sprinters or cyclists, and others to be like weightlifters or shot-putters). The latter are variously called draft (U.S. spelling) / draught (British spelling) horses (US), dray horses, carthorses, work horses or heavy horses. When a load was too heavy to be moved by a single horse, several would be harnessed together into a team – providing more power. Ironically the term 'horsepower' was popularised by James Watts – whose name has since been given to the modern international (S.I.) unit of power – in marketing his steam engines. According to the Institute of Physics,

Whilst the peak mechanical power of a single horse can reach up to 15 horsepower, it is estimated that a typical horse can only sustain an output of 1 horsepower (746 W) for three hours and, if working for an eight-hour day, a horse might output only three quarters of one horsepower. 

https://spark.iop.org/why-one-horsepower-more-power-one-horse

5 Alexandre Surget (Associate Professor at University of Tours, France) has even argued that the guidelines adopted in animal experiments are sometimes counter-productive as they encourage experiments with too few animals, and consequently too little statistical power, to support robust conclusions – in effect sacrificing animals without reasonable expectations of securing sound knowledge (Surget, 2022).

Any research that makes demands of resources and the input of others, but which is designed in such a way that it is unlikely to produce reliable new knowledge, can be considered unethical.

Read about research ethics


Out of the womb of darkness

Medical ethics in 20th Century movies


Keith S. Taber


The hero of the film, Dr Holden, is presented as a scientist. Here he is trying to collect some data.
(still from 'The Night of the Demon')

"The Night of the Demon" is a 1957 British film about an American professor who visits England to investigate a supposed satanic cult. It was just shown on English television. It was considered as a horror film at the time of its release, although the short scenes that actually feature a (supposedly real? merely imagined? *) monster are laughable today (think Star Trek's Gorn in the original series, and consider if it is believable as anything other than an actor wearing a lizard suit – and you get the level of horror involved). [*Apparently the director, Jacques Tourneur, never intended a demon to be shown, but the film's producer decided to add footage showing the monster in the opening scenes, potentially undermining the whole point of the film: but giving the publicity department something they could work with. 6]


A real scary demon (in 1959) and a convincing alien (in 1967)?
(stills from 'The Night of the Demon' and ' Star Trek' episode 'Arena')
[Move the slider to see more of each image.]

The film's protagonist is a psychologist, Dr. John Holden, who dismisses stories of demons and witchcraft and the like, and has made a career studying people's beliefs about such superstitions. Dr Holden's visit to Britain deliberately coincided with a conference at which he was to present, as well as coincidentally with the death of one of his colleagues (who had been subject to a hex for investigating the cult).


'Night of the Demon' (Dir.  Jacques Tourneur) movie poster: Sabre Film Production.
[As was common at the time, although the film was in monochrome, the publicity was coloured. Whether the colour painting of the monster looks even less scary than the version in the film itself is a moot point.]

The film works much better as a kind of psychological thriller examining the power of beliefs, than as horror. (Director: 1 – Producer, 0.) That, if we believe something enough, it can have real effects is well acknowledged – but this does not need a supernatural explanation. People can be 'scared' to death by what they imagine, and how they respond to their fears. Researchers expecting a positive outcome from their research are likely to inadvertently behave in ways that leads to this very result: thus the use of double blind studies in medical trials, so that the researchers do not know which patients are receiving which treatment.

Read about expectancy effects in research

While the modern viewer will find little of suspense in the film, I did metaphorically at least 'recoil with shock' from one moment of 'horror'. At the conference a patient (Rand Hobart) is wheeled in on a trolley – someone suspected of having committed a murder associated with the cult, whom the authorities had allowed to be questioned by the researchers…at the conference.


"The authorities have lent me this suspected murderer for the benefit of dramatic effect and for plot development purposes"
(still from 'The Night of the Demon').

A variety of movie posters were produced for the film 6 – arguably this one reflects the genuinely horrific aspect of ther story. To a modern viewer this might also appear the most honest representation of the film as the demon given prominence in some versions of the poster barely features in the film.

Holden's British colleague, Professor O'Brien, explains to the delegates,

"For a period of time this man has been as you see him here. He fails to respond to any normal stimulation. His experience, whatever it was, which we hope here to discover, has left him in a state of absolute catatonic immobility. When I first investigated this case, the problem of how to hypnotise an unresponsive person was the major one. Now the proceedings may be somewhat dramatic, but they are necessary. The only way of bringing his mind out of the womb of darkness into which it has retreated to protect itself, is by therapeutic shock, electrical or chemical. For our purposes we are today using pentothal [? 1] and later methylamphetamine."

Introducing a demonstration of non-consensual use of drugs on a prisoner/patient

"Okay, we'll give him a barbiturate, then we'll hypnotise him, then a stimulant, and if that does not kill him, surely he will simply, calmly and rationally, tell us what so traumatised him that he has completely withdrawn into his subconscious."
(Still from 'The Night of the Demon')


After an injection, Hobart comes out of his catatonic state, becomes aware of his surroundings, and panics.

The dignity of the accused: Hobart is forced out of his 'state of absolute catatonic immobility' to discover he is an exhibit at a scientific conference.
(Still from 'The Night of the Demon'.)

He is physically restrained, and examined by Holden (supposedly the 'hero' of the piece), who then hypnotises him.



He is then given an injection of methylamphetamine before being questioned by O'Brien and Holden. He becomes agitated (what, after being forcibly given 'speed'?), breaks free, and leaps, out of a conveniently placed window, to his death.

Now, of course, this is all just fiction – a story. No one is really drugged, and Hobart is played by an' actor who is unharmed. (I can be fairly sure of that as the part was played by Brian Wilde who much later turned up alive and well as prison officer 'Mr Barrowclough' in BBC's Ronnie Barker vehicle 'Porridge'.)


The magic of the movies – people do not stay dead, and there are no professional misconduct charges brought against our hero.
(Stills from 'The Night of the Demon' and from BBC series 'Porridge'.3 )
[Move the slider to see more of each image.]

Yet this is not some fantastical film (the Gorn's distant cousin aside) but played for realism. Would a psychiatric patient and murder suspect have been released to be paraded and demonstrated at a conference on the paranormal in 1957? I expect not. Would the presenters have been allowed to drug Hobart without his consent?

Read about voluntary, informed, consent

An adult cannot normally be medicated without their consent unless they are considered to lack the ability to make responsible decisions for themselves. Today, it might be possible to give a patient drugs without consent if they have been sectioned under the Mental Health Act (1983) and it was considered the action was necessary for their safety or for the safety of others. Hobart was certainly not an immediate threat to anyone before he was brought out of his trance.

However, even if this enforced use of drugs was sanctioned, this would not be done in a public place with dozens of onlookers. 4 And it would not be done (in the U.K. at least!) simply to question someone about a crime.5 Presumably, the makers of the film either thought that this scene reflected something quite reasonable, or, at least, that the cinema-going public would find this sufficiently feasible to suspend disbelief. If this fictitious episode did not reflect acceptable ethical standards at the time, it would seem to tell us something about public perceptions of the attitude of those in authority (whether the actual authorities who were meant to have a duty of care to a person under arrest, or those designated with professional roles and academic titles) to human rights.

Today, however, professionals such as researchers, doctors, and even teachers, are prepared for their work with a strong emphasis on professional ethics. In medical care, the interest of the patient themselves comes first. In research, informants are voluntary participants in our studies, who offer us the gift of data, and are not subjects of our enquiries to be treated simply as available material for our work.

Yet, actually, this is largely a modern perspective that has developed in recent decades, and sadly there are many real stories, even in living memory, of professionals deciding that people (and this usually meant people with less standing or power in their society) should be drugged, or shocked, or operated on, without their consent and even against their explicit wishes; for what is seen as their own, or even what is judged as some greater, good; in circumstances where it would be totally unacceptable in most countries these days.

So, although this is not really a horror film by today's measures, I hope any other researchers (or medical practitioners) who were watching the film shared my own reaction to this scene: 'no, they cannot do that!'

At least, they could not do that today.

Read about research ethics


Notes

1 This sounds to me like 'pentatyl', but I could not find any reference to a therapeutic drug of that name. Fentanyl is a powerful anti-pain drug, which like amphetamines is abused for recreational use – but was only introduced into practice the year after the film was made. It was most likely referring to sodium thiopental, known as pentothal, and much used (in movies and television, at least) as a truth serum. 5 As it is a barbiturate, and so is used in anaesthesia, it does not seem an obvious drug of choice to wake someone from a catatonic state.


2 The script is based loosely on a 1911 M. R. James short story, 'Casting the Runes' that does not include the episode discussed.


3 I have flipped this image (as can be seen form the newspaper) to put Wilde (playing alongside Ronnie Barker, standing, and Richard Beckinsale), on the right hand side of picture.


4 Which is not to claim that such a public demonstration would have been unlikely at another time and place. Execution was still used in the U.K. until 1964 (during my lifetime), although by that time being found guilty of vagrancy (being unemployed and hanging around {unfortunate pun unintended}) for the second time was no longer a capital offence. However, after 1868 executions were no longer carried out in public.

It was not unknown for the corpses of executed criminals to be subject to public dissection in Renaissance [sic, ironically] Europe.


5 Fiction, of course, has myriad scenes where 'truth drugs' are used to obtain secrets from prisoners – but usually those carrying out the torture are the 'bad guys', either criminals or agents of what is represented in the story as an enemy or dystopian state.


6 Some variations on a theme. (For some reason, for its slightly cut U.S. release 'The Night of the Demon' was called 'The Curse of the Demon'.) The various representations of the demon and the prominence given to it seem odd to a modern viewer given how little the demon actually features in the film.

The references to actually seeing demons and monsters from hell on the screen, "the most terrifying story ever told", and "scenes of terror never before imagined" raises the question of whether the copywriters were expected to watch a film before producing their copy.

Passive learners in unethical control conditions

When 'direct instruction' just becomes poor instruction


Keith S. Taber


An experiment that has been set up to ensure the control condition fails, and so compares an innovation with a substandard teaching condition, can – at best – only show the innovation is not as bad as the substandard teaching

One of the things which angers me when I read research papers is examples of what I think of as 'rhetorical research' that use unethical control conditions (Taber, 2019). That is, educational research which sets up one group of students to be taught in a way that is clearly disadvantages them to ensure the success of an experimental teaching approach,

"I am suggesting that some of the experimental studies reported in the literature are rhetorical in the … sense that the researchers clearly expect to demonstrate a well- established effect, albeit in a specific context where it has not previously been demonstrated. The general form of the question 'will this much-tested teaching approach also work here' is clearly set up expecting the answer 'yes'. Indeed, control conditions may be chosen to give the experiment the best possible chance of producing a positive outcome for the experimental treatment."

Taber, 2019, p.108

This irks me for two reasons. The first, obviously, is that researchers have been prepared to (ab)use learners as 'data fodder' and subject them to poor learning contexts in order to have the best chance of getting positive results for the innovation supposedly being 'tested'. However, it also annoys me as this is inherently a poor research design (and so a poor use of resources) as it severely limits what can be found out. An experiment that compares an innovation with a substandard teaching condition can, at best, show the innovation is not as ineffecitive as the substandard teaching in the control condition; but it cannot tell us if the innovation is at least as effective as existing good practice.

This irritation is compounded when the work I am reading is not some amateur report thrown together for a predatory journal, but an otherwise serious study published in a good research outlet. That was certainly the case for a paper I read today in Research in Science Education (the journal of the Australasian Science Education Research Association) on problem-based learning (Tarhan, Ayar-Kayali, Urek & Acar, 2008).

Rhetorical studies?

Genuine research is undertaken to find something out. The researchers in this enquiry claim:

"This research study aims to examine the effectiveness of a [sic] problem-based learning [PbBL] on 9th grade students' understanding of intermolecular forces (dipole- dipole forces, London dispersion forces and hydrogen bonding)."

Tarhan, et al., 2008, p.285

But they choose to compare PbBL with a teaching approach that they expect to be ineffective. Here the researchers might have asked "how does teaching year 9 students about intermolecular forces though problem-based learning compared with current good practice?" After all, even if PbBL worked quite well, if it is not quite as effective as the way teachers are currently teaching the topic then, all other things being equal, there is no reason to shift to it; whereas if it outperforms even our best current approaches, then there is a reason to recommend it to teachers and roll out associated professional development opportunities.


Problem-based learning (third column) uses a problem (i.e., a task which cannot be solved simply by recalling prior learning or employing an algorithmic routine) as the focus and motivation for learning about a topic

Of course, that over-simplifies the situation, as in education, 'all other things' never are equal (every school, class, teacher…is unique). An approach that works best on average will not work best everywhere. But knowing what works best on average (that is, taken across the diverse range of teaching and learning contexts) is certainly a very useful starting point when teachers want to consider what might work best in their own classrooms.

Rhetorical research is poor research, as it is set up (deliberately or inadvertently) to demonstrate a particular outcome, and, so, has built-in bias. In the case of experimental studies, this often means choosing an ineffective instructional approach for the comparison class. Why else would researchers select a control condition they know is not suitable for bringing about the educational outcomes they are testing for?

Problem-Based Learning in a 9th Grade Chemistry Class

Tarhan and colleagues' study was undertaken in one school with 78 students divided into two groups. One group was taught through a sequence based on problem-based learning that involved students undertaking research in groups, gently supported and steered by the teacher. The approach allowed student dialogue, which is believed to be valuable in learning, and motivated students to be active engaged in enquiry. When such an approach is well judged it has potential to count as 'scaffolding' of learning. This seems a very worthwhile innovation – well worth developing and evaluating.

Of course, work in one school cannot be assumed to generalise elsewhere, and small-scale experimental work of this kind is open to major threats to validity, such as expectancy effects and researcher bias – but this is unfortunately always true of these kinds of studies (which are often all educational researchers are resourced to carry out). Finding out what works best in some educational context at least potentially contributes to building up an overall picture (Taber, 2019). 1

Why is this rhetorical research?

I consider this rhetoric research because of the claims the authors make at the start of the study:

"Research in science education therefore has focused on applying active learning techniques, which ensure the affective construction of knowledge, prevent the formation of alternate conceptions, and remedy existing alternate conceptions…Other studies suggest that active learning methods increase learning achievement by requiring students to play a more active role in the learning process…According to active learning principles, which emphasise constructivism, students must engage in researching, reasoning, critical thinking, decision making, analysis and synthesis during construction of their knowledge."

Tarhan, et al., 2008, pp.285-286

If they genuinely believed that, then to test the effectiveness of their PbBL activity, Tarhan and colleagues needed to compare it with some other teaching condition that they are confident can "ensure the affective construction of knowledge, prevent the formation of alternate conceptions, and remedy existing alternate conceptions… requir[e] students to play a more active role in the learning process…[and] engage in researching, reasoning, critical thinking, decision making, analysis and synthesis during construction of their knowledge." A failure to do that means that the 'experiment' has been biased – it has been set up to ensure the control condition fails.

Unethical research?

"In most educational research experiments of [this] type…potential harm is likely to be limited to subjecting students (and teachers) to conditions where teaching may be less effective, and perhaps demotivating. This may happen in experimental treatments with genuine innovations (given the nature of research). It can also potentially occur in control conditions if students are subjected to teaching inputs of low effectiveness when better alternatives were available. This may be judged only a modest level of harm, but – given that the whole purpose of experiments to test teaching innovations is to facilitate improvements in teaching effectiveness – this possibility should be taken seriously."

Taber, 2019, p.94

The same teacher taught both classes: "Both of the groups were taught by the same chemistry teacher, who was experienced in active learning and PbBL" (p.288). This would seem to reduce the 'teacher effect' – outcomes being effected because the teacher of one one class being more effective than that of another. (Reduce, rather than eliminate, as different teachers have different styles, skills, and varied expertise: so, most teachers are more suited to, and competent in, some teaching approaches than others.)

So, this teacher was certainly capable of teaching in the ways that Tarhan and colleagues claim as necessary for effective learning ("active learning techniques"). However, the control condition sets up the opposite of active learning, so-called passive learning:

"In this study, the control group was taught the same topics as the experimental group using a teacher-centred traditional didactic lecture format. Teaching strategies were dependent on teacher expression and question-answer format. However, students were passive participants during the lessons and they only listened and took notes as the teacher lectured on the content.

The lesson was begun with teacher explanation about polar and nonpolar covalent bonding. She defined formation of dipole-dipole forces between polar molecules. She explained that because of the difference in electronegativities between the H and Cl atoms for HCl molecule is 0.9, they are polar molecules and there are dipole-dipole forces between HCl molecules. She also stated that the intermolecular dipole-dipole forces are weaker than intramolecular bonds such as covalent and ionic bonding. She gave the example of vaporisation and decomposition of HCl. She explained that while 16 kJ/mol of energy is needed to overcome the intermolecular attraction between HCl molecules in liquid HCl during vaporisation process of HCl, 431 kJ/mol of energy is required to break the covalent bond between the H and Cl atoms in the HCl molecule. In the other lesson, the teacher reminded the students of dipole-dipole forces and then considered London dispersion forces as weak intermolecular forces that arise from the attractive force between instantaneous dipole in nonpolar molecules. She gave the examples of F2, Cl2, Br2, I2 and said that because the differences in electronegativity for these examples are zero, these molecules are non-polar and had intermolecular London dispersion forces. The effects of molecular size and mass on the strengths of London dispersion forces were discussed on the same examples. She compared the strengths of dipole-dipole forces and London dispersion forces by explaining the differences in melting and boiling points for polar (MgO, HCl and NO) and non-polar molecules (F2, Cl2, Br2, and I2). The teacher classified London dispersion forces and dipole- dipole as van der Waals forces, and indicated that there are both London dispersion forces and dipole-dipole forces between polar molecules and only London dispersion forces between nonpolar molecules. In the last lesson, teacher called attention to the differences in boiling points of H2O and H2S and defined hydrogen bonds as the other intermolecular forces besides dipole-dipole and London dispersion forces. Strengths of hydrogen bonds depending on molecular properties were explained and compared in HF, NH3 and H2O. She gave some examples of intermolecular forces in daily life. The lesson was concluded with a comparison of intermolecular forces with each other and intramolecular forces."

Tarhan, et al., 2008, p.293

Lecturing is not ideal for teaching university students. It is generally not suitable for teaching school children (and it is not consistent with what is expected in Turkish schools).

This was a lost opportunity to seriously evaluate the teaching through PbBL by comparing with teaching that followed the national policy recommendations. Moreover, it was a dereliction of the duty that educators should never deliberately disadvantage learners. It is reasonable to experiment with children's learning when you feel there is a good chance of positive outcomes: it is not acceptable to deliberately set up learners to fail (e.g., by organising 'passive' learning when you claim to believe effective learning activities are necessarily 'active').

Isn't this 'direct instruction'?

Now, perhaps the account of the teaching given by Tarhan and colleagues might seem to fit the label of 'direct teaching'. Whilst Tarhan et al. claim constructivist teaching is clearly necessary for effective learning, there are some educators who claim that constructivist approaches are inferior, and a more direct approach, 'direct instruction', is more likely to lead to learning gains.

This has been a lively debate, but often the various commentators use terminology differently and argue across each other (Taber, 2010). The proponents of direct instruction often criticise teaching that expects learners to take nearly all the responsibility for learning, with minimal teacher support. I would also criticise that (except perhaps in the case of graduate research students once they have demonstrated their competence, including knowing when to seek supervisory guidance). That is quite unlike genuine constructivist teaching which is optimally guided (Taber, 2011): where the teacher manages activities, constantly monitors learner progress, and intervenes with various forms of direction and support as needed. Tarhan and colleagues' description of their problem-based learning experimental condition appears to have had this kind of guidance:

"The teacher visited each group briefly, and steered students appropriately by using some guiding questions and encouraging them to generate their hypothesis. The teacher also stimulated the students to gain more information on topics such as the polar structure of molecules, differences in electronegativity, electron number, atom size and the relationship between these parameters and melting-boiling points…The teacher encouraged students to discuss the differences in melting and boiling points for polar and non-polar molecules. The students came up with [their] research questions under the guidance of the teacher…"

Tarhan, et al., 2008, pp.290-291

By contrast, descriptions of effective direct instruction do involve tightly planned teaching with carefully scripted teacher moves of the kind quoted in the account, above, of the control condition. (But any wise teacher knows that lessons can only be scripted as a provisional plan: the teacher has to constantly check the learners are making sense of teaching as intended, and must be prepared to change pace, repeat sections, re-order or substitute activities, invent new analogies and examples, and so forth.)

However, this instruction is not simply a one-way transfer of information, but rather a teacher-led process that engages students in active learning to process the material being introduced by the teacher. If this is done by breaking the material into manageable learning quanta, each of which students engage with in dialogic learning activities before preceding to the next, then this is constructivist teaching (even if it may also be considered by some as 'direct instruction'!)


Effective teaching moves between teacher input and student activities and is not just the teacher communicating information to the learners.

By contrast, the lecture format adopted by Tarhan's team was based on the teacher offering a multi-step argument (delivered over several lessons) and asking the learners to follow and retain an extensive presentation.

"The lesson was begun with teacher explanation …

She defined …

She explained…

She also stated…

She gave the example …

She explained that …

the teacher reminded the students …

She gave the examples of …

She compared…

The teacher classified …

and indicated that …

[the] teacher called attention to …

She gave some examples of …"

Tarhan, et al., 2008, p.293

This is a description of the transmission of information through a communication channel: not an account of teaching which engages with students' thinking and guides them to new understandings.

Ethical review

Despite the paper having been published in a major journal, Research in Science Education, there seems to be no mention that the study design has been through any kind of institutional ethical review before the research began. Moreover, there is no reference to the learners or their parents/guardians having been asked for, or having given, voluntary, informed, consent, as is usually required in research with human participants. Indeed Tarhen and colleagues refer to the children as the 'subjects' of their research, not participants in their study.

Perhaps ethical review was not expected in the national context (at least, in 2008). Certainly, it is difficult to imagine how voluntary, informed, consent would be obtained if parents were to be informed that half of the learners would be deliberately subject to a teaching approach the researchers claim lacks any of the features "students must engage in…during construction of their knowledge".

PbBL is better than…deliberately teaching in a way designed to limit learning

Tarhan and colleagues, unsurprisingly, report that on a post-test the students who were taught through PbBL out-performed these students who were lectured at. It would have been very surprising (and so potentially more interesting, and, perhaps, even useful, research!) had they found anything else, given the way the research was biased.

So, to summarise:

  1. At the outset of the paper it is reported that it is already established that effective learning requires students to engage in active learning tasks.
  2. Students in the experimental conditions undertook learning through a PbBL sequence designed to engage them in active learning.
  3. Students in the control condition were subject to a sequence of lecturing inputs designed to ensure they were passive.
  4. Students in the active learning condition outperformed the students in the passive learning condition

Which I suggest can be considered both rhetorical research, and unethical.


The study can be considered both rhetorical and unfair to the learners assigned to be in the control group

Read about rhetorical experiments

Read about unethical control conditions


Work cited:

Note:

1 There is a major issue which is often ignored in studies of his type (where a pedagogical innovation is trialled in a single school area, school or classroom). Finding that problem-based learning (or whatever) is effective in one school when teaching one topic to one year group does not allow us to generalise to other classrooms, schools, country, educational level, topics and disciplines.

Indeed, as every school, every teacher, every class, etc., is unique in some ways, it might be argued that one only really finds out if an approach will work well 'here' by trying it out 'here' – and whether it is universally applicable by trying it everywhere. Clearly academic researchers cannot carry out such a programme, but individual teachers and departments can try out promising approaches for themselves (i.e., context-directed research, such as 'action research').

We might ask if there is any point in researchers carrying out studies of the type discussed in this article, there they start by saying an approach has been widely demonstrated, and then test it in what seems an arbitrarily chosen (or, more likely, convenient) curriculum and classroom context, given that we cannot generalise from individual studies, and it is not viable to test every possible context.

However, there are some sensible guidelines for how series of such studies into the same type of pedagogic innovation in different contexts can be more useful in (a) helping determine the range of contexts where an approach is effective (through what we might call 'incremental generalisation'), and (b) document the research contexts is sufficient detail to support readers in making judgements about the degree of similarity with their own teaching context (Taber, 2019).

Read about replication studies

Read about incremental generalisation

Didactic control conditions

Another ethically questionable science education experiment?


Keith S. Taber


This seems to be a rhetorical experiment where an educational treatment that is already known to be effective is 'tested' to demonstrate that it is more effective than suboptimal teaching – by asking a teacher to constrain her teaching to students assigned to be an unethical comparison condition

one group of students were deliberately disadvantaged by asking an experienced and skilled teacher to teach in a way all concerned knew was sub-optimal so as to provide a low base line that would be outperformed by the intervention, simply to replicate a much demonstrated finding

In a scientific experiment, an intervention is made into the natural state of affairs to see if it produces a hypothesised change. A key idea in experimental research is control of variables: in the ideal experiment only one thing is changed. In the control condition all relevant variables are fixed so that there is a fair test between the experimental treatment and the control.

Although there are many published experimental studies in education, such research can rarely claim to have fully controlled all potentially relevant variables: there are (nearly always, always?) confounding factors that simply can not be controlled.

Read about confounding variables

Experimental research in education, then, (nearly always, always?) requires some compromising of the pure experimental method.

Where those compromises are substantial, we might ask if experiment was the wrong choice of methodology: even if a good experiment is often the best way to test an idea, a bad experiment may be less informative than, for example, a good case study.

That is primarily a methodological matter, but testing educational innovations and using control conditions in educational studies also raises ethical issues. After all, an experiment means experimenting with real learners' educational experiences. This can certainly be sometimes justified – but there is (or should be) an ethical imperative:

  • researchers should never ask learners to participate in a study condition they have good reason to expect will damage their opportunities to learn.

If researchers want to test a genuinely innovative teaching approach or learning resource, then they have to be confident it has a reasonable chance of being effective before asking learners to participate in a study where they will be subjected to an untested teaching input.

It is equally the case that students assigned to a control condition should never be deliberately subjected to inferior teaching simply in order to help make a strong contrast with an experimental approach being tested. Yet, reading some studies leads to a strong impression that some researchers do seek to constrain teaching to a control group to help bias studies towards the innovation being tested (Taber, 2019). That is, such studies are not genuinely objective, open-minded investigations to test a hypothesis, but 'rhetorical' studies set up to confirm and demonstrate the researchers' prior assumptions. We might say these studies do not reflect true scientific values.


A general scheme for a 'rhetorical experiment'

Read about rhetorical experiments


I have raised this issue in the research literature (Taber, 2019), so when I read experimental studies in education I am minded to check see that any control condition has been set up with a concern to ensure that the interests of all study participants (in both experimental and control conditions) have been properly considered.

Jigsaw cooperative learning in elementary science: physical and chemical changes

I was reading a study called "A jigsaw cooperative learning application in elementary science and technology lessons: physical and chemical changes" (Tarhan, Ayyıldız, Ogunc & Sesen, 2013) published in a respectable research journal (Research in Science & Technological Education).

Tarhan and colleagues adopted a common type of research design, and the journal referees and editor presumably were happy with the design of their study. However, I think the science education community should collectively be more critical about the setting up of control conditions which require students to be deliberately taught in ways that are considered to be less effective (Taber, 2019).


Jigsaw learning involves students working in co-operative groups, and in undertaking peer-teaching

Jigsaw learning is a pedagogic technique which can be seen as a constructivist, student-centred, dialogic, form of 'active learning'. It is based on collaborative groupwork and includes an element of peer-tutoring. In this paper the technique is described as "jigsaw cooperative learning", and the article authors explain that "cooperative learning is an active learning approach in which students work together in small groups to complete an assigned task" (p.185).

Read about jigsaw learning

Random assignment

The study used an experimental design, to compare between learning outcomes in two classes taught the same topic in two different ways. Many studies that compare between two classes are problematic because whole extant classes are assigned to conditions which means that the unit of analysis should be the class (experimental condition, n=1; control condition, n=1). Yet, despite this, such studies commonly analyse results as if each learner was an independent unit of analysis (e.g., experimental condition, n=c.30; control condition, n=c.30) which is necessary to obtain statistical results, but unfortunately means that inferences drawn from those statistics are invalid (Taber, 2019). Such studies offer examples of where there seems little point doing an experiment badly as the very design makes it intrinsically impossible to obtain a (i.e., a valid) statistically significant outcome.


Experimental designs may be categorised as true experiments, quasi-experiments and natural experiments (Taber, 2019).

Tarhan and colleagues, however, randomly assign the learners to the two conditions so can genuinely claim that in their study they have a true experiment: for their study, experimental condition, n=30; control condition, n=31.

Initial equivalence between groups

Assigning students in this way also helped ensure the two groups started from a similar base. Often such experimental studies use a pre-test to compare the groups before teaching. However, often the researchers look for a statistical difference between the groups which does not reach statistical significance (Taber, 2019). That is, if a statistical test shows p≥0.05 (in effect, the initial difference between the groups is not very unlikely to occur by chance) this is taken as evidence of equivalence. That is like saying we will consider two teachers to be of 'equivalent' height as long as there is no more than 30 cm difference in their height!

In effect

'not very different'

is being seen as a synonym for

'near enough the same'


Some analogies for how equivalence is determined in some studies: read about testing for initial equivalence

However, the pretest in Tarhan and colleagues' study found that the difference between two groups in performances on the pretest was at a level likely to occur by chance (not simply something more than 5%, but) 87% of the time. This is a much more convincing basis for seeing the two groups as initially similar.

So, there are two ways in which the Tarhan et al. study seemed better thought-through than many small scale experiments in teaching I have read.

Comparing two conditions

The research was carried out with "sixth grade students in a public elementary school in Izmir, Turkey" (p.184). The focus was learning about physical and chemical changes.

The experimental condition

At the outset of the study, the authors suggest it is already known that

  • "Jigsaw enhances cooperative learning" (p.185)"
  • "Jigsaw promotes positive attitudes and interests, develops communication skills between students, and increases learning achievement in chemistry" (p.186)
  • "the jigsaw technique has the potential to improve students' attitude towards science"
  • development of "students' understanding of chemical equilibrium in a first year general chemistry course [was more successful] in the jigsaw class…than …in the individual learning class"

It seems the approach being tested was already demonstrated to be effective in a range of contexts. Based on the existing research, then, we could already expect well-implemented jigsaw learning to be effective in facilitating student learning.

Similarly, the authors tell the readers that the broader category of cooperative learning has been well established as successful,

"The benefits of cooperative learning have been well documented as being

higher academic achievement,

higher level of reasoning and critical thinking skills,

deeper understanding of learned material,

better attention and less disruptive behavior in class,

more motivation to learn and achieve,

positive attitudes to subject matter,

higher self-esteem and

higher social skills."

Tarhan et al., 2013, p.185

What is there not to like here? So, what was this highly effective teaching approach compared with?

What is being compared?

Tarhan and colleagues tell readers that:

"The experimental group was taught via jigsaw cooperative learning activities developed by the researchers and the control group was taught using the traditional science and technology curriculum."

Tarhan et al., 2013, p.189
A different curriculum?

This seems an unhelpful statement as it does not seem to compare like with like:


conditioncurriculumpedagogy
experimental?jigsaw cooperative learning activities developed by the researchers
control traditional science and technology curriculum?
A genuine experiment would look to control variables, so would not simultaneously vary both curriculum and pedagogy

The study uses a common test to compare learning in the two conditions, so the study only makes sense as an experimental test of jigsaw learning if the same curriculum is being followed in both conditions. Otherwise, there is no prima facie reason to think that the post-test is equally fair in testing what has been taught in the two conditions. 1

The control condition

The paper includes an account of the control condition which seems to make it clear that both groups were taught "the same content", which is helpful as to have done otherwise would have seriously undermined the study.

The control group was instructed via a teacher-centered didactic lecture format. Throughout the lesson, the same science and technology teacher presented the same content as for the experimental group to achieve the same learning objectives, which were taught via detailed instruction in the experimental group. This instruction included lectures, discussions and problem solving. During this process, the teacher used the blackboard and asked some questions related to the subject. Students also used a regular textbook. While the instructor explained the subject, the students listened to her and took notes. The instruction was accomplished in the same amount of time as for the experimental group.

Tarhan et al., 2013, p.194

So, it seems:


conditioncurriculumpedagogy
experimental[by inference: "traditional science and technology curriculum"]jigsaw cooperative learning activities developed by the researchers
control traditional science and technology curriculum
[the same content as for the experimental group to achieve the same learning objectives]
teacher-centred didactic lecture format:
instructor explained the subject and asked questions
controlled variableindependent variable
An experiment relies on control of variables and would not simultaneously vary both curriculum and pedagogy

The statement is helpful, but might be considered ambiguous as "this instruction which included lectures, discussions and problem solving" seems to relate to what had been "taught via detailed instruction in the experimental group".

But this seems incongruent with the wider textual context. The experimental group were taught by a jigsaw learning technique – not lectures, discussions and problem solving. Yet, for that matter, the experimental group were not taught via 'detailed instruction' if this means the teacher presenting the curriculum content. So, this phrasing seems unhelpfully confusing (to me, at least – presumably, the journal referees and editor thought this was clear enough.)

So, this probably means the "lectures, discussions and problem solving" were part of the control condition where "the teacher used the blackboard and asked some questions related to the subject. Students also used a regular textbook. While the instructor explained the subject, the students listened to her and took notes".

'Lectures' certainly fit with that description.

However, genuine 'discussion' work is a dialogic teaching method and would not seem to fit within a "teacher-centered didactic lecture format". But perhaps 'discussion' simply refers to how the "teacher used the blackboard and asked some questions" that members of the class were invited to answer?

Read about dialogic teaching

Writing-up research is a bit like teaching in that in presenting to a particular audience, one works with a mental model of what that audience already knowns and understands, and how they use specific terms, and this model is never likely to be perfectly accurate:

  • when teaching, the learners tend to let you know this, whereas,
  • when writing, this kind of immediate feedback is lacking.

Similarly, problem-solving would not seem to fit within a "teacher-centered didactic lecture format". 'Problem-solving' engages high level cognitive and metacognitive skills because a 'problem' is a task that students are not able to respond to simply by recalling what they have been told and applying learnt algorithms. Problem-solving requires planning and applying strategies to test out ideas and synthesise knowledge. Yet teachers and textbooks commonly refer to simple questions that simply test recall and comprehension, or direct application of learnt techniques, as 'problems' when they are better understood as 'exercises' as they do not pose authentic problems.

The imprecise use of terms that may be understood differently across diverse contexts is characteristic of educational discourse, so Tarhan and colleagues may have simply used the labels that are normally applied in the context where they are working. It should also be noted that as the researchers are based in Turkey they are presumably finding the best English translations they can for the terms used locally.

Read about the challenges of translation in research writing

So, it seems we have:


Experimental conditionin one of the conditions?Control condition
Jigsaw learning (set out in some detail in the paper) – an example of
cooperative learning – an active learning approach in which students work together in small groups
detailed instruction?
discussions (=teacher questioning?)
problem solving? (=practice exercises?)
teacher-centred didactic lecture format…the teacher used the blackboard and asked some questions…a regular textbook….the instructor explained the subject, the students listened and took notes
The independent variable – teaching methodology

The teacher variable

One of the major problems with some educational experiments comparing different teaching approaches is the confound of the teacher. If

  • class A is taught through approach 'a' by teacher 1, and
  • class B is taught through approach 'b' by teacher 2

then even if there is a good case that class A and class B start off as 'equivalent' in terms of readiness to learn about the focal topic then any differences in study outcomes could be as much down to different teachers (and we all know that different teachers are not equivalent!) as different teaching methodology.

At first sight this is easily solved by having the same teacher teach both classes (as in the study discussed here). That certainly seems to help. But, a little thought suggests it is not a foolproof approach (Taber, 2019).

Teachers inevitably have better rapport with some classes than others (even when those classes are shown to be technically 'equivalent') simply because that is the nature of how diverse personalities interact. 3 Even the most professional teachers find they prefer to teach some classes than others, enjoy the teaching more, and seem to get better results (even when the classes are supposed to be equivalent).

In an experiment, there is no reason why the teacher would work better with a class assigned the experimental condition; it might just as well be the control condition. However, this is still a confound and there is no obvious solution to this, except having multiple classes and teachers in each condition such that the statistics can offer guide on whether outcomes are sufficiently unlikely to be able to reasonable discount these types of effect.

Different teachers also have different styles and approaches and skills sets – so the same teacher will not be equally suited to every teaching approach and pedagogy. Again, this does not necessarily advantage the experimental condition, but, again, is something that can only be addressed by having a diverse range of teachers in each condition (Taber, 2019).

So, although we might expect having the same teacher teach both classes is the preferred approach, the same teacher is not exactly the same teacher in different classes or teaching in different ways.

And what do participants expect will happen?

Moreover, expectancy effects can be very influential in education. Expecting something to work, or not work, has been shown to have real effects on outcomes. It may not be true, as some motivational gurus like to pretend, that we can all of us achieve anything if only we believe: but we are more likely to be successful when we believe we can succeed. When confident, we tend to be more motivated, less easily deterred, and (given the human capacity for perceiving with confirmation bias) more likely to judge we are making good progress. So, any research design which communicates to teachers and students (directly, or through the teacher's or researcher's enthusiasm) an expectation of success in some innovation is more likely to lead to success. This is a potential confound that is not even readily addressed by having large numbers of classes and teachers (Taber, 2019)!

Read about expectancy effects

The authors report that

Before implementation of the study, all students and their families were informed about the aims of the study and the privacy of their personal information. Permission for their children attend the study was obtained from all families.

Tarhan et al., 2013, p.194

This is as it should be. School children are not data-fodder for researchers, and they should always be asked for, and give, voluntary informed consent when recruited to join a research project. However, researchers need to open and honest about their work, whilst also being careful about how they present their research aims. We can imagine a possible form of invitation,

We would like you to invite you to be part of a study where some of you will be subject to traditional learning through a teacher-centred didactic lecture format where the teacher will give you notes and ask you questions, and some of you will learn by a different approach that has been shown to enhance learning, promote positive attitudes and interests, develop communication skills, increase achievement, support higher level of reasoning and critical thinking skills, lead to deeper understanding of learned material…

An honest, but unhelpful, briefing for students and parents

If this was how the researchers understood the background to their study, then this would be a fair and honest briefing. Yet, this would clearly set up strong expectations in the student groups!

A suitable teacher

Tarhan and colleagues report that

"A teacher experienced in active learning was trained in how to implement the instruction based on jigsaw cooperative learning. The teacher and researchers discussed the instructional plans before implementing the activities."

Tarhan et al., 2013, p.189

So, the teacher who taught both classes, using an jigsaw cooperative learning in one class and a teacher-centred didactic lecture approach in the other was "experienced in active learning". So, it seems that

  • the researchers were already convinced that active learning approaches were far superior to teaching via a lecture approach
  • the teacher had experience in teaching though more engaging, effective student-centred active learning approaches

despite this, a control condition was set-up that required the teacher to, in effect, de-skill, and teach in a way the researchers were well aware research suggested was inferior, for the sake of carrying out an experiment to demonstrate in a specific context what had already been well demonstrated elsewhere.

In other words, it seems that one group of students were deliberately disadvantaged by asking an experienced and skilled teacher to teach in a way all concerned knew was sub-optimal, so as to provide a low base line that would be outperformed by the intervention, simply to replicate a much demonstrated finding. When seen in that way, this is surely unethical research.

The researchers may not have been consciously conceptualising their design in those terms, but it is hard to see this as a fair test of the jigsaw learning approach – it can show it is better than suboptimal teaching, but does not offer a comparison with an example of the kind of teaching that is recommended in the national context where the research took place.

Unethical, but not unusual

I am not seeking to pick out Tarhan and colleagues in particular for designing an unethical study, because they are not unique in adopting this approach (Taber, 2019): indeed, they are following a common formula (an experimental 'paradigm' in the sense the term is used in psychology).

Tarhan and colleagues have produced a study that is interesting and informative, and which seems well planned, and strongly-motivated when considered as part of tradition of such studies. Clearly, the referees and journal editor were not minded to question the procedure. The problem is that as a science education community we have allowed this tradition to continue such that a form of study that was originally genuinely open-ended (in that it examined under-researched teaching approaches of untested efficacy) has not been modified as published study after published study has slowly turned those untested teaching approaches into well-researched and repeatedly demonstrated approaches.

So much so, that such studies are now in danger of simply being rhetorical research – where (as in this case) the authors tell readers at the outset that it is already known that what they are going to test is widely shown to be effective good practice. Rhetorical research is set up to produce an expected result, and so is not authentic research. A real experiment tests a genuine hypothesis rather than demonstrates a commonplace. A question researchers might ask themselves could be

'how surprised would I be if this leads to a negative outcome'?

If the answer is

'that would be very surprising'

then they should consider modifying their research so it is likely to be more than minimally informative.

Finding out that jigsaw learning achieved learning objectives better/as well as/not so well as, say, P-O-E (predict-observe-explain) activities might be worth knowing: that it is better than deliberately constrained teaching does not tell us very much that is not obvious.

I do think this type of research design is highly questionable and takes unfair advantage of students. It fails to meet my suggested guideline that

  • researchers should never ask learners to participate in a study condition they have good reason to expect will damage their opportunities to learn

The problem of generalisation

Of course, one fair response is that despite all the claims of the superiority of constructivist, active, cooperatative (etc.) learning approaches, the diversity of educational contexts means we can not simply generalise from an experiment in one context and assume the results apply elsewhere.

Read about generalising from research

That is, the research literature shows us that jigsaw learning is an effective teaching approach, but we cannot be certain it will be effective in the particular context of teaching about chemical and physical changes to sixth grade students in a public elementary school in Izmir, Turkey.

Strictly that is true! But we should ask:

do we not know this because

  1. research shows a great variation in whether jigsaw learning is effective or not as it differs according to contexts and conditions
  2. although jigsaw learning has consistently been shown to be effective in many different contexts, no one has yet tested it in the specific case of teaching about chemical and physical changes to sixth grade students in a public elementary school in Izmir, Turkey

It seems clear from the paper that the researchers are presenting the second case (in which case the study would actually be of more interest and importance if had been found that in this context jigsaw learning was not effective).

Given there are very good reasons to expect a positive outcome, there seems no need to 'stack the odds' by using deliberately detrimental control conditions.

Even had situation 1 applied, it seems of limited value to know that jigsaw learning is more effective (in teaching about chemical and physical changes to sixth grade students in a public elementary school in Izmir, Turkey) than an approach we already recognise is suboptimal.

An ethical alternative

This does not mean that there is no value in research that explores well-established teaching approaches in new contexts. However, unless the context is very different from where the approach has already been widely demonstrated, there is little value in comparing it with approaches that are known to be sub-optimal (which in Turkey, a country where constructivist 'reform' teaching approaches are supposed to be the expected standard, seem to often be labelled as 'traditional').

Detailed case studies of the implementation of a reform pedagogy in new contexts that collect rich 'process' data to explore challenges to implementation and to identify especially effective specific practices would surely be more informative? 4

If researchers do feel the need to do experiments, then rather than comparing known-to-be-effective approaches with suboptimal approaches hoping to demonstrate what everyone already knows, why not use comparison conditions that really test the innovation. Of course jigsaw learning out performed lecturing in an elementary school – but how might it have compared with another constructivist approach?

I have described the constructivist science teacher as a kind of learning doctor. Like medical doctors, our first tenet should be to do no harm. So, if researchers want to set up experimental comparisons, they have a duty to try to set up two different approaches that they believe are likely to benefit the learners (whichever condition they are assigned to):

  • not one condition that advantages one group of students
  • and another which deliberately disadvantages another group of students for the benefit of a 'positive' research outcome.

If you already know the outcome then it is not genuine research – and you need a better research question.


Work cited:

Note:

1 Imagine teaching one class about acids by jigsaw learning, and teaching another about the nervous system by some other pedagogy – and then comparing the pedagogies by administering a test – about acids! The class in the jigsaw condition might well do better, without it being reasonable to assume this reflects more effective pedagogy.

So, I am tempted to read this as simply a drafting/typographical error that has been missed, and suspect the authors intended to refer to something like the traditional approach to teaching the science and technology curriculum. Otherwise the experiment is fatally flawed.

Yet, one purpose of the study was to find out

"Does jigsaw cooperative learning instruction contribute to a better conceptual understanding of 'physical and chemical changes' in sixth grade students compared to the traditional science and technology curriculum?"

Tarhan et al., 2013, p.187

This reads as if the researchers felt the curriculum was not sufficiently matched to what they felt were the most important learning objectives in the topic of physical and chemical changes, so they have undertaken some curriculum development, as well as designed a teaching unit accordingly, to be taught by jigsaw learning pedagogy. If so the experiment is testing

traditional curriculum x traditional pedagogy

vs.

reformed curriculum x innovative pedagogy

making it impossible to disentangle the two components.

This suggests the researchers are testing the combination of curriculum and pedagogy, and doing so with a test biased towards the experimental condition. This seems illogical, but I have actually worked in a project where we faced a similar dilemma. In the epiSTEMe project we designed innovative teaching units for lower secondary science and maths. In both physics units we incorporated innovative aspects to the curriculum.

  • In the forces unit material on proportionality was introduced, with examples (car stopping distance) normally not taught at that grade level (Y7);
  • In the electricity unit the normal physics content was embedded in an approach designed to teach aspects of the nature of science.

In the forces unit, the end-of-topic test included material that was included in the project-designed units, but unlikely to be taught in the control classes. There was evidence that on average students in the project classes did better on the test.

In the electricity unit, the nature of science objectives were not tested as these would not necessarily have been included in teaching control classes. On average, there was very little difference in learning about electrical circuits in the two conditions. There was however a very wide range of class performances – oddly just as wide in the experimental condition (where all classes had a common scheme of work, common activities, and common learning materials) as in the control condition where teachers taught the topic in their customary ways.


2 It could be read either as


1

ControlExperimental
The control group was instructed via a teacher-centered didactic lecture format. Throughout the lesson, the same science and technology teacher presented the same content as for the experimental group to achieve the same learning objectives, which were taught via detailed instruction in the experimental group.
…detailed instruction in the experimental group. This instruction included lectures, discussions and problem solving.
During this process, the teacher used the blackboard and asked some questions related to the subject. Students also used a regular textbook. While the instructor explained the subject, the students listened to her and took notes. The instruction was accomplished in the same amount of time as for the experimental group.
What was 'this instruction' which included lectures, discussions and problem solving?

or


2

ControlExperimental
The control group was instructed via a teacher-centered didactic lecture format. Throughout the lesson, the same science and technology teacher presented the same content as for the experimental group to achieve the same learning objectives, which were taught via detailed instruction in the experimental group.
…detailed instruction in the experimental group.
This [sic] instruction included lectures, discussions and problem solving. During this process, the teacher used the blackboard and asked some questions related to the subject. Students also used a regular textbook. While the instructor explained the subject, the students listened to her and took notes. The instruction was accomplished in the same amount of time as for the experimental group.
What was 'this instruction' which included lectures, discussions and problem solving?

3 A class, of course, is not a person, but a collection of people, so perhaps does not have a 'personality' as such. However, for teachers, classes do take on something akin to a personality.

This is not just an impression. It was pointed out above that if a researcher wants to treat each learner as a unit of analysis (necessary to use inferential statistics when only working with a small number of classes) then learners, not intact classes, should be assigned to conditions. However, even a newly formed class will soon develop something akin to a personality. This will certainly be influenced by individual learners present but develop through the history of their evolving mutual interactions and is not just a function of the sum of their individual characteristics.

So, even when a class is formed by random assignment of learners at the start of a study, it is still strictly questionable whether these students should be seen as independent units for analysis (Taber, 2019).


4 I suspect that science educators have a justified high regard for experimental method in the natural sciences, which sometimes blinkers us to its limitations in social contexts where there are myriad interacting variables and limited controls.

Read: Why do natural scientists tend to make poor social scientists?


Delusions of educational impact

A 'peer-reviewed' study claims to improve academic performance by purifying the souls of students suffering from hallucinations


Keith S. Taber


The research design is completely inadequate…the whole paper is confused…the methodology seems incongruous…there is an inconsistency…nowhere is the population of interest actually identified…No explanation of the discrepancy is provided…results of this analysis are not reported…the 'interview' technique used in the study is highly inadequate…There is a conceptual problem here…neither the validity nor reliability can be judged…the statistic could not apply…the result is not reported…approach is completely inappropriate…these tables are not consistent…the evidence is inconclusive…no evidence to demonstrate the assumed mechanism…totally unsupported claims…confusion of recommendations with findings…unwarranted generalisation…the analysis that is provided is useless…the research design is simply inadequate…no control condition…such a conclusion is irresponsible

Some issues missed in peer review for a paper in the European Journal of Education and Pedagogy

An invitation to publish without regard to quality?

I received an email from an open-access journal called the European Journal of Education and Pedagogy, with the subject heading 'Publish Fast and Pay Less' which immediately triggered the thought "another predatory journal?" Predatory journals publish submissions for a fee, but do not offer the editorial and production standards expected of serious research journals. In particular, they publish material which clearly falls short of rigorous research despite usually claiming to engage in peer review.

A peer reviewed journal?

Checking out the website I found the usual assurances that the journal used rigorous peer review as:

"The process of reviewing is considered critical to establishing a reliable body of research and knowledge. The review process aims to make authors meet the standards of their discipline, and of science in general.

We use a double-blind system for peer-reviewing; both reviewers and authors' identities remain anonymous to each other. The paper will be peer-reviewed by two or three experts; one is an editorial staff and the other two are external reviewers."

https://www.ej-edu.org/index.php/ejedu/about

Peer review is critical to the scientific process. Work is only published in (serious) research journals when it has been scrutinised by experts in the relevant field, and any issues raised responded to in terms of revisions sufficient to satisfy the editor.

I could not find who the editor(-in-chief) was, but the 'editorial team' of European Journal of Education and Pedagogy were listed as

  • Bea Tomsic Amon, University of Ljubljana, Slovenia
  • Chunfang Zhou, University of Southern Denmark, Denmark
  • Gabriel Julien, University of Sheffield, UK
  • Intakhab Khan, King Abdulaziz University, Saudi Arabia
  • Mustafa Kayıhan ErbaÅŸ, Aksaray University, Turkey
  • Panagiotis J. Stamatis, University of the Aegean, Greece

I decided to look up the editor based in England where I am also based but could not find a web presence for him at the University of Sheffield. Using the ORCID (Open Researcher and Contributor ID) provided on the journal website I found his ORCID biography places him at the University of the West Indies and makes no mention of Sheffield.

If the European Journal of Education and Pedagogy is organised like a serious research journal, then each submission is handled by one of this editorial team. However the reference to "editorial staff" might well imply that, like some other predatory journals I have been approached by (e.g., Are you still with us, Doctor Wu?), the editorial work is actually carried out by office staff, not qualified experts in the field.

That would certainly help explain the publication, in this 'peer-reviewed research journal', of the first paper that piqued my interest enough to motivate me to access and read the text.


The Effects of Using the Tazkiyatun Nafs Module on the Academic Achievement of Students with Hallucinations

The abstract of the paper published in what claims to be a peer-reviewed research journal

The paper initially attracted my attention because it seemed to about treatment of a medical condition, so I wondered was doing in an education journal. Yet, the paper seemed to also be about an intervention to improve academic performance. As I read the paper, I found a number of flaws and issues (some very obvious, some quite serious) that should have been spotted by any qualified reviewer or editor, and which should have indicated that possible publication should have been be deferred until these matters were satisfactorily addressed.

This is especially worrying as this paper makes claims relating to the effective treatment of a symptom of potentially serious, even critical, medical conditions through religious education ("a  spiritual  approach", p.50): claims that might encourage sufferers to defer seeking medical diagnosis and treatment. Moreover, these are claims that are not supported by any evidence presented in this paper that the editor of the European Journal of Education and Pedagogy decided was suitable for publication.


An overview of what is demonstrated, and what is claimed, in the study.

Limitations of peer review

Peer review is not a perfect process: it relies on busy human beings spending time on additional (unpaid) work, and it is only effective if suitable experts can be found that fit with, and are prepared to review, a submission. It is also generally more challenging in the social sciences than in the natural sciences. 1

That said, one sometimes finds papers published in predatory journals where one would expect any intelligent person with a basic education to notice problems without needing any specialist knowledge at all. The study I discuss here is a case in point.

Purpose of the study

Under the heading 'research objectives', the reader is told,

"In general, this journal [article?] attempts to review the construction and testing of Tazkiyatun Nafs [a Soul Purification intervention] to overcome the problem of hallucinatory disorders in student learning in secondary schools. The general objective of this study is to identify the symptoms of hallucinations caused by subtle beings such as jinn and devils among students who are the cause of disruption in learning as well as find solutions to these problems.

Meanwhile, the specific objective of this study is to determine the effect of the use of Tazkiyatun Nafs module on the academic achievement of students with hallucinations.

To achieve the aims and objectives of the study, the researcher will get answers to the following research questions [sic]:

Is it possible to determine the effect of the use of the Tazkiyatun Nafs module on the academic achievement of students with hallucinations?"

Awang, 2022, p.42

I think I can save readers a lot of time regarding the research question by suggesting that, in this study, at least, the answer is no – if only because the research design is completely inadequate to answer the research question. (I should point that the author comes to the opposite conclusion: e.g., "the approach taken in this study using the Tazkiyatun Nafs module is very suitable for overcoming the problem of this hallucinatory disorder", p.49.)

Indeed, the whole paper is confused in terms of what it is setting out to do, what it actually reports, and what might be concluded. As one example, the general objective of identifying "the symptoms of hallucinations caused by subtle beings such as jinn and devils" (but surely, the hallucinations are the symptoms here?) seems to have been forgotten, or, at least, does not seem to be addressed in the paper. 2


The study assumes that hallucinations are caused by subtle beings such as jinn and devils possessing the students.
(Image by Tünde from Pixabay)

Methodology

So, this seems to be an intervention study.

  • Some students suffer from hallucinations.
  • This is detrimental to their education.
  • It is hypothesised that the hallucinations are caused by supernatural spirits ("subtle beings that lead to hallucinations"), so, a soul purification module might counter this detriment;
  • if so, sufferers engaging with the soul purification module should improve their academic performance;
  • and so the effect of the module is being tested in the study.

Thus we have a kind of experimental study?

No, not according to the author. Indeed, the study only reports data from a small number of unrepresentative individuals with no controls,

"The study design is a case study design that is a qualitative study in nature. This study uses a case study design that is a study that will apply treatment to the study subject to determine the effectiveness of the use of the planned modules and study variables measured many times to obtain accurate and original study results. This study was conducted on hallucination disorders [students suffering from hallucination disorders?] to determine the effectiveness of the Tazkiyatun Nafs module in terms of aspects of student academic achievement."

Awang, 2022, p.42

Case study?

So, the author sees this as a case study. Research methodologies are better understood as clusters of similar approaches rather than unitary categories – but case study is generally seen as naturalistic, rather than involving an intervention by an external researcher. So, case study seems incongruous here. Case study involves the detailed exploration of an instance (of something of interest – a lesson, a school, a course of tudy, a textbook, …) reported with 'thick description'.

Read about the characteristics of case study research

The case is usually a complex phenomena which is embedded within a context from which is cannot readily be untangled (for example, a lesson always takes place within a wider context of a teacher working over time with a class on a course of study, within a curricular, and institutional, and wider cultural, context, all of which influence the nature of the specific lesson). So, due to the complex and embedded nature of cases, they are all unique.

"a case study is a study that is full of thoroughness and complex to know and understand an issue or case studied…this case study is used to gain a deep understanding of an issue or situation in depth and to understand the situation of the people who experience it"

Awang, 2022, p.42

A case is usually selected either because that case is of special importance to the researcher (an intrinsic case study – e.g., I studied this school because it is the one I was working in) or because we hope this (unique) case can tell us something about similar (but certainly not identical) other (also unique) cases. In the latter case [sic], an instrumental case study, we are always limited by the extent we might expect to be able to generalise beyond the case.

This limited generalisation might suggest we should not work with a single case, but rather look for a suitably representative sample of all cases: but we sometimes choose case study because the complexity of the phenomena suggests we need to use extensive, detailed data collection and analyses to understand the complexity and subtlety of any case. That is (i.e., the compromise we choose is), we decide we will look at one case in depth because that will at least give us insight into the case, whereas a survey of many cases will inevitably be too superficial to offer any useful insights.

So how does Awang select the case for this case study?

"This study is a case study of hallucinatory disorders. Therefore, the technique of purposive sampling (purposive sampling [sic]) is chosen so that the selection of the sample can really give a true picture of the information to be explored ….

Among the important steps in a research study is the identification of populations and samples. The large group in which the sample is selected is termed the population. A sample is a small number of the population identified and made the respondents of the study. A case or sample of n = 1 was once used to define a patient with a disease, an object or concept, a jury decision, a community, or a country, a case study involves the collection of data from only one research participant…

Awang, 2022, p.42

Of course, a case study of "a community, or a country" – or of a school, or a lesson, or a professional development programme, or a school leadership team, or a homework policy, or an enrichnment activity, or … – would almost certainly be inadequate if it was limited to "the collection of data from only one research participant"!

I do not think this study actually is "a case study of hallucinatory disorders [sic]". Leading aside the shift from singular ("a case study") to plural ("disorders"), the research does not investigate a/some hallucinatory disorders, but the effect of a soul purification module on academic performance. (Actually, spoiler alert  ðŸ˜‰, it does not actually investigate the effect of a soul purification module on academic performance either, but the author seems to think it does.)

If this is a case study, there should be the selection of a case, not a sample. Sometimes we do sample within a case in case study, but only from those identified as part of the case. (For example, if the case was a year group in a school, we may not have resources to interact in depth with several hundred different students). Perhaps this is pedantry as the reader likely knows what Awang meant by 'sample' in the paper – but semantics is important in research writing: a sample is chosen to represent a population, whereas the choice of case study is an acknowledgement that generalisation back to a population is not being claimed).

However, if "among the important steps in a research study is the identification of populations" then it is odd that nowhere in the paper is the population of interest actually specified!

Things slip our minds. Perhaps Awang intended to define the population, forgot, and then missed this when checking the text – buy, hey, that is just the kind of thing the reviewers and editor are meant to notice! Otherwise this looks very like including material from standard research texts to play lip-service to the idea that research-design needs to be principled, but without really appreciating what the phrases used actually mean. This impression is also given by the descriptions of how data (for example, from interviews) were analysed – but which are not reflected at all in the results section of the paper. (I am not accusing Awang of this, but because of the poor standard of peer review not raising the question, the author is left vulnerable to such an evaluation.)

The only one research participant?

So, what do we know about the "case or sample of n = 1 ", the "only one research participant" in this study?

The actual respondents in this case study related to hallucinatory disorders were five high school students. The supportive respondents in the case study related to hallucination disorders were five counseling teachers and five parents or guardians of students who were the actual respondents."

Awang, 2022, p.42

It is certainly not impossible that a case could comprise a group of five people – as long as those five make up a naturally bounded group – that is a group that a reasonable person would recognise as existing as a coherent entiy as they clearly had something in common (they were in the same school class, for example; they were attending the same group therapy session, perhaps; they were a friendship group; they were members of the same extended family diagnosed with hallucinatory disorders…something!) There is no indication here of how these five make up a case.

The identification of the participants as a case might have made sense had the participants collectively undertaken the module as a group, but the reader is told: "This study is in the form of a case study. Each practice and activity in the module are done individually" (p.50). Another justification could have been if the module had been offered in one school, and these five participants were the students enrolled in the programme at that time but as "analysis of  the  respondents'  academic  performance  was conducted  after  the  academic  data  of  all  respondents  were obtained  from  the  respective  respondent's  school" (p.45) it seems they did not attend a single school.

The results tables and reports in the text refer to "respondent 1" to "respondent 4". In case study, an approach which recognises the individuality and inherent value of the particular case, we would usually assign assumed names to research participants, not numbers. But if we are going to use numbers, should there not be a respondent 5?

The other one research participant?

It seems that these is something odd here.

Both the passage above, and the abstract refer to five respondents. The results report on four. So what is going on? No explanation of the discrepancy is provided. Perhaps:

  • There only ever were four participants, and the author made a mistake in counting.
  • There only ever were four participants, and the author made a typographical mistake (well, strictly, six typographical mistakes) in drafting the paper, and then missed this in checking the manuscript.
  • There were five respondents and the author forgot to include data on respondent 5 purely by accident.
  • There were five respondents, but the author decided not to report on the fifth deliberately for a reason that is not revealed (perhaps the results did not fit with the desired outcome?)

The significant point is not that there is an inconsistency but that this error was missed by peer reviewers and the editor – if there ever was any genuine peer review. This is the kind of mistake that a school child could spot – so, how is it possible that 'expert reviewers' and 'editorial staff' either did not notice it, or did not think it important enough to query?

Research instruments

Another section of the paper reports the instrumentation used in the paper.

"The research instruments for this study were Takziyatun Nafs modules, interview questions, and academic document analysis. All these instruments were prepared by the researcher and tested for validity and reliability before being administered to the selected study sample [sic, case?]."

Awang, 2022, p.42

Of course, it is important to test instruments for validity and reliability (or perhaps authenticity and trustworthiness when collecting qualitative data). But it is also important

  • to tell the reader how you did this
  • to report the outcomes

which seems to be missing (apart from in regard to part of the implemented module – see below). That is, the reader of a research study wants evidence not simply promises. Simply telling readers you did this is a bit like meeting a stranger who tells you that you can trust them because they (i.e., say that they) are honest.

Later the reader is told that

"Semi- structured interview questions will be [sic, not 'were'?] developed and validated for the purpose of identifying the causes and effects of hallucinations among these secondary school students…

…this interview process will be [sic, not 'was'] conducted continuously [sic!] with respondents to get a clear and specific picture of the problem of hallucinations and to find the best solution to overcome this disorder using Islamic medical approaches that have been planned in this study

Awang, 2022, pp.43-44

At the very least, this seems to confuse the plan for the research with a report of what was done. (But again, apparently, the reviewers and editorial staff did not think this needed addressing.) This is also confusing as it is not clear how this aspect of the study relates to the intervention. Were the interviews carried out before the intervention to help inform the design of the modules (presumably not as they had already been "tested for validity and reliability before being administered to the selected study sample"). Perhaps there are clear and simple answers to such questions – but the reader will not know because the reviewers and editor did not seem to feel they needed to be posed.

If "Interviews are the main research instrument in this study" (p.43), then one would expect to see examples of the interview schedules – but these are not presented. The paper reports a complex process for analysing interview data, but this is not reflected in the findings reported. The readers is told that the six stage process leads to the identifications and refinement of main and sub-categories. Yet, these categories are not reported in the paper. (But, again, peer reviewers and the editor did not apparently raise this as something to be corrected.) More generally "data  analysis  used  thematic  analysis  methods" (p.44), so why is there no analysis presented in terms of themes? The results of this analysis are simply not reported.

The reader is told that

"This  interview  method…aims to determine the respondents' perspectives, as well as look  at  the  respondents'  thoughts  on  their  views  on  the issues studied in this study."

Awang, 2022, p.44

But there is no discussion of participants perspectives and views in the findings of the study. 2 Did the peer reviewers and editor not think this needed addressing before publication?

Even more significantly, in a qualitative study where interviews are supposedly the main research instrument, one would expect to see extracts from the interviews presented as part of the findings to support and exemplify claims being made: yet, there are none. (Did this not strike the peer reviewers and editor as odd: presumably they are familiar with the norms of qualitative research?)

The only quotation from the qualitative data (in this 'qualitative' study) I can find appears in the implications section of the paper:

"Are you aware of the importance of education to you? Realize. Is that lesson really important? Important. The success of the student depends on the lessons in school right or not? That's right"

Respondent 3: Awang, 2022, p.49

This seems a little bizarre, if we accept this is, as reported, an utterance from one of the students, Respondent 3. It becomes more sensible if this is actually condensed dialogue:

"Are you aware of the importance of education to you?"

"Realize."

"Is that lesson really important?"

"Important."

"The success of the student depends on the lessons in school right or not?"

"That's right"

It seems the peer review process did not lead to suggesting that the material should be formatted according to the norms for presenting dialogue in scholarly texts by indicating turns. In any case, if that is typical of the 'interview' technique used in the study then it is highly inadequate, as clearly the interviewer is leading the respondent, and this is more an example of indoctrination than open-ended enquiry.

Random sampling of data

Completely incongruous with the description of the purposeful selection of the participants for a case study is the account of how the assessment data was selected for analysis:

"The  process  of  analysis  of  student  achievement documents is carried out randomly by taking the results of current  examinations  that  have  passed  such  as the  initial examination of the current year or the year before which is closest  to  the  time  of  the  study."

Awang, 2022, p.44

Did the peer reviewers or editor not question the use of the term random here? It is unclear what is meant to by 'random' here, but clearly if the analysis was based on randomly selected data that would undermine the results.

Validating the soul purification module

There is also a conceptual problem here. The Takziyatun Nafs modules are the intervention materials (part of what is being studied) – so they cannot also be research instruments (used to study them). Surely, if the Takziyatun Nafs modules had been shown to be valid and reliable before carrying out the reported study, as suggested here, then the study would not be needed to evaluate their effectiveness. But, presumably, expert peer reviewers (if there really were any) did not see an issue here.

The reliability of the intervention module

The Takziyatun Nafs modules had three components, and the author reports the second of the three was subjected to tests of validity and reliability. It seems that Awang thinks that this demonstrates the validity and reliability of the complete intervention,

"The second part of this module will go through [sic] the process of obtaining the validity and reliability of the module. Proses [sic] to obtain this validity, a questionnaire was constructed to test the validity of this module. The appointed specialists are psychologists, modern physicians (psychiatrists), religious specialists, and alternative medicine specialists. The validity of the module is identified from the aspects of content, sessions, and activities of the Tazkiyatun Nafs module. While to obtain the value of the reliability coefficient, Cronbach's alpha coefficient method was used. To obtain this Cronbach's alpha coefficient, a pilot test was conducted on 50 students who were randomly selected to test the reliability of this module to be conducted."

Awang, 2022, pp.43-44

Now to unpack this, it may be helpful to briefly outline what the intervention involved (as as the paper is open access anyone can access and read the full details in the report).


From the MGM film 'A Night at the Opera' (1935): "The introduction of the module will elaborate on the introduction, rationale, and objectives of this module introduced"

The description does not start off very helpfully ("The introduction of the module will elaborate on the introduction, rationale, and objectives of this module introduced" (p.43) put me in mind of the Marx brothers: "The party of the first part shall be known in this contract as the party of the first part"), but some key points are,

"the Tazkiyatun Nafs module was constructed to purify the heart of each respondent leading to the healing of hallucinatory disorders. This liver purification process is done in stages…

"the process of cleansing the patient's soul will be done …all the subtle beings in the patient will be expelled and cleaned and the remnants of the subtle beings in the patient will be removed and washed…

The second process is the process of strengthening and the process of purification of the soul or heart of the patient …All the mazmumah (evil qualities) that are in the heart must be discarded…

The third process is the process of enrichment and the process of distillation of the heart and the practices performed. In this process, there will be an evaluation of the practices performed by the patient as well as the process to ensure that the patient is always clean from all the disturbances and disturbances [sic] of subtle beings to ensure that students will always be healthy and clean from such disturbances…

Awang, 2022, p.45, p.43

Quite how this process of exorcising and distilling and cleansing will occur is not entirely clear (and if the soul is equated with the heart, how is the liver involved?), but it seems to involve reflection and prayer and contemplation of scripture – certainly a very personal and therapeutic process.

And yet its validity and reliability was tested by giving a questionnaire to 50 students randomly selected (from the unspecified population, presumably)? No information is given on how a random section was made (Taber, 2013) – which allows a reader to be very sceptical that this actually was a random sample from the (un?)identified population, and not just an arbitrary sample of 50 students. (So, that is twice the word 'random' is used in the paper when it seems inappropriate.)

It hardly matters here, as clearly neither the validity nor the reliability of a spiritual therapy can be judged from a questionnaire (especially when administered to people who have never undertaken the therapy). In any case, the "reliability coefficient" obtained from an administration of a questionnaire ONLY applies to that sample on that occasion. So, the statistic could not apply to the four participants in the study. And, in any case, the result is not reported, so the reader has no idea what the value of Cronbach's alpha was (but then, this was described as a qualitative study!)

Moreover, Cronbach's alpha only indicates the internal coherence of the items on a scale (Taber, 2019): so, it only indicates whether the set of questions included in the questionnaire seem to be accessing the same underlying construct in motivating the responses of those surveyed across the set of items. It gives no information about the reliability of the instrument (i.e., whether it would give the same results on another occasion).

This approach to testing validity and reliability is then completely inappropriate and unhelpful. So, even if the outcomes of the testing had been reported (and they are not) they would not offer any relevant evidence. Yet it seems that peer reviewers and editor did not think to question why this section was included in the paper.

Ethical issues

A study of this kind raises ethical issues. It may well be that the research was carried out in an entirely proper and ethical manner, but it is usual in studies with human participants ('human subjects') to make this clear in the published report (Taber, 2014b). A standard issue is whether the participants gave voluntary, informed, consent. This would mean that they were given sufficient information about the study at the outset to be able to decide if they wished to participate, and were under no undue pressure to do so. The 'respondents' were school students: if they were considered minors in the research context (and oddly for a 'case study' such basic details as age and gender are not reported) then parental permission would also be needed, again subject to sufficient briefing and no duress.

However, in this specific research there are also further issues due to the nature of the study. The participants were subject to medical disorders, so how did the researcher obtain information about, and access to, the students without medical confidentiality being broken? Who were the 'gatekeepers' who provided access to the children and their personal data? The researcher also obtained assessment data "from  the  class  teacher  or  from  the  Student Affairs section of the student's school" (p.44), so it is important to know that students (and parents/guardians) consented to this. Again, peer review does not seem to have identified this as an issue to address before publication.

There is also the major underlying question about the ethics of a study when recognising that these students were (or could be, as details are not provided) suffering from serious medical conditions, but employing religious education as a treatment ("This method of treatment is to help respondents who suffer from hallucinations caused by demons or subtle beings", p.44). Part of the theoretical framework underpinning the study is the assumption that what is being addressed is"the problem of hallucinations caused by the presence of ethereal beings…" (p.43) yet it is also acknowledged that,

"Hallucinatory disorders in learning that will be emphasized in this study are due to several problems that have been identified in several schools in Malaysia. Such disorders are psychological, environmental, cultural, and sociological disorders. Psychological disorders such as hallucinatory disorders can lead to a more critical effect of bringing a person prone to Schizophrenia. Psychological disorders such as emotional disorders and psychiatric disorders. …Among the causes of emotional disorders among students are the school environment, events in the family, family influence, peer influence, teacher actions, and others."

Awang, 2022, p.41

There seem to be three ways of understanding this apparent discrepancy, which I might gloss:

  1. there are many causes of conditions that involve hallucinations, including, but not only, possession by evil or mischievousness spirits;
  2. the conditions that lead to young people having hallucinations may be understood at two complementary levels, at a spiritual level in terms of a need for inner cleansing and exorcising of subtle beings, and in terms of organic disease or conditions triggered by, for example, social and psychological factors;
  3. in the introduction the author has relied on various academic sources to discuss the nature of the phenomenon of students having hallucinations, but he actually has a working assumption that is completely different: hallucinations are due to the presence of jinn or other spirits.

I do not think it is clear which of these positions is being taken by the study's author.

  1. In the first case it would be necessary to identify which causes are present in potential respondents and only recruit those suffering possession for this study (which does not seem to have been done);
  2. In the second case, spiritual treatment would need to complement medical intervention (which would completely undermine the validity of the study as medical treatments for the underlying causes of hallucinations are likely to be the cause of hallucinations ceasing, not the tested intervention);
  3. The third position is clearly problematic in terms of academic scholarship as it is either completely incompetent or deliberately disregards academic norms that require the design of a study to reflect the conceptual framework set out to motivate it.

So, was this tested intervention implemented instead of or alongside formal medical intervention?

  • If it was alongside medical treatment, then that raises a major confound for the study.
  • Yet it would clearly be unacceptable to deny sufferers indicated medical treatment in order to test an educational intervention that is in effect a form of exorcism.

Again, it may be there are simple and adequate responses to these questions (although here I really cannot see what they might be), but unfortunately it seems the journal referees and editor did not think to ask for them.  

Findings


Results tables presented in Awang, 2022 (p.45) [Published with a creative commons licence allowing reproduction]: "Based on the findings stated in Table I show that serial respondents experienced a decline in academic achievement while they face the problem of hallucinations. In contrast to Table II which shows an improvement in students' academic achievement  after  hallucinatory  disorders  can  be  resolved." If we assume that columns in the second table have been mislabelled, then it seems the school performance of these four students suffered while they were suffering hallucinations, but improved once they recovered. From this, we can infer…?

The key findings presented concern academic performance at school. Core results are presented in tables I and II. Unfortunately these tables are not consistent as they report contradictory results for the academic performance of students before and during periods when they had hallucinations.

They can be made consistent if the reader assumes that two of the columns in table II are mislabelled. If the reader assumes that the column labelled 'before disruption' actually reports the performance 'during disruption' and that the column actually labelled 'during disruption' is something else, then they become consistent. For the results to tell a coherent story and agree with the author's interpretation this 'something else' presumably should be 'after disruption'.

This is a very unfortunate error – and moreover one that is obvious to any careful reader. (So, why was it not obvious to the referees and editor?)

As well as looking at these overall scores, other assessment data is presented separately for each of respondent 1 – respondent 4. Theses sections comprise presentations of information about grades and class positions, mixed with claims about the effects of the intervention. These claims are not based on any evidence and in many cases are conclusions about 'respondents' in general although they are placed in sections considering the academic assessment data of individual respondents. So,there are a number of problems with these claims:

  • they are of the nature of conclusions, but appear in the section presenting the findings;
  • they are about the specific effects of the intervention that the author assumes has influenced academic performance, not the data analysed in these sections;
  • they are completely unsubstantiated as no data or analysis is offered to support them;
  • often they make claims about 'respondents' in general, although as part of the consideration of data from individual learners.

Despite this, the paper passed peer-review and editorial scrutiny.

Rhetorical research?

This paper seems to be an example of a kind of 'rhetorical research' where a researcher is so convinced about their pre-existant theoretical commitments that they simply assume they have demonstrated them. Here the assumption seem to be:

  1. Recovering from suffering hallucinations will increase student performance
  2. Hallucinations are caused by jinn and devils
  3. A spiritual intervention will expel jinn and devils
  4. So, a spiritual intervention will cure hallucinations
  5. So, a spiritual intervention will increase student performance

The researcher provided a spiritual intervention, and the student performance increased, so it is assumed that the scheme is demonstrated. The data presented is certainly consistent with the assumption, but does not in itself support this scheme without evidence. Awang provides evidence that student performance improved in four individuals after they had received the intervention – but there is no evidence offered to demonstrate the assumed mechanism.

A gardener might think that complimenting seedlings will cause them to grow. Perhaps she praises her seedlings every day, and they do indeed grow. Are we persuaded about the efficacy of her method, or might we suspect another cause at work? Would the peer-reveiewers and editor of the European Journal of Education and Pedagogy be persuaded this demonstrated that compliments cause plant growth? On the evidence of this paper, perhaps they would.

This is what Awang tells readers about the analysis undertaken:

Each student  respondent  involved  in  this  study  [sic, presumably not, rather the researcher] will  use  the analysis  of  the  respondent's  performance  to  determine the effect of hallucination disorders on student achievement in secondary school is accurate.

The elements compared in this analysis are as follows: a) difference in mean percentage of achievement by subject, b) difference in grade achievement by subject and c) difference in the grade of overall student achievement. All academic results of the respondents will be analyzed as well as get the mean of the difference between the  performance  before, during, and after the  respondents experience  hallucinations. 

These  results  will  be  used  as research material to determine the accuracy of the use of the Tazkiyatun  Nafs  Module  in  solving  the  problem  of hallucinations   in   school   and   can   improve   student achievement in academic school."

Awang, 2022, p.45

There is clearly a large jump between the analysis outlined in the second paragraph here, and testing the study hypotheses as set out in the final paragraph. But the author does not seem to notice this (and more worryingly, nor do the journal's reviewers and editor).

So interleaved into the account of findings discussing "mean percentage of achievement by subject…difference in grade achievement by subject…difference in the grade of overall student achievement" are totally unsupported claims. Here is an example for Respondent 1:

"Based on the findings of the respondent's achievement in the  grade  for  Respondent  1  while  facing  the  problem  of hallucinations  shows  that  there  is  not  much  decrease  or deterioration  of  the  respondent's  grade.  There  were  only  4 subjects who experienced a decline in grade between before and  during  hallucination  disorder.  The  subjects  that experienced  decline  were  English,  Geography,  CBC, and Civics.  Yet  there  is  one  subject  that  shows  a  very  critical grade change the Civics subject. The decline occurred from grade A to grade E. This shows that Civics education needs to be given serious attention in overcoming this problem of decline. Subjects experiencing this grade drop were subjects involving  emotion,  language,  as  well  as  psychomotor fitness.  In  the  context  of  psychology,  unstable  emotional development  leads  to  a  decline  in the psychomotor  and emotional development of respondents.

After  the  use  of  the  Tazkiyatun  Nafs  module  in overcoming  this  problem,  hallucinatory  disorders  can  be overcome.  This  situation  indicates  the  development  of  the respondents  during  and  after  experiencing  hallucinations after  practicing  the  Tazkiyatun  Nafs  module.  The  process that takes place in the Tzkiyatun Nafs module can help the respondent  to  stabilize  his  emotions  and  psyche  for  the better. From the above findings there were 5 subjects who experienced excellent improvement in grades. The increase occurred in English, Malay, Geography, and Civics subjects. The best improvement is in the subject of Civic education from grade E to grade B. The improvement in this language subject  shows  that  the  respondents'  emotions  have stabilized.  This  situation  is  very  positive  and  needs  to  be continued for other subjects so that respondents continue to excel in academic achievement in school.""

Awang, 2022, p.45 (emphasis added)

The material which I show here as underlined is interjected completely gratuitously. It does not logically fit in the sequence. It is not part of the analysis of school performance. It is not based on any evidence presented in this section. Indeed, nor is it based on any evidence presented anywhere else in the paper!

This pattern is repeated in discussing other aspects of respondents' school performance. Although there is mention of other factors which seem especially pertinent to the dip in school grades ("this was due to the absence of the  respondents  to  school  during  the  day  the  test  was conducted", p.46; "it was an increase from before with no marks due to non-attendance at school", p.46) the discussion of grades is interspersed with (repetitive) claims about the effects of the intervention for which no evidence is offered.


Respondent 1Respondent 2Respondent 3Respondent 4
§: Differences in Respondents' Grade Achievement by Subject"After the use of the Tazkiyatun Nafs module in overcoming this problem, hallucinatory disorders can be overcome. This situation indicates the development of the respondents during and after experiencing hallucinations after practicing the Tazkiyatun Nafs module. The process that takes place in the Tzkiyatun Nafs module can help the respondent to stabilize his emotions and psyche for the better." (p.45)"After the use of the Tazkiyatun Nafs module as a soul purification module, showing the development of the respondents during and after experiencing hallucination disorders is very good. The process that takes place in the Tzkiyatun Nafs module can help the respondent to stabilize his emotions and psyche for the better." (p.46)"The process that takes place in the Tazkiyatun Nafs module can help the respondent to stabilize his emotions and psyche for the better" (p.46)"The process that takes place in the Tazkiyatun Nafs module can help the respondent to stabilize his emotions and psyche for the better." (p.46)
§:Differences in Respondent Grades according to Overall Academic Achievement"Based on the findings of the study after the hallucination
disorder was overcome showed that the development of the respondents was very positive after going through the treatment process using the Tazkiyatun Nafs module…In general, the use of Tazkiyatun Nafs module successfully changed the learning lifestyle and achievement of the respondents from poor condition to good and excellent achievement.
" (pp.46-7)
"Based on the findings of the study after the hallucination disorder was overcome showed that the development of the respondents was very positive after going through the treatment process using the Tazkiyatun Nafs module. … This excellence also shows that the respondents have recovered from hallucinations after practicing the methods found in the Tazkiayatun Nafs module that has been introduced.
In general, the use of the Tazkiyatun Nafs module successfully changed the learning lifestyle and achievement of the respondents from poor condition to good and excellent achievement
." (p.47)
"Based on the findings of the study after the hallucination disorder was overcome showed that the development of the respondents was very positive after going through the treatment process using the Tazkiyatun Nafs module…In general, the use of the Tazkiyatun Nafs module successfully changed the learning lifestyle and achievement of the respondents from poor condition to good and excellent achievement." (p.47)"Based on the findings of the study after the hallucination disorder was overcome showed that the development of the respondents was very positive after going through the treatment process using the Tazkiyatun Nafs module…In general, the use of the Tazkiyatun Nafs module has successfully changed the learning lifestyle and achievement of the respondents from poor condition to good and excellent achievement." (p.47)
Unsupported claims made within findings sections reporting analyses of individual student academic grades: note (a) how these statements included in the analysis of individual school performance data from four separate participants (in a case study – a methodology that recognises and values diversity and individuality) are very similar across the participants; (b) claims about 'respondents' (plural) are included in the reports of findings from individual students.

Awang summarises what he claims the analysis of 'differences in respondents' grade achievement by subject' shows:

"The use of the Tazkiyatun Nafs module in this study helped the students improve their respective achievement grades. Therefore, this soul purification module should be practiced by every student to help them in stabilizing their soul and emotions and stay away from all the disturbances of the subtle beings that lead to hallucinations"

Awang, 2022, p.46

And, on the next page, Awang summarises what he claims the analysis of 'differences in respondent grades according to overall academic achievement' shows:

"The use of the Tazkiyatun Nafs module in this study helped the students improve their respective overall academic achievement. Therefore, this soul purification module should be practiced by every student to help them in stabilizing the soul and emotions as well as to stay away from all the disturbances of the subtle beings that lead to hallucination disorder."

Awang, 2022, p.47

So, the analysis of grades is said to demonstrate the value of the intervention, and indeed Awang considers this is reason to extend the intervention beyond the four participants, not just to others suffering hallucinations, but to "every student". The peer review process seems not to have raised queries about

  • the unsupported claims,
  • the confusion of recommendations with findings (it is normal to keep to results in a findings section), nor
  • the unwarranted generalisation from four hallucination suffers to all students whether healthy or not.

Interpreting the results

There seem to be two stories that can be told about the results:

When the four students suffered hallucinations, this led to a deterioration in their school performance. Later, once they had recovered from the episodes of hallucinations, their school performance improved.  

Narrative 1

Now narrative 1 relies on a very substantial implied assumption – which is that the numbers presented as school performance are comparable over time. So, a control would be useful: such as what happened to the performance scores of other students in the same classes over the same time period. It seems likely they would not have shown the same dip – unless the dip was related to something other than hallucinations – such as the well-recognised dip after long school holidays, or some cultural distraction (a major sports tournament; fasting during Ramadan; political unrest; a pandemic…). Without such a control the evidence is suggestive (after all, being ill, and missing school as a result, is likely to lead to a dip in school performance, so the findings are not surprising), but inconclusive.

Intriguingly, the author tells readers that "student  achievement  statistics  from  the  beginning  of  the year to the middle of the current [sic, published in 2022] year in secondary schools in Northern Peninsular Malaysia that have been surveyed by researchers show a decline (Sabri, 2015 [sic])" (p.42), but this is not considered in relation to the findings of the study.

When the four students suffered hallucinations, this led to a deterioration in their school performance. Later, as a result of undergoing the soul purification module, their school performance improved.  

Narrative 2

Clearly narrative 2 suffers from the same limitation as narrative 1. However, it also demands an extra step in making an inference. I could re-write this narrative:

When the four students suffered hallucinations, this led to a deterioration in their school performance. Later, once they had recovered from the episodes of hallucinations, their school performance improved. 
AND
the recovery was due to engagement with the soul purification module.

Narrative 2'.

That is, even if we accept narrative 1 as likely, to accept narrative 2 we would also need to be convinced that:

  • a) sufferers from medical conditions leading to hallucinations do not suffer periodic attacks with periods of remission in between; or
  • b) episodes of hallucinations cannot be due to one-off events (emotional trauma, T.I.A. {transient ischaemic attack or mini-strokes},…) that resolve naturally in time; or
  • c) sufferers from medical conditions leading to hallucinations do not find they resolve due to maturation; or
  • d) the four participants in this study did not undertaken any change in life-style (getting more sleep, ceasing eating strange fungi found in the woods) unrelated to the intervention that might have influenced the onset of hallucinations; or
  • e) the four participants in this study did not receive any medical treatment independent of the intervention (e.g., prescribed medication to treat migraine episodes) that might have influenced the onset of hallucinations

Despite this study being supposedly a case study (where the expectation is there should be 'thick description' of the case and its context), there is no information to help us exclude such options. We do not know the medical diagnoses of the conditions causing the participants' hallucinations, or anything about their lives or any medical treatment that may have been administered. Without such information, the analysis that is provided is useless for answering the research question.

In effect, regardless of all the other issues raised, the key problem is that the research design is simply inadequate to test the research question. But it seems the referees and editor did not notice this shortcoming.

Alleged implications of the research

After presenting his results Awang draws various implications, and makes a number of claims about what had been found in the study:

  • "After the students went through the treatment session by using the Tazkiayatun Nafsmodule to treat hallucinations, it showed a positive effect on the student respondents. All this was certified by the expert, the student's parents as well as the  counselor's  teacher." (p.48)
  • "Based on these findings, shows that hallucinations are very disturbing to humans and the appropriate method for now to solve this problem is to use the Tazkiyatun Nafs Module." (p.48)
  • "…the use of the Tazkiyatun Nafs module while the  respondent  is  suffering  from  hallucination  disorder  is very  appropriate…is very helpful to the respondents in restoring their minds and psyche to be calmer and healthier. These changes allow  students  to  focus  on  their  studies  as  well  as  allow them to improve their academic performance better." (p.48)
  • "The use of the Tazkiyatun Nafs Module in this study has led to very positive changes there are attitudes and traits of students  who  face  hallucinations  before.  All  the  negative traits  like  irritability, loneliness,  depression,etc.  can  be overcome  completely." (p.49)
  • "The personality development of students is getting better and perfect with the implementation of the Tazkiaytun Nafs module in their lives." (p.49)
  • "Results  indicate that  students  who  suffer  from  this hallucination  disorder are in  a  state  of  high  depression, inactivity, fatigue, weakness and pain,and insufficient sleep." (p.49)
  • "According  to  the  findings  of  this study,  the  history  of  this  hallucination  disorder  started in primary  school  and  when  a  person  is  in  adolescence,  then this  disorder  becomes  stronger  and  can  cause  various diseases  and  have  various  effects  on  a  person who  is disturbed." (p.50)

Given the range of interview data that Awang claims to have collected and analysed, at least some of the claims here are possibly supported by the data. However, none of this data and analysis is available to the reader. 2 These claims are not supported by any evidence presented in the paper. Yet peer reviewers and the editor who read the manuscript seem to feel it is entirely acceptable to publish such claims in a research paper, and not present any evidence whatsoever.

Summing up

In summary: as far as these four students were concerned (but not perhaps the fifth participant?), there did seem to be a relationship between periods of experiencing hallucinations and lower school performance (perhaps explained by such factors as "absenteeism to school during the day the test was conducted" p.46) ,

"the performance shown by students who face chronic hallucinations is also declining and  declining.  This  is  all  due  to  the  actions  of  students leaving the teacher's learning and teaching sessions as well as  not  attending  school  when  this  hallucinatory  disorder strikes.  This  illness or  disorder  comes  to  the  student suddenly  and  periodically.  Each  time  this  hallucination  disease strikes the student causes the student to have to take school  holidays  for  a  few  days  due  to  pain  or  depression"

Awang, 2022, p.42

However,

  • these four students do not represent any wider population;
  • there is no information about the specific nature, frequency, intensity, etcetera, of the hallucinations or diagnoses in these individuals;
  • there was no statistical test of significance of changes; and
  • there was no control condition to see if performance dips were experienced by others not experiencing hallucinations at the same time.

Once they had recovered from the hallucinations (and it is not clear on what basis that judgement was made) their scores improved.

The author would like us to believe that the relief from the hallucinations was due to the intervention, but this seems to be (quite literally) an act of faith 3 as no actual research evidence is offered to show that the soul purification module actually had any effect. It is of course possible the module did have an effect (whether for the conjectured or other reasons – such as simply offering troubled children some extra study time in a calm and safe environment and special attention – or because of an expectancy effect if the students were told by trusted authority figures that the intervention would lead to the purification of their hearts and the healing of their hallucinatory disorder) but the study, as reported, offers no strong grounds to assume it did have such an effect.

An irresponsible journal

As hallucinations are often symptoms of organic disease affecting blood supply to the brain, there is a major question of whether treating the condition by religious instruction is ethically sound. For example, hallucinations may indicate a tumour growing in the brain. Yet, if the module was only a complement to proper medical attention, a reader may prefer to suspect that any improvement in the condition (and consequent increased engagement in academic work) may have been entirely unrelated to the module being evaluated.

Indeed, a published research study that claims that soul purification is a suitable treatment for medical conditions presenting with hallucinations is potentially dangerous as it could lead to serious organic disease going untreated. If Awang's recommendations were widely taken up in Malaysia such that students with serious organic conditions were only treated for their hallucinations by soul purification rather than with medication or by surgery it would likely lead to preventable deaths. For a research journal to publish a paper with such a conclusion, where any qualified reviewer or editor could easily see the conclusion is not warranted, is irresponsible.

As the journal website points out,

"The process of reviewing is considered critical to establishing a reliable body of research and knowledge. The review process aims to make authors meet the standards of their discipline, and of science in general."

https://www.ej-edu.org/index.php/ejedu/about

So, why did the European Journal of Education and Pedagogy not subject this submission to meaningful review to help the author of this study meet the standards of the discipline, and of science in general?


Work cited:

Notes:

1 In mature fields in the natural sciences there are recognised traditions ('paradigms', 'disciplinary matrices') in any active field at any time. In general (and of course, there will be exceptions):

  • at any historical time, there is a common theoretical perspective underpinning work in a research programme, aligned with specific ontological and epistemological commitments;
  • at any historical time, there is a strong alignment between the active theories in a research programme and the acceptable instrumentation, methodology and analytical conventions.

Put more succinctly, in a mature research field, there is generally broad agreement on how a phenomenon is to be understood; and how to go about investigating it, and how to interpret data as research evidence.

This is generally not the case in educational research – which is in part at least due to the complexity and, so, multi-layered nature, of the phenomena studied (Taber, 2014a): phenomena such as classroom teaching. So, in reviewing educational papers, it is sometimes necessary to find different experts to look at the theoretical and the methodological aspects of the same submission.


2 The paper is very strange in that the introductory sections and the conclusions and implications sections have a very broad scope, but the actual research results are restricted to a very limited focus: analysis of school test scores and grades.

It is as if as (and could well be that) a dissertation with a number of evidential strands has been reduced to a paper drawing upon only one aspect of the research evidence, but with material from other sections of the dissertation being unchanged from the original broader study.


3 Readers are told that

"All  these  acts depend on the sincerity of the medical researcher or fortune-teller seeking the help of Allah S.W.T to ensure that these methods and means are successful. All success is obtained by the permission of Allah alone"

Awang, 2022, p.43


The mystery of the disappearing authors

Original image by batian lu from Pixabay 

Can an article be simultaneously out of scope, and limited in scope?

Keith S. Taber

Not only had two paragraphs from the abstract gone missing, along with the figures, but the journal article had also lost two-thirds of its authors.

I have been reading some papers in a journal that I believed, on the basis of its misleading title and website details, was an example of a poor-quality 'predatory journal'. That is, a journal which encourages submissions simply to be able to charge a publication fee (currently $1519, according to the website), without doing the proper job of editorial scrutiny. I wanted to test this initial evaluation by looking at the quality of some of the work published.

Although the journal is called the Journal of Chemistry: Education Research and Practice (not to be confused, even if the publishers would like it to be, with the well-established journal Chemistry Education Research and Practice) only a few of the papers published are actually education studies.

One of the articles that IS on an educational topic is called 'An overview of the first year Undergraduate Medical Students [sic] Feedback on the Point of Care Ultrasound Curriculum' (Mohialdin, 2018a), by Vian Mohialdin, an
Associate Professor of Pathology and Molecular Medicine at McMaster University in Ontario.

A single-authored paper by Prof. Mohialdin

Review articles

Research journals tend to distinguish between different types of articles, and most commonly:

  • papers that report empirical studies,
  • articles which set out theoretical perspectives/positions, and
  • articles that offer reviews of the existing literature on a topic.

'An overview of the first year Undergraduate Medical Students Feedback on the Point of Care Ultrasound Curriculum' is classified as a review article.

A review article?

Typically, review articles cite a good deal of previous literature. Prof. Mohialdin cites a modest number of previous publications – just 10. Now one might suspect that perhaps the topic of point-of-care ultrasound in undergraduate medical education is a fairly specialist topic, and perhaps even a novel topic, in which case there may not be much literature to review. But a review of ultrasound in undergraduate medical education published a year earlier (Feilchenfeld, Dornan, Whitehead & Kuper, 2017) cited over a hundred works.

Actually a quick inspection of Mohialdin's paper reveals it is not a review article at all, as it reports a single empirical study. Either the journal has misclassified the article, or the author submitted it as a review article and the journal did not query this. To be fair, the journal website does note that classification into article types "is subjective to some degree". 1

So, is it a good study?

Not a full paper

Well, that is not easy to evaluate as the article is less than two pages in length whereas most research studies in education are much more substantial. Even the abstract of the article seems lacking (see the table below, left hand column). An abstract of a research paper is usually expected to very briefly report something about the research sample/population (who participated in the study?); the research design/methodology (is it an experiment, a survey…), and the results (what did the researchers find out?) The abstract of Prof. Mohialdin's paper misses all these points and so tells readers nothing about the research.

The main text also lacks some key information. The study is a type of research report that is sometimes called a 'practice paper' – the article reports some teaching innovation carried out by practitioners in their own teaching context. The text does give some details of what the practice was – but simply writing about practice is not usually considered sufficient for a research paper. At the least, there needs to be some evaluation of the innovation.

The research design for the evaluation is limited to two sentences under the section heading 'Conclusion/Result Result'. (Mohialdin, 2018a, p.1)

Here there has been some evaluation, but the report is very sketchy, and so might seem inadequate for a research report. Under a rather odd section heading, the reader is informed,

"A questionnaire was handed to the first year undergraduate medical students at the end of session four, to evaluate their hands on ultrasound session experience."

Mohialdin, 2018a, p.1

That one sentence comprises the account of data collection.

The questionnaire is not reproduced for readers. Nor is it described (how many questions, what kinds of questions?) Nor is its development reported. There is not any indication of how many of the 150 students in the population completed the questionnaire, whether ethical procedures were followed 2, where the students completed the questionnaire (for example, was this undertaken in a class setting where participants were being observed by the teaching staff, or did they take it away with them "at the end of session four" to complete in private?) or whether they were able to respond anonymously (rather than have their teachers be able to identify who made which responses).

Perhaps there are perfectly appropriate responses to these questions – but as the journal peer reviewers and editor do not seem to have asked, the reader is left in the dark.

Invisible analytical techniques

Similarly, details of the analysis undertaken are, again, sketchy. A reader is told:

"Answers were collected and data was [sic] analyzed into multiple graphs (as illustrated on this poster)."

Mohialdin, 2018a, p.1

Now that sounds promising, except either the author forgot to submit the graphs with the text, or the journal somehow managed to lose them in production. 3 (And as I've found out, even the most prestigious and well established publishers can lose work they have accepted for publication!)

So, readers are left with no idea what questions were asked, nor what responses were offered, that led to the graphs – that are not provided.

There were also comments – presumably [sic – it would be good to be told] in response to open-ended items on the questionnaire.

"The comments that we [sic, not I] got from this survey were mainly positive; here are a few of the constructive comments that we [sic] received:…

We [sic] also received some comments about recommendations and
ways to improve the sessions (listed below):…"

Mohialdin, 2018a, 1-2.

A reader might ask who decided which comments should be counted as positive (e.g., was it a rater independent of the team who implemented the innovation?), and what does 'mainly' mean here (e.g., 90 of 100 responses? 6 of 11?).

So, in summary, there is no indication of what was asked, who exactly responded, or how the analysis was carried out. As the Journal of Chemistry: Education Research and Practice claims to be a peer reviewed journal one might expect reviewers to have recommended at least that such information (along with the missing graphs) should be included before publication might be considered.

There is also another matter that one would expect peer reviewers, and especially the editor, to have noticed.

Not in scope

Research journals usually have a scope – a range of topics they publish articles on. This is normally made clear in the information on journal websites. Despite its name, the Journal of Chemistry: Education Research and Practice does not restrict itself to chemistry education, but invites work on all aspects of the chemical sciences, and indeed most of its articles are not educational.

Outside the scope of the journal? (Original Image by Magnascan from Pixabay )

But 'An overview of the first year Undergraduate Medical Students Feedback on the Point of Care Ultrasound Curriculum' is not about chemistry education or chemistry in a wider sense. Ultrasound diagnostic technology falls under medical physics, not a branch of chemistry. And, more pointedly, teaching medical students to use ultrasound to diagnose medical conditions falls under medical education – as the reference to 'Medical Students' in the article title rather gives away. So, it is odd that this article was published where it was, as it should have been rejected from this particular journal as being out of scope.

Despite the claims of Journal of Chemistry: Education Research and Practice to be a peer reviewed journal (that means that all submissions are supposedly sent out to, and scrutinised and critiqued by, qualified experts on the topic who make recommendations about whether something is sufficient quality for publication, and, if so, whether changes should be made first – like perhaps including graphs that are referred to, but missing), the editor managed to decide the submission should be published just seven days after it was submitted for consideration.

The chemistry journal accepted the incomplete report of the medical education study, to be described as a review article, one week after submission.

The journal article as a truncated conference poster?

The reference to "multiple graphs (as illustrated on this poster)" (my emphasis) suggested that the article was actually the text (if not the figures) of a poster presented at a conference, and a quick search revealed that Mohialdin, Wainman and Shali had presented on 'An overview of the first year Undergraduate Medical Students Feedback on the Point of Care Ultrasound Curriculum' at an experimental biology (sic, not chemistry) conference.

A poster at a conference is not considered a formal publication, so there is nothing inherently wrong with publishing the same material in a journal – although often posters report either quite provisional or relatively inconsequential work so it is unusual for the text of a poster to be considered sufficiently rigorous and novel to justify appearing in a research journal in its original form. It is notable that despite being described by Prof. Mohialdin as a 'preliminary' study, the journal decided it was of publishable quality.

Although norms vary between fields, it is generally the case that a conference poster is seen as something quite different from a journal article. There is a limited amount of text and other material that can be included on a poster if it is to be readable. Conferences often have poster sessions where authors are invited to stand by their poster and engage with readers – so anyone interested can ask follow-up questions to supplement the often limited information given on the poster itself.

By contrast, a journal article has to stand on its own terms (as the authors cannot be expected to pop round for a conversation when you decide to read it). It is meant to present an argument for some new knowledge claim(s): an argument that depends on the details of the research conceptualisation, design, and data analysis. So what may seem as perfectly adequate in a poster may well not be sufficient to satisfy journal peer review.

The abstract of the conference poster was published in a journal (Mohialdin, Wainman & Shali, 2018) and I have reproduced that abstract in the table below, in the right hand column.


Mohialdin, 2018a
(Journal paper)
Mohialdin, Wainman & Shali, 2018
(Conference poster)
With the technological progress of different types of portable Ultrasound machines, there is a growing demand by all health care providers to perform bedside Ultrasonography, also known as Point of Care Ultrasound (POCUS). This technique is becoming extremely useful as part of the Clinical Skills/Anatomy teaching in the undergraduate Medical School Curriculum.With the technological progress of different types of portable Ultrasound machines, there is a growing demand by all health care providers to perform bedside Ultrasonography, also known as Point of Care Ultrasound (POCUS). This technique is becoming extremely useful as part of the Clinical Skills/Anatomy teaching in the undergraduate Medical School Curriculum.
Teaching/training health care providers how to use these portable Ultrasound machines can complement their physical examination findings and help in a more accurate diagnosis, which leads to a faster and better improvement in patient outcomes. In addition, using portable Ultrasound machines can add more safety measurements to every therapeutic/diagnostic procedure when it is done under an Ultrasound guide. It is also considered as an extra tool in teaching Clinical Anatomy to Medical students. Using an Ultrasound is one of the different imaging modalities that health care providers depend on to reach their diagnosis, while also being the least invasive method.Teaching/training health care providers how to use these portable Ultrasound machines can complement their physical examination findings and help in a more accurate diagnosis, which leads to a faster and better improvement in patient outcomes. In addition, using portable Ultrasound machines can add more safety measurements to every therapeutic/diagnostic procedure when it is done under an Ultrasound guide. It is also considered as an extra tool in teaching Clinical Anatomy to Medical students. Using an Ultrasound is one of the different imaging modalities that health care providers depend on to reach their diagnosis, while also being the least invasive method.
We thought investing in training the undergraduate Medical students on the basic Ultrasound scanning skills as part of their first year curriculum will help build up the foundation for their future career.We thought investing in training the undergraduate Medical students on the basic Ultrasound scanning skills as part of their first year curriculum will help build up the foundation for their future career.
The research we report in this manuscript is a preliminary qualitative study. And provides the template for future model for teaching a hand on Ultrasound for all health care providers in different learning institutions.
A questionnaire was handed to the first year medical students to evaluate their hands on ultrasound session experience. Answers were collected and data was [sic] analyzed into multiple graphs.
Abstracts from Mohialdin's paper, plus the abstract from co-authored work presented at the Experimental Biology 2018 Meeting according to the journal of the Federation of American Societies for Experimental Biology. (See note 4 for another version of the abstract.)

The abstract includes some very brief information about what the researchers did (which is strangely missing from the journal article's abstract). Journals usually put limits on the word count for abstracts. Surely the poster's abstract was not considered too long for the journal, so someone (the author? the editor?) simply dropped the final two paragraphs – that is, arguably the two most relevant paragraphs for readers?

The lost authors?

Not only had two paragraphs from the abstract gone missing, along with the figures, but the journal article had also lost two-thirds of its authors.

A poster with multiple authors

Now in the academic world authorship of research reports is not an arbitrary matter (Taber, 2018). An author is someone who has made a substantial intellectual contribution to the work (regardless of how much of the writing-up they undertake, or whether they are present when work is presented at a conference). That is a simple principle, which unfortunately may lead to disputes as it needs to be interpreted when applied; but, in most academic fields, there are conventions regarding what kind of contribution is judged significant and substantive enough for authorship

It may well be that Prof. Mohialdin was the principal investigator on this study and that the contributions of Prof. Wainman and Prof. Shali were more marginal, and so it was not obvious whether or not they should be considered authors when reporting the study. But it is less easy to see how they qualified for authorship on the poster but not on the journal article with the same title which seems (?) to be the text of the poster (i.e., describes itself as being the poster). [It is even more difficult to see how they could be authors of the poster when it was presented at one conference, but not when it was presented somewhere else. 4]

Of course, one trivial suggestion might be to suggest that Wainman and Shali contributed the final two paragraphs of the abstract, and the graphs, and that without these the – thus reduced – version in the journal only deserved one author according to the normal academic authorship conventions. That is clearly not an acceptable rationale as academic studies have to be understood more holistically than that!

Perhaps Wainman and Shali asked to have their names left off the paper as they did not want to be published in a journal of chemistry that would publish a provisional and incomplete account of a medical education practice study classified as a review article. Maybe they suspected that this would hardly enhance their scholarly reputations?

Work cited:
  • Feilchenfeld, Z., Dornan, T., Whitehead, C., & Kuper, A. (2017). Ultrasound in undergraduate medical education: a systematic and critical review. Medical Education. 51: 366-378. doi: 10.1111/medu.13211
  • Mohialdin, V. (2018a) An overview of the first year Undergraduate Medical Students Feedback on the Point of Care Ultrasound Curriculum. Journal of Chemistry: Education Research and Practice, 2 (2), 1-2.
  • Mohialdin, V. (2018b). An overview of the first year undergraduate medical students feedback on the point of care ultrasound curriculum. Journal of Health Education Research & Development, 6, 30.
  • Mohialdin, V., Wainman, B. & Shali, A. (2018) An overview of the first year Undergraduate Medical Students Feedback on the Point of Care Ultrasound Curriculum. The FASIB Journal. 32 (S1: Experimental Biology 2018 Meeting Abstracts), 636.4
  • Taber, K. S. (2013). Classroom-based Research and Evidence-based Practice: An introduction (2nd ed.). London: Sage.
  • Taber, K. S. (2018). Assigning Credit and Ensuring Accountability. In P. A. Mabrouk & J. N. Currano (Eds.), Credit Where Credit Is Due: Respecting Authorship and Intellectual Property (Vol. 1291, pp. 3-33). Washington, D.C.: American Chemical Society. [The publisher appears to have made this open access]

Footnotes:

1 The following section appears as part of the instructions for authors:

"Article Types

Journal of Chemistry: Education Research and Practice accepts Original Articles, Review, Mini Review, Case Reports, Editorial, and Letter to the Editor, Commentary, Rapid Communications and Perspectives, Case in Images, Clinical Images, and Conference Proceedings.

In general the Manuscripts are classified in to following [sic] groups based on the criteria noted below [I could not find these]. The author(s) are encouraged to request a particular classification upon submitting (please include this in the cover letter); however the Editor and the Associate Editor retain the right to classify the manuscript as they see fit, and it should be understood by the authors that this process is subjective to some degree. The chosen classification will appear in the printed manuscript above the manuscript title."

https://opastonline.com/journal/journal-of-chemistry-education-research-and-practice/author-guidelines

2 The ethical concerns in this kind of research are minimal, and in an area like medical education one might feel there is a moral imperative for future professionals to engage in activities to innovate and to evaluate such innovations. However, there is a general principle that all participants in research should give voluntary, informed consent.

(Read about Research Ethics here).

According to the policy statement on the author's (/authors'?) University's website (Research involving human participants, Sept. 2002) at the time of this posting (November, 2021) McMaster University "endorses the ethical principles cited in the Tri-Council Policy Statement: Ethical Conduct for Research Involving Humans (1998)".

According to Article 2.1 of that document, Research Ethics Board Review is required for any research involving "living human participants". There are some exemptions, including (Article 2.5): "Quality assurance and quality improvement studies, program evaluation activities, and performance reviews, or testing within normal educational requirements when used exclusively for assessment, management or improvement purposes" (my emphasis).

My reading then is that this work would not have been subject to requiring approval following formal ethical review if it had been exclusively used for internal purposes, but that publication of the work as research means it should have been subject to Research Ethics Board Review before being carried out. This is certainly in line with advice to teachers who invite their own students to participate in research into their teaching that may be reported later (in a thesis, at a conference, etc.) (Taber, 2013, pp.244-248).


3 Some days ago, I wrote to the Journal of Chemistry: Education Research and Practice (in reply to an invitation to publish in the journal), with a copy of the email direct to the editor, asking where I could find the graphs referred to in this paper, but have not yet had a response. If I do get a reply I will report this in the comments below.


4 Since drafting this post, I have found another publication with the same title published in an issue of another journal reporting conference proceedings (Mohialdin, 2018b):

A third version of the publication (Mohialdin, 2018b).

The piece begins with the same material as in the table above. It ends with the following account of empirical work:

A questionnaire was handed to the first year undergraduate medical students at the end of session four, to evaluate their hands on ultrasound session experience. Answers were collected and data was [sic] analyzed into multiple graphs. The comments that we [sic] got from this survey were mainly positive; here are a few of the constructive comments that we [sic] received: This was a great learning experience; it was a great learning opportunity; very useful, leaned [sic] a lot; and loved the hand on experience.

Mohialdin, 2018b, p.30

There is nothing wrong with the same poster being presented at multiple conferences and this is quite a common academic strategy. Mohialdin (2018b) reports from a conference in Japan, whereas Mohialdin, Wainman, Shali (2018) refers to a US meeting – but it is not clear why the author list is different as the two presentations would seem to report the same research – indeed, it seems reasonable to assume from the commonality of Mohialdin, 2018b) with Mohialdin, Wainman, Shali, 2018 that they are the same report (poster).

Profs. Wainman and Shali should be authors of any report of this study if, and only if, they made substantial intellectual contributions to the work reported – and, surely, either they did, or they did not.

Responding to a misconception about my own teaching

Keith S. Taber

There are many postings here about things that learners said, and so presumably thought, about curriculum topics that would likely surprise, if not shock, the teachers who had taught them those topics. I am certainly not immune from being misunderstood. Today, I reflect on how someone seems to have understood some of my own teaching, and indeed seriously objected to it.

When I have called-out academic malpractice in this blog the targets have usually been conference organisers or journal administrators using misleading (or downright dishonest) techniques, or publishers mistreating authors. I feel somewhat uneasy about publicly contradicting a junior scholar. However, I also do not appreciate being publicly described as deliberately misleading a student, as has happened here, and my direct challenge to the blog author was rejected.

The accusation

A while back some Faculty colleagues referred me to a blog that included the following comments:

In the Faculty of Education students pursuing the MPhil or PhD take a research ethics lecture that presents the Tuskegee Syphilis Study as ethically sound, but only up to the year 1947 when penicillin was actively being used to treat syphilis. According to the Cambridge lecturer, that's the point when the study became unethical.

When I interrupted his lecture to object to his presentation, I was told by that lecturer that he'd never received any objections in his many, many years of teaching the same slides on the same course. That was not true. He knew and the Faculty knows and yet that false information continues to be disseminated to students, many of whom will go on to complete research in developing countries where their only reference for their ethical or unethical behavior is this lecture.

I am not named, but virtually anyone in my Faculty, or having taken graduate studies there in the last few years, would surely know who was being discussed. As is pointed out in our Educational Research course, and the Research Methods strand of other graduate courses, if you want to avoid someone being identified in your writing, it is not enough to not name them. I can be fairly confident the author of the comments above should have known that: it is a point made in the very lecture being criticised.

This blog posting seems to have received quite a lot of attention among students at the University Faculty where I worked. Yet the two claims here are simply not correct. The teaching is seriously misrepresented, and I certainly did not lie to this student.

The blog invited me to 'Leave a Reply', so I did. My comments were subject to moderation – and the next morning I found a response in my email in-box. My comments would not be posted, and the claims would not be amended: I was welcome to post my reply elsewhere, but not at the site where I was being criticised. So, here goes:

The (rejected) reply

I hope you are well.

I was directed to your blog by a group of scholars in the Faculty (Of Education at Cambridge). It is an impressive blog. However, I was rather surprised by some of what you have posted. I was the lecturer you refer to in your posting who taught the lecture on research ethics. I do indeed remember you interrupting me when I was presenting the Tuskagee syphilis study as an example of unethical research. I always encouraged students to participate in class, and would have welcomed your input at the end of my treatment of that example.

However, having read your comments here, I do need to challenge your account. I do not consider that the Tuskegee syphilis study was initially ethically sound, and I do not (and did not) teach that. I certainly did make the point that even if the study had been ethical until antibiotics were widely available, continuing it beyond that point would have been completely unjustifiable. But that was certainly not the only reason the study was unethical. Perhaps this would have been clearer if you had let me finish my own comments before interjecting – but even so I really do not understand how you could have interpreted the teaching that way.

Scheme (an annotated version of 'the ethical field', Taber, 2013a, Figure 9.1) used to summarise ethical issues in the Tuskegee syphilis study in my Educational Research lecture on ethical considerations of research.

The reference to 1947 in the posting quoted above relates to the 'continue' issue under research quality – the research (which involved medical staff periodically observing, but not treating, diseased {black, poor, mostly illiterate} men who had not been told of the true nature of their condition) was continued even when effective, safe treatment was available and any claims to the information being collected having potential to inform medical practice became completely untenable.

I may well have commented that no one had ever raised any objections to the presentation when I had given the lecture on previous occasions over a number of years – because that is true. No one had previously raised any concerns with me regarding my teaching of this example (or any aspect of the lecture as far as I can recall). I am not sure why you seem to so confidently assume otherwise: regarding this, you are simply wrong.

Usually in that lecture I would present a brief account of the Milgram 'learning' experiment, which would often lead to extended discussion about the ethical problems of that research in relation to its motivation and what was usefully learnt from it. Then, later in the session, I would talk about the Tuskegee study, which normally passed without comment. I had always assumed that was because the study is so obviously and seriously problematic that no one would see any reason to disagree with my critique. Then I would go on to discuss other issues and studies. I can assure you that no one had previously, before you, raised any concerns about my teaching of this example with me. If anyone in earlier cohorts had any concerns about this example they would have been welcome to talk to me about them – either in class, or privately afterwards. No one ever did.

I have no reason to believe that colleagues at Cambridge are deliberately disseminating false information to students, but then I do not audit other teaching officers' lectures, and I cannot speak for them. However, I can speak for myself, just as you rightly speak up for yourself. I have certainly always taken care to do my best not to teach things that are not the case. Of course, as a school and college science teacher I was often teaching models and simplifications, and not the 'whole' truth, but that is the nature of pedagogy, and is something we should make clear to learners (i.e., that they are being taught models and simplifications that can later in their studies be developed through more sophisticated treatments).

In a similar way, I used simplifications and models in my research methods lectures at Cambridge – for example, in terms of the 'shape' of a research project, or contrasting paradigms, or types of qualitative analysis, and so on, but would make explicit to the class that this is what they were: 'teaching models'. I entered the teaching profession to make a positive difference; to help learners develop, and to acquire new understandings and perspectives and skills; not to misinform people. I very much suspect that on occasions I must have got some things wrong, but, if so, such errors would always have been honest mistakes. I have never knowingly taught something that I thought was untrue.

So, whilst I admire your courage in standing up for what you believe, and I certainly wish you well, what you have written is not correct, and I trust my response will be posted so that your inaccurate remarks will not go unchallenged. I suspect that you are not being deliberately untruthful (you accuse me of telling you something I knew was not true: I try to be charitable and give people the benefit of doubt, so I would like to think that you were writing your comments in good faith), but I do not understand how you managed to come to the interpretation of my teaching that you did, and wish that you would have at least heard me out before interrupting the class, as that may have clarified my position for you. The Tuskegee syphilis study was a racist, unethical study that misled and abused some of those people with the lowest levels of economic and political power in society: people (not just the men subjected to the study, but also their families) who were betrayed by those employed by the public health service that they trusted (and should have been able to trust) to look after their interests. I do not see how anyone could consider it an ethically sound study, and I struggle to see why you would think anyone could.

Your claim that I lied about not having previously received complaints about my teaching of this topic before is simply untrue – it is a falsehood that I hope you will be prepared to correct.

What should a 'constructivist' teacher make of this?

I should be careful about criticising a student for thinking I was teaching something quite different from what I thought I was teaching. I have spent much of my career telling other teachers that learners will make sense of our teaching in terms of the interpretive resources they have available, and so they may interpret our teaching in unexpected ways. Learners will always be biased to understand in terms of their expectations and past experiences. We see it all the time in science teaching, as many of the posts here demonstrate.

I have described learning as being an incremental, interpretive, and so iterative, process and not a simple transfer of understanding (Taber, 2014). Teaching (indeed communication) is always a representation of thinking in a publicly accessible form (speech, gesture, text, diagrams {what sense does the figure above make out of the context of the lecture?}, models, etc.) – and whatever meaning may have informed the production of the representation, the representation itself does not have or contain meaning: the person accessing that presentation has to impose their own interpretation to form a meaning (Taber, 2013b). After teaching and writing about these ideas, I would be a hypocrite to claim that a learner could not misinterpret my own teaching as I can communicate perfectly to a room full of students from all around the world with different life experiences and varied disciplinary backgrounds!

Even so, I am still struggling to understand the interpretation put on my teaching in this case, despite going back to revisit the teaching materials a number of times. Most of the points I was making must have been completed disregarded to think I did not consider the study, which ran from 1932 to 1972 (Jones, 1993) unethical until 1947. So, even for someone who claims to be a constructivist teacher and knows there is always a risk of learners misconceiving teaching, this example seems an extreme case.

The confident claim that it was not true that I had not received previous complaints about my teaching of this example is even harder to understand. It is at least a good reminder for me not to assume I know what students are thinking or that they know what I am thinking, or can readily access the intended meaning in my teaching. I've made those points to others enough times, so I will try to see this incident as a useful reminder to follow my own advice.

Sources cited:

A case of hybrid research design?

When is "a case study" not a case study? Perhaps when it is (nearly) an experiment?

Keith S. Taber

I read this interesting study exploring learners shifting conceptions of the particulate nature of gases.

Mamombe, C., Mathabathe, K. C., & Gaigher, E. (2020). The influence of an inquiry-based approach on grade four learners' understanding of the particulate nature of matter in the gaseous phase: a case study. EURASIA Journal of Mathematics, Science and Technology Education, 16(1), 1-11. doi:10.29333/ejmste/110391

Key features:

  • Science curriculum context: the particulate nature of matter in the gaseous phase
  • Educational context: Grade 4 students in South Africa
  • Pedagogic context: Teacher-initiated inquiry approach (compared to a 'lecture' condition/treatment)
  • Methodology: "qualitative pre-test/post-test case study design" – or possibly a quasi-experiment?
  • Population/sample: the sample comprised 116 students from four grade four classes, two from each of two schools

This study offers some interesting data, providing evidence of how students represent their conceptions of the particulate nature of gases. What most intrigued me about the study was its research design, which seemed to reflect an unusual hybrid of quite distinct methodologies.

In this post I look at whether the study is indeed a case study as the authors suggest, or perhaps a kind of experiment. I also make some comments about the teaching model of the states of matter presented to the learners, and raise the question of whether the comparison condition (lecturing 8-9 year old children about an abstract scientific model) is appropriate, and indeed ethical.

Learners' conceptions of the particulate nature of matter

This paper is well worth reading for anyone who is not familiar with existing research (such as that cited in the paper) describing how children make sense of the particulate nature of matter, something that many find counter-intuitive. As a taster for this, I reproduce here two figures from the paper (which is published open access under a creative common license* that allows sharing and adaption of copyright material with due acknowledgement).

Figures © 2020 by the authors of the cited paper *

Conceptions are internal, and only directly available to the epistemic subject, the person holding the conception. (Indeed, some conceptions may be considered implicit, and so not even available to direct introspection.) In research, participants are asked to represent their understandings in the external 'public space' – often in talk, here by drawing (Taber, 2013). The drawings have to be interpreted by the researchers (during data analysis). In this study the researchers also collected data from group work during learning (in the enquiry condition) and by interviewing students.

What kind of research design is this?

Mamombe and colleagues describe their study as "a qualitative pre-test/post-test case study design with qualitative content analysis to provide more insight into learners' ideas of matter in the gaseous phase" (p. 3), yet it has many features of an experimental study.

The study was

"conducted to explore the influence of inquiry-based education in eliciting learners' understanding of the particulate nature of matter in the gaseous phase"

p.1

The experiment compared two pedagogical treatments :

  • "inquiry-based teaching…teacher-guided inquiry method" (p.3) guided by "inquiry-based instruction as conceptualized in the 5Es instructional model" (p.5)
  • "direct instruction…the lecture method" (p.3)

These pedagogic approaches were described:

"In the inquiry lessons learners were given a lot of materials and equipment to work with in various activities to determine answers to the questions about matter in the gaseous phase. The learners in the inquiry lessons made use of their observations and made their own representations of air in different contexts."

"the teacher gave probing questions to learners who worked in groups and constructed different models of their conceptions of matter in the gaseous phase. The learners engaged in discussion and asked the teacher many questions during their group activities. Each group of learners reported their understanding of matter in the gaseous phase to the class"

p.5, p.1

"In the lecture lessons learners did not do any activities. They were taught in a lecturing style and given all the notes and all the necessary drawings.

In the lecture classes the learners were exposed to lecture method which constituted mainly of the teacher telling the learners all they needed to know about the topic PNM [particulate nature of matter]. …During the lecture classes the learners wrote a lot of notes and copied a lot of drawings. Learners were instructed to paste some of the drawings in their books."

pp.5-6

The authors report that,

"The learners were given clear and neat drawings which represent particles in the gaseous, liquid and solid states…The following drawing was copied by learners from the chalkboard."

p.6
Figure used to teach learners in the 'lecture' condition. Figure © 2020 by the authors of the cited paper *
A teaching model of the states of matter

This figure shows increasing separation between particles moving from solid to liquid to gas. It is not a canonical figure, in that the spacing in a liquid is not substantially greater than in a solid (indeed, in ice floating on water the spacing is greater in the solid), whereas the difference in spacing in the two fluid states is under-represented.

Such figures do not show the very important dynamic aspect: that in a solid particles can usually only oscillate around a fixed position (a very low rate of diffusion not withstanding), where in a liquid particles can move around, but movement is restricted by the close arrangement of (and intermolecular forces between) the particles, where in a gas there is a significant mean free path between collisions where particles move with virtually constant velocity. A static figure like this, then, does not show the critical differences in particle interactions which are core to the basic scientific model

Perhaps even more significant, figure 2 suggests there is the same level of order in the three states, whereas the difference in ordering between a solid and liquid is much more significant than any change in particle spacing.

In teaching, choices have to be made about how to represent science (through teaching models) to learners who are usually not ready to take on board the full details and complexity of scientific knowledge. Here, Figure 2 represents a teaching model where it has been decided to emphasise one aspect of the scientific model (particle spacing) by distorting the canonical model, and to neglect other key features of the basic scientific account (particle movement and arrangement).

External teachers taught the classes

The teaching was undertaken by two university lecturers

"Two experienced teachers who are university lecturers and well experienced in teacher education taught the two classes during the intervention. Each experienced teacher taught using the lecture method in one school and using the teacher-guided inquiry method in the other school."

p.3

So, in each school there was one class taught by each approach (enquiry/lecture) by a different visiting teacher, and the teachers 'swapped' the teaching approaches between schools (a sensible measure to balance possible differences between the skills/styles of the two teachers).

The research design included a class in each treatment in each of two schools

An experiment; or a case study?

Although the study compared progression in learning across two teaching treatments using an analysis of learner diagrams, the study also included interviews, as well as learners' "notes during class activities" (which one would expect would be fairly uniform within each class in the 'lecture' treatment).

The outcome

The authors do not consider their study to be an experiment, despite setting up two conditions for teaching, and comparing outcomes between the two conditions, and drawing conclusions accordingly:

"The results of the inquiry classes of the current study revealed a considerable improvement in the learners' drawings…The results of the lecture group were however, contrary to those of the inquiry group. Most learners in the lecture group showed continuous model in their post-intervention results just as they did before the intervention…only a slight improvement was observed in the drawings of the lecture group as compared to their pre-intervention results"

pp.8-9

These statements can be read in two ways – either

  • a description of events (it just happened that with these particular classes the researchers found better outcomes in the enquiry condition), or
  • as the basis for a generalised inference.

An experiment would be designed to test a hypothesis (this study does not seem to have an explicit hypothesis, nor explicit research questions). Participants would be assigned randomly to conditions (Taber, 2019), or, at least, classes would be randomly assigned (although then strictly each class should be considered as a single unit of analysis offering much less basis for statistical comparisons). No information is given in the paper on how it was decided which classes would be taught by which treatment.

Representativeness

A study could be carried out with the participation of a complete population of interest (e.g., all of the science teachers in one secondary school), but more commonly a sample is selected from a population of interest. In a true experiment, the sample has to be selected randomly from the population (Taber, 2019) which is seldom possible in educational studies.

The study investigated a sample of 'grade four learners'

In Mamombe and colleagues' study the sample is described. However, there is no explicit reference to the population from which the sample is drawn. Yet the use of the term 'sample' (rather than just, say, 'participants') implies that they did have a population in mind.

The aim of the study is given as to "to explore the influence of inquiry-based education in eliciting learners' understanding of the particulate nature of matter in the gaseous phase" (p.1) which could be considered to imply that the population is 'learners'. The title of the paper could be taken to suggest the population of interests is more specific: "grade four learners". However, the authors make no attempt to argue that their sample is representative of any particular population, and therefore have no basis for statistical generalisation beyond the sample (whether to learners, or to grade four learners, or to grade four learners in RSA, or to grade four learners in farm schools in RSA, or…).

Indeed only descriptive statistics are presented: there is no attempt to use tests of statistical significance to infer whether the difference in outcomes between conditions found in the sample would probably have also been found in the wider population.

(That is inferential stats. are commonly used to suggest 'we found a statistically significant better outcome in one condition in our sample, so in the hypothetical situation that we had been able to include the entire population in out study we would probably have found better mean outcomes in that same condition'.)

This may be one reason why Mamombe and colleagues do not consider their study to be an experiment. The authors acknowledge limitations in their study (as there always are in any study) including that "the sample was limited to two schools and two science education specialists as instructors; the results should therefore not be generalized" (p.9).

Yet, of course, if the results cannot be generalised beyond these four classes in two schools, this undermines the usefulness of the study (and the grounds for the recommendations the authors make for teaching based on their findings in the specific research contexts).

If considered as an experiment, the study suffers from other inherent limitations (Taber, 2019). There were likely novelty effects, and even though there was no explicit hypothesis, it is clear that the authors expected enquiry to be a productive approach, so expectancy effects may have been operating.

Analytical framework

In an experiment is it important to have an objective means to measure outcomes, and this should be determined before data are collected. (Read about 'Analysis' in research studies.). In this study methods used in previous published work were adopted, and the authors tell us that "A coding scheme was developed based on the findings of previous research…and used during the coding process in the current research" (p.6).

But they then go on to report,

"Learners' drawings during the pre-test and post-test, their notes during class activities and their responses during interviews were all analysed using the coding scheme developed. This study used a combination of deductive and inductive content analysis where new conceptions were allowed to emerge from the data in addition to the ones previously identified in the literature"

p.6

An emerging analytical frame is perfectly appropriate in 'discovery' research where a pre-determined conceptualisation of how data is to be understood is not employed. However in 'confirmatory' research, testing a specific idea, the analysis is operationalised prior to collecting data. The use of qualitative data does not exclude a hypothesis-testing, confirmatory study, as qualitative data can be analysed quantitatively (as is done in this study), but using codes that link back to a hypothesis being tested, rather than emergent codes. (Read about 'Approaches to qualitative data analysis'.)

Much of Mamombe and colleagues' description of their work aligns with an exploratory discovery approach to enquiry, yet the gist of the study is to compare student representations in relation to a model of correct/acceptable or alternative conceptions to test the relative effectiveness of two pedagogic treatments (i.e., an experiment). That is a 'nomothetic' approach that assumed standard categories of response.

Overall, the author's account of how they collected and analysed data seem to suggest a hybrid approach, with elements of both a confirmatory approach (suitable for an experiment) and elements of a discovery approach (more suitable for case study). It might seem this is a kind of mixed methods study with both confirmatory/nomothetic and discovery/idiographic aspects – responding to two different types of research question the same study.

Yet there do not actually seem (**) to be two complementary strands to the research (one exploring the richness of student's ideas, the other comparing variables – i.e., type of teaching versus degree of learning), but rather an attempt to hybridise distinct approaches based on incongruent fundamental (paradigmatic) assumptions about research. (** Having explicit research questions stated in the paper could have clarified this issue for a reader.)

So, do we have a case study?

Mamombe and colleagues may have chosen to frame their study as a kind of case study because of the issues raised above in regard to considering it an experiment. However, it is hard to see how it qualifies as case study (even if the editor and peer reviewers of the EURASIA Journal of Mathematics, Science and Technology Education presumably felt this description was appropriate).

Mamombe and colleagues do use multiple data sources, which is a common feature of case study. However, in other ways the study does not meet the usual criteria for case study. (Read more about 'Case study'.)

For one thing, case study is naturalistic. The method is used to study a complex phenomena (e.g., a teacher teaching a class) that is embedded in a wider context (e.g., a particular school, timetable, cultural context, etc.) such that it cannot be excised for clinical examination (e.g., moving the lesson to a university campus for easy observation) without changing it. Here, there was an intervention, imposed from the outside, with external agents acting as the class teachers.

Even more fundamentally – what is the 'case'?

A case has to have a recognisable ('natural') boundary, albeit one that has some permeability in relation to its context. A classroom, class, year group, teacher, school, school district, etcetera, can be the subject of a case study. Two different classes in one school, combined with two other classes from another school, does not seem to make a bounded case.

In case study, the case has to be defined (not so in this study); and it should be clear it is a naturally occurring unit (not so here); and the case report should provide 'thick description' (not provided here) of the case in its context. Mamombe and colleagues' study is simply not a case study as usually understood: not a "qualitative pre-test/post-test case study design" or any other kind of case study.

That kind of mislabelling does not in itself does not invalidate research – but may indicate some confusion in the basic paradigmatic underpinnings of a study. That seems to be the case [sic] here, as suggested above.

Suitability of the comparison condition: lecturing

A final issue of note about the methodology in this study is the nature of one of the two conditions used as a pedagogic treatment. In a true experiment, this condition (against which the enquiry condition was contrasted) would be referred to as the control condition. In a quasi-experiment (where randomisation of participants to conditions is not carried out) this would usually be referred to as the comparison condition.

At one point Mamombe and colleagues refer to this pedagogic treatment as 'direct instruction' (p.3), although this term has become ambiguous as it has been shown to mean quite different things to different authors. This is also referred to in the paper as the lecture condition.

Is the comparison condition ethical?

Parental consent was given for students contributing data for analysis in the study, but parents would likely trust the professional judgement of the researchers to ensure their children were taught appropriately. Readers are informed that "the learners whose parents had not given consent also participated in all the activities together with the rest of the class" (p.3) so it seems some children in the lecture treatment were subject to the inferior teaching approach despite this lack of consent, as they were studying "a prescribed topic in the syllabus of the learners" (p.3).

I have been very critical of a certain kind of 'rhetorical' research (Taber, 2019) report which

  • begins by extolling the virtues of some kind of active / learner-centred / progressive / constructivist pedagogy; explaining why it would be expected to provide effective teaching; and citing numerous studies that show its proven superiority across diverse teaching contexts;
  • then compares this with passive modes of learning, based on the teacher talking and giving students notes to copy, which is often characterised as 'traditional' but is said to be ineffective in supporting student learning;
  • then describes how authors set up an experiment to test the (superior) pedagogy in some specific context, using as a comparison condition the very passive learning approach they have already criticised as being ineffective as supporting learning.

My argument is that such research is unethical

  • It is not genuine science as the researchers are not testing a genuine hypothesis, but rather looking to demonstrate something they are already convinced of (which does not mean they could not be wrong, but in research we are trying to develop new knowledge).
  • It is not a proper test of the effectiveness of the progressive pedagogy as it is being compared against a teaching approach the authors have already established is sub-standard.

Most critically, young people are subjected to teaching that the researchers already believe they know will disadvantage them, just for the sake of their 'research', to generate data for reporting in a research journal. Sadly, such rhetorical studies are still often accepted for publication despite their methodological weaknesses and ethical flaws.

I am not suggesting that Mamombe, Mathabathe and Gaigher have carried out such a rhetorical study (i.e., one that poses a pseudo-question where from the outset only one outcome is considered feasible). They do not make strong criticisms of the lecturing approach, and even note that it produces some learning in their study:

"Similar to the inquiry group, the drawings of the learners were also clearer and easier to classify after teaching"

"although the inquiry method was more effective than the lecture method in eliciting improved particulate conception and reducing continuous conception, there was also improvement in the lecture group"

p.9, p.10

I have no experience of the South African education context, so I do not know what is typical pedagogy in primary schools there, nor the range of teaching approaches that grade 4 students there might normally experience (in the absence of external interventions such as reported in this study).

It is for the "two experienced teachers who are university lecturers and well experienced in teacher education" (p.3) to have judged whether a lecture approach based on teacher telling, children making notes and copying drawings, but with no student activities, can be considered an effective way of teaching 8-9 year old children a highly counter-intuitive, abstract, science topic. If they consider this good teaching practice (i.e., if it is the kind of approach they would recommend in their teacher education roles) then it is quite reasonable for them to have employed this comparison condition.

However, if these experienced teachers and teacher educators, and the researchers designing the study, considered that this was poor pedagogy, then there is a real question for them to address as to why they thought it was appropriate to implement it, rather than compare the enquiry condition with an alternative teaching approach that they would have expected to be effective.

Sources cited:

* Material reproduced from Mamombe, Mathabathe & Gaigher, 2020 is © 2020 licensee Modestum Ltd., UK. That article is an open access article distributed under the terms and conditions of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/4.0/) [This post, excepting that material, is © 2020, Keith S. Taber.]

An introduction to research in education:

Taber, K. S. (2013). Classroom-based Research and Evidence-based Practice: An introduction (2nd ed.). London: Sage.