Can we be sure that fun in the sun alters water chemistry?

Minimalist sampling and experimental variables


Keith S. Taber


Dirty water

I was reading the latest edition of Education in Chemistry and came across an article entitled "Fun in the sun alters water chemistry. How swimming and tubing are linked to concerning rises in water contaminants" (Notman, 2023). This was not an article about teaching, but a report of some recent chemistry research summarised for teachers. [Teaching materials relating to this article can be downloaded from the RSC website.]

I have to admit to not having understood what 'tubing' was (I plead 'age') apart from its everyday sense of referring collectively to tubes, such as those that connect Bunsen burners to gas supplies, and was intrigued by what kinds of tubes were contaminating the water.

The research basically reported on the presence of higher levels of contaminants in the same body of water at Clear Creak, Colorado on a public holiday when many people used the water for recreational pursuits (perhaps even for 'tubing'?) than on a more typical day.

This seems logical enough: more people in the water; more opportunities for various substances to enter the water from them. I have my own special chemical sensor which supports this finding. I go swimming in the local hotel pool, and even though people are supposed to shower before entering the pool: not everyone does (or at least, not effectively). Sometimes one can 'taste' 1 the change when someone gets in the water without washing off perfume or scented soap residue. Indeed, occasionally the water 'tastes' 1 differently after people enter the pool area wearing strong perfume, even if they do not use the pool and come into direct contact with the water!

The scientists reported finding various substances they assumed were being excreted 2 by the people using the water – substances such as antihistamines and cocaine – as well as indicators of various sunscreens and cosmetics. (They also found higher levels of "microbes associated with humans", although this was not reported in Education in Chemistry.)


I'm not sure why I bother having a shower BEFORE I go for a swim in there… (Image by sandid from Pixabay)


It makes sense – but is there a convincing case?

Now this all seems very reasonable, as the results fit into a narrative that seems theoretically feasible: a large number of people entering the fresh water of Clear Creek are likely to pollute it sufficiently (if not to rename it Turbid Creek) for detection with the advanced analytical tools available to the modern chemist (including "an inductively coupled plasma mass spectrometer and a liquid chromatography high resolution mass spectrometer").

However, reading on, I was surprised to learn that the sampling in this study was decidedly dodgy.

"The scientists collected water samples during a busy US public holiday in September 2022 and on a quiet weekday afterwards."

I am not sure how this (natural) experiment would rate as a design for a school science investigation. I would certainly have been very critical if any educational research study I had been asked to evaluate relied on sampling like this. Even if large numbers of samples were taken from various places in the water over an extended period during these two days this procedure has a major flaw. This is because the level of control of other possibly relevant factors is minimal.

Read about control in experimental research

The independent variable is whether the samples were collected on a public holiday when there was much use of the water for leisure, or on a day with much less leisure use. The dependent variables measured were levels of substances in the water that would not be considered part of the pristine natural composition of river water. A reasonable hypothesis is that there would be more contamination when more people were using the water, and that was exactly what was found. But is this enough to draw any strong conclusions?

Considering the counterfactual

A useful test is to ask whether we would have been convinced that people do not contaminate the water had the analysis shown there was no significant difference in water samples on the two days? That is to examine a 'counterfactual' situation (one that is not the case, but might have been).

In this counterfactual scenario, would similar levels of detected contaminants be enough to convince us the hypotheses was misguided – or might we look to see if there was some other factor which might explain this unexpected (given how reasonable the hypothesis seems) result and rescue our hypothesis?

Had pollutant levels been equally high on both days, might we have sought ('ad hoc') to explain that through other factors:

  • Maybe it was sunnier on the second day with high U.V. levels which led to more breakdown of organic debris in the river?
  • Perhaps there was a spill of material up-river 3 which masked any effect of the swimmers (and, er, tubers?)
  • Perhaps rainfall between the two sampling dates had increased the flow of the river and raised its level, washing more material into the water?
  • Perhaps the wind direction was different and material was being blown in from nearby agricultural land on the second day.
  • Perhaps the water temperature was different?
  • Perhaps a local industry owner tends to illegally discharge waste into the river when the plant is operating on normal working days?
  • Perhaps spawning season had just started for some species, or some species was emerging from a larval state on the river bed and disturbing the debris on the bottom?
  • Perhaps passing migratory birds were taking the opportunity to land in the water for some respite, and washing off parasites as well as dust.
  • Perhaps a beaver's dam had burst up stream 3 ?
  • Perhaps (for any panspermia fans among readers) an asteroid covered with organic residues had landed in the river?
  • Or…

But: if we might consider some of those factors to potentially explain a lack of effect we were expecting, then we should equally consider them as possible alternative causes for an effect we predicted.

  • Maybe it was sunnier on the first day with high U.V. levels which led to more breakdown of organic debris in the river?
  • Perhaps a local industry owner tends to illegally discharge waste into the river on public holidays because the work force are off site and there will be no one to report this?
  • … etc.

Lack of control of confounding variables

Now, in environmental research, as in research into teaching, we cannot control conditions in the way we can in a laboratory. We cannot ensure the temperature and wind direction and biota activity in a river is the same. Indeed, one thing about any natural environment that we can be fairly sure of is that biological activity (and so the substances released by such activity) varies seasonally, and according to changing weather conditions, and in different ways for different species.

So, as in educational research, there are often potentially confounding variables which can undermine our experiments:

In quasi-experiments or natural experiments, a more complex design than simply comparing outcome measures is needed. …this means identifying and measuring any relevant variables. …Often…there are other variables which it is recognised could have an effect, other than the dependent variable: 'confounding' variables.

Taber, 2019, p.85 [Download this article]

independent variableclass of day (busy holiday versus quiet working day)
dependent variablesconcentrations of substances and organisms considered to indicate contamination
confounding variablesanything that might feasibly influence the level of concentrations of substances and organisms considered to indicate contamination – other than the class of day
In a controlled experiment any potential confounding variables are held at fixed levels, but in 'natural experiments' this is not possible

Read about confounding variables in research

Sufficient sampling?

The best we can do to mitigate for the lack of control is rigorous sampling. If water samples from a range of days when there was high level of leisure activity, and a range of days when there was low level of leisure activity were compared, this would be more convincing that just one day from each category. Especially so if these were randomly selected days. It is still possible that factors such as wind direction and water temperature could bias findings, but it becomes less likely – and with random sampling of days it is possible to estimate how likely such chance factors are to have an effect. Then we can at least apply models that suggest whether observed differences in outcomes exceed the level likely due to chance effects.

Read about sampling in research

I would like to think that any educational study that had this limitation would be questioned in peer review. The Education in Chemistry article cited the original research, although I could not immediately find this. The work does not seem to have been published in a research journal (at least, not yet) but was presented at a conference, and is discussed in a video published by the American Chemical Society on YouTube.

"With Labor Day approaching, many people are preparing to go tubing and swimming at local streams and rivers. These delightful summertime activities seem innocuous, but do they have an impact on these waterways? Today, scientists report preliminary [sic] results from the first holistic study of this question 4, which shows that recreation can alter the chemical and microbial fingerprint of streams, but the environmental and health ramifications are not yet known."

American Chemical Society Meeting Newsroom, 2023

In the video, Noor Hamdan, of John Hopkins University, reports that "we are thinking of collecting more samples and doing some more statistical analysis to really, really make sure that humans are significantly impacting a stream".

This seems very wise, as it is only too easy to be satisfied with very limited data when it seems to fit with your expectations. Indeed that is one of the everyday ways of thinking that science challenges by requiring more rigorous levels of argument and evidence. In the meantime, Noor Hamdan suggests people using the water should use mineral-based rather than organic-based sunscreens, and she "recommend[s] not peeing in rivers". No, I am fairly sure 'tubing' is not meant as a euphemism for that. 5


Work cited:

Notes:


1 Perhaps more correctly, smell, though it is perceived as tasting – most of the flavour we taste in food is due to volatile substances evaporating in the mouth cavity and diffusing to be detected in the nose lining.


2 The largest organ of excretion for humans is the skin. The main mechanism for excreting the detected contaminating substances into the water (if perhaps not the only pertinent one, according to the researchers) was sweating. Physical exertion (such as swimming) tends to be associated with higher levels of sweating. We do not notice ourselves sweating when the sweat evaporates as fast as it is released – nor, of course, when we are immersed in water.


One of those irregular verbs?

I perspire.

You sweat.

She excretes through her skin

(Image by Sugar from Pixabay)


3 The video suggests that sampling took place both upriver and downriver of the Creek which would offer some level of control for the effect of completely independent influxes into the water – unless they occurred between the sampling points.


4 There seem to be plenty of studies of the effects of water quality on leisure use of waterways: but not on the effects of the recreational use of waterways on their quality.


5 Just in case any readers were also ignorant about this, it apparently refers to using tyre inner tubes (or similar) as floatation devices. This suggests a new line of research. People who float around in inner tubes will tend to sweat less than those actively swimming – but are potentially harmful substances leached from the inner tubes themselves?


Join an email discussion list for those teaching chemistry


The sugger strikes back!

An update on the 'first annual International Survey of Research Centres and Institutes'


Keith S. Taber (masquerading as a learned academic)


if he wanted me to admit I had been wrong, Hussain could direct me to the released survey results and assure me that the data collected for the survey was not being used for other purposes. That is, he should given me grounds to think the survey was a genuine piece of research and not 'sugging'


Some months ago I published an article in this blog about a message I received from an organisation called Acaudio, that has a website where academics can post audio recordings promoting their research, that invited me to participate in "the first annual International Survey of Research Centres and Institutes". I was suspicious of this invitation for a number of reason as I discuss at 'The first annual International Survey of Gullible Research Centres and Institutes')

Several things suggested to me that this was not a genuine piece of academic research, including the commitment that "We will release the results over the next month" which seemed so unrealistic as to have been written either by someone with no experience of collecting and analysing large scale survey data – or someone with no intention of actually following through on the claim.

Sugging?

Having taken a look at the survey questions, I felt pretty sure thus was an example of what has been labelled as 'sugging'. Sugging is a widely recognised, and indeed widely adopted, unethical practice of collecting marketing information by framing it as a survey. The Market Research Society explains that,

Sugging is a market research industry term, meaning 'selling under the guise of research'. Sugging occurs when individuals or companies pretend to be market researchers conducting a research, when in reality they are trying to build databases, generate sales leads or directly sell product or services….

The practices of sugging and frugging [fundraising under the guise of market research] bring discredit on the profession of research… and mislead members of the public when they are being asked for their co-operation…

Failing to clearly specify the purpose for which the data is being collected is also a breach of…the first principle of the Data Protection Act 1998.

https://www.mrs.org.uk/standards/suggingfaq

Although I thought the chances of the results of the first annual International Survey of Research Centres and Institutes actually being released within the month, or even within a few months to allow for a modest level of over-promising, were pretty minuscule, I did think I should wait a few months and then do a search to see if such a report had appeared. I did not think I was likely to find such a report released into the public domain, but any scientist has to be open-minded enough to consider they might be wrong – and certainly in my own case I've collected enough empirical evidence over the years to know I am not just, in principle, fallible.

Acaudio doth protest too much, methink

But (being fallible) I'd rather forgotten about this and had not got round to doing a web search. Until, that is, I was prompted to do so by receiving an email from the company founder, Hussain Ayed, who had had his attention drawn to my blog, and was – understandably perhaps – not happy about my criticisms:



Hussain's letter did not address my specific points from the blog (as he did not want to "get into the nitty gritty of it all"), but assured me his company was genuinely trying to do useful work, and there was no scamming.

Of course, I had not suggested Acaudio, the organisation, was itself a 'scam': in my earlier article I had pointed that Acaudio was offering a free, open-access, service which was likely to be useful to academics – and even briefly pointed out some positive features of their website.

But Acaudio's 'survey' was a different matter. It did not meet the basic requirements for a serious academic study, and it asked questions that seemed to be clearly designed as linked to potential selling points for a company that was offering services to increase research impact (so, perhaps, Acaudio).



And it promised a fantastic time-scale. Perhaps a very large organisation, with staff fully dedicated to analysis and reporting could have released international survey results within a month of collecting data – perhaps? But Acaudio was a company with one company officer that reported employing one person.

Given the scale of the organisation, what Acaudio have achieved with their website in a relatively short time is highly impressive. But…

…where is that survey report?

I replied to Hussain, as below.

Dear Hussain Ayed

Thank you for your message.

I have not written "a comprehensive attack on [your] company" and do not have a sufficient knowledge-base to have done so. I have indeed, however, published a blog article criticising your marketing techniques based on the direct evidence in messages you have sent me. In particular, I claimed that,

(i) (despite being registered as a UK based company) you did not adhere to the UK regulations concerning direct marketing. (I assume you are not seeking to challenge this given the evidence of your own emails)

(ii) that you were also 'sugging': undertaking marketing under the guise of carrying out a survey.

If I understand your complaint, you are suggesting in regard to point (ii) that you really were carrying out a survey for the public good (rather than to collect information for your own commercial purposes) and that any apparent failure of rigour in this regard actually resulted from a lack of relevant expertise within the company. If so, perhaps you will send me, or tell me where I can access, the published outcome of the survey (due to be available by the middle of June 2023 according to your earlier message). I have looked on line for this, but a Google search (using the term "International Survey of Research Centres and Institutes") failed to locate the report.

Can you offer me an assurance that information collected for the survey was ONLY used for the analysis that led to the published survey report (assuming there is one you can point me to), and that this information was not retained by your organisation as a basis for contacting individuals with regard to your company's services? If you can offer appropriate assurances then I will be happy to add an inserted edit into the blog to include a statement along the lines that the company assures me that all information collected was only used for the purposes of producing a survey report, and was not retained or used in any other way by the company.

So, to summarise regarding point (ii), if this survey was not a scam, please (a) point me to the outcomes, and (b) give me these assurances about not collecting information under false premises.

You also have the right to reply directly. If you really think anything in my article amounted to "misleading bits of 'evidence' " then please do correct this. You are free to submit a response in the comments section at the bottom of the page. If you wish to do that, I will be happy to publish your reply (subject to my usual restrictions which I am sure should not be any impediment to you – so, I will not publish anything I think might be libellous of a third party, nor anything with obscenity/profanity etc. Sadly, I do sometimes have to reject comments of these kinds.)

I recognise that comments have less prominence than the blog article they follow, and that indeed some readers may not get that far in their engagement with an article. Therefore, if you do submit a reply I am happy to also add a statement at the HEAD of my article to point out out to readers that there is a reply on behalf of the company beneath the article, so my readers see that notice BEFORE proceeding to read my own account. I am not looking for people/organisations to criticise for the sake of it, but have become concerned about the extent of unethical practice in the name of academic work (such as the marketing of predatory journals and conferences) and do point out some of the examples that come my way. I believe such bad practice is very damaging, and especially so for students who are new to the academic world, and for those working working in under-resourced contexts who may be under extreme pressure to achieve 'tenure'. People spend their limited funds on getting published in journals that have no serious peer review (and so are not taken seriously by most academics), or presenting at conferences which 'invite' contributions from anyone prepared to pay the fees. I do not spend time looking for such bad practice: it arrives in my inbox on a fairly frequent basis.

Perhaps your intentions are indeed honourable, and perhaps you are doing good work. Perhaps you are indeed "working to tackle inequality in higher education and academia", which obviously would be valuable, although I am not sure how this is achieved by working with groups at Cambridge such as the Bioelectronic Systems Tech Group – unless you perhaps charge fees to those in wealthy institutions to allow you to offer a free service for those elsewhere? If you do: good on you. Even so, I would strongly suggest you 'clean up your act' as far as your marketing is concerned, and make sure your email campaigns are within the law. By failing to follow the regulations you present your organisation as either being unprofessional (giving the impression no one knows what they are doing) or dodgy (if you* know the regulations, but are choosing not to follow them). *I assume you are responsible for the marketing strategy, but even if someone else is doing this for you, I suspect you (as the only registered company officer) would be considered ultimately responsible for not following the regulations.

If you are genuine about wishing to learn more about undertaking quality surveys, there are many sources of information. My pages on research methods might be a place to get some introductory background, but if this to be a major part of your company's activity I would really suggest you should employ someone with expertise, or retain a consultant who works in that area.

Thank you for the offer to work with you, but I am retired and have too many existing projects to work on – and in any case you should work with someone you genuinely respect, not someone that you consider only to "masquerade as a learned academic" and who has "shaky morals".

Best wishes

Keith

My key point was that if he wanted me to admit I had been wrong, Hussain could direct me to the released survey results and assure me that the data collected for the survey was not being used for other purposes. That is, he should given me grounds to think the survey was a genuine piece of research and not 'sugging'.

The findings of the survey are 'reserved'

Later that day, I got the following reply:



So, it seems the research report that was supposed to have been released ("over the next month" – according to Acaudio's email dated 15th May 2023) was not available, and – furthermore – would not be made available to me.

  • A key principle of scientific research is that the outcomes are published – that is made available to the public: and not "reserved" for certain people the researchers select!
  • A key feature of ethical research is that a commitment is made to make outcomes available (as Acaudio did) and this is followed through (as Acaudio did not).
What is the research data being used for?

Hussain also failed to offer any assurances that the data collected under the claim (pretence, surely) of carrying out survey research was not being used for commercial purposes – as a basis for evaluating the potential merits of approaching different respondents to tender for services. I cannot prove that Acaudio was using the collected information for such purposes, but if my suspicions were misplaced (and if Hussain really wanted to persuade me that the survey was not intended as a scam) it would have been very easy to simply include a sentence in his response to that effect – to have assured me that the research data was being analysed anonymously and handled separately from the company's marketing data with a suitable 'ethical wall' between.1

That is, Hussain could have simply got into enough of the "nitty gritty" to have offered an assurance of following an ethical protocol, instead of choosing to insult me…as I pointed out to him:-


Dear Hussain

Thank you for your message.

So, the 'survey' results (if indeed any such document actually exists) that you indicated to me would be released by mid-June are still not actually available in the public domain. As you say: 'Hmm'.

You are right, that I would have no right to ask you to provide me with anything – except that YOU ASKED ME to believe I misjudged you, and to withdraw my public criticisms; and so I ASKED YOU to provide the evidence to persuade me by (i) proving there was a survey analysis with published results, and (ii) giving an assurance that you did not use, for your company's marketing purposes, data supposedly collected for publishable research. There is of course no reason why you should have provided either the results or the assurances, unless you actually did feel I had judged Acaudio too harshly and you wanted to give me reason to acknowledge this. The only thing that might give me "some sort of power over [you]" in this regard is your suggestion to me that I might wish to "take back the claims that [I] made". Can I remind you: you contacted me. You contacted me, unsolicited, in December 2022, and then again in May 2023. This morning, you contacted me again specifically to suggest my suggestions of wrong-doing were misjudged. But you will not back that up, so you have simply reinforced my earlier inferences.

For some reason that is not clear to me, you think that my mind is on money – that is presumably why I spend some of my valuable time highlighting poor academic practices on a personal website that brings in no income and is financed from my personal funds. Perhaps that is the company director finding it hard to get inside the mind of a retired teacher who worked his entire career in the public sector? (That is not meant as an insult – I probably have the reverse difficulty in understanding the motivations of the commercial mind. Perhaps that is why these are "things that are beyond [my] understanding"?) I do not have any problem with you setting up a company to make money (good luck to you if you work hard and treat people with due respect), and think it is perfectly possible for an organisation to both make money and produce public goods – I am not against commercial organisations per se. My 'vested interests' relate to commitments to certain values that I think underpin both good science and academic activities more broadly. A key one is honesty (which is one fundamental aspect of treating people with due respect). We are all entitled (perhaps even have a duty?) to make the strongest arguments for our positions, but when people knowingly misrepresent (e.g., "We will release the results over the next month" but no publication is forthcoming) in order to to advance their interests, this undermines the scholarly community. Anyone can be wrong. Anyone can be mistaken. Anyone can fail in a venture. (Such as promising a report, genuinely intending to produce one, but finding the task was more complex than anticipated. Had that been your response, I might have found this feasible. Instead, you promised to release the results, but now you claim you have "every right to ignore [my] request for the outcomes". Yes, that is so – if the commitment you made means nothing.) As long as we can trust each other to be open and honest the system will eventually self-correct in cases when there are false (but honestly motivated) claims. Yet, these days, academics are flooded with offers and claims that are not mistaken, but deliberately misleading. That is what I find so troublesome that I take time to call out examples. That may seem strange to you, but you have to remember I have worked as a school, college, and university, teacher all my working life, so I identify at a very deep level with the basic values underpinning the quest for knowledge and learning. When I get an email from someone claiming they are doing a survey, but which seems to be an attempt to market services, I do take it personally. I do not like to be lied to. I do not like to be treated as a fool. And I do not like the thought that perhaps less experienced colleagues and graduate students may take such approaches at face value and not appreciate they are being scammed. Can does not equate to should: you may have "the ability to write and say what [you] want", but that does not mean you have the right to deliberately mislead people. You say you will not be engaging with me any more. Fine. You started this correspondence with your unsolicited approaches. I will be very happy if you remove me from your marketing list (that I did not sign up for) and do not contact me again. That might be in both our interests.

And despite all this, I wish you well. Whatever your mistakes in the past, if you do genuinely wish to make a difference in the way you suggest, then I hope you are successful. But please, if you believe in your company and the contribution it can make, seek to be totally honest with potential clients. If you are in this for the long term, then developing trust and a strong reputation for ethical business practices will surely create a fund of social capital that will pay dividends as you build up the organisation. Whereas producing emails of the kind you have sent me today is likely to be counter-productive and just alienate people: using ad hominem points – I am masquerading as a learned academic, out of touch, arrogant, unfit and entitled; with shaky morals and vested interests; things are beyond my understanding; I write nonsense – simply suggests you have no substantive points to support your position. By doing this you automatically cede the higher ground. And, moreover, is that really the way you want your company represented in its communications?

Best wishes

Keith 


As I wrote above, Acaudio seem to be doing a really good job in setting up a platform where researchers can post accounts of their research – and given the scale of the organisation – I assume much (if not all) of that is down to Hussain. That, he can be proud of.

However, using the appearance of an international survey as a cover for collecting data that can be used to market a company's services is widely recognised as a dishonest and unethical (if not illegal 2) practice. I think he should less proud of himself in that regard.

If Hussain still wants to maintain that his request for contributions to the first annual International Survey of Research Centres and Institutes was intended as a genuine attempt at academic research, rather than just a marketing scam, then he still has the option of publishing a report of the study so that the academic community can evaluate the extent to which the survey meets the norms of genuine research; and so that, at very least, he will have met one key criterion of academic research (publication).

This would also show that Acaudio are prepared to meet their side of the contract they offered to potential respondents (i.e., please contribute to this survey – in consideration we will release the results over the next month). Any reputable business should be looking to make good on its promises.


Notes

1 The idea of an ethical wall (sometimes referred to as a 'Chinese wall') is important in businesses where there is the potential for conflicts of interest. Consider, for example, firms of lawyers that may have multiple clients, and where information offered in confidence by one client could have commercial value for another. The firm is expected to have protocols in place so that information about one client is not either leaked to another client, or (deliberately or inadvertently) influences advice given to another client. To avoid inadvertent influence, it may be necessary to ensure staff working with one client are not involved in work for another client that may be seen to have conflicting interests.

A company may hire a market research organisation to carry out market research to inform then about future strategies – so the people analysing the data have no bias due to preferred outcomes, and no temptation to misuse the data for direct marketing purposes. The commissioned report will not identify particular respondents. Then there is an ethical wall between the market researchers who report on the overall state of the market, and the client company's marketing and sales section.

My reference to the small size of Acaudio is not intended as an inherent criticism. My original point was that such a small company was unlikely to have the capacity to carry out a meaningful international survey (which does not imply the intention to do so was necessarily inauthentic – Acaudio might have simply overstretched itself).

However, a very small company might well have inherent difficulties in carrying out genuine research which did not leak information about specific respondents to those involved in sales.

Many surveys invite people to offer their email if they wish for feedback or to make themselves available for follow-up interviews – but offer an assurance the email address will not be used for other purposes, and need not be given to participate. Acaudio's survey required identifying information.2 This is a strong indicator that the primary purpose was not scholarly research.



2 The Data Protection Act 2018 concerns personal information:

"Everyone responsible for using personal data has to follow strict rules called 'data protection principles'. They must make sure the information is:

  • used fairly, lawfully and transparently
  • used for specified, explicit purposes
  • used in a way that is adequate, relevant and limited to only what is necessary
  • accurate and, where necessary, kept up to date
  • kept for no longer than is necessary
  • handled in a way that ensures appropriate security, including protection against unlawful or unauthorised processing, access, loss, destruction or damage"
GOV.UK

Acaudio's survey is nominally about research institutes not individual people.

However, it asks questions such as

  • "How satisfied are you with…"
  • "How much time do you spend…"
  • "Do you feel like…"
  • "What are the biggest challenges you face…"
  • "Who do you feel is…"
  • "How effective do you think…"
  • "Do you agree…"
  • "What would you consider..."
  • "How much would you consider…"
  • "Would you be interested in…"
  • "How do you decide…"
  • "What do you hope…"

This is information about a person, moreover a person of known email address:

" 'personal data' means any information relating to an identified or identifiable natural person ('data subject'); an identifiable natural person is one who can be identified, directly or indirectly, in particular by reference to an identifier such as a name, an identification number, location data, an online identifier…"

Information Commissioner's Office

So, if information collected by this survey was used for purposes other than the survey itself –

  • say perhaps for identifying sales leads {e.g., "How satisfied are you with the level of awareness people have of your centre / institute?" "How effective do you think your current promotion methods are?"; "How important is building an audience for the work of the research centre / institute?"};
  • and/or profiling potential clients
    • in terms of level of resource that might be available to buy services {e.g., "How much would you consider to be a reasonable amount to spend on promotional activities?"},
    • or priorities for research impact strategies {e.g., "What mediums [sic] would you consider using to promote your research centre / institute?"; "Do you agree it is important to have a dedicated person to take care of promotional activities?"}

– would that not be a breach of UK data protection law?


The sins of scientific specialisation


Keith S. Taber


As long ago as 1932, Albert Einstein warned about the dangers of scientific specialisation. Indeed, he drew on a Biblical analogy for the situation:

"The area of scientific investigation has been enormously extended, and theoretical knowledge has become vastly more profound in every department of science. But the assimilative power of the human intellect is and remains strictly limited. Hence it was inevitable that the activity of the individual investigator should be confined to a smaller and smaller section of human knowledge. Worse still, this specialisation makes it increasingly difficult to keep even our general understanding of science as a whole, without which the true spirit of research is inevitably handicapped, in step with scientific progress. A situation is developing similar to the one symbolically represented in the Bible by the story of the tower of Babel. Every serious scientific worker is painfully conscious of this involuntary relegation to an ever-narrowing sphere of knowledge, which threatens to deprive the investigator of his broad horizon and degrades him to the level of a mechanic."

Albert Einstein, 1932

Einstein suggested that the true scientist needs to have a basic grasp of current knowledge across the natural sciences to retain what he labels the 'true spirit' of science. I doubt many scientists would agree with this today, as, inevitably, few if any professional research scientists today could claim sufficient "general understanding of science as a whole" to, by Einstein's criterion here, avoid "the true spirit of research" being handicapped. Moreover, I doubt there are many (any?) who could claim to be the kind of polymaths that were still found two to three centuries ago, when some individuals made substantive contributions to research across a range of scientific disciplines.

The level of the mechanic?

I am sure Einstein did not intend to be derogatory about mechanics per se, but he, in effect, made a distinction between the work of the scientist and the technician. The technician may sometimes be a supreme craftsperson with highly developed technê (technical knowledge) and finely tuned skills. Scientists depend upon technicians, and often lack their expertise and level of skill in carrying out procedures.

School science teachers rely heavily on their school laboratory technicians (in those countries where they exist) and often would actually lack the knowledge and skills to source and prepare and maintain all the materials and apparatus used in practical work in their classes. But the research scientist is primarily concerned with a different, more theoretical, form of knowledge development: epsitêmê.

Professional teachers and classroom technicians

This is a distinction that resonates with many teachers. Professional teachers should be assumed to have developed a form of professional knowledge that is highly complex and enables them to critically use theory to interpret nuanced teaching situations, and make informed decisions. Too often, however, teaching is seen and discussed as only a craft where teachers can be trained and should have imposed on them detailed guidance about what and how to teach.

I have certainly seen this in England, where sometimes civil servants take advice from a small group of supposed experts 1 to develop general 'guidance' that they then think should be applied as a matter of policy by professional teachers in their various, diverse, teaching contexts. Similarly, formal inspections, where a small number of visitors spend a few days in a school or college are used to make judgements and recommendations given more weight than the collective experience of the professional staff embedded in that unique teaching context.

Of course technê and epsitêmê are rudderless without another domain of knowledge: that which helps us acquire the wisdom to live a good life – phronêsis (Martínez Sainz, 2015). The vision of the education system as something that can be subjected to atomistic, objective, evaluation and ranking, perhaps reflects the values of society that has somewhat lost sight of the most important aims of education. We do want informed citizens that have high levels of skills and that can contribute to the workforce – but unless these competent and employed people also go on to live meaningful and satisfying lives, that is all rather pointless. That is not a call to 'turn on, tune in, drop out' (as might have been suggested when I was young) but perhaps to turn on, tune in, and balance priorities: having a 'good' job is certainly worthwhile, but it only really is a 'good job' if it helps the individual live a good life.

Authorship – taking responsibility for scientific work

The technician/scientist distinction is very clear in some academic fields when it comes to publication. To be an author on a research report should signify two very important things (Taber, 2018a):

  • an author has substantially contributed intellectually to the work reported;
  • an author takes responsibility for what has been reported.

Regarding the first point, it is usually thought that when reporting research purely technical contributions (no matter how essential) do not amount to authorship. Someone who transcribes a thousand hours of interviews verbatim into a database for a researcher to interrogate does not get considered as an author for the resulting paper even if they actually spent ten times as long working with the data as the person who did the analysis – as their contribution is technical, not intellectual.

But the other side of the authorship is that authors have to stand by the work they put their name to. That does not mean their conclusions have to stand for ever – but in claiming authorship of a research report they are giving personal assurance that it is honestly reported and reflects work undertaken with proper standards of care (including proper attention to research ethics).

Read about research authorship

But, in modern science, we often find papers with a dozen, a hundred, even a thousand authors. The authors of high energy physics papers may come from theoretical and experimental physics, statistics, engineering, computer programming, … Presumably each author has made a substantial intellectual contribution to the work reported (even when in extreme cases there are so many authors that if they had all been involved in the writing process they would, on average, have contributed about a sentence each).

Each of those authors knows a good deal about their specialism – but each relies completely on the experts in other fields to be on top of their own areas. No one author could offer assurances about all the science that the paper conclusions depend upon. For example, the authors named because they programmed the computers to interpret signals rely completely upon the theoretical physicists to tell them what patterns they were looking for. In Einstein's terms, "the true spirit of research is inevitably handicapped". The many authors of such a paper, are indeed like the proverbial committee of blind people preparing a description of an elephant by coordinating and compiling a series of partial reports.


Researchers at CERN characterise the elephant boson? (Image by Mote Oo Education from Pixabay)

It is as if a research report were like the outcome of a complex algorithm, with each step (e.g., "multiply the previous answer by 0.017") coded in a different language, and carried-out by a team, each of whom only understood one of the languages involved. As long as everyone is fully competent, then the outcome should be valid, but a misstep will will not be noticed and corrected by anyone else – and will invalidate the answer.


Making the unfamiliar familiar…by comparing it to Babel

Teachers and scientists often find they need to communicate something unfamiliar, and perhaps abstract, to an audience, and look to offer a comparison with something more familiar. For this to work well, it is important that the analogue, or metaphor, or other comparison, is actually already familiar to the audience.

Read about making the unfamiliar, familiar

Einstein offers an analogy: modern science reflects the story of the Tower of Babel.

Read about scientific analogies

Einstein presumably thought that his readers were likely to be familiar with the the Tower of Babel. It has a reputation for being a place of debauchery, as in the lyric to (my 'friend') Elton's song,

"It's party time for the guys in the tower of Babel
Sodom meet Gomorrah, Cain meet Abel
Have a ball y'all
See the letches crawl
With the call girls under the table
Watch them dig their graves
'Cause Jesus don't save the guys
In the tower of Babel"

Extract from Bernie Taupin's lyrics for 'Tower of Babel', a song from the Elton John album 'Captain Fantastic and the Brown Dirt Cowboy'

Taupin here conflates several biblical stories for dramatic effect (and suggests that the sins were so extreme that the sinners were beyond salvation, despite Jesus's promise to save all who truly repent). According to the Bible, the people of Sodom and Gomorrah were so wicked that God destroyed the cities. (The term 'sodomy' derives from Sodom.) A sense of the level of wickedness is suggested by how the mob demanded the two Angels sent by God be handed over to be sexually abused… 2

But the alleged 'sins' of the people in the Tower of Babel were quite different in nature.

Pride comes before the falls

The original account is indeed, as Einstein suggested, Biblical. According to the narrative in Genesis, the descendants of Adam and Eve were populating the world, and formed a settlement where they set out on building a city with a brick tower to reach into the sky.


The Tower of Babel by Pieter Bruegel the Elder (1563) (Source: Wikimedia) and the radio telescope at Jodrell Bank near Manchester (Image by petergaunt2 from Pixabay)


Supposedly, God saw this, and was concerned at how the people working together in this way could achieve so much, and pondered that "this is only the beginning of what they will do; nothing that they propose to do will now be impossible for them". God responded by disrupting society by confusing the people's common language, so they could no longer understand each other, and they abandoned the city and tower, and spread into different communities with their own languages. (This is reflected – at least, in a 'mirror universe' sense – in the New Testament account of how the Holy Spirit enabled the apostles to have the 'gift of tongues' so they could spread the Gospel without impediments from language barriers.)

The tower is believed to be one of a number of large towers known as ziggurats which functioned as both temples and astronomical observatories in Babylonian society (Freely, 2011). So, the Tower of Babel might be considered as something like our Jodrell Bank, or the Hubble telescope of its day.

So, the wrong-doing of the people in the Tower seems to be having made rapid progress towards a technological civilisation, made possible because everyone shared the same language and could effectively cooperate. That may seem an odd thing to be punished for, but this is in the tradition of the Old Testament account of a God that had already exiled humans from the paradise of the Garden of Eden as punishment for the sin (the 'fall' of humanity) of disobediently eating fruit form the tree of knowledge.


Talk, it's only talk
Babble, burble, banter
Bicker, bicker, bicker
Brouhaha, balderdash, ballyhoo
It's only talk
Back talk

From Adrian Belew's lyrics for the King Crimson song 'Elephant Talk'


The tower only become known as Babel in retrospect, from a term referring to confused talk, as in 'to babble'. This also inspired the name of the fictional 'Babel Fish' which, according to Douglas Adams, was probably the oddest thing in the Universe (as well as the basis for a mooted proof for the non-existence of God),

"It feeds on brainwave energy received not from its own carrier, but from those around it. It absorbs all unconscious mental frequencies from this brainwave energy to nourish itself with. It then excretes into the mind of its carrier a telepathic matrix formed by combining the conscious thought frequencies with nerve signals picked up from the speech centres of the brain which has supplied them. The practical upshot of all this is that if you stick a Babel fish in your ear you can instantly understand anything said to you in any form of language. The speech patterns you actually hear decode the brainwave matrix which has been fed into your mind by your Babel fish."

Douglas Adams, from 'The Hitchhiker's Guide to the Galaxy'
Have scientists been dispersed from a golden age of mutual comprehension?

Einstein's analogy has some bite then: we develop knowledge together when we communicate well, but once we are split into small specialist groups, each with their own technical concepts and terminology, this disrupts our ability to progress our science and technology. Whether that is a good thing, or not, depends what we do with the science, and what kinds of technologies result. This is where we need phronêsis as well as technê and epsitêmê.


Wise progress in society relies on different forms of knowledge (after Figure 2.2 from Taber, 2019)


Einstein himself would later put much effort into the cause of nuclear disarmament – having encouraged the United States to develop nuclear weapons in the context of World War 2, he later worked hard to campaign against nuclear proliferation. (Einstein wanted the US and other countries to hand over their nuclear arsenals to an international body.)


Hiroshima after the U.S. bombing

(Source: Wikimedia)


One wonders how Einstein might have reflected on his 1932 Tower of Babel analogy by the end of his life, after the destruction of the cities of Hiroshima and Nagasaki, and the subsequent development of the (even more destructive) hydrogen bomb? After all, as Adams reflects, the poor old Babel fish:

"by effectively removing all barriers to communication between different races and cultures, has caused more and bloodier wars than anything else in the history of creation".


Sodom and Gomorrah afire by Jacob de Wet II, 1680 (Source: Wikimedia); and an atomic bomb explodes (Image by Gerd Altmann from Pixabay)


Work cited:
  • Einstein, Albert (1932), In honor of Arnold Berliner's seventieth birthday. In Ideas and Opinions (1994), New York: The Modern Library.
  • Freely, J. (2011) Light from the East. How the science of medieval Islam helped to shape the Western World. I. B. Tauris & Co Ltd
  • Kierkegaard, Søren (1843/2014) Fear and Trembling. (Translated, Alastair Hannay) Penguin Classics.
  • Martínez Sainz, G. (2015). Teaching human rights in Mexico. A case study of educators' professional knowledge and practices [Unpublished Ph.D. thesis, University of Cambridge].
  • Taber, Keith S. (2018). Assigning Credit and Ensuring Accountability. In P. A. Mabrouk & J. N. Currano (Eds.), Credit Where Credit Is Due: Respecting Authorship and Intellectual Property (Vol. 1291, pp. 3-33). Washington, D.C.: American Chemical Society. [Can be downloaded here]
  • Taber (2019) MasterClass in Science Education: Transforming teaching and learning. London, Bloomsbury.

Notes

1 Perhaps 'supposed' is a little unfair in many cases? But, often official documents are drafted by civil servants and published as authored by faceless departments – so we may never know who the experts were; what they advised; and whether it was acted on. * So, the current English National Curriculum for science includes some 'howlers' – an incorrect statement of the principle of conservation of energy; labelling of some mixtures as being 'substances' – for which no individual has to take responsibility (perhaps explaining why the Department for Education is happy to let them stand until a major revision is due).

Read about scientific errors in the English National Curriculum

* An exception to this general pattern occurred with the 'Key Stage 3 Strategy' materials which actually included some materials which were acknowledged as authored by most respected science educators (genuine experts!) in Robin Millar and John Gilbert.


Fear and loathing in Sodom

2 According to the Biblical account, the Angels led Lot and his daughters away to safely before God destroyed the cities – with fire and sulphur. (Lot's wife famously looked back, having not had the benefit of learning from the Orpheus myth, and was lost.)

Lot had offered hospitality to the angels in his house, but the mob arrived and demanded the angels be handed over so the mob could 'know' them. Lot refused, but offered his two virgin daughters instead for the crowd to do with as they wished. (The substitution was rejected.) I imagine Søren Kierkegaard (1843) could have made much of this story, as it has echoes of Abraham's (no matter how reluctant) willingness to sacrifice his much-loved son Isaac to God; although one might argue that Lot's dilemma was more nuanced as he was dealing with a kind of 'trolley-problem', risking his daughters to try to protect guests he had offered the safety of his house, rather than simply blindingly obeying an order.


Sacrifice of Isaac (c. 1603) by Caravaggio (public domain, accessed from Wikimedia Commons), an episode open to multiple interpretations (Kierkegaard, 1843)


"It wasn't only me who blew their brains
I certainly admit to putting chains
Around their necks so they couldn't move
But there were others being quite crude
That was quite a gang waiting for the bang
I only take the blame for lighting the fuse

Now you say I'm responsible for killing them
I say it was God, He was willing them"

From the Lyrics of the song 'It Wasn't Me' (written by Steve Harley), from the Steve Harley & Cockney Rebel album 'The Best Years of Our Lives'.


Creeping bronzes

Evidence of journalistic creep in 'surprising' Benin bronzes claim


Keith S. Taber


How certain can we be about the origin of metals used in historic artefacts? (Image by Monika from Pixabay)


Science offers reliable knowledge of the natural world – but not absolutely certain knowledge. Conclusions from scientific studies follow from the results, but no research can offer absolutely certain conclusions as there are always provisos.

Read about critical reading of research

Scientists tend to know this, something emphasised for example by Albert Einstein (1940), who described scientific theories (used to interpret research results) as "hypothetical, never completely final, always subject to question and doubt".

When scientists talk to one another within some research programme they may used a shared linguistic code where they can omit the various conditionals ('likely', 'it seems', 'according to our best estimates', 'assuming the underlying theory', 'within experimental error', and the rest) as these are understood, and so may be left unspoken, thus increasing economy of language.

When scientists explain their work to a wider public such conditionals may also be left out to keep the account simple, but really should be mentioned. A particular trope that annoyed me when I was younger was the high frequency of links in science documentaries that told me "this could only mean…" (Taber, 2007) when honest science is always framed more along the lines "this would seem to mean…", "this could possibly mean…", "this suggested the possibility"…

Read about scientific certainty in the media

Journalistic creep

By journalistic creep I mean the tendency for some journalists who act as intermediates between research scientists and the public to keep the story simple by omitting important provisos. Science teachers will appreciate this, as they often have to decide which details can be included in a presentation without loosing or confusing the audience. A useful mantra may be:

Simplification may be necessary – but oversimplification can be misleading

A slightly different type of journalist creep occurs within stories themselves, Sometimes the banner headline and the introduction to a piece report definitive, certain scientific results – but reading on (for those that do!) reveals nuances not acknowledged at the start. Teachers will again appreciate this tactic: offer the overview with the main point, before going back to fill in the more subtle aspects. But then, teachers have (somewhat) more control over whether the audience engages with the full account.

I am not intending to criticise journalists in general here, as scientists themselves have a tendency to do something similar when it comes to finding titles for papers that will attract attention by perhaps suggesting something more certain (or, sometimes, poetic or even controversial) than can be supported by the full report.


An example of a Benin Bronze (a brass artefact from what is now Nigeria) in the British [sic] Museum

(British Museum, CC BY-SA 3.0 https://creativecommons.org/licenses/by-sa/3.0, via Wikimedia Commons)


Where did the Benin bronzes metal come from?

The title of a recent article in the RSC's magazine for teachers, Education in Chemistry, proclaimed a "Surprise origin for Benin bronzes".1 The article started with the claim:

"Geochemists have confirmed that most of the Benin bronzes – sculptured heads, plaques and figurines made by the Edo people in West Africa between the 16th and 19th centuries – are made from brass that originated thousands of miles away in the German Rhineland."

So, this was something that scientists had apparently confirmed as being the case.

Reading on, one finds that

  • it has been "long suspected that metal used for the artworks was melted-down manillas that the Portuguese brought to West Africa"
  • scientists "analysed 67 manillas known to have been used in early Portuguese trade. The manillas were recovered from five shipwrecks in the Atlantic and three land sites in Europe and Africa"
  • they "found strong similarities between the manillas studied and the metal used in more than 700 Benin bronzes with previously published chemical compositions"
  • and "the chemical composition of the copper in the manillas matched copper ores mined in northern Europe"
  • and "suggests that modern-day Germany, specifically the German Rhineland, was the main source of the metal".

So, there is a chain of argument here which seems quite persuasive, but to move from this to it being "confirmed that most of the Benin bronzes…are made from brass that originated …in the German Rhineland" seems an example of journalistic creep.

The reference to "the chemical composition of the copper [sic] in the manillas" is unclear, as according to the original research paper the sample of manilla analysed were:

"chemically different from each other. Although most manillas analysed here …are brasses or leaded brasses, sometimes with small amounts of tin, a few specimens are leaded copper with little or no zinc."

Skowronek, et al., 2023

The key data presented in the paper concerned the ratios of different lead isotopes (205Pb:204Pb; 206Pb:204Pb; 207Pb:204Pb; 208Pb:204Pb {see the reproduced figure below}) in

  • ore from different European locations (according to published sources)
  • sampled Benin bronze (as reported from earlier research), and
  • sampled recovered manillas

and the ratios of different elements (Ni:AS; Sb:As; Bi:As) in previously sampled Benin bronzes and sampled manillas.

The tendency to consider a chain of argument where each link seems reasonably persuasive as supporting fairly certain conclusions is logically flawed (it is like concluding from knowledge that one's chance of dying on any particular day is very low, that one must be immortal) but seems reflected in something I have noticed with some research students: that often their overall confidence in the conclusions of a research paper they have scrutinised is higher than their confidence in some of the distinct component parts of that study.


An example of a student's evaluation of a research study


This is like being told by a mechanic that your cycle brakes have a 20% of failing in the next year; the tyres 30%; the chain 20%; and the frame 10%; and concluding from this that there is only about a 20% chance of having any kind of failure in that time!

A definite identification?

The peer reviewed research paper which reports the study discussed in the Education in Chemistry article informs readers that

"In the current study, documentary sources and geochemical analyses are used to demonstrate that the source of the early Portuguese "tacoais" manillas and, ultimately, the Benin Bronzes was the German Rhineland."

"…this study definitively identifies the Rhineland as the principal source of manillas at the opening of the Portuguese trade…"

Skowronek, et al.,2023

which sounds pretty definitive, but interestingly the study did not rely on chemical analysis alone, but also 'documentary' evidence. In effect, historical evidence provided another link in the argument, by suggesting the range of possible sources of the alloy that should be considered in any chemical comparisons. This assumes there were no mining and smelting operations providing metal for the trade with Africa which have not been well-documented by historians. That seems a reasonable assumption, but adds another proviso to the conclusions.

The researchers reported that

Pre-18th century manillas share strong isotopic similarities with Benin's famous artworks. Trace elements such as antimony, arsenic, nickel and bismuth are not as similar as the lead isotope data…. The greater data derivation suggests that manillas were added to older brass or bronze scrap pieces to produce the Benin works, an idea proposed earlier.

and acknowledges that

Millions of these artifacts were sent to West Africa where they likely provided the major, virtually the only, source of brass for West African casters between the 15th and the 18th centuries, including serving as the principal metal source of the Benin Bronzes. However, the difference in trace elemental patterns between manillas and Benin Bronzes does not allow postulating that they have been the only source.

The figure below is taken from the research report.


Part of Figure 2 from the open access paper (© 2023 Skowronek et al. – distributed under the terms of the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are credited.)

The chart shows results from sampled examples of Benin bronzes (blue circles); compared with the values of the same isotope ratios from different copper ore site (squares) and manillas sampled from different archaeological sties (triangles).


The researchers feel that the pattern of clustering of results (in this, and other similar comparisons between lead isotope ratios) from the Benin bronzes, compared with those from the sampled manillas, and the ore sites, allows them to identify the source of metal re-purposed by the Edo craftspeople to make the bronzes.

It is certainly the case that the blue circles (which refer to the artworks) and the green squares (which refer to copper ore samples from Rhineland) do seem to generally cluster in a similar region of the graph – and that some of the samples taken from the manillas also seem to fit this pattern.

I can see why this might strongly suggest the Rhineland (certainly more so than Wales) as the source of the copper believed to be used in manillas which were traded in Africa and are thought to have been later melted down as part of the composition of alloy used to make the Benin bronzes.

Whether that makes for either

  • definitive identification of the Rhineland as the principal source of manillas (Skowronek paper), or
  • confirmation that most of the Benin bronze are made from brass that originated thousands of miles away in the German Rhineland (EiC)

seems somewhat less certain. Just as scientific claims should be.


A conclusion for science education

It is both human nature, and often good journalistic or pedagogic practice to begin with a clear, uncomplicated statement of what is to be communicated. But we also know that what is heard or read first may be better retained in memory than what follows. It also seems that people in general tend to apply the wrong kind of calculus when there are multiple source of doubt – being more likely to estimate overall doubt as being the mean or modal level of the several discrete sources of doubt, rather than something that accumulates step-on-step.

It seems there is a major issue here for science education in training young people in critically questioning claims, looking for the relevant provisos, and understanding how to integrate levels of doubt (or, similarly, risk) that are distributed over a sequence of phases in a process.


All research conclusions (in any empirical study in any discipline) rely on a network of assumptions and interpretations, any one of which could be a weak link in the chain of logic. This is my take on some of the most critical links and assumptions in the Benin bronzes study. One could easily further complicate this scheme (for example, I have ignored the assumptions about the validity of the techniques and calibration of the instrumentation used to find the isotopic composition of metal samples).


Work cited:

Note:

1 It is not clear to me what the surprise was – but perhaps this is meant to suggest the claim may be surprising to readers of the article. The study discussed was premised on the assumption that the Benin Bronzes were made from metal largely re-purposed from manillas traded from Europe, which had originally been cast in one of the known areas in Europe with metal working traditions. The researchers included the Rhineland as one of the potential regional sites they were considering. So, it was surely a surprise only in a similar sense to rolling a die and it landing on 4, rather than say 2 or 5, would be a surprise.

But then, would you be just as likely to read an article entitled "Benin bronzes found to have anticipated origin"?


Educational experiments – making the best of an unsuitable tool?

Can small-scale experimental investigations of teaching carried-out in a couple of arbitrary classrooms really tells us anything about how to teach well?


Keith S. Taber


Undertaking valid educational experiments involves (often, insurmountable) challenges, but perhaps this grid (shown larger below) might be useful for researchers who do want to do genuinely informative experimental studies into teaching?


Applying experimental method to educational questions is a bit like trying to use a precision jeweller's screwdriver to open a tin of paint: you may get the tin open eventually, but you will probably have deformed the tool in the process whilst making something of a mess of the job.


In recent years I seem to have developed something of a religious fervour about educational research studies of the kind that claim to be experimental evaluations of pedagogies, classroom practices, teaching resources, and the like. I think this all started when, having previously largely undertaken interpretive studies (for example, interviewing learners to find out what they knew and understood about science topics) I became part of a team looking to develop, and experimentally evaluate, classroom pedagogy (i.e., the epiSTEMe project).

As a former school science teacher, I had taught learners about the basis of experimental method (e.g., control of variables) and I had read quite a number of educational research studies based on 'experiments', so I was pretty familiar with the challenges of doing experiments in education. But being part of a project which looked to actually carry out such a study made a real impact on me in this regard. Well, that should not be surprising: there is a difference between watching the European Cup Final on the TV, and actually playing in the match, just as reading a review of a concert in the music press is not going to impact you as much as being on stage performing.

Let me be quite clear: the experimental method is of supreme value in the natural sciences; and, even if not all natural science proceeds that way, it deserves to be an important focus of the science curriculum. Even in science, the experimental strategy has its limitations. 1 But experiment is without doubt a precious and powerful tool in physics and chemistry that has helped us learn a great deal about the natural world. (In biology, too, but even here there are additional complications due to the variations within populations of individuals of a single 'kind'.)

But transferring experimental method from the laboratory to the classroom to test hypotheses about teaching is far from straightforward. Most of the published experimental studies drawing conclusions about matters such as effective pedagogy, need to be read with substantive and sometimes extensive provisos and caveats; and many of them are simply invalid – they are bad experiments (Taber, 2019). 2

The experiment is a tool that has been designed, and refined, to help us answer questions when:

  • we are dealing with non-sentient entities that are indifferent to outcomes;
  • we are investigating samples or specimens of natural kinds;
  • we can identify all the relevant variables;
  • we can measure the variables of interest;
  • we can control all other variables which could have an effect;

These points simply do not usually apply to classrooms and other learning contexts. 3 (This is clearly so, even if educational researchers often either do not appreciate these differences, or simply pretend they can ignore them.)

Applying experimental method to educational questions is a bit like trying to use a precision jeweller's screwdriver to open a tin of paint: you may get the tin open eventually, but you will probably have deformed the tool in the process whilst making something of a mess of the job.

The reason why experiments are to be preferred to interpretive ('qualitative') studies is that supposedly experiments can lead to definite conclusions (by testing hypotheses), whereas studies that rely on the interpretation of data (such as classroom observations, interviews, analysis of classroom talk, etc.) are at best suggestive. This would be a fair point when an experimental study genuinely met the control-of-variables requirements for being a true experiment – although often, even then, to draw generalisable conclusions that apply to a wide population one has to be confident one is working with a random or representatives sample, and use inferential statistics which can only offer a probabilistic conclusion.

My creed…researchers should prefer to undertake competent work

My proselytising about this issue, is based on having come to think that:

  • most educational experiments do not fully control relevant variables, so are invalid;
  • educational experiments are usually subject to expectancy effects that can influence outcomes;
  • many (perhaps most) educational experiments have too few independent units of analysis to allow the valid use of inferential statistics;
  • most large-scale educational experiments can not assure that samples are fully representative of populations, so strictly cannot be generalised;
  • many experiments are rhetorical studies that deliberately compare a condition (supposedly being tested but actually) assumed to be effective with a teaching condition known to fall short of good teaching practice;
  • an invalid experiment tells us nothing that we can rely upon;
  • a detailed case study of a learning context which offers rich description of teaching and learning potentially offers useful insights;
  • given a choice between undertaking a competent study of a kind that can offer useful insights, and undertaking a bad experiment which cannot provide valid conclusions, researchers should prefer to undertake competent work;
  • what makes work scientific is not the choice of methodology per se, but the adoption of a design that fits the research constraints and offers a genuine opportunity for useful learning.

However, experiments seem very popular in education, and often seem to be the methodology of choice for researchers into pedagogy in science education.

Read: Why do natural scientists tend to make poor social scientists?

This fondness of experiments will no doubt continue, so here are some thoughts on how to best draw useful implications from them.

A guide to using experiments to inform education

It seems there are two very important dimensions that can be used to characterise experimental research into teaching – relating to the scale and focus of the research.


Two dimensions used to characterise experimental studies of teaching


Scale of studies

A large-scale study has a large number 'units of analysis'. So, for example, if the research was testing out the value of using, say, augmented reality in teaching about predator-prey relationships, then in such a study there would need to be a large number of teaching-learning 'units' in the augmented learning condition and a similarly large number of teaching-learning 'units' in the comparison condition. What a unit actually is would vary from study to study. Here a unit might be a sequence of three lessons where a teacher teaches the topic to a class of 15-16 year-old learners (either with, or without, the use of augmented reality).

For units of analysis to be analysed statistically they need to be independent from each other – so different students learning together from the same teacher in the same classroom at the same time are clearly not learning independently of each other. (This seems obvious – but in many published studies this inconvenient fact is ignored as it is 'unhelpful' if researchers wish to use inferential statistics but are only working with a small number of classes. 4)

Read about units of analysis in research

So, a study which compared teaching and learning in two intact classes can usually only be considered to have one unit of analysis in each condition (making statistical tests completely irrelevant 5, thought this does not stop them often being applied anyway). There are a great many small scale studies in the literature where there are only one or a few units in each condition.

Focus of study

The other dimension shown in the figure concerns the focus of a study. By the focus, I mean whether the researchers are interested in teaching and learning in some specific local context, or want to find out about some general population.

Read about what is meant by population in research

Studies may be carried out in a very specific context (e.g., one school; one university programme) or across a wide range of contexts. That seems to simply relate to the scale of the study, just discussed. But by focus I mean whether the research question of interest concerns just a particular teaching and learning context (which may be quite appropriate when practitioner-researchers explore their own professional contexts, for exmample), or is meant to help us learn about a more general situation.


local focusgeneral focus
Why does school X get such outstanding science examination scores?Is there a relationship between teaching pedagogy employed and science examination results in English schools?
Will jig-saw learning be a productive way to teach my A level class about the properties of the transition elements?Is jig-saw learning an effective pedagogy for use in A level chemistry classes?
Some hypothetical research questions relating either to a specific teaching context, or a wider population. (n.b. The research literature includes a great many studies that claim to explore general research questions by collecting data in a single specific context.)

If that seems a subtle distinction between two quite similar dimensions then it is worth noting that the research literature contains a great many studies that take place in one context (small-scale studies) but which claim (implicitly or explicitly) to be of general relevance. So, many authors, peer reviewers, and editors clearly seem think one can generalise from such small scale studies.

Generalisation

Generalisation is the ability to draw general conclusions from specific instances. Natural science does this all the time. If this sample of table salt has the formula NaCl, then all samples of table salt do; if the resistance of this copper wire goes up when the wire is heated the same will be found with other specimens as well. This usually works well when dealing with things we think are 'natural kinds' – that is where all the examples (all samples of NaCl, all pure copper wires) have the same essence.

Read about generalisation in research

Education deals with teachers, classes, lessons, schools…social kinds that lack that kind of equivalence across examples. You can swap any two electrons in a structure and it will make absolutely no difference. Does any one think you can swap the teachers between two classes and safely assume it will not have an effect?

So, by focus I mean whether the point of the research is to find out about the research context in its own right (context-directed research) or to learn something that applies to a general category of phenomena (theory-directed research).

These two dimensions, then, lead to a model with four quadrants.

Large-scale research to learn about the general case

In the top-right quadrant is research which focuses on the general situation and is larger-scale. In principle 6 this type of research can address a question such as 'is this pedagogy (teaching resource, etc.) generally effective in this population', as long as

  • the samples are representative of the wider population of interest, and
  • those sampled are randomly assigned to conditions, and
  • the number of units supports statistical analysis.

The slight of hand employed in many studies is to select a convenience sample (two classes of thirteen years old students at my local school) yet to claim the research is about, and so offers conclusions about, a wider population (thirteen year learners).

Read about some examples of samples used to investigate populations


When an experiment tests a sample drawn at random from a wider population, then the findings of the experiment can be assumed to (probably) apply (on average) to the population. (Taber, 2019)

Even when a population is properly sampled, it is important not to assume that something which has been found to be generally effective in a population will be effective throughout the population. Schools, classes, courses, learners, topics, etc. vary. If it has been found that, say, teaching the reactivity series through enquiry generally works in the population of English classes of 13-14 year students, then a teacher of an English class of 13-14 year students might sensibly think this is an approach to adopt, but cannot assume it will be effective in her classroom, with a particular group of students.

To implement something that has been shown to generally work might be considered research-based teaching, as long as the approach is dropped or modified if indications are it is not proving effective in this particular context. That is, there is nothing (please note, UK Department for Education, and Ofsted) 'research-based' about continuing with a recommended approach in the face of direct empirical evidence that it is not working in your classroom.

Large-scale research to learn about the range of effectiveness

However, even large-scale studies where there are genuinely sufficient units of analysis for statistical analysis may not logically support the kinds of generalisation in the top-right quadrant. For that, researchers needs either a random sampling of the full population (seldom viable given people and institutions must have a choice to participate or not 7), or a sample which is known to be representative of the population in terms of the relevant characteristics – which means knowing a lot about

  • (i) the population,
  • (ii) the sample, and
  • (ii) which variables might be relevant!

Imagine you wanted to undertake a survey of physics teachers in some national context, and you knew you could not reach all that population so you needed to survey a sample. How could you possibly know that the teachers in your sample were representative of the wider population on whatever variables might potentially be pertinent to the survey (level of qualification?; years of experience?; degree subject?; type of school/college taught in?; gender?…)

But perhaps a large scale study that attracts a diverse enough sample may still be very useful if it collects sufficient data about the individual units of analysis, and so can begin to look at patterns in how specific local conditions relate to teaching effectiveness. That is, even if the sample cannot be considered representative enough for statistical generalisation to the population, such a study might be a be to offer some insights into whether an approach seems to work well in mixed-ability classes, or top sets, or girls' schools, or in areas of high social deprivation, or…

In practice, there are very few experimental research studies which are large-scale, in the sense of having enough different teachers/classes as units of analysis to sit in either of these quadrants of the chart. Educational research is rarely funded at a level that makes this possible. Most researchers are constrained by the available resources to only work with a small number of accessible classes or schools.

So, what use are such studies for producing generalisable results?

Small-scale research to incrementally extend the range of effectiveness

A single small-scale study can contribute to a research programme to explore the range of application of an innovation as if it was part of a large-scale study with a diverse sample. But this means such studies need to be explicitly conceptualised and planned as part of such a programme.

At the moment it is common for research papers to say something like

"…lots of research studies, from all over the place, report that asking students to

(i) first copy science texts omitting all the vowels, and then

(ii) re-constituting them in full by working from the reduced text, by writing it out adding vowels that produce viable words and sentences,

is an effective way of supporting the learning of science concepts; but no one has yet reported testing this pedagogic method when twelve year old students are studying the topic of acids in South Cambridgeshire in a teaching laboratory with moveable stools and West-facing windows.

In this ground-breaking study, we report an experiment to see if this constructivist, active-learning, teaching approach leads to greater science learning among twelve year old students studying the topic of acids in South Cambridgeshire in a teaching laboratory with moveable stools and West-facing windows…"

Over time, the research literature becomes populated with studies of enquiry-based science education, jig-saw learning, use of virtual reality, etc., etc., and these tend to refer to a range of national contexts, variously aged students, diverse science topics, etc., this all tends to be piecemeal. A coordinated programme of research could lead to researchers both (a) giving rich description of the context used, and (b) selecting contexts strategically to build up a picture across ranges of contexts,

"When there is a series of studies testing the same innovation, it is most useful if collectively they sample in a way that offers maximum information about the potential range of effectiveness of the innovation.There are clearly many factors that may be relevant. It may be useful for replication studies of effective innovations to take place with groups of different socio-economic status, or in different countries with different curriculum contexts, or indeed in countries with different cultural norms (and perhaps very different class sizes; different access to laboratory facilities) and languages of instruction …. It may be useful to test the range of effectiveness of some innovations in terms of the ages of students, or across a range of quite different science topics. Such decisions should be based on theoretical considerations.

Given the large number of potentially relevant variables, there will be a great many combinations of possible sets of replication conditions. A large number of replications giving similar results within a small region of this 'phase space' means each new study adds little to the field. If all existing studies report positive outcomes, then it is most useful to select new samples that are as different as possible from those already tested. …

When existing studies suggest the innovation is effective in some contexts but not others, then the characteristics of samples/context of published studies can be used to guide the selection of new samples/contexts (perhaps those judged as offering intermediate cases) that can help illuminate the boundaries of the range of effectiveness of the innovation."

Taber, 2019

Not that the research programme would be co-ordinated by a central agency or authority, but by each contributing researcher/research team (i) taking into account the 'state of play' at the start of their research; (ii) making strategic decisions accordingly when selecting contexts for their own work; (iii) reporting the context in enough detail to allow later researchers to see how that study fits into the ongoing programme.

This has to be a more scientific approach than simply picking a convenient context where researchers expect something to work well; undertake a small-scale local experiment (perhaps setting up a substandard control condition to be sure of a positive outcome); and then report along the lines "this widely demonstrated effective pedagogy works here too", or, if it does not, perhaps putting the study aside without publication. As the philosopher of science, Karl Popper, reminded us, science proceeds through the testing of bold conjectures: an 'experiment' where you already know the outcome is actually a demonstration. Demonstrations are useful in teaching, but do not contribute to research. What can contribute is an experiment in a context where there is reason to be unsure if an innovation will be an improvement or not, and where the comparison reflects good teaching practice to offer a meaningful test.

Small-scale research to inform local practice

Now, I would be the first to admit that I am not optimistic that such an approach will be developed by researchers; and even if it is, it will take time for useful patterns to arise that offer genuine insights into the range of convenience of different pedagogies.

Does this mean that small-scale studies in single context are really a waste of research resource and an unmerited inconvenient for those working in such contexts?

Well, I have time for studies in my final (bottom left) quadrant. Given that schools and classrooms and teachers and classes all vary considerably, and that what works well in a highly selective boys-only fee-paying school with a class size of 16 may not be as effective in a co-educational class of 32 mixed ability students in an under-resourced school in an area of social deprivation – and vice versa, of course!, there is often value in testing out ideas (even recommended 'research-based' ones) in specific contexts to inform practice in that context. These are likely to be genuine experiments, as the investigators are really motived to find out what can improve practice in that context.

Often such experiments will not get published,

  • perhaps because the researchers are teachers with higher priorities than writing for publication;
  • perhaps because it is assumed such local studies are not generalisable (but they could sometimes be moved into the previous category if suitably conceptualised and reported);
  • perhaps because the investigators have not sought permissions for publication (part of the ethics of research), usually not necessary for teachers seeking innovations to improve practice as part of their professional work;
  • perhaps because it has been decided inappropriate to set up control conditions which are not expected to be of benefit to those being asked to participate;
  • but also because when trying out something new in a classroom, one needs to be open to make ad hoc modifications to, or even abandon, an innovation if it seems to be having a deleterious effect.

Evaluation of effectiveness here usually comes down to professional judgement (rather than statistical testing – which assumes a large random sample of a population – being used to invalidly generalise small, non-random, local results to that population) which might, in part, rely on the researcher's close (and partially tacit) familiarity with the research context.

I am here describing 'action research', which is highly useful for informing local practice, but which is not ideally suited for formal reporting in academic journals.

Read about action research

So, I suspect there may be an irony here.

There may be a great many small-scale experiments undertaken in schools and colleges which inform good teaching practice in their contexts, without ever being widely reported; whilst there are a great many similar scale, often 'forced' experiments, carried out by visiting researchers with little personal stake in the research context, reporting the general effectiveness of teaching approaches, based on misuse of statistics. I wonder which approach best reflects the true spirit of science?

Source cited:


Notes:

1 For example:

Even in the natural sciences, we can never be absolutely sure that we have controlled all relevant variables (after all, if we already knew for sure which variables were relevant, we would not need to do the research). But usually existing theory gives us a pretty good idea what we need to control.

Experiments are never a simple test of the specified hypothesis, as the experiment is likely to depends upon the theory of instrumentation and the quality of instruments. Consider an extreme case such as the discovery of the Higgs boson at CERN: the conclusions relied on complex theory that informed the design of the apparatus, and very challenging precision engineering, as well as complex mathematical models for interpreting data, and corresponding computer software specifically programmed to carry out that analysis.

The experimental results are a test of a hypothesis (e.g., that a certain particle would be found at events below some calculated energy level) subject to the provisos that

  • the theory of the the instrument and its design is correct; and
  • the materials of the apparatus (an apparatus as complex and extensive as a small city) have no serious flaws; and
  • the construction of the instrumentation precisely matches the specifications;
  • and the modelling of how the detectors will function (including their decay in performance over time) is accurate; and
  • the analytical techniques designed to interpret the signals are valid;
  • the programming of the computers carries out the analysis as intended.

It almost requires an act of faith to have confidence in all this (and I am confident there is no one scientist anywhere in the world who has a good enough understanding and familiarity will all these aspects of the experiment to be able to give assurances on all these areas!)


CREST {Critical Reading of Empirical Studies} evaluation form: when you read a research study, do you consider the cumulative effects of doubts you may have about different aspects of the work?

I would hope at least that as professional scientists and engineers they might be a little more aware of this complex chain of argumentation needed to support robust conclusions than many students – for students often seem to be overconfident in the overall value of research conclusions given any doubts they may have about aspects of the work reported.

Read about the Critical Reading of Empirical Studies Tool


Galileo Galilei was one of the first people to apply the telescope to study the night sky

Galileo Galilei was one of the first people to apply the telescope to study the night sky (image by Dorothe from Pixabay)


A historical example is Galileo's observations of astronomical phenomena such as Jovian moons (he spotted the four largest: Io, Europa, Ganymede and Callisto) and the irregular surface of the moon. Some of his contemporaries rejected these findings on the basis that they were made using an apparatus, the newly fanged telescope, that they did not trust. Whilst this is now widely seen as being arrogant and/or ignorant, arguably if you did not understand how a telescope could magnify, and you did not trust the quality of the lenses not to produce distortions, then it was quite reasonable to be sceptical of findings which were counter to a theory of the 'heavens' that had been generally accepted for many centuries.


2 I have discussed a number of examples on this site. For example:

Falsifying research conclusions: You do not need to falsify your results if you are happy to draw conclusions contrary to the outcome of your data analysis.

Why ask teachers to 'transmit' knowledge…if you believe that "knowledge is constructed in the minds of students"?

Shock result: more study time leads to higher test scores (But 'all other things' are seldom equal)

Experimental pot calls the research kettle black: Do not enquire as I do, enquire as I tell you

Lack of control in educational research: Getting that sinking feeling on reading published studies


3 For a detailed discussion of these and other challenges of doing educational experiments, see Taber, 2019.


4 Consider these two situations.

A researcher wants to find out if a new textbook 'Science for the modern age' leads to more learning among the Grade 10 students she teaches than the traditional book 'Principles of the natural world'. Imagine there are fifty grade 10 students divided already into two classes. The teacher flips a coin and randomly assigns one of the classes to the innovative book, the other being assigned by default the traditional book. We will assume she has a suitable test to assess each students' learning at the end of the experiment.

The teacher teaches the two classes the same curriculum by the same scheme of work. She presents a mini-lecture to a class, then sets them some questions to discuss using the text book. At the end of the (three part!) lesson, she leads a class disucsison drawing on students' suggested answers.

Being a science teacher, who believes in replication, she decides to repeat the exercise the following year. Unfortunately there is a pandemic, and all the students are sent into lock-down at home. So, the teacher assigns the fifty students by lot into two groups, and emails one group the traditional book, and the other the innovative text. She teaches all the students on line as one cohort: each lesson giving them a mini-lecture, then setting them some reading from their (assigned) book, and a set of questions to work through using the text, asking them to upload their individual answers for her to see.

With regard to experimental method, in the first cohort she has only two independent units of analysis – so she may note that the average outcome scores are higher in one group, but cannot read too much into that. However, in the second year, the fifty students can be considered to be learning independently, and as they have been randomly assigned to conditions, she can treat the assessment scores as being from 25 units of analysis in each condition (and so may sensibly apply statistics to see if there is a statistically significant different in outcomes).


5 Inferential statistical tests are usually used to see if the difference in outcomes across conditions is 'significant'. Perhaps the average score in a class with an innovation is 5.6, compared with an average score in the control class of 5.1. The average score is higher in the experimental condition, but is the difference enough to matter?

Well, actually, if the question is whether the difference is big enough to likely to make a difference in practice then researchers should calculate the 'effect size' which will suggest whether the difference found should be considered small, moderate or large. This should ideally be calculated regardless of whether inferential statistics are being used or not.

Inferential statistical tests are often used to see if the result is generalisable to the wider population – but, as suggested above, this is strictly only valid if the population of interest have been randomly sampled – which virtually never happens in educational studies as it is usually not feasible.

Often researchers will still do the calculation, based on the sets of outcome scores in the two conditions, to see if they can claim a statistically significant difference – but the test will only suggest how likely or unlikely the difference between the outcomes is, if the units of analysis have been randomly assigned to the conditions. So, if there are 50 learners each randomly assigned to experimental or control condition this makes sense. That is sometimes the case, but nearly always the researchers work with existing classes and do not have the option of randomly mixing the students up. [See the example in the previous note 4.] In such a situation, the stats. are not informative. (That does not stop them often being reported in published accounts as if they are useful.)


6 That is, if it possible to address such complications as participant expectations, and equitable teacher-familiarity with the different conditions they are assigned to (Taber, 2019).

Read about expectancy effects


7 A usual ethical expectation is that participants voluntarily (without duress) offer informed consent to participate.

Read about voluntary informed consent


The best science education journal

Where is the best place to publish science education research?


Keith S. Taber



OutletDescriptionNotes
International Journal of Science EducationTop-tier general international science education journalHistorically associated with the European Science Education Research Association
Science EducationTop-tier general international science education journal
Journal of Research in Science TeachingTop-tier general international science education journalAssociated with NARST
Research in Science EducationTop-tier general international science education journalAssociated with the Australasian Science Education Research Association
Studies in Science EducationLeading journal for publishing in-depth reviews of topics in science education
Research in Science and Technological Education Respected general international science education journal
International Journal of Science and Maths EducationRespected general international science education journalFounded by the National Science and Technology Council, Taiwan
Science Education InternationalPublishes papers that focus on the teaching and learning of science in school settings ranging from early childhood to university educationPublished by the International Council of Associations for Science Education
Science & EducationHas foci of historical, philosophical, and sociological perspectives on science educationAssociated with the International History, Philosophy, and Science Teaching Group
Journal of Science Teacher EducationConcerned with the preparation and development of science teachersAssociated with the Association for Science Teacher Education
International Journal of Science Education, Part B – Communication and Public EngagementConcerned with research into science communication and public engagement / understanding of science
Cultural Studies of Science EducationConcerned with science education as a cultural, cross-age, cross-class, and cross-disciplinary phenomenon
Journal of Science Education and TechnologyConcerns the intersection between science education and technology.
Disciplinary and Interdisciplinary Science Education ResearchConcerned with science education within specific disciplines and between disciplines.Affiliated with the Faculty of Education, Beijing Normal University
Journal of Biological Education For research specifically within biology educationPublished for the Royal Society of Biology.
Journal of Chemical EducationA long-standing journal of chemistry education, which includes a section for Chemistry Education Research papersPublished by the American Chemical Society.
Chemistry Education Research and Practice The leading research journal for chemistry educationPublished by the Royal Society of Chemistry
Some of the places to publish research in science education

I was recently asked which was the best journal in which to seek publication of science education research. This was a fair question, given that I had been been warning of the large number of low quality journals now diluting the academic literature.

I had been invited to give a seminar talk to the Physics Education and Scholarship Section in the Department of Physics at Durham University. I had been asked to talk on the theme of 'Publishing research in science education'.

The talk considered the usual processes involved in submitting a paper to a research journal and the particular responsibilities involved for authors, editors and reviewers. In the short time available I said a little about ethical issues, including difficulties that can arise when scholars are not fully aware of, or decide to ignore, the proper understanding of academic authorship 1 . I also discussed some of the specific issues that can arise when those with research training in the natural sciences undertake educational research without any further preparation (for example, see: Why do natural scientists tend to make poor social scientists?), such as underestimating the challenge of undertaking valid experiments in educational contexts.

I had not intended to offer advice on specific journals for the very good reasons that

  • there are a lot of journals
  • my experience of them is very uneven
  • I have biases!
  • knowledge of journals can quickly become out of date when publishers change policies, or editorial teams change

However, it was pointed out that there does not seem to be anywhere where such advice is readily available, so I made some comments based on my own experience. I later reflected that some such guidance could be useful, especially to those new to research in the area.

I do, in the 'Research methodology' section of the site, offer some advice to the new researcher on 'Publishing research', that includes some general advice on things to consider when thinking about where to send your work:

Read about 'Selecting a research journal: Selecting an outlet for your research articles'

Although I name check some journals there, I did not think I should offer strong guidance for the reasons I give above. However, taking on board the comment about the lack of guidance readily available, I thought I would make some suggestions here, with the full acknowledgement that this is a personal perspective, and that the comments facility below will allow other views and potential correctives to my biases! If I have missed an important journal, or seem to have made a misjudgement, then please tell me and (more importantly) other readers who may be looking for guidance.

Publishing in English?

My focus here is on English language journals. There are many important journals that publish in other languages such as Spanish. However, English is often seen as the international language for reporting academic research, and most of the journals with the greatest international reach work in the English language.

These journals publish work from all around the world, which therefore includes research into contexts where the language of instruction is NOT English, and where data is collected, and often analysed, in the local language. In these cases, reporting research in English requires translating material (curriculum materials, questions posed to participants, quotations from learners etc.) into English. That is perfectly acceptable, but translation is a skilled and nuanced activity, and needs to be acknowledged and reported, and some assurance of the quality of translation offered (Taber, 2018).

Read about guidelines for good practice regarding translation in reporting research

Science research journal or science education journal?

Sometime science research journals will publish work on science education. However, not all science journals will consider this, and even for those that do, this tends to be an occasional event.

With the advent of open-access, internet accessible publishing, some academic publishers are offering journals with very wide scope (presumably as it is considered that in the digital age it is easier to find research without it needing to be in a specialist journal), however, authors should be wary of journals that have titles implying a specialist scientific focus but which seem to accept material from a wide range of fields, as this is one common indicator of predatory journals – that is, journals which do not use robust peer review (despite what they may claim) and have low quality standards.

Read about predatory journals

There are some scientific journals with an interdisciplinary flavour which are not education journals per se, but are open to suitable submissions on educational topics. I am most familiar (disclosure of interest, being on the Editorial Board) is Foundations of Chemistry (published by Springer).



Science Education Journal or Education Journal?

Then, there is the question of whether to publish work in specialist science education journals or one of the many more general education journals. (There are too many to discuss them here.) General education journals will sometimes publish work from within science education, as long as they feel it is of high enough general interest to their readership. This may in part be a matter of presentation – if the paper is written so it is only understandable to subject specialists, and only makes recommendations for specialists in science education, it is unlikely to seem suitable for a more general journal.

On the other hand, just because research has been undertaken in science teaching and learning context, this may not make it of particular interest to science educators if the research aims, conceptualisation, conclusions and recommendations concern general educational issues, and anything that may be specific to science teaching and learning is ignored in the research – that is, if a science classroom was chosen just as a matter of convenience, but the work could have been just as well undertaken in a different curriculum context (Taber, 2013).

Research Journal or Professional Journal?

Another general question is whether it is best to send one's work to an academic research journal (offering more kudos for the author{s} if published) or a journal widely read by practitioners (but usually considered less prestigious when a scholar's academic record is examined for appointment and promotion). These different types of output usually have different expectations about the tone and balance of articles:

Read about Research journals and practitioner journals

Some work is highly theoretical, or is focussed on moving forward a research field – and is unlikely to be seen as suitable for a teacher's journal. Other useful work may have developed and evaluated new educational resources, but without critically exploring any educational questions in any depth. Information about this project would likely be of great interest to teachers, but is unlikely to meet the criteria to be accepted for publication in a research journal.

But what about a genuine piece of research that would be of interest to other researchers in the field, but also leads to strong recommendations for policy and practice? Here you do not have to choose one or other option. Although you cannot publish the same article in different journals, a research report sent to an academic journal and an article for teachers would be sufficiently different, with different emphases and weightings. For example, a professional journal does not usually want a critical literature review and discussion of details of data analysis, or long lists of references. But it may value vignettes that teachers can directly relate to, as well as exemplification of how recommendation might be followed through – information that would not fit in the research report.

Ideally, the research report would be completed and published first, and the article for the professional audience would refer to (and cite) this, so that anyone who does want to know more about the theoretical background and technical details can follow up.

Some examples of periodicals aimed at teachers (and welcoming work written by classroom teachers) include the School Science Review, (published by the Association for Science Education), Physics Education (published by the Institute of Physics) and the Royal Society of Chemistry's magazine Education in Chemistry. Globally, there are many publications of this kind, often with a national focus serving teachers working in a particular curriculum context by offering articles directly relevant to the specifics of the local education contexts.

The top science education research journals

Having established our work does fit in science education as a field, and would be considered academic research, we might consider sending it to one of these journals

  • International Journal of Science Education (IJSE)
  • Science Education (SE)
  • Journal of Research in Science Teaching (JRST)
  • Research in Science Education (RiSE)


To my mind these are the top general research journals in the field.

IJSE is the journal I have most worked with, having published quite a few papers in the journal, and have reviewed a great many. I have been on the Editorial Board for about 20 years, so I may be biased here.2 IJSE started as the European Journal of Science Education and has long had an association with the European Science Education Research Association (ESERA – not to be confused with ASERA).

Strictly this journal is now known as IJSE Part A, as there is also a Part B which has a particular focus on 'Communication and Public Engagement' (see below). IJSE is published by Taylor and Francis / Routledge.

SE is published by Wiley.

JRST is also published by Wiley, and is associated with NARST.

RISE is published by Springer, and is associated with the Australasian Science Education Research Association (ASERA – not to be confused with ESERA)

N.A.R.S.T. originally stood for the National Association for Research in Science Teaching, where the Nation referred to was the USA. However, having re-branded itself as "a global organization for improving science teaching and learning through research" it is now simply known as NARST. In a similar way ESERA describes itself as "an European organisation focusing on research in science education with worldwide membership" and ASERA clams it "draws together researchers in science education from Australia, New Zealand and more broadly".


The top science education reviews journal

Another 'global' journal I hold in high esteem in Studies in Science Education (published by Taylor & Francis / Routledge) 3 .

This journal, originally established at the University of Leeds and associated with the world famous Centre for Studies in Science Education 4, is the main reviews journal in science education. It publishes substantive, critical reviews of areas of science education, and some of the most influential articles in the field have been published here.

Studies in Science Education also has a tradition of publishing detailed scholarly book reviews.


In my view, getting your work published in any of these five journals is something to be proud of. I think people in many parts of the world tend to know IJSE best, but I believe that in the USA it is often considered to be less prestigious than JRST and SE. At one time RISE seemed to have a somewhat parochial focus, and (my impression is) attracted less work from outside Australasia and its region – but that has changed now. 'Studies' seems to be better known in some contexts than other, but it is the only high status general science education journal that publishes full-length reviews (both systematic, and thematic perspectives), with many of its contributions exceeding the normal word-length limits of other top science education journals. This is the place to send an article based on that literature review chapter that thesis examiners praised for its originality and insight!



There are other well-established general journals of merit, for example Research in Science and Technological Education (published by Taylor & Francis / Routledge, and originally based at the University of Hull) and the International Journal of Science and Maths Education (published by Springer, and founded by the National Science and Technology Council, Taiwan). The International Council of Associations for Science Education publishes Science Education International.

There are also journals with particular foci with the field of science education.

More specialist titles

There are also a number of well-regarded international research journals in science education which particular specialisms or flavours.


Science & Education (published by Springer) is associated with the International History, Philosophy, and Science Teaching Group 5, which as the name might suggest has a focus on science eduction with a focus on the nature of science, and "publishes research using historical, philosophical, and sociological approaches in order to improve teaching, learning, and curricula in science and mathematics".


The Journal of Science Teacher Education (published by Taylor & Francis / Routledge), as the name suggests is concerned with the preparation and development of science teachers. The journal is associated with the USA based Association for Science Teacher Education.


As suggested above, IJSE has a companion journal (also published by Taylor & Francis / Routledge), International Journal of Science Education, Part B – Communication and Public Engagement


Cultural Studies of Science Education (published by Springer) has a particular focus on  science education "as a cultural, cross-age, cross-class, and cross-disciplinary phenomenon".


The Journal of Science Education and Technology (published by Springer) has a focus on the intersection between science education and technology.


Disciplinary and Interdisciplinary Science Education Research has a particular focus on science taught within and across disciplines. 6 Whereas most of the journals described here are now hybrid (which means articles will usually be behind a subscription/pay-wall, unless the author pays a publication fee), DISER is an open-access journal, with publication costs paid on behalf of authors by the sponsoring organisation: the Faculty of Education, Beijing Normal University.

This relatively new journal reflects the increasing awareness of the importance of cross-disciplinary, interdisciplinary and transdisciplinary research in science itself. This is also reflected in notions of whether (or to what extent) science education should be considered part of a broader STEM education, and there are now journals styled as STEM education journals.


Science as part of STEM?

Read about STEM in the curriculum


Research within teaching and learning disciplines

Whilst both the Institute of Physics and the American Institute of Physics publish physics education journals (Physics Education and The Physics Teacher, respectively) neither publishes full length research reports of the kind included in research journals. The American Physical Society does publish Physical Review Physics Education Research as part of its set of Physical Review Journals. This is an on-line journal that is Open Access, so authors have to pay a publication fee.


The Journal of Biological Education (published by Taylor and Francis/Routledge) is the education journal of the Royal Society of Biology.


The Journal of Chemical Education is a long-established journal published by the American Chemical Society. It is not purely a research journal, but it does have a section for educational research and has published many important articles in the field. 7


Chemistry Education Research and Practice (published by the Royal Society of Chemistry, RSC) is purely a research journal, and can be considered the top international journal for research specifically in chemistry education. (Perhaps this is why there is a predatory journal knowingly called the Journal of Chemistry Education Research and Practice)

As CERP is sponsored by the RSC (which as a charity looks to use income to support educational and other valuable work), all articles in CERP are accessible for free on-line, but there are no publication charges for authors.


Not an exhaustive list!

These are the journals I am most familiar with, which focus on science education (or a science discipline education), publish serous peer-reviewed research papers, and can be considered international journals.

I know there are other discipline-based journals (e.g, biochemistry education, geology education) and indeed I expect there are many worthwhile places to publish that have slipped my mind or about which I am ignorant. Many regional or national journals have high standards and publish much good work. However, when it comes to research papers (rather than articles aimed primarily at teachers) academics usually get more credit when they publish in higher status international journals. It is these outlets that can best attract highly qualified editors and reviewers, and so peer review feedback tends to be most helpful 8, and the general standard of published work tends to be of a decent quality – both in terms of technical aspects, and its significance and originality.

There is no reason why work published in English is more important than work published in other languages, but the wide convention of publishing research for an international audience in English means that work published in English language journals probably gets wider attention globally. I have published a small number of pieces in other languages, but am primarily limited by my own restricted competence to only one language. This reflects my personal failings more than the global state of science education publishing!

A personal take – other viewpoints are welcome

So, this is my personal (belated) response to the question about where one should seek to publish research in science education. I have tried to give a fair account, but it is no doubt biased by my own experiences (and recollections), and so inadvertently subject to distortions and omissions.

I welcome any comments (below) to expand upon, or seek to correct, my suggested list, which might indeed make this a more useful listing for readers who are new to publishing their work. If you have had good (or bad) experiences with science education journals included in, or omitted from, my list, please share…


Sources cited:

Notes

1 Academic authorship is understood differently to how the term 'author' is usually used: in most contexts, the author is the person who prepared (wrote, types, dictated) a text. In academic research, the authors of the research paper are those who made a substantial direct intellectual contribution to the work being reported. That is, an author need not contribute to the writing-up phase (though all authors should approve the text) as long as they have made a proper contribution to the substance of the work. Most journals have clear expectations that all deserving authors, and only those people, should be named as authors.

Read about academic authorship


2 For many years the journal was edited by the late Prof. John Gilbert, who I first met sometime in the 1984-5 academic year when I applied to join the University of Surrey/Roehampton Institute part-time teachers' programme in the Practice of Science Education, and he – as one of course directors – interviewed me. I was later privileged to work with John on some projects – so this might be considered as a 'declaration of interest'.


3 Again, I must declare an interest. For some years I acted as the Book Reviews editor for the journal.


4 The centre was the base for the highly influential Children's Learning in Science Project which undertook much research and publication in the field under the Direction of the late Prof. Ros Driver.


5 Another declaration of interest: at the time of writing I am on the IHPST Advisory Board for the journal.


6 Declaration of interest: I am a member of the DISER's Editorial Board


7 I have recently shown some surprise at one research article published in JChemEd where major problems seem to have been missed in peer review. This is perhaps simply an aberration, or may reflect the challenge of including peer-reviewed academic research in a hybrid publication that also publishes a range of other kinds of articles.


8 Peer-review evaluates the quality of submissions, in part to inform publication decisions, but also to provide feedback to authors on areas where they can improve a manuscript prior to publication.

Read about peer review


Download this post


The first annual International Survey of Gullible Research Centres and Institutes

When is a 'survey' not really a survey? Perhaps, when it is a marketing tool.


Keith S. Taber


A research survey seeks information about a population by collecting data from a sample.
Acaudio's 'survey' seems to seek information about whether particular respondents might be persuaded to buy their services.

Today I received an invitation to contribute to something entitled "the first annual International Survey of Research Centres and Institutes". Despite this impressive title, I decided not to do so.

This was not because I had some doubts that whether it really was 'the first…' (has there never previously been an annual International Survey of Research Centres and Institutes?) Nor was it because I had been invited to represent 'The Science and Technology Education Research Group' which I used to lead – but not since retiring from my Faculty duties.

My main reason for not participating was because I suspected this was a scam. I imagined this might be marketing apparently masquerading as academic research. I include the provisos 'suspected' and 'apparently' as I was not quite sure whether this was actually a poor attempt to mislead participants or just a misjudged attempt at witty marketing. That is, I was not entirely sure if recipients of the invitation were supposed to think this was a serious academic survey.



There is a carpet company that claims that no one knows more about floors than … insert here any of a number of their individual employees. Their claims – taken together – are almost logically impossible, and certainly incredible. I am sure most people let this wash over them – but I actually find it disconcerting that I am not sure if the company is (i) having a logical joke I am supposed to enjoy ('obviously you are not meant to believe claims in adverts, so how about this…'), or (ii) simply lying to me, assuming that I will be too stupid to spot the logical incoherence.

Read 'Floored or flawed knowledge?: A domain with a low ceiling'

Why is this not serious academic research?

My first clue that this 'survey' was not a serious attempt at research was that the invitation was from an email address of 'playlist.manager@acaudio.com', rather than from an academic institute or a learned society. Of course, commercial organisations can do serious academic research, if usually when they are hired to do so on behalf of a named academically-focussed organisation. The invitation made no mention of any reputable academic sponsor.

I clicked on the link to the survey to check for the indicators one finds in quality research. Academic research is subject to ethical norms, such as seeking voluntary informed consent, and any invitation to engage in bone fide academic research will provide information to participants up front (either on the front page of the survey or via a link that can be accessed before starting to respond to any questions). One would expect to be informed, at a minimum:

  • who is carrying out the research (and who for, if it is commissioned) and for what purpose;
  • how data will be used – for example, usually it is expected that any information provided with be treated as confidential, securely stored, and only used in ways that protect the anonymity of participants.

This was missing. Commercial organisations sometimes see information you provide differently, as being a resource that they can potentially sell on. (Thus the recent legislation regulating what can or cannot be done with personal information that is collected by organisations.)

Hopefully, potential participants will be informed about the population being sampled and something of the methodology being applied. In an ideal world an International Survey of Research Centres and Institutes would identify and seek data from all Research Centres and Institutes, internationally. That would be an immense undertaking – and is clearly not viable. Consider:

  • How many 'research centres' are initiated, and how many close down or fade away, internationally, each year?
  • Do they all even have websites? (If not, how are they to be identified?)
  • If so, spread over how many languages?

Even attempting a meaningful annual survey of all such organisations would require a substantive, well-resourced, research team working full-time on the task. Rather, a viable survey would collect data from a sample of all research centres and research institutes, internationally. So, some indication of how a sample has been formed, or how potential participants identified, might be expected.

Read about sampling a population of interest

One of the major limitations of many surveys of large populations is that even if a decent sample size is achieved, such surveys are unlikely to reach a representative sample, or even provide any useful indicators of whether the sample might be representative. For example, information provided by 'a sample of 80 science teachers' tells us next to nothing about 'science teachers' in general if we have no idea how representative that sample is.

It can be a different matter when surveys are undertaken of small, well-defined, populations. A researcher looking to survey the students in one school, for example (perhaps for a consultation about a mooted change in school dress policy), is likely to be in a position to make sure all in the population have the opportunity to respond, and perhaps encourage a decent response rate. They may even be able to see if, for example, respondents reflect the wider population in some important ways (for example, if one got responses from 400/1000 students, one would usually be reasonably pleased, but less so if hardly any of the responses were in, say, the two youngest year groups).

In such a situation there is likely to be a definitive list of members of the population, and a viable mechanism to reach them all. In more general surveys, this is seldom the case. One might see a particular type of exception as elections (which can be considered as akin to surveys). The electoral register potentially lists all enfranchised to vote, and includes a postal address where each voter can be informed of a forthcoming poll. In this situation, there is a considerable administrative cost of maintaining the register – considered worth paying to support the democratic process – and a legal requirement to register: yet, even here, no one imagines the roll is ever complete and entirely up-to-date.)

  • How many of the extant Research Centres and Research Institutes, internationally, had been invited to participated in this survey?
  • And did these invitations reflect the diversity of Research Centres and Institutes, internationally?
    • By geographical location?
    • By discipline?

No such information was provided.

The time-scale for an International Survey of Research Centres and Institutes

To be fair the invitation email did suggest the 'researchers' would share outcomes with the participants:

"We will release the results over the next month".

But that time-scale actually seemed to undermine the possibility that this initiative was meant as a serious survey. Anyone who has ever undertaken any serious research knows: it takes time.

When planning the stages of a research project, you should keep in mind that everything will likely take longer than you expect…

even when you allow for that.

Not entirely frivolous advice given to research students

Often with surveys, the initial response is weak (filling in other people's questionnaires is seldom anyone's top priority), and it becomes necessary to undertake additional rounds of eliciting participation. It is good practice to promise to provide feedback; but to offer to do this within a month seems, well, foolhardy.

Except, of course, Acaudio are not a research organisation, and the purpose of the 'survey' was, I suggest, not academic research. As becomes clear from the questions asked, this is marketing 'research': a questionnaire to support Acaudio's own marketing.

What does this company do?

Acaudio offer a platform for allowing researchers to upload short audio summaries of their research. Researchers can do this for free. The platform is open-access, allowing anyone to listen. The library is collated with play-lists and search functions. The company provides researchers data on access to their recordings.

This sounds useful, and indeed 'too good to be true' as there are no charges for the service. Clearly, of itself, that would be a lousy business model.

The website explains:

"We also collaborate with publishers and companies. While our services are licensed to these organizations, generating revenue, this approach is slightly different from our collaboration with you as researchers. However, it enables us to maintain the platform as fully open access for our valued users."

https://acaudio.com/faq

So, having established the website, and built up a library of recordings hosted for free (the 'loss leader' as they say 1), the company is now generating income by entering into commercial arrangements with organisations. Another page on their website claims the company has 'signed' 1000 journals and 2000 research centers [sic]. So, alongside the free service, the company is preparing content on behalf of clients to publicise, in effect advertise, their research for them. Nothing terrible there, although one would hope that the research that has the most impact gets that impact on merit, not because some journals and research centres can pay to bring more attention to their work. This business seems similar to those magazines that offer to feature your research in a special glossy article – for a price.

Read 'Research features…but only if you can afford it'

One would like to think that publicly funded researchers, at least, spend the public's money on the actual research, not on playing the impact indicators game by commissioning glossy articles in magazines which would not be any serious scholar's preferred source of information on research. Sadly, since the advent of the Research Assessment Exercise (and its evolution into the 'Research Excellence Framework') vast amounts of useful resource have been spent on both rating research and in playing the games needed to get the best ratings (and so the consequent research income). As is usually the case with anything of this kind (one could even include formal school examinations!), even if the original notion is well-intentioned,

  • the measurement process comes to distort what it is measuring;
  • those seen as competing spend increasing resources in trying to out do each other in terms of the specifics of the assessment indicators/criteria

So, as research impact is now considered measurable, and as it is (supposedly) measured, and contributes to university income, there is a temptation to spend money on things that might increase impact. It becomes less important whether a study has the potential to increase human health and happiness; and more important to get it the kind of public/'end user' attention that might ultimately lead to evidence of 'impact' – as this will increase income, and allow the research to continue (and, who knows, perhaps eventually even increase human health and happiness).

What do Acaudio want to know?

Given that background, the content of the survey questionnaire makes perfect sense. After collecting some information on your research centre, there are various questions such as

  • How satisfied are you with the level of awareness people have of your centre / institute?
  • How important is it that the general public are aware of the work your centre / institute does?

I suspect most heads of research centres think it is important people know of their work, and are not entirely satisfied that enough people do. (I suspect academic researchers generally tend to think that their own research is actually (i) more important than most other people realise and (ii) deserves more attention than it gets. That's human nature, surely? Any self-effacing and modest scholars are going to have to learn to sell themselves better, or, if not, they are perhaps unlikely to be made centre/institute heads.

There are questions about how much time is spent promoting the research centre, and whether this is enough (clearly, one would always want to do more, surely?), and the challenges of doing this, and who is responsible (I suspect most heads of centres feel some such responsibility, without considering it is how they most want to spend their limited time for research and scholarship).

Perhaps the core questions are:

  • Do you agree it is important to have a dedicated person to take care of promotional activities?
  • How much would you consider to be a reasonable amount to spend on promotional activities?

These questions will presumably help Acaudio decide whether you can easily be persuaded to sign up for their help, and what kind of budget you might have for this. (The responses for the latter include an option for spending more than $5000 each year on promotional activities!)

I am guessing that at even $5000+ p.a., they would not actually provide a person dedicated to 'take care of promotional activities' for you, rather than a person dedicated to adding your promotional activities to their existing portfolio of assigned clients!

So, this is a marketing questionnaire.

Is this dishonest?

It seems misleading to call a marketing questionnaire 'the first annual International Survey of Research Centres and Institutes' unless Acaudio are making a serious attempt to undertake a representative survey of Research Centres and Institutes, internationally, and they do intend to publish a full analysis of the findings. "We will release the results over the next month" sounds like a promise to publish, so I will look out with interest for an announcement that the results have indeed been made available.

Lies, delusional lies, and ill-judged attempts at humour

Of course, lying is not simply telling untruths. A person who claims to be Napoleon or Joan of Arc is not lying if that person actually believes that is who they are. Someone who claims they are the best person to run your country is not necessarily lying simply because the claim is false. If the Acaudio people genuinely think they are really doing an International Survey of Research Centres and Institutes then their invitation is not dishonest even if it might betray any claim to know much about academic research.


"I'm [an actor playing] Spartacus";"I'm [an actor playing another character who is not Spartacus, but is pretending to be] Spartacus"; "I'm [another actor playing another character who is also not Spartacus, but is also pretending to be] Spartacus"… [Still from Universal Pictures Home Entertainment movie 'Spartacus']


Nor is it lying, when there is no intent to deceive. Something said sarcastically or as a joke, or in the context of a theatrical performance, is not a lie as long as it is expected that the audience share the conceit and do not confuse it for an authentic knowledge claim. Kirk Douglas, Tony Curtis, and their fellow actors playing rebellious Roman slaves, all knew they were not Spartacus, and that anyone in a cinema watching their claims to be the said Spartacus would recognise these were actors playing parts in a film – and that indeed in the particular context of a whole group of people all claiming to be Spartacus, the aim even in the fiction was actually NOT to identify Spartacus, but to confuse the whole issue (even if being crucified as someone who was only possibly Spartacus might be seen as a Pyrrhic victory 2).

So, given that the claim to be undertaking the first annual International Survey of Research Centres and Institutes was surely, and fairly obviously, an attempt to identify research centres that (a) might be persuaded to purchase Acaudio's services and (b) had budget to pay for those services, I am not really sure this was an attempt to deceive. Perhaps it was a kind of joke, intended to pull in participants, rather than a serious attempt to fool them.

That said, any organisation hoping for credibility among the academic community surely needs to be careful about its reputation. Sending out scam emails that claim to be seeking participants for a research survey that is really a marketing questionnaire seems pretty dubious practice, even if there was no serious attempt to follow through by disguising the questionnaire as a serious piece of research. You might initially approach the questionnaire thinking it was genuine research, but as you worked through it SHOULD have dawned that this information was being collected because (i) it is of commercial value to Acaudio, and not (ii) to answer any theoretically motivated research questions.

  • So, is this dishonest? Well, it is not what it claims to be.
  • Does this intend to deceive? If it did, then it was not well designed to hide its true purpose.
  • Is it malpractice? Well, there are rules in the U.K. about marketing emails:

"You're only allowed to send marketing emails to individual customers if they've given you permission.

Emails or text messages must clearly indicate:

  • who you are
  • that you're selling something

Every marketing email you send must give the person the ability to opt out of (or 'unsubscribe from') further emails."

https://www.gov.uk/marketing-advertising-law/direct-marketing

The email from Hussain Ayed, Founder, Acaudio, told me who he, and his organisation, are, but

  • did not clearly suggest he was selling something: he was inviting me to contribute to a research survey (illegal?)
  • Nor was there any option to opt out of further messages (illegal?)
  • And I am not aware of having invited approaches from this company – which might be why it was masquerading as a request to contribute to research (illegal?)

I checked my email system to see if I'd had any previous communication with this company, and found in my junk folder a previous approach,"invit[ing Keith, again] to talk about some of the research being done at The Science and Technology Education Research Group on Acaudio…". It seems my email software can recognise cold calling – as long as it does not claim to be an invitation to respond to a research study.



The earlier email claimed it was advertising the free service…but then invited me to arrange a time to talk to them for 'roughly' 20 minutes. That seems odd, both because the website seems to provide all the information needed; and then why would they commit 20 minutes of their representative's time to talk about a free service? Presumably, they wanted to sell me their premium service. The email footer also gave a business address in E9, London – so the company should know about the UK laws about direct marketing that Acaudio seems to be flouting.

Perhaps not enough people responded to give them 20 minutes of their time, so the new approach skips all that and asks instead for people to "give us 2-3 minutes of your time to fill in the survey [sic 3]".


Original image by Mohamed Hassan from Pixabay


Would you buy a second hand account of research from this man?

In summary, if someone is looking to buy in this kind of support in publicising their work, and has the budget(!), and feels it is acceptable to spend research funds on such services, then perhaps they might fill in the questionnaire and await the response. But I am not sure I would want to get involved with companies which use marketing scams in this way. After all, if they cannot even start a conversation by staying within the law, and being honest about their intentions, then that does not bode well for being able to trust them going forward into a commercial arrangement.


Update (15th October, 2023): Were the outcomes of the first annual International Survey of Research Centres and Institutes published? See 'The sugger strikes back! An update on the 'first annual International Survey of Research Centres and Institutes'


Notes

1 When a shop offers a product at a much discounted price, below the price needed to 'break even', so as to entice people into the shop where they will hopefully buy other goods (at a decent mark-up for the seller), the goods sold at a loss are the 'loss leaders'.

Goods may also be sold at a loss when they are selling very slowly, to make space on the shop floor and in the storeroom for new produce that it is hoped will generate profit. Date-sensitive goods may be sold at a loss because they will soon not be saleable at all (such as perishables) or only at even greater discounts (such as models about to be replaced by updated versions by manufacturers – e.g., iPhones). But loss leader goods are priced low to get people to view other produce (so they might be displayed dominantly in the window, but only found deep in the shop).


2 In their wars against the armies of King Pyrrhus of Epirus, the Romans lost battles, but in doing so inflicted such heavy and unsustainable losses on the nominally victorious invading army that Pyrrhus was forced to abandon his campaign.

At the end of the slave revolt (a historical event on which the film 'Spartacus' is based) the Romans are supposed to have decided to execute the rebel leader, the escaped gladiator Spartacus, and return the other rebels to slavery. Supposedly, when the Roman official tried to identify Spartacus, each of the recaptured slaves in turn claimed he was Spartacus, thus thwarting identification. So, the ever pragmatic Romans crucified them all.


3 The set of questions is actually a questionnaire which is used to collect data for the survey. Survey (a type of methodology) does not necessarily imply using a questionnaire (a data collection technique) as a survey could be carried out using an observation schedule (i.e., a different data collection technique), for example.

Read about surveys

Read about questionnaires


Are physics teachers unaware of the applications of physics to other sciences?

Confounding conceptual integration


Keith S. Taber


Tuysuz and colleagues seem to have found chemistry and physics teachers have a different attitude to the importance of integrating concepts from across the subjects.


Conceptual integration?

Conceptual integration is very important in science. That is, science doesn't consist of a large set of unrelated facts, but rather the ability to subsume a great many phenomena under a limited number of ideas is valued. James Clerk Maxwell is widely remembered for showing that electricity, magnetism and radiation such as light (that is, what we now call electromagnetic radiation) were intimately related, and today theoretical physicists seek a 'Grand Unified Theory' that would account of all the forces in nature. Equally, the apparent incompatibility of the two major scientific ideas of the early twentieth century – general relativity and quantum mechanics – is widely recognised as suggesting a fundamental problem in our current best understanding of the world.

So, conceptual integration can be seen as a scientific value: something scientists expect to find in nature 1 and something they seek through their research.

Learners may not appreciate this. When I was teaching physics and chemistry I was quite surprised to see how little some students who studied both subjects would notice, or indeed expect, ideas taught in one course to link to those in another (e.g., Taber, 1998).

A demarcation criterion?

I have even, only partially tongue-in-cheek, suggested that a criterion for identifying an authentic science education would be that it emphasises the connections within science, both within and across disciplines (Taber, 2006). 2

Sadly, there has been limited attention to this theme within science education, and very little research. I was therefore pleased to find a references to a Turkish study on the topic. 3

A study with teachers-in-preparation

Tuysuz, Bektas, Geban, Ozturk and Yalvac (2016) undertook an interview study with students preparing for school science teaching. One of their findings was:

"Generally speaking, while the pre-service chemistry teachers think that physics concepts should be used in the chemistry lessons, the pre-service physics teachers believe that these two subjects' concepts generally are not related to each other."

Tuysuz, Bektas, Geban, Ozturk & Yalvac, 2016

Reading this in isolation might seem to suggest that those preparing for chemistry teaching (and therefore, likely, chemistry teachers) saw more value in emphasising conceptual integration in teaching than those preparing for physics teaching (and therefore, likely, physics teachers).

Why might physics teachers give less value to conceptual integration?

It is easy to try to think of possible reasons for this:

  • Conjecture 1: chemistry teachers are aware of how chemistry draws upon physical concepts, and so are more minded to emphasise links between the subjects than physics teachers. 4
  • Conjecture 2: physicists, and so physics teachers, are more arrogant about their discipline than other scientists (cf. "All science is either physics or stamp collecting" – as Ernest Rutherford supposedly claimed!)
  • Conjecture 3: chemists are more likely to have also studied other science disciplines at a high level (and so are well placed to appreciate conceptual integration across sciences), whereas physics specialists are more likely to have mainly focussed on mathematics as a subsidiary subject rather than other sciences.

I imagine other possibilities will have occurred to readers, but before spending too much time on explaining Tuysuz and colleagues' findings, it is worth considering how they came to this conclusion.

Not an experiment

Tuysuz and colleagues do not claim to have undertaken an experimental study, but rather claim their work is phenomenology. It did not use a large, randomly selected (and, so, likely to be representative) sample of populations of pre-service science teachers (as would be needed for an experiment), but rather used a convenience sample of six students who were accessible and willing to help: three pre-service physics teachers and three pre-service chemistry teachers.

Read about sampling populations in research

It is not unusual for educational studies to be based on very small samples, as this allows for in-depth work. If you want to know what a person really thinks about a topic, you need to establish rapport and trust with them, and encourage them to talk in some detail – not just offer a rating to some item on a questionnaire. Small samples are perfectly proper in such studies.

What is questionable, is whether it is really meaningful to tease out differences between two identified groups (e.g., pre-service chemistry teachers; pre-service physics teachers) based on such samples. We cannot generalise without representative samples, so, when Tuysuz, Bektas, Geban, Ozturk and Yalvac write "Generally speaking…", their study does not really support such generalisation. The authors are only reporting what they found in their particular sample, and so the reader needs to contextualise their claim in terms of further details of the study, i.e., the reader needs to read the claim as

"Generally speaking, while the three pre-service chemistry teachers who volunteered to talk to us from this one teacher preparation programme think that physics concepts should be used in the chemistry lessons, the three pre- service physics teachers who volunteered to talk to us from this one programme believe that these two subjects' concepts generally are not related to each other."

Put in those terms, this is a very localised and limited kind of 'generally'.

This does not undermine the potential value of the study. That any future school science teachers might think that "these two subjects' concepts generally are not related to each other" is a worrying finding.

A confounded design

Another reason why it is important not to read Tuysuz's study as suggesting a general difference between teacher candidates in physics and chemistry, is because of a major confound in the study design. If the research had been intended as an experiment, where the investigators have to control variables so that there is only one differences between the different conditions, this would have been a critical flaw in the design.

The pre-service physics teachers and the pre-service chemistry teachers were taking parallel, but distinct, courses during the study. The authors report that the teaching approaches were different in the two subject areas. In particular, the paper reports that in the case of the pre-service chemistry teachers conceptual integration was explicitly discussed. The chemists – but not the physicists – were taught that conceptual integration was important. When interviewed, the chemists (who had been taught about conceptual integration) suggested conceptual integration was more important than the physicists (who had not been taught about conceptual integration) did!

  • This might have been because of their different subject specialisms;
  • It might have been because of the differences in the practice teaching courses taken by the two groups, such as perhaps the specific engagement of the chemists (but not the physicists) with ideas about conceptual integration during their course;
  • It might have been due to an interaction between these two factors (that is, perhaps neither difference by itself would have led to this finding);
  • And it might have simply reflected the ideas and past experiences of the particular three students in the chemists group, and the particular three students in the physicists group.

Tuysuz and colleagues found that, 'generally speaking', three students (who were chemistry specialists and had been taught about conceptual integration) had a different attitude to the importance of conceptual integration in teaching science to three other students (who were physics specialists and had not been taught about conceptual integration)

Read about confounding variables in research

The researchers might have just as readily reported that:

"Generally speaking, while the pre-service science teachers who had discussed conceptual integration in their course think that physics concepts should be used in the chemistry lessons, the pre-service science teachers who had not been taught about this believe that these two subjects' concepts generally are not related to each other."

Of course, such a conclusion would be equally misleading as both factors (subject specialism and presence/absence of explicit teaching input) vary simultaneously between the two groups of students, so it is inappropriate to suggest a general difference due to either factor in isolation.


Work cited:

Notes

1 Although science is meant to be based on objective observations of the natural world, scientists approach their work with certain fundamental assumptions about nature. These might include beliefs that

  • an objective account of nature is in principle possible (that different observers can observe the same things), and
  • that there is at some level a consistent nature to the universe (there are fixed laws which continue to apply over time)

assumptions that are needed for science to be meaningful. As these things are assumed prior to undertaking any scientific observations they can be considered metaphysical commitments (Taber, 2013).

[Download 'Conceptual frameworks, metaphysical commitments and worldviews']

Another metaphysical commitment generally shared by scientists as a common worldview is that the complex and diverse phenomena we experience can be explained by a limited number of underlying principles and laws. From this perspective, progress in science leads to increased integration between topics.


2 The term 'demarcation criterion' is often used in relation to deciding what should be considered a science (e.g., usually, astronomy is considered a science, and so is biochemistry; but not astrology or psychoanalysis). A famous example of a demarcation criterion, due to Karl Popper, is that a scientific conjecture is one which is in principle capable of being refuted.

Astronomers can use their theories and data to predict the date of the next solar eclipse, for example. If the eclipse did not occur when predicted, that would be considered a falsification.

By contrast, if a psychotherapist suggested a person had personality issues due to repressed, unresolved, feelings about their parents, then this cannot be refuted. (The client may claim having positive and untroubled relationships with the parents, but the therapist does not consider this a refutation as the feelings have been repressed, so they are not consciously available to the client. The problem can only be detected indirectly by signs which the therapist knows how to interpret.).


3 I became aware of the study discussed here when reading the work in progress of Louise Vong, who has been doing some research in this important topic.


4 Physics concepts are widely applied in chemistry, but not vice versa. So, this is suggesting that chemistry teachers have more need to refer to physics in teaching their subject than the converse.

However, we could also have looked to explain the opposite finding (had it been reported that pre-service physics teachers paid more attention to conceptual integration than pre-service chemistry teachers) by suggesting physics teachers have more reason to refer to chemistry topics when discussing examples of applications of concepts being taught, than chemistry teachers have to refer to physics topics.


Is your heart in the research?

Someone else's research, that is


Keith S. Taber


Imagine you have a painful and debilitating illness. Your specialist tells you there is no conventional treatment known to help. However, there is a new – experimental – procedure: a surgery that may offer relief. But it has not yet been fully tested. If you are prepared to sign up for a study to evaluate this new procedure, then you can undergo surgery.

You are put under and wheeled into the operating theatre. Whilst you experience – rather, do not experience – the deep, sleepless rest of anaesthesia, the surgeon saws through your breastbone, prises open your ribcage with a retractor (hopefully avoiding breaking any ribs),
reaches in, and gently lifts up your heart.

The surgeon, pauses, perhaps counts to five, then carefully replaces your heart between the lungs. The ribcage is closed, and you are sown-up without any actual medical intervention. You had been randomly assigned to the control group.


How can we test whether surgical interventions are really effective without blind controls?

Is it right to carry out sham operations on sick people just for the sake of research?

Where is the balance of interests?

(Image from Pixabay)


Research ethics

A key aspect of planning, executing and reviewing research is ethical scrutiny. Planning, obviously, needs to take into account ethical considerations and guidelines. But even the best laid plans 'of mice and men' (or, of, say, people investigating mice) may not allow for all eventualities (after all, if we knew what was going to happen for sure in a study, it would not be research – and it would be unethical to spend precious public resources on the study), so the ethical imperative does not stop once we have got approval and permissions. And even then, we may find that we cannot fully mitigate for unexpected eventualities – which is something to be reported and discussed to help inform future research.

Read about research ethics

When preparing students setting out on research, instruction about research ethics is vital. It is possible to teach about rules, and policies, and guidelines and procedures – but real research contexts are often complex, and ethical thinking cannot be algorithmic or a matter of adopting slogans and following heuristics. In my teaching I would include discussion of past cases of research studies that raised ethical questions for students to discuss and consider.

One might think that as research ethics is so important, it would be difficult to find many published studies which were not exemplars of good practice – but attitudes to, and guidance on, ethics have developed over time, and there are many past studies which, if not clearly unethical in today's terms, at least present problematic cases. (That is without the 'doublethink' that allows some contemporary researchers to, in a single paper, both claim active learning methods should be studied because it is known that passive learning activities are not effective, yet then report how they required teachers to instruct classes through passive learning to act as control groups.)

Indeed, ethical decision-making may not always be straight-forward – as it often means balancing different considerations, and at a point where any hoped-for potential benefits of the research must remain uncertain.

Pretending to operate on ill patients

I recently came across an example of a medical study which I thought raised some serious questions, and which I might well have included in my teaching of research ethics as a case for discussion, had I known about before I retired.

The research apparently involved surgeons opening up a patient's ribcage (not a trivial procedure), and lifting out the person's heart in order to carry out a surgical intervention…or not,

"In the late 1950s and early 60s two different surgical teams, one in Kansas City and one in Seattle, did double-blind trials of a ligation procedure – the closing of a duct or tube using a clip – for very ill patients suffering from severe angina, a condition in which pain radiates from the chest to the outer extremities as a result of poor blood supply to the heart. The surgeons were not told until they arrived in the operating theatre which patients were to receive a real ligation and which were not. All the patients, whether or not they were getting the procedure, had their chest cracked open and their heart lifted out. But only half the patients actually had their arteries rerouted so that their blood could more efficiently bathe its pump …"

Slater, 2018

The quote is taken from a book by Lauren Slater which sets out a history of drug use in psychiatry. Slater is a psychotherapist who has written a number of books about aspects of mental health conditions and treatments.

Fair testing

In order to make a fair experiment, the double-blind procedure sought to treat the treatment and control group the same in all respects, apart from the actual procedure of ligation of selected blood vessels that comprised the mooted intervention. The patients did not know (at least, in one of the studies) they might not have the real operation. Their physicians were not told who was getting the treatment. Even the surgeons only found out who was in each group when the patient arrived in theatre.

It was necessary for those in the control group to think they were having an intervention, and to undergo the sham surgery, so that they formed a fair comparison with those who got the ligation.

Read about control of variables

It was necessary to have double-blind study (neither the patients themselves, nor the physicians looking after them, were told which patients were, and which were not, getting the treatment), because there is a great deal of research which shows that people's beliefs and expectations make substantial differences to outcomes. This is a real problem in educational research when researchers want to test classroom practices such as new teaching schemes or resources or innovative pedagogies (Taber, 2019). The teacher almost certainly knows whether she is teaching the experimental or control group, and usually the students have a pretty good idea. (If every previous lesson has been based on teacher presentations and note-taking, and suddenly they are doing group discussion work and making videos, they are likely to notice.)

Read about expectancy effects

It was important to undertake a study, because there was not clear objective evidence to show whether the new procedure actually improved patient outcomes (or possibly even made matters worst). Doctors reported seeing treated patients do better – but could only guess how they might have done without surgery. Without proper studies, many thousands or people might ultimately undergo an ineffective surgery, with all the associated risks and costs, without getting any benefit.

Simply comparing treated patients with matched untreated patients would not do the job, as there can be a strong placebo effect of believing one is getting a treatment. (It is likely that at least some alternative therapies largely work because a practitioner with good social skills spends time engaging with the patient and their concerns, and the client expects a positive outcome.)

If any positive effects of heart surgery were due to the placebo effect, then perhaps a highly coloured sugar pill prescribed with confidence by a physician could have the same effect without operating theatres, surgical teams, hospital stays… (For that matter, a faith healer who pretended to operate without actually breaking the skin, and revealed a piece of material {perhaps concealed in a pocket or sleeve} presented as an extracted mass of diseased tissue or a foreign body, would be just as effective if the patient believed in the procedure.)

So, I understood the logic here.

Do no harm

All the same – this seemed an extreme intervention. Even today, anaesthesia is not very well understood in detail: it involves giving a patient drugs that could kill them in carefully controlled sub-lethal doses – when how much would actually be lethal (and what would be insufficient to fully sedate) varies from person to person. There are always risks involved.


"All the patients, whether or not they were getting the procedure had their chest cracked open and their heart lifted out."

(Image by Starllyte from Pixabay)


Open heart surgery exposes someone to infection risks. Cracking open the chest is a big deal. It can take two months for the disrupted tissues to heal. Did the research really require opening up the chest and lifting the heart for the control group?

Could this really ever have been considered ethical?

I might have been much more cynical had I not known of other, hm, questionable medical studies. I recall hearing a BBC radio documentary in the 1990s about American physicians who deliberately gave patients radioactive materials without their knowledge, just to to explore the effects. Perhaps most infamously there was the Tuskegee Syphilis study where United States medical authorities followed the development of disease over decades without revealing the full nature of the study, or trying to treat any of those infected. Compared with these violations, the angina surgery research seemed tame.

But do not believe everything you read…

According to the notes at the back of Slater's book, her reference was another secondary source (Moerman, 2002) – that is someone writing about what the research reports said, not those actual 'primary' accounts in the research journals.

So, I looked on-line for the original accounts. I found a 1959 study, by a team from the University of Washington School of Medicine. They explained that:

"Considerable relief of symptoms has been reported for patient with angina pectoris subjected to bilateral ligation of the internal mammary arteries. The physiologic basis for the relief of angina afforded by this rather simple operation is not clear."

Cobb, Thomas, Dillard, Merendino & Bruce, 1959

It was not clear why clamping these blood vessels in the chest should make a substantial difference to blood flow to the heart muscles – despite various studies which had subjected a range of dogs (who were not complaining of the symptoms of angina, and did not need any surgery) to surgical interventions followed by invasive procedures in order to measure any modifications in blood flow (Blair, Roth & Zintel, 1960).

Would you like your aorta clamped, and the blood drained from the left side of your heart, for the sake of a research study?

That raises another ethical issue – the extent of pain and suffering and morbidity it is fair to inflect on non-human animals (which are never perfect models for human anatomy and physiology) to progress human medicine. Some studies explored the details of blood circulation in dogs. Would you like your aorta clamped, and the blood drained from the left side of your heart, for the sake of a research study? Moreover, in order to test the effectiveness of the ligation procedure, in some studies healthy dogs had to have the blood supply to the heart muscles disrupted to given them similar compromised heart function as the human angina sufferers. 1

But, hang on a moment. I think I passed over something rather important in that last quote: "this rather simple operation"?

"Considerable relief of symptoms has been reported for patient with angina pectoris subjected to bilateral ligation of the internal mammary arteries. The physiologic basis for the relief of angina afforded by this rather simple operation is not clear."

Cobb and colleagues' account of the procedure contradicted one of my assumptions,

 At the time of operation, which was performed under local anesthesia [anaesthesia], the surgeon was handed a randomly selected envelope, which contained a card instructing him whether or not to ligate the internal mammary arteries after they had been isolated.

Cobb et al, 1959

It seems my inference that the procedure was carried out under general anaesthetic was wrong. Never assume! Surgery under local anaesthetic is not a trivial enterprise, but carries much less risk than general anaesthetic.

Yet, surely, even back then, no surgeon was going to open up the chest and handle the heart under a local anaesthetic? Cobb and colleagues wrote:

"The surgical procedures commonly used in the therapy of coronary-artery disease have previously been "major" operations utilizing thoracotomy and accompanied by some morbidity and a definite mortality. … With the advent of internal-mammary-artery ligation and its alleged benefit, a unique opportunity for applying the principles of a double-blind evaluation to a surgical procedure has been afforded

Cobb, Thomas, Dillard, Merendino & Bruce, 1959

So, the researchers were arguing that, previously, surgical interventions for this condition were major operations that did involve opening up the chest (thorax) – thoracotomy – where sham surgery would not have been ethical; but the new procedure they were testing – "this rather simple operation" was different.

Effects of internal-mammary-artery ligation on 17 patients with angina pectoris were evaluated by a double-blind technic. Eight patients had their internal mammary arteries ligated; 9 had skin incisions only. 

Cobb et al, 1959

They describe "a 'placebo' procedure consisting of parasternal skin incisions"– that is some cuts were made into the skin next to the breast bone. Skin incisions are somewhat short of open heart surgery.

The description given by the Kansas team (from the Departments of Medicine and Surgery, University of Kansas Medical Center, Kansas City) also differs from Slater's third-hand account in this important way:

"The patients were operated on under local anesthesia. The surgeon, by random sampling, selected those in whom bilateral internal mammary artery and vein ligation (second interspace) was to be carried out and those in whom a sham procedure was to be performed. The sham procedure consisted of a similar skin incision with exposure of the internal mammary vessels, but without ligation."

Dimond, Kittle & Crocket, 1960

This description of the surgery seemed quite different from that offered by Slater.

These teams seemed to be reporting a procedure that could be carried out without exposing the lungs or the heart and opening their protective covers ("in this technique…the pericardium and pleura are not entered or disturbed", Glover, et al, 1957), and which could be superficially forged by making a few cuts into the skin.


"The performance of bilateral division of the internal mammary arteries as compared to other surgical procedures for cardiac disease is safe, simple and innocuous in capable hands."

Glover, Kitchell, Kyle, Davila & Trout, 1958

The surgery involved making cuts into the skin of the chest to access, and close off, arteries taking blood to (more superficial) chest areas in the hope it would allow more to flow to the heart muscles; the sham surgery, the placebo, involved making similar incisions, but without proceeding to change the pattern of arterial blood flow.

The sham surgery did not require general anaesthesia and involved relatively superficial wounds – and offered a research technique that did not need to cause suffering to, and the sacrifice of, perfectly healthy dogs. So, that's all ethical then?

The first hand research reports at least give a different impression of the balance of costs and potential benefits to stakeholders than I had originally drawn from Lauren Slater's account.

Getting consent for sham surgery

A key requirement for ethical research with human participants is being offered voluntary informed consent. Unlike dogs, humans can assent to research procedures, and it is generally considered that research should not be undertaken without such consent.

Read about voluntary informed consent

Of course, there is nuance and complication. The kind of research where investigators drop large denomination notes to test the honesty of passers by – where the 'participants' are in a public place and will not be identified or identifiable – is not usually seen as needing such consent (which would clearly undermine any possibility of getting authentic results). But is it acceptable to observe people using public toilets without their knowledge and consent (as was described in one published study I used as a teaching example)?

The extent to which a lay person can fully understand the logic and procedures explained to them when seeking consent can vary. The extent to which most participants would need, or even want to, know full details of the study can vary. When children of various ages are are involved, the extent to which consent can be given on their behalf by a parent or teachers raises interesting questions.


"I'm looking for volunteers to have a procedure designed to make it look like you've had surgery"

Image by mohamed_hassan from Pixabay


There is much nuance and many complications – and this is an area researchers needs to give very careful consideration.

  • How many ill patients would volunteer for sham surgery to help someone else's research?
  • Would that answer change, if the procedure being tested would later be offered to them?
  • What about volunteering for a study where you have a 50-50 chance of getting the real surgery or the placebo treatment?

In Cobb's study, the participants had all volunteered – but we might wonder if the extent of the information they were given amounted to what was required for informed consent,

The subjects were informed of the fact that this procedure had not been proved to be of value, and yet many were aware of the enthusiastic report published in the Reader's Digest. The patients were told only that they were participating in an evaluation of this operation; they were not informed of the double-blind nature of the study.

Cobb et al, 1959

So, it seems the patients thought they were having an operation that had been mooted to help angina sufferers – and indeed some of them were, but others just got taken into surgery to get a few wounds that suggested something more substantive had been done.

Was that ethical? (I doubt it would be allowed anywhere today?)

The outcome of these studies was that although the patients getting the ligation surgery did appear to get relief from their angina – so did those just getting the skin incisions. The placebo seemed just as good as the re-plumbing.

In hindsight, does this make the studies more worthwhile and seem more ethical? This research has probably prevented a great many people having an operation to have some of their vascular system blocked when that does not seem to make any difference to angina. Does that advance in medical knowledge justify the deceit involved in leading people to think they would get an experimental surgical treatment when they might just get an experimental control treatment?


Ethical principles and guidelines can helps us judge the merits of study

Coda – what did the middle man have to say?

I wondered how a relatively minor sham procedure under local anaesthetic became characterised as "the patients, whether or not they were getting the procedure had their chest cracked open and their heart lifted out" – a description which gave a vivid impression of a major intervention.


The heart is pretty well integrated into the body – how easy is it to life an intact, fully connected, working heart out of position?

Image by HANSUAN FABREGAS from Pixabay


I wondered to what extent it would even be possible to lift the heart out from the chest whilst it remained connected with the major vessels passing the blood it was pumping, and the nerves supplying it, and the vessels supplying blood to its own muscles (the ones that were considered compromised enough to make the treatment being tested worth considering). Some sources I found on-line referred to the heart being 'lifted' during open-heart procedures to give the surgeon access to specific sites: but that did not mean taking the heart out of the body. Having the heart 'lifted out' seemed more akin to Aztec sacrificial rites than medical treatment.

Although all surgery involves some risk, the actual procedure being investigated seemed of relatively routine nature. I actually attended a 'minor' operation which involved cutting into the chest when my late wife was prepared for kidney dialysis. Usually a site for venal access is prepared in the arm well in advance, but it was decided my wife needed to be put on dialysis urgently. A temporary hole was cut into her neck to allow the surgeon to connect a tube (a central venous catheter) to a vein, and another hole into her chest so that the catheter would exit in her chest, where the tap could be kept sterile, bandaged to the chest. This was clearly not considered a high risk operation (which is not to say I think I could have coped with having this done to me!) as I was asked by the doctors to stay in the room with my wife during the procedure, and I did not need to 'scrub' or 'gown up'.

Bilateral internal mammary artery ligation seemed a procedure on that kind of level, accessing blood vessels through incisions made in the skin. However, if Lauren Slater had read up some of the earlier procedures that did require opening the chest, or if she had read the papers describing how the dogs were investigated to trace blood flow through connected vessels, measure changes in flow, and prepare them for induced heart conditions, I could appreciate the potential for confusion. Yet she did not cite the primary research, but rather Daniel Moerman, an Emeritus Professor of Anthropology at University of Michigan-Dearborn, who has written a book about placebo treatments in medicine.

Moerman does write about the bilateral internal mammary artery ligation, and the two sham surgery studies I found in my search. Moerman describes the operation:

"It was quite simple, and since the arteries were not deep in the body, could be performed under local anaesthetic."

Moerman, 2002

He also refers to the subjective reports on one of the patients assigned to the placebo condition in one of the studies, who claimed to feel much better immediately after the procedure:

"This patient's arteries were not ligated…But he did have two scars on his chest…"

Moerman, 2002

But nobody cracked open his chest, and no one handled his heart.

There are still ethical issues here, but understanding the true (almost superficial) nature of the sham surgery clearly changes the balance of concerns. If there is a moral to this article, it is perhaps the importance of being fully informed before reaching judgement about the ethics of a research study.


Work cited:
  • Blair, C. R., Roth, R. F., & Zintel, H. A. (1960). Measurement of coronary artery blood-flow following experimental ligation of the internal mammary artery. Annals of Surgery, 152(2), 325.
  • Cobb, L. A., Thomas, G. I., Dillard, D. H., Merendino, K. A., & Bruce, R. A. (1959). An evaluation of internal-mammary-artery ligation by a double-blind technic. New England Journal of Medicine, 260(22), 1115-1118.
  • Dimond, E. G., Kittle, C. F., & Crockett, J. E. (1960). Comparison of internal mammary artery ligation and sham operation for angina pectoris. The American Journal of Cardiology, 5(4), 483-486.
  • Glover, R. P., Davila, J. C., Kyle, R. H., Beard, J. C., Trout, R. G., & Kitchell, J. R. (1957). Ligation of the internal mammary arteries as a means of increasing blood supply to the myocardium. Journal of Thoracic Surgery, 34(5), 661-678. https://doi.org/https://doi.org/10.1016/S0096-5588(20)30315-9
  • Glover, R. P., Kitchell, J. R., Kyle, R. H., Davila, J. C., & Trout, R. G. (1958). Experiences with Myocardial Revascularization By Division of the Internal Mammary Arteries. Diseases of the Chest, 33(6), 637-657. https://doi.org/https://doi.org/10.1378/chest.33.6.637
  • Moerman, D. E. (2002). Meaning, Medicine, and the "Placebo Effect". Cambridge University Press Cambridge.
  • Slater, Lauren (2018) The Drugs that Changed our Minds. The history of psychiatry in ten treatments. London. Simon & Schuster
  • Taber, K. S. (2019). Experimental research into teaching innovations: responding to methodological and ethical challengesStudies in Science Education, 55(1), 69-119. doi:10.1080/03057267.2019.1658058 [Download this paper.]


Note:

1 To find out if the ligation procedure protected a dog required stressing the blood supply to the heart itself,

"An attempt has been made to evaluate the degree of protection preliminary ligation of the internal mammary artery may afford the experimental animal when subjected to the production of sudden, acute myocardial infarction by ligation of the anterior descending coronary artery at its origin. …

It was hoped that survival in the control group would approximate 30 per cent so that infarct size could be compared with that of the "protected" group of animals. The "protected" group of dogs were treated in the same manner but in these the internal mammary arteries were ligated immediately before, at 24 hours, and at 48 hours before ligation of the anterior descending coronary.

In 14 control dogs, the anterior descending coronary artery with the aforementioned branch to the anterolateral aspect of the left ventricle was ligated. Nine of these animals went into ventricular fibrillation and died within 5 to 20 minutes. Attempts to resuscitate them by defibrillation and massage were to no avail. Four others died within 24 hours. One dog lived 2 weeks and died in pulmonary edema."

Glover, Davila, Kyle, Beard, Trout & Kitchell, 1957

Pulmonary oedema involves fluid build up in the lungs that restricts gaseous exchange and prevents effective breathing. The dog that survived longest (if it was kept conscious) will have experienced death as if by slow suffocation or drowning.

Why ask teachers to 'transmit' knowledge…

…if you believe that "knowledge is constructed in the minds of students"?


Keith S. Taber


While the students in the experimental treatment undertook open-ended enquiry, the learners in the control condition undertook practical work to demonstrate what they had already been told was the case – a rhetorical exercise that reflected the research study they were participating in


A team of researchers chose to compare a teaching approach they believed met the requirements for good science instruction, and which they knew had already been demonstrated effective pedagogy in other studies, with teaching they believed was not suitable for bringing about conceptual change.
(Ironically, they chose a research design more akin to the laboratory activities in the substandard control condition, than to the open-ended enquiry that was part of the pedagogy they considered effective!)

An imaginary conversation 1 with a team of science education researchers.

When we critically read a research paper, we interrogate the design of the study, and the argument for new knowledge claims that are being made. Authors of research papers need to anticipate the kinds of questions readers (editors, reviewers, and the wider readership on publication) will be asking as they try to decide if they find the study convincing.

Read about writing-up research

In effect, there is an asynchronous conversation.

Here I engage in 'an asynchronous conversation' with the authors of a research paper I was interrogating:

What was your study about?

"This study investigated the effect of the Science Writing Heuristic (SWH) approach on grade 9 students' understanding of chemical change and mixture concepts [in] a Turkish public high school."

Kingir, Geban & Gunel, 2013

I understand this research was set up as a quasi-experiment – what were the conditions being compared?

"Students in the treatment group were instructed by the SWH approach, while those in the comparison group were instructed with traditionally designed chemistry instruction."

Kingir, Geban & Gunel, 2013

Constructivism

Can you tell me about the theoretical perspective informing this study?

"Constructivism is increasingly influential in guiding student learning around the world. However, as knowledge is constructed in the minds of students, some of their commonsense ideas are personal, stable, and not congruent with the scientifically accepted conceptions… Students' misconceptions [a.k.a. alternative conceptions] and learning difficulties constitute a major barrier for their learning in various chemistry topics"

Kingir, Geban & Gunel, 2013

Read about constructivist pedagogy

Read about alternative conceptions

'Traditional' teaching versus 'constructivist' teaching

So, what does this suggest about so-called traditional teaching?

"Since prior learning is an active agent for student learning, science educators have been focused on changing these misconceptions with scientifically acceptable ideas. In traditional science teaching, it is difficult for the learners to change their misconceptions…According to the conceptual change approach, learning is the interaction between prior knowledge and new information. The process of learning depends on the degree of the integration of prior knowledge with the new information.2"

Kingir, Geban & Gunel, 2013

And does the Science Writing Heuristic Approach contrast to that?

"The Science Writing Heuristic (SWH) approach can be used to promote students' acquisition of scientific concepts. The SWH approach is grounded on the constructivist philosophy because it encourages students to use guided inquiry laboratory activities and collaborative group work to actively negotiate and construct knowledge. The SWH approach successfully integrates inquiry activities, collaborative group work, meaning making via argumentation, and writing-to-learn strategies…

The negotiation activities are the central part of the SWH because learning occurs through the negotiation of ideas. Students negotiate meaning from experimental data and observations through collaboration within and between groups. Moreover, the student template involves the structure of argumentation known as question, claim, and evidence. …Reflective writing scaffolds the integration of new ideas with prior learning. Students focus on how their ideas changed through negotiation and reflective writing, which helps them confront their misconceptions and construct scientifically accepted conceptions"

Kingir, Geban & Gunel, 2013

What is already known about SWH pedagogy?

It seems like the SWH approach should be effective at supporting student learning. So, has this not already been tested?

"There are many international studies investigating the effectiveness of the SWH approach over the traditional approach … [one team] found that student-written reports had evidence of their science learning, metacognitive thinking, and self-reflection. Students presented reasons and arguments in the meaning-making process, and students' self-reflections illustrated the presence of conceptual change about the science concepts.

[another team] asserted that using the SWH laboratory report format in lieu of a traditional laboratory report format was effective on acquisition of scientific conceptions, elimination of misconceptions, and learning difficulties in chemical equilibrium.

[Another team] found that SWH activities led to greater understanding of grade 6 science concepts when compared to traditional activities. The studies conducted at the postsecondary level showed similar results as studies conducted at the elementary level…

[In two studies] it was demonstrated that the SWH approach can be effective on students' acquisition of chemistry concepts. SWH facilitates conceptual change through a set of argument-based inquiry activities. Students negotiate meaning and construct knowledge, reflect on their own understandings through writing, and share and compare their personal meanings with others in a social context"

Kingir, Geban & Gunel, 2013

What was the point of another experimental test of SWH?

So, it seems that from a theoretical point of view, so-called traditional teaching is likely to be ineffective in bringing about conceptual learning in science, whilst a constructivist approach based on the Science Writing Heuristic is likely to support such learning. Moreover, you are aware of a range of existing studies which suggest that in practice the Science Writing Heuristic is indeed an effective basis for science teaching.

So, what was the point of your study?

"The present study aimed to investigate the effect of the SWH approach compared to traditional chemistry instruction on grade 9 students' understanding of chemical change and mixture concepts."

Kingir, Geban & Gunel, 2013

Okay, I would certainly accept that just because a teaching approach has been found effective with one age group, or in one topic, or in one cultural context, we cannot assume those findings can be generalised and will necessarily apply in other teaching contexts (Taber, 2019).

Read about generalisation from studies

What happened in the experimental condition?

So, what happened in the two classes taught in the experimental condition?

"The teacher asked students to form their own small groups (n=5) and introduced to them the SWH approach …they were asked to suggest a beginning question…, write a claim, and support that claim with evidence…

they shared their questions, claims, and evidence in order to construct a group question, claim, and evidence. …each group, in turn, explained their written arguments to the entire class. … the rest of the class asked them questions or refuted something they claimed or argued. …the teacher summarized [and then] engaged students in a discussion about questions, claims, and evidence in order to make students aware of the meaning of those words. The appropriateness of students' evidence for their claims, and the relations among questions, claims, and evidence were also discussed in the classroom…

The teacher then engaged students in a discussion about …chemical change. First, the teacher attempted to elicit students' prior understanding about chemical change through questioning…The teacher asked students to write down what they wanted to learn about chemical change, to share those items within their group, and to prepare an investigation question with a possible test and procedure for the next class. While students constructed their own questions and planned their testing procedure, the teacher circulated through the groups and facilitated students' thinking through questioning…

Each group presented their questions to the class. The teacher and the rest of the class evaluated the quality of the question in relation to the big idea …The groups' procedures were discussed and revised prior to the actual laboratory investigation…each group tested their own questions experimentally…The teacher asked each student to write a claim about what they thought happened, and support that claim with the evidence. The teacher circulated through the classroom, served as a resource person, and asked …questions

…students negotiated their individual claims and evidence within their groups, and constructed group claims and evidence… each group…presented … to the rest of the class."

Kingir, Geban & Gunel, 2013
What happened in the control condition?

Okay, I can see that the experimental groups experienced the kind of learning activities that both educational theory and previous research suggests are likely to engage them and develop their thinking.

So, what did you set up to compare with the Science Writing Heuristic Approach as a fair test of its effectiveness as a pedagogy?

"In the comparison group, the teacher mainly used lecture and discussion[3] methods while teaching chemical change and mixture concepts. The chemistry textbook was the primary source of knowledge in this group. Students were required to read the related topic from the textbook prior to each lesson….The teacher announced the goals of the lesson in advance, wrote the key concepts on the board, and explained each concept by giving examples. During the transmission of knowledge, the teacher and frequently used the board to write chemical formula[e] and equations and draw some figures. In order to ensure that all of the students understood the concepts in the same way, the teacher asked questions…[that] contributed to the creation of a discussion[3] between teacher and students. Then, the teacher summarized the concepts under consideration and prompted students to take notes. Toward the end of the class session, the teacher wrote some algorithmic problems [sic 4] on the board and asked students to solve those problems individually….the teacher asked a student to come to the board and solve a problem…

The …nature of their laboratory activities was traditional … to verify what students learned in the classroom. Prior to the laboratory session, students were asked to read the procedures of the laboratory experiment in their textbook. At the laboratory, the teacher explained the purpose and procedures of the experiment, and then requested the students to follow the step-by-step instructions for the experiment. Working in groups (n=5), all the students conducted the same experiment in their textbook under the direct control of the teacher. …

The students were asked to record their observations and data. They were not required to reason about the data in a deeper manner. In addition, the teacher asked each group to respond to the questions about the experiment included in their textbook. When students failed to answer those questions, the teacher answered them directly without giving any hint to the students. At the end of the laboratory activity, students were asked to write a laboratory report in traditional format, including purpose, procedure, observations and data, results, and discussion. The teacher asked questions and helped students during the activity to facilitate their connection of laboratory activity with what they learned in the classroom.

Kingir, Geban & Gunel, 2013

The teacher variable

Often in small scale research studies in education, a different teacher teaches each group and so the 'teacher variable' confounds the experiment (Taber, 2019). Here, however, you avoid that problem 5, as you had a sample of four classes, and two different teachers were involved, each teaching one class in each condition?

"In order to facilitate the proper instruction of the SWH approach in the treatment group, the teachers were given training sessions about its implementation prior to the study. The teachers were familiar with the traditional instruction. One of the teachers was teaching chemistry for 20 years, while the other was teaching chemistry for 22 years at a high school. The researcher also asked the teachers to teach the comparison group students in the same way they taught before and not to do things specified for the treatment group."

Kingir, Geban & Gunel, 2013

Was this research ethical?

As this is an imaginary conversation, not all of the questions I might like to ask are actually addressed in the paper. In particular, I would love to know how the authors would justify that their study was ethical, considering that the control condition they set up deliberately excluded features of pedagogy that they themselves claim are necessary to support effective science learning:

"In traditional science teaching, it is difficult for the learners to change their misconceptions"

The authors beleive that "learning occurs through the negotiation of ideas", and their experimental condition provides plenty of opportunity for that. The control condition is designed to avoid the explicit elicitation of learners' idea, dialogic talk, or peer interactions when reading, listening, writing notes or undertaking exercises. If the authors' beliefs are correct (and they are broadly consistent with a wide consensus across the global science education research community), then the teaching in the comparison condition is not suitable for facilitating conceptual learning.

Even if we think it is conceivable that highly experienced teachers, working in a national context where constructivist teaching has long been official education policy, had somehow previously managed to only teach in an ineffective way: was it ethical to ask these teachers to teach one of their classes poorly even after providing them with professional development enabling them to adopt a more engaging approach better aligned with our understanding of how science can be effectively taught?

Read about unethical control conditions

Given that the authors already believed that –

  • "Students' misconceptions and learning difficulties constitute a major barrier for their learning in various chemistry topics"
  • "knowledge is constructed in the minds of students"
  • "The process of learning depends on the degree of the integration of prior knowledge with the new information"
  • "learning occurs through the negotiation of ideas"
  • "The SWH approach successfully integrates inquiry activities, collaborative group work, meaning making" – A range of previous studies have shown that SWH effectively supports student learning

– why did they not test the SWH approach against existing good practice, rather than implement a control pedagogy they knew should not be effective, so setting up two classes of learners (who do not seem to have been asked to consent to being part of the research) to fail?

Read about the expectation for voluntary informed consent

Why not set up a genuinely informative test of the SWH pedagogy, rather than setting up conditions for manufacturing a forgone conclusion?


When it has already been widely established that a pedagogy is more effective than standard practice, there is little point further testing it against what is believed to be ineffective instruction.

Read about level of contol in experiments


How can it be ethical to ask teachers to teach in a way that is expected to be ineffective?

  • transmission of knowledge
  • follow the step-by-step instructions
  • not required to reason in a deeper manner
  • individual working

A rhetorical experiment?

Is this not just a 'rhetorical' experiment engineered to produce a desired outcome (a demonstration), rather than an open-ended enquiry (a genuine experiment)?

A rhetorical experiment is not designed to produce substantially new knowledge: but rather to create the conditions for a 'positive' result (Figure 8 from Taber, 2019).

Read about rhetorical experiments


A technical question

Any study of a teaching innovation requires the commitment of resources and some disruption of teaching. Therefore any research study which has inherent design faults that will prevent it producing informative outcomes can be seen as a misuse of resources, and an unproductive disruption of school activities, and so, if only in that sense, unethical.

As the research was undertaken with "four intact classes" is it possible to apply any statistical tests that can offer meaningful results, when there are only two units of analysis in each condition? [That is, I think not.]

The researchers claim to have 117 degrees of freedom when applying statistical tests to draw conclusions. They seem to assume that each of the 122 children can be considered to be a separate unit of analysis. But is it reasonable to assume that c.30 children taught together in the same intact class by the same teacher (and working in groups for at least part of the time) are independently experiencing the (experimental or control) treatment?

Surely, the students within a class influence each other's learning (especially during group-work), so the outcomes of statistical tests that rely on treating each learner as an independent unit of analysis are invalid (Taber, 2019). This is especially so in the experimental treatment where dialogue (and "the negotiation of ideas") through group-work, discussion, and argumentation were core parts of the instruction.

Read about units of analysis

Sources cited:

  • Ausubel, D. P. (1968). Educational Psychology: A cognitive view. Holt, Rinehart & Winston.
  • Kingir, S., Geban, O., & Gunel, M. (2013). Using the Science Writing Heuristic Approach to Enhance Student Understanding in Chemical Change and Mixture. Research in Science Education, 43(4), 1645-1663. https://doi.org/10.1007/s11165-012-9326-x
  • Taber, K. S. (2019). Experimental research into teaching innovations: responding to methodological and ethical challengesStudies in Science Education, 55(1), 69-119. doi:10.1080/03057267.2019.1658058 [Download]

Notes:

1 I have used direct quotes from the published report in Research in Science Education (but I have omitted citations to other papers), with some emphasis added. Please refer to the full report of the study for further details. I have attempted to extract relevant points from the paper to develop an argument here. I have not deliberately distorted the published account by selection and/or omission, but clearly am only reproducing small extracts. I would recommend readers might access the original study in order to make up their own minds.


2 The next statement is "If individuals know little about the subject matter, new information is easily embedded in their cognitive structure (assimilation)." This is counter to the common thinking that learning about an unfamiliar topic is more difficult, and learning is made meaningful when it can be related to prior knowledge (Ausubel, 1968).

Read about making the unfamiliar familiar


3 The term 'discussion' might suggest an open-ended exchange of ideas and views. This would be a dialogic technique typical of constructivist approaches. From the wider context its seems likely something more teacher-directed and closed than this was meant here – but this is an interpretation which goes beyond the description available in the original text.

Read about dialogic learning


4 Researchers into problem-solving consider that a problem has to require a learner to do more that simply recall and apply previously learned knowledge and techniques – so an 'algorithmic problem' might be considered an oxymoron. However, it is common for teachers to refer to algorithmic exercises as 'problems' even though they do not require going beyond application of existing learning.


5 This design does avoid the criticism that one of the teacher may have just been more effective at teaching the topic to this age group, as both teachers teach in both conditions.

This does not entirely remove potential confounds as teachers interact differently with different classes, and with only four teacher-class combinations it could well be that there is better rapport in the two classes in one or other condition. It is very hard to see how this can be addressed (except by having a large enough sample of classes to allow inferential statistics to be used rigorously – which is not feasible in small scale studies).

A potentially more serious issue is 'expectancy' effects. There is much research in education and other social contexts to show that people's beliefs and expectations influence outcomes of studies – and this can make a substantial difference. If the two teachers were unconvinced by the newfangled and progressive approach being tested, then this could undermine their ability to effectively teach that way.

On the other hand, although it is implied that these teachers normally teach in the 'traditional' way, actually constructivist approaches are recommended in Turkey, and are officially sanctioned, and widely taught in teacher education and development courses. If the teachers accepted the arguments for believing the SWH was likely to be more effective at bringing about conceptual learning than the methods they were asked to adopt in the comparison classes, that would further undermine that treatment as a fair control condition.

Read about expectancy effects in research

Again, there is very little researchers can do about this issue as they cannot ensure that teachers participating in research studies are equally confident in the effectivenes of different treatments (and why should they be – the researchers are obviously expecting a substantive difference*), and this is a major problem in studies into teaching innovations (Taber, 2019).

* This is clear from their paper. Is it likely that they would have communicated this to the teachers? "The teachers were given training sessions about [SWH's] implementation prior to the study." Presumably, even if somehow these experienced teachers had previously managed to completely avoid or ignore years of government policy and guidance intending to persuade them of the value of constructivist approaches, the researchers could not have offered effective "training sessions" without explaining the rationales of the overall approach, and for the specific features of the SWH that they wanted teachers to adopt.


Shock result: more study time leads to higher test scores

(But 'all other things' are seldom equal)


Keith S. Taber


I came across an interesting journal article that reported a quasi-experimental study where different groups of students studied the same topic for different periods of time. One group was given 3 half-hour lessons, another group 5 half-hour lessons, and the third group 8 half-hour lessons. Then they were tested on the topic they had been studying. The researchers found that the average group performance was substantially different across the different conditions. This was tested statistically, but the results were clear enough to be quite impressive when presented visually (as I have below).


Results from a quasi-experiment: its seems more study time can lead to higher achievement

These results seem pretty clear cut. If this research could be replicated in diverse contexts then the findings could have great significance.

  • Is your manager trying to cut course hours to save budget?
  • Does your school want you to teach 'triple science' in a curriculum slot intended for 'double science'?
  • Does your child say they have done enough homework?

Research evidence suggests that, ceteris paribus, learners achieve more by spending more time studying.

Ceteris paribus?

That is ceteris paribus (no, it is not a newly discovered species of whale): all other things being equal. But of course, in the real world they seldom – if ever – are.

If you wondered about the motivation for a study designed to see whether more teaching led to more learning (hardly what Karl Popper would have classed as a suitable 'bold conjecture' on which to base productive research), then I should confess I am being disingenuous. The information I give above is based on the published research, but offers a rather different take on the study from that offered by the authors themselves.

An 'alternative interpretation' one might say.

How useful are DARTs as learning activities?

I came across this study when looking to see if there was any research on the effectiveness of DARTs in chemistry teaching. DARTs are directed activities related to text – that is text-based exercises designed to require learners to engage with content rather than just copy or read it. They have long been recommended, but I was not sure I had seen any published research on their use in science classrooms.

Read about using DARTs in teaching

Shamsulbahri and Zulkiply (2021) undertook a study that "examined the effect of Directed Activity Related to Texts (DARTs) and gender on student achievement in qualitative analysis in chemistry" (p.157). They considered their study to be a quasi-experiment.

An experiment…

Experiment is the favoured methodology in many areas of natural science, and, indeed, the double blind experiment is sometimes seen as the gold standard methodology in medicine – and when possible in the social sciences. This includes education, and certainly in science education the literature reports many, many educational experiments. However, doing experiments well in education is very tricky and many published studies have major methodological problems (Taber, 2019).

Read about experiments in education

…requires control of variables

As we teach in school science, fair testing requires careful control of variables.

So, if I suggest there are some issues that prevent a reader from being entirely confident in the conclusions that Shamsulbahri and Zulkiply reach in their paper, it should be borne in mind that I think it is almost impossible to do a rigorously 'fair' small-scale experiment in education. By small-scale, I mean the kind of study that involves a few classes of learners as opposed to studies that can enrol a large number of classes and randomly assign them to conditions. Even large scale randomised studies are usually compromised by factors that simply cannot be controlled in educational contexts (Taber, 2019) , and small scale studies are subject to additional, often (I would argue) insurmountable, 'challenges'.

The study is available on the web, open access, and the paper goes into a good deal of detail about the background to, and aspects of, the study. Here, I am focusing on a few points that relate to my wider concerns about the merits of experimental research into teaching, and there is much of potential interest in the paper that I am ignoring as not directly relevant to my specific argument here. In particular, the authors describe the different forms of DART they used in the study. As, inevitably (considering my stance on the intrinsic problems of small-scale experiments in education), the tone of this piece is critical, I would recommend readers to access the full paper and make up your own minds.

Not a predatory journal

I was not familiar with the journal in which this paper was published – the Malaysian Journal of Learning and Instruction. It describes itself as "a peer reviewed interdisciplinary journal with an international advisory board". It is an open access journal that charges authors for publication. However, the publication fees are modest (US$25 if authors are from countries that are members of The Association of Southeast Asian Nations, and US$50 otherwise). This is an order of magnitude less than is typical for some of the open-access journals that I have criticised here as being predatory – those which do not engage in meaningful peer review, and will publish some very low quality material as long as a fee is paid. 25 dollars seems a reasonable charge for the costs involved in publishing work, unlike the hefty fees charged by many of the less scrupulous journals.

Shamsulbahri and Zulkiply seem, then, to have published in a well-motivated journal and their paper has passed peer review. But this peer thinks that, like most small scale experiments into teaching, it is very hard to draw any solid conclusions from this work.

What do the authors conclude?

Shamsulbahri and Zulkiply argue that their study shows the value of DARTs activities in learning. I approach this work with a bias, as I also think DARTs can be very useful. I used different kinds of DARTs extensively in my teaching with 14-16 years olds when I worked in schools.

The authors claim their study,

"provides experimental evidence in support of the claim that the DARTs method has been beneficial as a pedagogical approach as it helps to enhance qualitative analysis learning in chemistry…

The present study however, has shown that the DARTs method facilitated better learning of the qualitative analysis component of chemistry when it was combined with the experimental method. Using the DARTs method only results in better learning of qualitative analysis component in chemistry, as compared with using the Experimental method only."

Shamsulbahri & Zulkiply, 2021

Yet, despite my bias, which leads me to suspect they are right, I do not think we can infer this much from their quasi-experiment.

I am going to separate out three claims in the quote above:

  1. the DARTs method has been beneficial as a pedagogical approach as it helps to enhance qualitative analysis learning in chemistry
  2. the DARTs method facilitated better learning of the qualitative analysis component of chemistry when it was combined with the [laboratory1] method
  3. the DARTs method [by itself] results in better learning of qualitative analysis component in chemistry, as compared with using the [laboratory] method only.

I am going to suggest that there are two weak claims here and one strong claim. The weak claims are reasonably well supported (but only as long as they are read strictly as presented and not assumed to extend beyond the study) but the strong claim is not.

Limitations of the experiment

I suggest there are several major limiations of this research design.

What population is represented in the study?

In a true experiment researchers would nominate the population of interest (say, for example, 14-16 year old school learners in Malaysia), and then randomly select participants from this population who would be randomly assigned to the different conditions being compared. Random selection and assignment cannot ensure that the groupings of participants are equivalent, nor that the samples genuinely represent the population; as by chance it could happen that, say, the most studious students are assigned to one condition and all the lazy students to an other – but that is very unlikely. Random selection and assignment means that there is strong statistical case to think the outcomes of the experiment probably represent (more or less) what would have happened on a larger scale had it been possible to include the whole population in the experiment.

Read about sampling in research

Obviously, researchers in small-scale experiments are very unlikely to be able to access full populations to sample. Shamsulbahri and Zulkiply did not – and it would be unreasonable to criticise them for this. But this does raise the question of whether what happens in their samples will reflect what would happen with other groups of students. Shamsulbahri and Zulkiply acknowledge their sample cannot be considered typical,

"One limitation of the present study would be the sample used; the participants were all from two local fully residential schools, which were schools for students with high academic performance."

Shamsulbahri & Zulkiply, 2021

So, we have to be careful about generalising from what happened in this specific experiment to what we might expect with different groups of learners. In that regard, two of the claims from the paper that I have highlighted (i.e., the weaker claims) do not directly imply these results can be generalised:

  1. the DARTs method has been beneficial as a pedagogical approach…
  2. the DARTs method facilitated better learning of the qualitative analysis component of chemistry when it was combined with the [laboratory] method

These are claims about what was found in the study – not inferences about what would happen in other circumstances.

Read about randomisation in studies

Equivalence at pretest?

When it is not possible to randomly assign participants to the different conditions then there is always the possibility that whatever process has been used to assign conditions to groups produces a bias. (An extreme case would be in a school that used setting, that is assigning students to teaching groups according to achievement, if one set was assigned to one condition, and another set to a different condition.)

In quasi-experiments on teaching it is usual to pre-test students and to present analysis to show that at the start of the experiment the groups 'are equivalent'. Of course, it is very unlikely two different classes would prove to be entriely equivalent on a pre-test, so often there is a judgement made of the test results being sufficiently similar across the conditions. In practice, in many published studies, authors settle for the very weak (and inadequate) test of not finding differences so great that would be very unlikely to occur by chance (Taber, 2019)!

Read about testing for equivalence

Shamsulbahri and Zulkiply did pretest all participants as a screening process to exclude any students who already had good subject knowledge in the topic (qualitative chemical analysis),

"Before the experimental manipulation began, all participants were given a pre-screening test (i.e., the Cation assessment test) with the intention of selecting only the most qualified participants, that is, those who had a low-level of knowledge on the topic….The participants who scored ten or below (out of a total mark of 30) were selected for the actual experimental manipulation. As it turned out, all 120 participants scored 10 and below (i.e., with an average of 3.66 out of 30 marks), which was the requirement that had been set, and thus they were selected for the actual experimental manipulation."

Shamsulbahri & Zulkiply, 2021

But the researchers do not report the mean results for the groups in the three conditions (laboratory1; DARTs; {laboratory+DARTs}) or give any indication of how similar (or not) these were. Nor do these scores seem to have been included as a variable in the analysis of results. The authors seem to be assuming that as no students scored more than one-third marks in the pre-test, then any differences beteen groups at pre-test can be ignored. (This seems to suggest that scoring 30% or 0% can be considered the same level of prior knowledge in terms of the potential influence on further learning and subsequent post-test scores.) That does not seem a sound assumption.

"It is important to note that there was no issue of pre-test treatment interaction in the context of the present study. This has improved the external validity of the study, since all of the participants were given a pre-screening test before they got involved in the actual experimental manipulation, i.e., in one of the three instructional methods. Therefore, any differences observed in the participants' performance in the post-test later were due to the effect of the instructional method used in the experimental manipulation."

Shamsulbahri & Zulkiply, 2021 (emphasis added)

There seems to be a flaw in the logic here, as the authors seem to be equating demonstrating an absence of high scorers at pre-test with there being no differences between groups which might have influenced learning. 2

Units of analysis

In any research study, researchers need to be clear regarding what their 'unit of analysis' should be. In this case the extreme options seem to be:

  • 120 units of analysis: 40 students in each of three conditions
  • 3 units of analysis: one teaching group in each condition

The key question is whether individual learners can be considered as being subject to the treatment conditions independently of others assiged to the same condition.

"During the study phase, student participants from the three groups were instructed by their respective chemistry teachers to learn in pairs…"

Shamsulbahri & Zulkiply, 2021

There is a strong argument that when a group of students attend class together, and are taught together, and interact with each other during class, they strictly should not be considered as learning independently of each other. Anyone who has taught parallel classes that are supposedly equivalent will know that classes take on their own personalities as groups, and the behaviour and learning of individual students is influenced by the particular class ethos.

Read about units of analysis

So, rigorous research into class teaching pedagogy should not treat the individual learners as units of analysis – yet it often does. The reason is obvious – it is only possible to do statistical testing when the sample size is large enough, and in small scale educational experiments the sample size is never going to be large enough unless one…hm…pretends/imagines/considers/judges/assumes/hopes?, that each learner is independently subject to the assigned treatment without being substantially influenced by others in that condition.

So, Shamsulbahri and Zulkiply treated their participants as independent units of analysis and based on this find a statistically significant effect of treatment:

⎟laboratory⎢ vs. ⎟DARTs⎢ vs. ⎟laboratory+DARTs⎢.

That is questionable – but what if, for argument's sake, we accept this assumption that within a class of 40 students the learners can be considered not to influence each other (even their learning partner?) or the classroom more generally sufficiently to make a difference to others in the class?

A confounding variable?

Perhaps a more serious problem with the research design is that there is insufficient control of potentially relevant variables. In order to make a comparison of ⎟laboratory⎢ vs. ⎟DARTs⎢ vs. ⎟laboratory+DARTs⎢ then the only relevant difference between the three treatment conditions should be whether the students learn by laboratory activity, DARTs, or both. There should not be any other differences between the groups in the different treatments that might reasonably be expected to influence the outcomes.

Read about confounding variables

But the description of how groups were set up suggests this was not the case:

"….the researchers conducted a briefing session on the aims and experimental details of the study for the school's [schools'?] chemistry teachers…the researchers demonstrated and then guided the school's chemistry teachers in terms of the appropriate procedures to implement the DARTs instructional method (i.e., using the DARTs handout sheets)…The researcher also explained to the school's chemistry teachers the way to implement the combined method …

Participants were then classified into three groups: control group (experimental method), first treatment group (DARTs method) and second treatment group (Combination of experiment and DARTs method). There was an equal number of participants for each group (i.e., 40 participants) as well as gender distribution (i.e., 20 females and 20 males in each group). The control group consisted of the participants from School A, while both treatment groups consisted of participants from School B"


Shamsulbahri & Zulkiply, 2021

Several different teachers seems to have been involved in teaching the classes, and even if it is not entirely clear how the teaching was divided up, it is clear that the group that only undertook the laboratory activities were from a different school than those in the other two conditions.

If we think one teacher can be replaced by another without changing learning outcomes, and that schools are interchangeable such that we would expect exactly the same outcomes if we swapped a class of students from one school for a class from another school, then these variables are unimportant. If, however, we think the teacher doing the teaching and the school from which learners are sampled could reasonably make a difference to the learning achieved, then these are confounding variables which have not been properly controlled.

In my own experience, I do not think different teachers become equivalent even when their are briefed to teach in the same way, and I do not think we can assume schools are equivalent when providing students to participate in learning. These differences, then, undermine our ability to assign any differences in outcomes as due to the differences in pedagogy (that "any differences observed…were due to the effect of the instructional method used").

Another confounding variable

And then I come back to my starting point. Learners did not just experience different forms of pedagogy but also different amounts of teaching. The difference between 3 lessons and 5 lessons might in itself be a factor (that is, even if the pedagogy employed in those lessons had been the same), as might the difference between 5 lessons and 8 lessons. So, time spent studying must be seen as a likely confounding variable. Indeed, it is not just the amount of time, but also the number of lessons, as the brain processes learning between classes and what is learnt in one lesson can be reinforced when reviewed in the next. (So we could not just assume, for example, that students automatically learn the same amount from, say, two 60 min. classes and four 30 min. classes covering the same material.)

What can we conclude?

As with many experiments in science teaching, we can accept the results of Shamsulbahri and Zulkiply's study, in terms of what they found in the specific study context, but still not be able to draw strong conclusions of wider significance.

Is the DARTs method beneficial as a pedagogical approach?

I expect the answer to this question is yes, but we need to be careful in drawing this conclusion from the experiment. Certainly the two groups which undertook the DARTs activities outperformed the group which did not. Yet that group was drawn from a different school and taught by a different teacher or teachers. That could have explained why there was less learning. (I am not claiming this is so – the point is we have no way of knowing as different variables are conflated.) In any case, the two groups that did undertake the DARTs activity were both given more lessons and spent substantially longer studying the topic they were tested on, than the class that did not. We simply cannot make a fair comparison here with any confidence.

Did the DARTs method facilitate better learning when it was combined with laboratory work?

There is a stronger comparison here. We still do not know if the two groups were taught by the same teacher/teachers (which could make a difference) or indeed whether the two groups started from a very similar level of prior knowledge. But, at least the two groups were from the same school, and both experienced the same DARTs based instruction. Greater learning was achieved when students undertook laboratory work as well as undertaking DARTs activities compared with students who only undertook the DARTs activity.

The 'combined' group still had more teaching than the DARTs group, but that does not matter here in drawing a logical conclusion because the question being explored is of the form 'does additional teaching input provide additional value?' (Taber, 2019). The question here is not whether one type of pedagogy is better than the other, but simply whether also undertaking practical works adds something over just doing the paper based learning activities.

Read about levels of control in experimental design

As the sample of learners was not representative of any specfiic wider population, we cannot assume this result would generalise beyond the participants in the study, although we might reasonably expect this result would be found elsewhere. But that is because we might already assume that learning about a practical activity (qualitative chemical analysis) will be enhanced by adding some laboratory based study!

Does DARTs pedagogy produce more learning about qualitative analysis than laboratory activities?

Shamsulbahri and Zulkiply's third claim was bolder because it was framed as a generalisation: instruction through DARTs produces more learning about qualitative analysis than laboratory-based instruction. That seems quite a stretch from what the study clearly shows us.

What the research does show us with confidence is that a group of 40 students in one school taught by a particular teacher/teaching team with 5 lessons of a specific set of DARTs activities, performed better on a specific assessment instrument than a different group of 40 students in another school taught by a different teacher/teaching team through three lessons of laboratory work following a specific scheme of practical activities.


a group of 40 students
performed better on a specific assessment instrumentthan a different group of 40 students
in one schoolin another school
taught by a particular teacher/teaching team
taught by a different teacher/teaching team
with 5 lessonsthrough 3 lessons
of a specific set of DARTs activities, of laboratory work following a specific scheme of practical activities
Confounded variables

Test instrument bias?

Even if we thought the post-test used by Shamsulbahri and Zulkiply was perfectly valid as an assessment of topic knowledge, we might be concerned by knowing that learning is situated in a context – we better recall in a similar context to that in which we learned.


How can we best assess students' learning about qualitative analysis?


So:

  • should we be concerned that the form of assessment, a paper-based instrument, is closer in nature to the DARTs learning experience than the laboratory learning experience?

and, if so,

  • might this suggest a bias in the measurement instrument towards one treatment (i.e., DARTs)

and, if so,

  • might a laboratory-based assessment have favoured the group that did the laboratory based learning over the DARTs group, and led to different outcomes?

and, if so,

  • which approach to assessment has more ecological validity in this case: which type of assessment activity is a more authentic way of testing learning about a laboratory-based activity like qualitative chemical analysis?

A representation of my understanding of the experimental design

Can we generalise?

As always with small scale experiments into teaching, we have to judge the extent to which the specifics of the study might prevent us from generalising the findings – to be able to assume they would generally apply elsewhere.3 Here, we are left to ask to what extent we can

  • ignore any undisclosed difference between the groups in levels of prior learning;
  • ignore any difference between the schools and their populations;
  • ignore any differences in teacher(s) (competence, confidence, teaching style, rapport with classes, etc.);
  • ignore any idiosyncrasies in the DARTs scheme of instruction;
  • ignore any idiosyncrasies in the scheme of laboratory instruction;
  • ignore any idiosyncrasies (and potential biases) in the assessment instrument and its marking scheme and their application;

And, if we decide we can put aside any concerns about any of those matters, we can safely assume that (in learning this topic at this level)

  • 5 sessions of learning by DARTs is more effective than 3 sessions of laboratory learning.

Then we only have to decide if that is because

  • (i) DARTs activities teach more about this topic at this level than laboratory activities, or
  • (ii) whether some or all of the difference in learning outcomes is simply because 150 minutes of study (broken into five blocks) has more effect than 90 minutes of study (broken into three blocks).

What do you think?


Loading poll ...
Work cited:

Notes:

1 The authors refer to the conditions as

  • Experimental control group
  • DARTs
  • combination of Experiment + DARTs

I am referring to the first group as 'laboratory' both because it not clear the students were doing any experiments (that is, testing hypotheses) as the practical activity was learning to undertake standard analytical tests, and, secondly, to avoid confusion (between the educational experiment and the laboratory practicals).


2 I think the reference to "no issue of pre-test treatment interaction" is probably meant to suggest that as all students took the same pre-test it will have had the same effect on all participants. But this not only ignores the potential effect of any differences in prior knowledge reflected in the pre-test scores that might influence subsequent learning, but also the effect of taking the pre-test cannot be assumed to be neutral if for some learners it merely told them they knew nothing about the topic, whilst for others it activated and so reinforced some prior knowledge in the subject. In principle, the interaction between prior knowledge and taking the pretest could have influenced learning at both cognitive and affective levels: that is, both in terms of consolidation of prior learning and cuing for the new learning; and in terms of a learner's confidence in, and attitude towards, learning the topic.


3 Even when we do have a representative sample of a population to test, we can only infer that the outcomes of an experiment reflect what will be most likely for members (schools, learners, classes, teachers…) of the wider population. Individual differences are such that we can never say that what most probably is the case will always be the case.


When an experiment tests a sample drawn at random from a wider population, then the findings of the experiment can be assumed to apply (on average) to the population. (Source: after Taber, 2019).

Experimental pot calls the research kettle black

Do not enquire as I do, enquire as I tell you


Keith S. Taber


Sotakova, Ganajova and Babincakova (2020) rightly criticised experiments into enquiry-based science teaching on the grounds that such studies often used control groups where the teaching methods had "not been clearly defined".

So, how did they respond to this challenge?

Consider a school science experiment where students report comparing the rates of reaction of 1 cm strips of magnesium ribbon dropped into:
(a) 100 ml of hydrochloric acid of 0.2 mol/dm3 concentration at a temperature of 28 ˚C; and
(b) some unspecified liquid.


This is a bit like someone who wants to check they are not diabetic, but – being worried they are – dips the test strip in a glass of tap water rather than their urine sample.


Basic premises of scientific enquiry and reporting are that

  • when carrying out an experiment one should carefully manage the conditions (which is easier in laboratory research than in educational enquiry) and
  • one should offer detailed reports of the work carried out.

In science there is an ideal that a research report should be detailed enough to allow other competent researchers to repeat the original study and verify the results reported. That repeating and checking of existing work is referred to as replication.

Replication in science

In practice, replication is more problematic for both principled and pragmatic reasons.

It is difficult to communicate tacit knowledge

It has been found that when a researcher develops some new technique, the official report in the literature is often inadequate to allow researchers elsewhere to repeat the work based only on the published account. The sociologist of science, Harry Collins (1992) has explored how there may be minor (but critical) details about the setting-up of apparatus or laboratory procedures that the original researchers did not feel were significant enough to report – or even that the researchers had not been explicitly aware of. Replication may require scientists to physically visit each others' laboratories to learn new techniques.

This should not be surprising, as the chemist and philosopher Michael Polanyi (1962/1969) long ago argued that science relied on tacit knowledge (sometimes known as implicit knowledge) – a kind of green fingers of the laboratory where people learn ways of doing things more as a kind of 'muscle memory' than formal procedural rules.

Novel knowledge claims are valued

The other problem with replication is that there is little to be gained for scientists by repeating other people's work if they believe it is sound, as journals put a premium on research papers that claim to report original work. Even if it proves possible to publish a true replication (at best, in a less prestigious journal), the replication study will just be an 'also ran' in the scientific race.


Copies need not apply!

Scientific kudos and rewards go to those who produce novel work: originality is a common criterion used when evaluating reports submitted to research journals

(Image by Tom from Pixabay)


Historical studies (Shapin & Schaffer, 2011) show that what actually tends to happen is that scientists – deliberately – do not exactly replicate published studies, but rather make adjustments to produce a modified version of the reported experiment. A scientist's mind set is not to confirm, but to seek a new, publishable, result,

  • they say it works for tin, so let's try manganese?
  • they did it in frogs, let's see if it works in toads?
  • will we still get that effect closer to the boiling point?
  • the outcome in broad spectrum light has been reported, but might monochromatic light of some particular frequency be more efficient?
  • they used glucose, we can try fructose

This extends (or finds the limits of) the range of application of scientific ideas, and allows the researchers to seek publication of new claims.

I have argued that the same logic is needed in experimental studies of teaching approaches, but this requires researchers detailing the context of their studies rather better than many do (e.g., not just 'twelve year olds in a public school in country X'),

"When there is a series of studies testing the same innovation, it is most useful if collectively they sample in a way that offers maximum information about the potential range of effectiveness of the innovation. There are clearly many factors that may be relevant. It may be useful for replication studies of effective innovations to take place with groups of different socio-economic status, or in different countries with different curriculum contexts, or indeed in countries with different cultural norms (and perhaps very different class sizes; different access to laboratory facilities) and languages of instruction …It may be useful to test the range of effectiveness of some innovations in terms of the ages of students, or across a range of quite different science topics. Such decisions should be based on theoretical considerations.

…If all existing studies report positive outcomes, then it is most useful to select new samples that are as different as possible from those already tested…When existing studies suggest the innovation is effective in some contexts but not others, then the characteristics of samples/context of published studies can be used to guide the selection of new samples/contexts (perhaps those judged as offering intermediate cases) that can help illuminate the boundaries of the range of effectiveness of the innovation."

Taber, 2019, pp.104-105

When scientists do relish replication

The exception, that tests the 'scientists do not simply replicate' rule, is when it is suspected that a research finding is wrong. Then, an attempt at replication might be used to show a published account is flawed.

For example, when 'cold fusion' was announced with much fanfare (ahead of the peer reviewed publications reporting the research) many scientists simply thought it was highly unlikely that atomic energy generation was going to be possible in fairly standard glassware (not that unlike the beakers and flasks used in school science) at room temperature, and so that there was a challenge to find out what the original researchers had got wrong.

"When it was claimed that power could be generated by 'cold fusion', scientists did not simply accept this, but went about trying it for themselves…Over a period of time, a (near) consensus developed that, when sufficient precautions were made to measure energy inputs and outputs accurately, there was no basis for considering a new revolutionary means of power generation had been discovered.

Taber, 2020, p.18

Of course, one failed replication might just mean the second team did not quite do the experiment correctly, so it may take a series of failed replications to make the point. In this situation, being the first failed replication of many (so being first to correct the record in the literature) may bring prestige – but this also invites the risk of being the only failed replication (so, perhaps, being judged a poorly executed replication) if subsequently other researchers confirm the fidnings of the original study!

So, a single attempt at replication is nether enough to definitely verify nor reject a published result. What all this does show is that the simple notion that there are crucial or critical experiments in science which once reported immediately 'prove' something for all time is a naïve oversimplification of how science works.

Experiments in education

Experiments are often the best way to test ideas about natural phenomena. They tend to be much less useful in education as there are often many potentially relevant variables that usually cannot be measured, let alone controlled, even if they can be identified.

  • Without proper control, you do not have a meaningful experiment.
  • Without a detailed account of the different treatments, and so how the comparison condition is different from the experimental condition, you do not have a useful scientific report, but little more than an anecdote.
Challenges of experimental work in classrooms

Despite this, the research literature includes a vast number of educational studies claiming to be experiments to test this innovation or that (Taber, 2019). Some are very informative. But many are so flawed in design or execution that their conclusions rely more on the researchers' expectations than a logical chain of argument from robust evidence. They often use poorly managed experimental conditions to find differences in learning outcomes between groups of students that are initially not equivalent. 1 (Poorly managed?: because there are severe – practical and ethical – limits on the variables you can control in a school or college classroom.)

Read about expectancy effects in research

Statistical tests are then used which would be informative had there been a genuinely controlled experiment with identical starting points and only the variable of interest being different in the two conditions. Results are claimed by ignoring the inconvenient fact that studies use statistical tests that, strictly, do not apply in the actual conditions studied! Worse than this, occasionally the researchers think they should have got a positive result and so claim one even when the statistical tests suggests otherwise (e.g., read 'Falsifying research conclusions')! In order to try and force a result, a supposed innovation may be compared with control conditions that have been deliberately framed to ensure the learners in that condition are not taught well!

Read about unethical control conditions

A common problem is that it is not possible to randomise students to conditions, so only classes are assigned to treatments randomly. As there are usually only a few classes in each condition (indeed, often only one class in each condition) there are not enough 'units of analysis' to validly use statistical tests. A common solution to this common problem, is…to do the tests anyway, as if there had been randomisation of learners. 2 The computer that crunches the numbers follows a programme that has been written on the assumption researchers will not cheat, so it churns out statistical results and (often) reports significant outcomes due to a misuse of the tests. 3

This is a bit like someone who wants to check they are not diabetic, but being worried they are, dips the test strip in a glass of tap water rather than their urine sample. They cannot blame the technology for getting it wrong if they do not follow the proper procedures.

I have been trying to make a fuss about these issues for some time, because a lot of the results presented in the educational literature are based upon experimental studies that, at best, do not report the research in enough detail, and often, when there is enough detail to be scrutinised, fall well short of valid experiments.

I have a hunch that many people with scientific training are so convinced of the superiority of the experimental method, that they tacitly assume it is better to do invalid experiments into teaching, than adopt other approaches which (whilst not as inherently convincing as a well-designed and executed experiment) can actually offer useful insights in the complex and messy context of classrooms. 4

Read: why do natural scientists tend to make poor social scientists?

So, it is uplifting when I read work which seems to reflect my concerns about the reliance on experiments in those situations where good experiments are not feasible. In that regard, I was reading a paper reporting a study into enquiry-based teaching (Sotakova, Ganajova & Babincakova, 2020) where the authors made the very valid criticism:

"The ambiguous results of research comparing IBSE [enquiry-based science education] with other teaching methods may result from the fact that often, [sic] teaching methods used in the control groups have not been clearly defined, merely referred to as "traditional teaching methods" with no further specification, or there has been no control group at all."

Sotakova, Ganajova & Babincakova, 2020, p.500

Quite right!


The pot calling the kettle black

idiom "that means people should not criticise someone else for a fault that they have themselves" 5 (https://dictionary.cambridge.org/dictionary/english/pot-calling-the-kettle-black)

(Images by OpenClipart-Vectors from Pixabay)


Now, I do not want to appear to be the pot calling the kettle black myself, so before proceeding I should acknowledge that I was part of a major funded research project exploring a teaching innovation in lower secondary science and maths teaching. Despite a large grant, the need to enrol a sufficient number of classes to randomise to treatments to allow statistical testing meant that we had very limited opportunities to observe, and so detail, the teaching in the control condition, which was basically the teachers doing their normal teaching, whilst the teachers of the experimental classes were asked to follow a particular scheme of work.


Results from a randomised trial showing the range of within-condition outcomes (After Figure 5, Taber, 2019)

In the event, the electricity module I was working on produced almost identical mean outcomes as the control condition (see the figure). The spread of outcomes was large, in both sets of conditions – so, clearly, there were significant differences between individual classes that influenced learning: but these differences were even more extreme in the condition where the teachers were supposed to be teaching the same content, in the same order, with the same materials and activities, than in the control condition where teachers were free to do whatever they thought best!

The main thing I learned from this experience is that experiments into teaching are highly problematic.

Anyway, Sotakova, Ganajova and Babincakova were quite right to point out that experiments with poorly defined control conditions are inadequate. Consider a school science experiment designed by students who report comparing the rates of reaction of 1 cm strips of magnesium ribbon dropped into

  • (a) 100 ml of hydrochloric acid of 0.2 mol/dm3 concentration at a temperature of 28 ˚C; and
  • (b) some unspecified liquid.

A science teacher might be disappointed with the students concerned, given the limited informativeness of such an experiment – yet highly qualified science education researchers often report analogous experiments where some highly specified teaching is compared with instruction that is not detailed at all.

The pot decides to follow the example of the kettle

So, what did Sotakova and colleagues do?

"Pre-test and post-test two-group design was employed in the research…Within a specified period of time, an experimental intervention was performed within the experimental group while the control group remained unaffected. The teaching method as an independent variable was manipulated to identify its effect on the dependent variable (in this case, knowledge and skills). Both groups were tested using the same methods before and after the experiment…both groups proceeded to revise the 'Changes in chemical reactions' thematic unit in the course of 10 lessons"

Sotakova, Ganajova & Babincakova, 2020, pp.501, 505.

In the experimental condition, enquiry-based methods were used in five distinct activities as a revision approach (an example activity is detailed in the paper). What about the control conditions?

"…in the control group IBSE was not used at all…In the control group, teachers revised the topic using methods of their choice, e.g. questions & answers, oral and written revision, textbook studying, demonstration experiments, laboratory work."

Sotakova, Ganajova & Babincakova, 2020, pp.502, 505

So, the 'control' condition involved the particular teachers in that condition doing as they wished. The only control seems to be that they were asked not to use enquiry. Otherwise, anything went – and that anything was not necessarily typical of what other teachers might have done. 6

This might have involved any of a number of different activities, such as

  • questions and answers
  • oral and written revision
  • textbook studying
  • demonstration experiments
  • laboratory work

or combinations of them. Call me picky (or a blackened pot), but did these authors not complain that

"The ambiguous results of research comparing IBSE [enquiry-based science education] with other teaching methods may result from the fact that often…teaching methods used in the control groups have not been clearly defined…"

Sotakova, Ganajova & Babincakova, 2020, p.500

Hm.


Work cited

Notes:

1 A very common approach is to use a pre-test to check for significant differences between classes before the intervention. Where differences between groups do not reach the usual criterion for being statistically significant (probability, p<0.05) the groups are declared 'equivalent'. That is, a negative result in a test for unlikely differences is treated inappropriately as an indicator of equivalence (Taber, 2019).

Read about testing for initial equivalence


2 So, for example, a valid procedure may be to enter the mean class scores on some instrument as data, but what are actually entered are the individual students scores as though the students can be treated as independent units rather than members of a treatment class.

Some statistical tests lead to a number (the statistic) which is then compared with the critical value that reaches statistical significance as listed in a table. The number in the table selected depends on the number of 'degrees of freedom' in the experimental design. Often that should be the determined by the number of classes involved in the experiment – but if instead the number of learners is used, a much smaller value of the calculated statistic will seem to reach significance.


3 Some of these studies would surely have given positive outcomes even if they had been able to randomise students to conditions or used a robust test for initial equivalence – but we cannot use that as a justification for ignoring the flaws in the experiment. That would be like claiming a laboratory result was obtained with dilute acid when actually concentrated acid was used – and then justifying the claim by arguing that the same result might have occurred with dilute acid.


4 Consider, for example, a case study that involves researchers in observing teaching, interviewing students and teachers, documenting classroom activities, recording classroom dialogue, collecting samples of student work, etc. This type of enquiry can offer a good deal of insight into the quality of teaching and learning in the class and the processes at work during instruction (and so whether specific outcomes seem to be causally linked to features of the innovation being tested).

Critics of so-called qualitative methods quite rightly point out that such approaches cannot actually show any one approach is better than others – only experiments can do that. Ideally, we need both types of study as they complement each other offering different kinds of information.

The problem with many experiments reported in the education literature is that because of the inherent challenges of setting up genuinely fair testing in educational contexts they are not comparing like with like, and often it is not even clear what the comparison is with! Probably this can only be avoided in very large scale (and so expensive) studies where enough different classrooms can be randomly assigned to each condition to allow statistics to be used.

Why do researchers keep undertaking small scale experimental studies that often lack proper initial equivalence between conditions, and that often have inadequate control of variables? I suggest they will continue to do so as long as research journals continue to publish the studies (and allow them to claim definitive conclusions) regardless of their problems.


5 At a time when cooking was done on open fires, using wood that produced much smoke, the idiom was likely easily understood. In an age of ceramic hobs and electric kettles the saying has become anachronistic.

From the perspective of thermal physics, black cooking pots (rather than shiny reflective surfaces) may be a sensible choice.


6 So, the experimental treatment was being compared with the current standard practice of the teachers assigned to the control condition. It would not matter so much that this varies between teachers, nor that we do not know what that practice is, if we could be confident that the teachers in the control condition were (or were very probably) a representative sample of the wider population of teachers – such as a sufficiently large number of teachers randomly chosen from the wider population (Taber, 2019). Then we would at least know whether the enquiry based approach was an improvement on current common practice.

All we actually know is how the experimental condition fared in comparison with the unknown practices of a small number of teachers who may or may not have been representative of the wider population.