Methodological and procedural flaws in published study

A letter to the editor of the Journal of Chemical Education

the authors draw a conclusion which is contrary to the results of their data analysis and so is invalid and misleading

I have copied below the text of a letter I wrote to the editor of the Journal of Chemical Education, to express my concern about the report of a study published in that journal. I was invited to formally submit the letter for consideration for publication. I did. Following peer review it was rejected.

Often when I see apparent problems in published research, I discuss them here. Usually, the journals concerned are predatory, and do not seem to take peer review seriously. That does not apply here. The Journal of Chemical Education is a long-established, well-respected, periodical published by a national learned scientific society: the American Chemical Society. Serious scientific journals often do publish comments from readers about published articles and even exchanges between correspondents and the original authors of the work commented on. I therefore thought it was more appropriate to express my concerns directly to the journal. 𝛂 On this occasion, after peer review, the editor decided my letter was not suitable for publication. 𝛃

I am aware of the irony – I am complaining about an article which passed peer review in a posting which is publishing a letter submitted, but rejected, after peer review. Readers should bear that in mind. The editor will have carefully considered the submission and the referee recommendations and reports, and decided to decline publication based on journal policy and the evaluation of my submission.

However, having read the peer reviewers' comments (which were largely positive about the submission and tended to agree with my critique 𝜸), I saw no reason to change my mind. If such work is allowed to stand in the literature without comment, it provides a questionable example for other researchers, and, as the abstracts and conclusions from research papers are often considered in isolation (so, here, without being aware that the conclusions contradicted the results), it distorts the research literature.

To my reading, the published study sets aside accepted scientific standards and values – though I very much suspect inadvertently. Perhaps the authors' enthusiasm for their teaching innovation affected their judgement and dulled their critical faculties. We are all prone to that: but one would normally expect such a major problem to have been spotted in peer review, allowing the authors the opportunity to put this right before publication.

Read about falsifying research conclusions


Methodological and procedural flaws in published study

Abstract

A recent study reported in the journal is presented as an experimental test of a teaching innovation. Yet the research design does not meet the conditions for an experiment as there is insufficient control of variables and no random assignment to conditions. The study design used does not allow a comparison of student scores in the 'experimental' and 'control' conditions to provide a valid test of the innovation. Moreover, the authors draw a conclusion which is contrary to the results of their data analysis and so is invalid and misleading. While the authors may well feel justified in putting aside the outcome of their statistical analysis, this goes against good scientific norms and practice.

Dear Editor

I am writing regarding a recent article published in J.Chem.Ed. 1, as I feel the reporting of this study, as published, is contrary to good scientific practice. The article, 'Implementation of the Student-Centered Team-Based Learning Teaching Method in a Medicinal Chemistry Curriculum' reports an innovation in pedagogy, and as such is likely to be of wide interest to readers of the journal. I welcome both this kind of work in developing pedagogy and its reporting to inform others; however, I think the report contravenes normal scientific standards.

Although the authors do not specify the type of research methodology they use, they do present their analysis in terms of 'experimental' and 'control' groups (e.g., p.1856), so it is reasonable to consider they see this as a kind of experimental research. There are many serious challenges when applying experimental method to social research, and it is not always feasible to address all such challenges in educational research designs 2 – but perhaps any report of educational experimental research should acknowledge relevant limitations.

A true experiment requires units of analysis (e.g., students) to be assigned to conditions randomly, as this can avoid (or, strictly, reduce the likelihood) of systematic differences between groups. Here the comparison is across different cohorts. These may be largely similar, but that cannot just be assumed. (Strictly, the comparison group should not be labelled as a 'control' group.2 ) There is clearly a risk of conflating variables.

  • Perhaps admission standards are changing over time?
  • Perhaps the teaching team has been acquiring teaching experience and expertise over time regardless of the innovation?

Moreover, if I have correctly interpreted the information on p.1858 about how student course scores after the introduction of the innovation in part derived from the novel activities in the new approach, then there is no reason to assume that the methodology of assigning scores is equivalent with that used in the 'control' (comparison) condition. The authors seem to simply assume the change in scoring methodology will not of itself change the score profile. Without evidence that assessment is equivalent across cohorts, this is an unsupportable assumption.

As it is not possible to 'blind' teachers and students to conditions there is a very real risk of expectancy effects which have been shown to often operate when researchers are positive about an innovation – when introducing the investigated innovation, teachers

  • may have a new burst of enthusiasm,
  • perhaps focus more than usual on this aspect of their work,
  • be more sensitive to students responses to teaching and so forth.

(None of this needs to be deliberate to potentially influence outcomes.) Although (indeed, perhaps because) there is often little that can be done in a teaching situation to address these challenges to experimental designs, it seems appropriate for suitable caveats to be included in a published report. I would have expected to have seen such caveats here.

However, a specific point that I feel must be challenged is in the presentation of results on p.1860. When designing an experiment, it is important to specify before collecting data how one will know what to conclude from the results. The adoption of inferential statistics is surely a commitment to accepting the outcomes of the analysis undertaken. Li and colleagues tell readers that "We used a t test to test whether the SCTBL method can create any significant difference in grades among control groups and the experimental group" and that "there is no significant difference in average score". This is despite the new approach requiring an "increased number of study tasks, and longer preclass preview time" (pp.1860-1).

I would not suggest this is necessarily a good enough reason for Li and colleagues to give up on their innovation, as they have lived experience of how it is working, and that may well offer good grounds for continuing to implement, refine, and evaluate it. As the authors themselves note, evaluation "should not only consider scores" (p.1858).

However, from a scientific point of view, this is a negative result. That certainly should not exclude publication (it is recognised that there is a bias against publishing negative results which distorts the literature in many fields) but it suggests, at the very least, that more work is needed before a positive conclusion can be drawn.

Therefore, I feel it is scientifically invalid for the authors to argue that as "the average score showed a constant [i.e., non-significant] upward trend, and a steady [i.e., non-significant] increase was found" they can claim their teaching "method brought about improvement in the class average, which provides evidence for its effectiveness in medicinal chemistry". Figure 4 reiterates this: a superficially impressive graphic, even if omits the 2018 data, actually shows just how little scores changed when it is noticed that the x-axis has a range only from 79.4-80.4 (%, presumably). The size of the variation across four cohorts (<1%, "an obvious improvement trend"?) is not only found to not be significant but can be compared with how 25% of student scores apparently derived from different types of assessment in the different conditions. 3

To reiterate, this is an interesting study, reporting valuable work. There might be very good reasons to continue the new pedagogic approach even if it does not increase student scores. However, I would argue that it is simply scientifically inadmissible to design an experiment where data will be analysed by statistical tests, and then to offer a conclusion contrary to the results of those tests. A reader who skipped to the end of the paper would find "To conclude, our results suggest that the SCTBL method is an effective way to improve teaching quality and student achievement" (p.1861) but that is to put aside the results of the analysis undertaken.


Keith S. Taber

Emeritus Professor of Science Education, University of Cambridge

References

1 Li, W., Ouyang, Y., Xu, J., & Zhang, P. (2022). Implementation of the Student-Centered Team- Based Learning Teaching Method in a Medicinal Chemistry Curriculum. Journal of Chemical Education, 99(5), 1855-1862. https://doi.org/10.1021/acs.jchemed.1c00978

2 Taber, K. S. (2019). Experimental research into teaching innovations: responding to methodological and ethical challenges. Studies in Science Education, 55(1), 69-119. https://doi.org/10.1080/03057267.2019.1658058

3 I felt there was some ambiguity regarding what figures 4a and 4b actually represent. The labels suggest they refer to "Assessment levels of pharmaceutical engineering classes [sic] in 2017-2020" and "Average scores of the medicinal chemistry course in the control group and the experimental group" (which might, by inspection, suggest that achievement on the medicinal chemistry course is falling behind shifts across the wider programme), but the references in the main text suggest that both figures refer only to the medicinal chemistry course, not the wider pharmaceutical engineering programme. Similarly, although the label for (b) refers to 'average scores' for the course, the text suggests the statistical tests were only applied to 'exam scores' (p.1858) which would only amount to 60% of the marks comprising the course scores (at least in 2018-2020; the information on how course scores were calculated for the 2017 cohort does not seem to be provided but clearly could not follow the methodology reported for the 2018-2020 cohorts). So, given that (a) and (b) do not seem consistent, it may be that the 'average scores' in (b) refers only to examination scores and not overall course scores. If so, that would at least suggest the general assessment methodology was comparable, as long as the setting and marking of examinations are equivalent across different years. However, even then, a reader would take a lot of persuasion that examination papers and marking are so consistent over time that changes of a third or half a percentage point between cohorts exceeds likely measurement error.


Read: Falsifying research conclusions. You do not need to falsify your results if you are happy to draw conclusions contrary to the outcome of your data analysis.


Notes:

𝛂 This is the approach I have taken previously. For example, a couple of years ago a paper was published in the Royal Society of Chemistry's educational research journal, Chemistry Education Research and Practice, which to my reading had similar issues, including claiming "that an educational innovation was effective despite outcomes not reaching statistical significance" (Taber, 2020).

Taber, K. S. (2020). Comment on "Increasing chemistry students' knowledge, confidence, and conceptual understanding of pH using a collaborative computer pH simulation" by S. W. Watson, A. V. Dubrovskiy and M. L. Peters, Chem. Educ. Res. Pract., 2020, 21, 528. Chemistry Education Research and Practice. doi:10.1039/D0RP00131G


𝛃 I wrote directly to the editor, Prof. Tom Holme on 12th July 2022. I received a reply the next day, inviting me to submit my letter through the journal's manuscript submission system. I did this on the 14th.

I received the decision letter on 15th September. (The "manuscript is not suitable for publication in the Journal of Chemical Education in its present form.") The editor offered to consider a resubmission of "a thoroughly rewritten manuscript, with substantial modification, incorporating the reviewers' points and including any additional data they recommended". I decided that, although I am sure the letter could have been improved in some senses, any new manuscript sufficiently different to be considered "thoroughly rewritten manuscript, with substantial modification" would not so clearly make the important points I felt needed to be made.


𝜸 There were four reviewers. The editor informed me that the initial reviews led to a 'split' perspective, so a fourth referee was invited.

  • Referee 1 recommended that the letter was published as submitted.
  • Referee 2 recommended that the letter was published as submitted.
  • Referee 3 recommended major revisions should be undertaken.
  • Referee 4 recommended rejection.

Read more about peer review and editorial decisions

Author: Keith

Former school and college science teacher, teacher educator, research supervisor, and research methods lecturer. Emeritus Professor of Science Education at the University of Cambridge.

Leave a Reply

Your email address will not be published. Required fields are marked *

Discover more from Science-Education-Research

Subscribe now to keep reading and get access to the full archive.

Continue reading