peer review - Science-Education-Research

The best science education journal

Where is the best place to publish science education research?

Keith S. Taber

Outlet	Description	Notes
International Journal of Science Education	Top-tier general international science education journal	Historically associated with the European Science Education Research Association
Science Education	Top-tier general international science education journal
Journal of Research in Science Teaching	Top-tier general international science education journal	Associated with NARST
Research in Science Education	Top-tier general international science education journal	Associated with the Australasian Science Education Research Association
Studies in Science Education	Leading journal for publishing in-depth reviews of topics in science education
Research in Science and Technological Education	Respected general international science education journal
International Journal of Science and Maths Education	Respected general international science education journal	Founded by the National Science and Technology Council, Taiwan
Science Education International	Publishes papers that focus on the teaching and learning of science in school settings ranging from early childhood to university education	Published by the International Council of Associations for Science Education
Science & Education	Has foci of historical, philosophical, and sociological perspectives on science education	Associated with the International History, Philosophy, and Science Teaching Group
Journal of Science Teacher Education	Concerned with the preparation and development of science teachers	Associated with the Association for Science Teacher Education
International Journal of Science Education, Part B – Communication and Public Engagement	Concerned with research into science communication and public engagement / understanding of science
Cultural Studies of Science Education	Concerned with science education as a cultural, cross-age, cross-class, and cross-disciplinary phenomenon
Journal of Science Education and Technology	Concerns the intersection between science education and technology.
Disciplinary and Interdisciplinary Science Education Research	Concerned with science education within specific disciplines and between disciplines.	Affiliated with the Faculty of Education, Beijing Normal University
*Journal of Biological Education*	For research specifically within biology education	Published for the Royal Society of Biology.
Journal of Chemical Education	A long-standing journal of chemistry education, which includes a section for Chemistry Education Research papers	Published by the American Chemical Society.
Chemistry Education Research and Practice	The leading research journal for chemistry education	Published by the Royal Society of Chemistry

Some of the places to publish research in science education

I was recently asked which was the best journal in which to seek publication of science education research. This was a fair question, given that I had been been warning of the large number of low quality journals now diluting the academic literature.

I had been invited to give a seminar talk to the Physics Education and Scholarship Section in the Department of Physics at Durham University. I had been asked to talk on the theme of 'Publishing research in science education'.

The talk considered the usual processes involved in submitting a paper to a research journal and the particular responsibilities involved for authors, editors and reviewers. In the short time available I said a little about ethical issues, including difficulties that can arise when scholars are not fully aware of, or decide to ignore, the proper understanding of academic authorship ¹ . I also discussed some of the specific issues that can arise when those with research training in the natural sciences undertake educational research without any further preparation (for example, see: Why do natural scientists tend to make poor social scientists?), such as underestimating the challenge of undertaking valid experiments in educational contexts.

I had not intended to offer advice on specific journals for the very good reasons that

there are a lot of journals
my experience of them is very uneven
I have biases!
knowledge of journals can quickly become out of date when publishers change policies, or editorial teams change

However, it was pointed out that there does not seem to be anywhere where such advice is readily available, so I made some comments based on my own experience. I later reflected that some such guidance could be useful, especially to those new to research in the area.

I do, in the 'Research methodology' section of the site, offer some advice to the new researcher on 'Publishing research', that includes some general advice on things to consider when thinking about where to send your work:

Read about 'Selecting a research journal: Selecting an outlet for your research articles'

Although I name check some journals there, I did not think I should offer strong guidance for the reasons I give above. However, taking on board the comment about the lack of guidance readily available, I thought I would make some suggestions here, with the full acknowledgement that this is a personal perspective, and that the comments facility below will allow other views and potential correctives to my biases! If I have missed an important journal, or seem to have made a misjudgement, then please tell me and (more importantly) other readers who may be looking for guidance.

Publishing in English?

My focus here is on English language journals. There are many important journals that publish in other languages such as Spanish. However, English is often seen as the international language for reporting academic research, and most of the journals with the greatest international reach work in the English language.

These journals publish work from all around the world, which therefore includes research into contexts where the language of instruction is NOT English, and where data is collected, and often analysed, in the local language. In these cases, reporting research in English requires translating material (curriculum materials, questions posed to participants, quotations from learners etc.) into English. That is perfectly acceptable, but translation is a skilled and nuanced activity, and needs to be acknowledged and reported, and some assurance of the quality of translation offered (Taber, 2018).

Read about guidelines for good practice regarding translation in reporting research

Science research journal or science education journal?

Sometime science research journals will publish work on science education. However, not all science journals will consider this, and even for those that do, this tends to be an occasional event.

With the advent of open-access, internet accessible publishing, some academic publishers are offering journals with very wide scope (presumably as it is considered that in the digital age it is easier to find research without it needing to be in a specialist journal), however, authors should be wary of journals that have titles implying a specialist scientific focus but which seem to accept material from a wide range of fields, as this is one common indicator of predatory journals – that is, journals which do not use robust peer review (despite what they may claim) and have low quality standards.

Read about predatory journals

There are some scientific journals with an interdisciplinary flavour which are not education journals per se, but are open to suitable submissions on educational topics. I am most familiar (disclosure of interest, being on the Editorial Board) is Foundations of Chemistry (published by Springer).

"Foundations of Chemistry is an international journal is an interdisciplinary forum in which chemists, biochemists, philosophers, historians, educators and sociologists discuss conceptual and fundamental issues which relate to the `central science' of chemistry."

Science Education Journal or Education Journal?

Then, there is the question of whether to publish work in specialist science education journals or one of the many more general education journals. (There are too many to discuss them here.) General education journals will sometimes publish work from within science education, as long as they feel it is of high enough general interest to their readership. This may in part be a matter of presentation – if the paper is written so it is only understandable to subject specialists, and only makes recommendations for specialists in science education, it is unlikely to seem suitable for a more general journal.

On the other hand, just because research has been undertaken in science teaching and learning context, this may not make it of particular interest to science educators if the research aims, conceptualisation, conclusions and recommendations concern general educational issues, and anything that may be specific to science teaching and learning is ignored in the research – that is, if a science classroom was chosen just as a matter of convenience, but the work could have been just as well undertaken in a different curriculum context (Taber, 2013).

Research Journal or Professional Journal?

Another general question is whether it is best to send one's work to an academic research journal (offering more kudos for the author{s} if published) or a journal widely read by practitioners (but usually considered less prestigious when a scholar's academic record is examined for appointment and promotion). These different types of output usually have different expectations about the tone and balance of articles:

Read about Research journals and practitioner journals

Some work is highly theoretical, or is focussed on moving forward a research field – and is unlikely to be seen as suitable for a teacher's journal. Other useful work may have developed and evaluated new educational resources, but without critically exploring any educational questions in any depth. Information about this project would likely be of great interest to teachers, but is unlikely to meet the criteria to be accepted for publication in a research journal.

But what about a genuine piece of research that would be of interest to other researchers in the field, but also leads to strong recommendations for policy and practice? Here you do not have to choose one or other option. Although you cannot publish the same article in different journals, a research report sent to an academic journal and an article for teachers would be sufficiently different, with different emphases and weightings. For example, a professional journal does not usually want a critical literature review and discussion of details of data analysis, or long lists of references. But it may value vignettes that teachers can directly relate to, as well as exemplification of how recommendation might be followed through – information that would not fit in the research report.

Ideally, the research report would be completed and published first, and the article for the professional audience would refer to (and cite) this, so that anyone who does want to know more about the theoretical background and technical details can follow up.

Some examples of periodicals aimed at teachers (and welcoming work written by classroom teachers) include the School Science Review, (published by the Association for Science Education), Physics Education (published by the Institute of Physics) and the Royal Society of Chemistry's magazine Education in Chemistry. Globally, there are many publications of this kind, often with a national focus serving teachers working in a particular curriculum context by offering articles directly relevant to the specifics of the local education contexts.

The top science education research journals

Having established our work does fit in science education as a field, and would be considered academic research, we might consider sending it to one of these journals

International Journal of Science Education (IJSE)
Science Education (SE)
Journal of Research in Science Teaching (JRST)
Research in Science Education (RiSE)

To my mind these are the top general research journals in the field.

IJSE is the journal I have most worked with, having published quite a few papers in the journal, and have reviewed a great many. I have been on the Editorial Board for about 20 years, so I may be biased here.² IJSE started as the European Journal of Science Education and has long had an association with the European Science Education Research Association (ESERA – not to be confused with ASERA).

Strictly this journal is now known as IJSE Part A, as there is also a Part B which has a particular focus on 'Communication and Public Engagement' (see below). IJSE is published by Taylor and Francis / Routledge.

SE is published by Wiley.

JRST is also published by Wiley, and is associated with NARST.

RISE is published by Springer, and is associated with the Australasian Science Education Research Association (ASERA – not to be confused with ESERA)

N.A.R.S.T. originally stood for the National Association for Research in Science Teaching, where the Nation referred to was the USA. However, having re-branded itself as "a global organization for improving science teaching and learning through research" it is now simply known as NARST. In a similar way ESERA describes itself as "an European organisation focusing on research in science education with worldwide membership" and ASERA clams it "draws together researchers in science education from Australia, New Zealand and more broadly".

The top science education reviews journal

Another 'global' journal I hold in high esteem in Studies in Science Education (published by Taylor & Francis / Routledge) ³ .

This journal, originally established at the University of Leeds and associated with the world famous Centre for Studies in Science Education ⁴, is the main reviews journal in science education. It publishes substantive, critical reviews of areas of science education, and some of the most influential articles in the field have been published here.

Studies in Science Education also has a tradition of publishing detailed scholarly book reviews.

In my view, getting your work published in any of these five journals is something to be proud of. I think people in many parts of the world tend to know IJSE best, but I believe that in the USA it is often considered to be less prestigious than JRST and SE. At one time RISE seemed to have a somewhat parochial focus, and (my impression is) attracted less work from outside Australasia and its region – but that has changed now. 'Studies' seems to be better known in some contexts than other, but it is the only high status general science education journal that publishes full-length reviews (both systematic, and thematic perspectives), with many of its contributions exceeding the normal word-length limits of other top science education journals. This is the place to send an article based on that literature review chapter that thesis examiners praised for its originality and insight!

There are other well-established general journals of merit, for example Research in Science and Technological Education (published by Taylor & Francis / Routledge, and originally based at the University of Hull) and the International Journal of Science and Maths Education (published by Springer, and founded by the National Science and Technology Council, Taiwan). The International Council of Associations for Science Education publishes Science Education International.

There are also journals with particular foci with the field of science education.

More specialist titles

There are also a number of well-regarded international research journals in science education which particular specialisms or flavours.

Science & Education (published by Springer) is associated with the International History, Philosophy, and Science Teaching Group ⁵, which as the name might suggest has a focus on science eduction with a focus on the nature of science, and "publishes research using historical, philosophical, and sociological approaches in order to improve teaching, learning, and curricula in science and mathematics".

The Journal of Science Teacher Education (published by Taylor & Francis / Routledge), as the name suggests is concerned with the preparation and development of science teachers. The journal is associated with the USA based Association for Science Teacher Education.

As suggested above, IJSE has a companion journal (also published by Taylor & Francis / Routledge), International Journal of Science Education, Part B – Communication and Public Engagement

Cultural Studies of Science Education (published by Springer) has a particular focus on science education "as a cultural, cross-age, cross-class, and cross-disciplinary phenomenon".

The Journal of Science Education and Technology (published by Springer) has a focus on the intersection between science education and technology.

Disciplinary and Interdisciplinary Science Education Research has a particular focus on science taught within and across disciplines. ⁶ Whereas most of the journals described here are now hybrid (which means articles will usually be behind a subscription/pay-wall, unless the author pays a publication fee), DISER is an open-access journal, with publication costs paid on behalf of authors by the sponsoring organisation: the Faculty of Education, Beijing Normal University.

This relatively new journal reflects the increasing awareness of the importance of cross-disciplinary, interdisciplinary and transdisciplinary research in science itself. This is also reflected in notions of whether (or to what extent) science education should be considered part of a broader STEM education, and there are now journals styled as STEM education journals.

Science as part of STEM?

Read about STEM in the curriculum

Research within teaching and learning disciplines

Whilst both the Institute of Physics and the American Institute of Physics publish physics education journals (Physics Education and The Physics Teacher, respectively) neither publishes full length research reports of the kind included in research journals. The American Physical Society does publish Physical Review Physics Education Research as part of its set of Physical Review Journals. This is an on-line journal that is Open Access, so authors have to pay a publication fee.

The Journal of Biological Education (published by Taylor and Francis/Routledge) is the education journal of the Royal Society of Biology.

The Journal of Chemical Education is a long-established journal published by the American Chemical Society. It is not purely a research journal, but it does have a section for educational research and has published many important articles in the field. ⁷

Chemistry Education Research and Practice (published by the Royal Society of Chemistry, RSC) is purely a research journal, and can be considered the top international journal for research specifically in chemistry education. (Perhaps this is why there is a predatory journal knowingly called the Journal of Chemistry Education Research and Practice)

As CERP is sponsored by the RSC (which as a charity looks to use income to support educational and other valuable work), all articles in CERP are accessible for free on-line, but there are no publication charges for authors.

Not an exhaustive list!

These are the journals I am most familiar with, which focus on science education (or a science discipline education), publish serous peer-reviewed research papers, and can be considered international journals.

I know there are other discipline-based journals (e.g, biochemistry education, geology education) and indeed I expect there are many worthwhile places to publish that have slipped my mind or about which I am ignorant. Many regional or national journals have high standards and publish much good work. However, when it comes to research papers (rather than articles aimed primarily at teachers) academics usually get more credit when they publish in higher status international journals. It is these outlets that can best attract highly qualified editors and reviewers, and so peer review feedback tends to be most helpful⁸, and the general standard of published work tends to be of a decent quality – both in terms of technical aspects, and its significance and originality.

There is no reason why work published in English is more important than work published in other languages, but the wide convention of publishing research for an international audience in English means that work published in English language journals probably gets wider attention globally. I have published a small number of pieces in other languages, but am primarily limited by my own restricted competence to only one language. This reflects my personal failings more than the global state of science education publishing!

A personal take – other viewpoints are welcome

So, this is my personal (belated) response to the question about where one should seek to publish research in science education. I have tried to give a fair account, but it is no doubt biased by my own experiences (and recollections), and so inadvertently subject to distortions and omissions.

I welcome any comments (below) to expand upon, or seek to correct, my suggested list, which might indeed make this a more useful listing for readers who are new to publishing their work. If you have had good (or bad) experiences with science education journals included in, or omitted from, my list, please share…

Sources cited:

Taber, K. S. (2013). Three levels of chemistry educational research. Chemistry Education Research and Practice, 14(2), 151-155. doi:10.1039/C3RP90003G. [Free access]
Taber, K. S. (2018). Lost and found in translation: guidelines for reporting research data in an 'other' language. Chemistry Education Research and Practice, 19, 646-652 doi:10.1039/C8RP90006J [Free access]

Notes

¹ Academic authorship is understood differently to how the term 'author' is usually used: in most contexts, the author is the person who prepared (wrote, types, dictated) a text. In academic research, the authors of the research paper are those who made a substantial direct intellectual contribution to the work being reported. That is, an author need not contribute to the writing-up phase (though all authors should approve the text) as long as they have made a proper contribution to the substance of the work. Most journals have clear expectations that all deserving authors, and only those people, should be named as authors.

Read about academic authorship

² For many years the journal was edited by the late Prof. John Gilbert, who I first met sometime in the 1984-5 academic year when I applied to join the University of Surrey/Roehampton Institute part-time teachers' programme in the Practice of Science Education, and he – as one of course directors – interviewed me. I was later privileged to work with John on some projects – so this might be considered as a 'declaration of interest'.

³ Again, I must declare an interest. For some years I acted as the Book Reviews editor for the journal.

⁴ The centre was the base for the highly influential Children's Learning in Science Project which undertook much research and publication in the field under the Direction of the late Prof. Ros Driver.

⁵ Another declaration of interest: at the time of writing I am on the IHPST Advisory Board for the journal.

⁶ Declaration of interest: I am a member of the DISER's Editorial Board

⁷ I have recently shown some surprise at one research article published in JChemEd where major problems seem to have been missed in peer review. This is perhaps simply an aberration, or may reflect the challenge of including peer-reviewed academic research in a hybrid publication that also publishes a range of other kinds of articles.

⁸ Peer-review evaluates the quality of submissions, in part to inform publication decisions, but also to provide feedback to authors on areas where they can improve a manuscript prior to publication.

Read about peer review

Download this post

Psychological skills, academic achievement and…swimming

Keith S. Taber

'Psychological Skills in Relation to Academic Achievement through Swimming Context'

Original image by Clker-Free-Vector-Images from Pixabay

I was intrigued by the title of an article I saw in a notification: "Psychological Skills in Relation to Academic Achievement through Swimming Context". In part, it was the 'swimming context' – despite never having been very athletic or sporty (which is not to say I did not enjoy sports, just that I was never particularly good at any), I have always been a regular and enthusiastic swimmer. Not a good swimmer, mind (too splashy, too easily veering off-line) – but an enthusiastic one. But I was also intrigued by the triad of psychological skills, academic achievement, and swimming.

Perhaps I had visions of students' psychological skills being tested in relation to their academic achievement as they pounded up and down the pool. So, I was tempted to follow this up.

Investigating psychological skills and academic achievement

The abstract of the paper by Bayyat and colleagues reported three aims for their study:

"This study aimed to investigate:

(1) the level of psychological skills among students enrolled in swimming courses at the Physical Education faculties in the Jordanian Universities.

(2) the relation between their psychological skills and academic achievement.

(3) the differences in these psychological skills according to gender."

Bayyat et al., 2021: 4535

The article was published in a journal called 'Psychology and Education', which, its publishers* suggest is "a quality journal devoted to basic research, theory, and techniques and arts of practice in the general field of psychology and education".

A peer reviewed journal

The peer review policy reports this is a double-blind peer-reviewed journal. This means other academics have critiqued and evaluated a submission prior to its being accepted for publication. Peer review is a necessary (but not sufficient) condition for high quality research journals.

Journals with high standards use expert peer reviewers, and the editors use their reports to both reject low-quality submissions, and to seek to improve high-quality submissions by providing feedback to authors about points that are not clear, any missing information, incomplete chains of argumentation, and so forth. In the best journals editors only accept submissions after reviewers' criticisms have been addressed to the satisfaction of reviewers (or authors have made persuasive arguments for why some criticism does not need addressing).

(Read about peer review)

The authors here report that

"The statistical analysis results revealed an average level of psychological skills, significant differences in psychological skills level in favor of female students, A students and JU[**], and significant positive relation between psychological skills and academic achievement."
Bayyat et al., 2021: 4535

Rewriting slightly it seems that, according to this study:

the students in the study had average levels of psychological skills;
the female students have higher levels of psychological skills than their male peers;
and that there was some kind of positive correlation between psychological
skills and academic achievement;

Anyone reading a research paper critically asks themselves questions such as

'what do they mean by that?';
'how did they measure that?;
'how did they reach that conclusion?'; and
'who does this apply to?'

Females are better – but can we generalise?

In this study it was reported that

"By comparing psychological skills between male and female participants, results revealed significant differences in favor [sic] of female participants"
"All psychological skills' dimensions of female participants were significant in favor [sic] of females compared to their male peers. They were more focused, concentrated, confident, motivated to achieve their goals, and sought to manage their stress."
Bayyat et al., 2021: 4541, 4545

"It's our superior psychological skills that make us this good!" (Image by CristianoCavina from Pixabay)

A pedant (such as the present author) might wonder if "psychological skills' dimensions of female participants" [cf. psychological skills' dimensions of male participants?] would not be inherently likely to be in favour of females , but it is clear from the paper that this is intended to refer to the finding that females (as a group) got significantly higher ratings than males (as a group) on the measures of 'psychological skills'.

If we for the moment (but please read on below…) accept these findings as valid, an obvious question is the extent to which these results might generalise beyond the study. That is, to what extent would these findings being true for the participants of this study imply the same thing would be found more widely (e.g., among all students in Jordanian Universities? among all university students? Among all adult Jordanians? among all humans?)

Statistical generalisation (From Taber, 2019)

Two key concepts here are the population and the sample. The population is the group that we wish our study to be about (e.g., chemistry teachers in English schools, 11-year olds in New South Wales…), and the sample is the group who actually provide data. In order to generalise to the population from the sample it is important that the sample is large enough and representative of the population (which of course may be quite difficult to ascertain).

(Read about sampling in research)

(Read about generalisation)

In this study the reader is told that "The population of this study was undergraduate male and female students attending both intermediate and advanced swimming courses" (Bayyat et al., 2021: 4536). Taken at face value this might raise the question of why a sample was drawn exclusively from Jordan – unless of course this is the only national context where students attend intermediate or advanced swimming courses. *** However, it was immediately clarified that "They consisted of (n= 314) students enrolled at the schools of Sport Sciences at three state universities". That is, the population was actually undergraduate male and female students from schools of Sport Sciences at three Jordanian state universities attending both intermediate and advanced swimming courses.

"The Participants were an opportunity sample of 260 students" (Bayyat et al., 2021: 4536). So in terms of sample size, 260, the sample made up most of the population – almost 83%. This is in contrast to many educational studies where the samples may necessarily only reflect a small proportion of the population. In general, representatives of a sample is more important than size as skew in the sample undermines statistical generalisations (whereas size, for a representative sample, influences the magnitude of the likely error ****) – but a reader is likely to feel that when over four-fifths of the population were sampled it is less critical that a convenience sample was used.

This still does not ensure us that the results can be generalised to the population (students from schools of Sport Sciences at three Jordanian state universities attending 'both' intermediate and advanced swimming courses), but psychologically it seems quite convincing.

Ontology: What are we dealing with?

The study is only useful if it is about something that readers think is important – and it is clear what it is about. The authors tells us their study is about

Psychological Skills
Academic Achievement

which would seem to be things educators should be interested in. We do need to know however how the authors understand these constructs: what do they mean by 'a Psychological Skill' and 'Academic achievement'? Most people would probably think they have a pretty good idea what these terms might mean, but that is no assurance at all that different people would agree on this.

So, in reading this paper it is important to know what the authors themselves mean by these terms – so a reader can check they understand these terms in a sufficiently similar way.

What is academic achievement?

The authors suggest that

"academic achievement reflects the learner's accomplishment of specific goals by the end of an educational experience in a determined amount of time"
Bayyat et al., 2021: 4535
Bayyat et al., 2021: 4535

This seems to be the extent of the general characterisation of this construct in the paper *****.

What are psychological skills?

The authors tell readers that

"Psychological skills (PS) are a group of skills and abilities that enhances peoples' performance and achievement…[It has been] suggested that PS includes a whole set of trainable skills including emotional control and self-confidence"
Bayyat et al., 2021: 4535
Bayyat et al., 2021: 4535

For the purposes of this particular study, they

"identified the psychological skills related to the swimming context such as; leadership, emotional stability, sport achievement motivation, self-confidence, stress management, and attention"
Bayyat et al., 2021: 4536
Bayyat et al., 2021: 4536

So the relevant skills are considered to be:

leadership
emotional stability
sport achievement motivation
self-confidence
stress management
attention

I suspect that there would not be complete consensus among psychologists or people working in education over whether all of these constructs actually are 'skills'. Someone who did not consider these (or some of these) characteristics as skills would need to read the authors' claims arising from the study about 'psychological skills' accordingly (i.e., perhaps as being about something other than skills) but as the authors have been clear about their use of the term, this should not confuse or mislead readers.

Epistemology: How do we know?

Having established what is meant by 'psychological skills' and 'academic achievement' a reader would want to know how these were measured in the present study – do the authors use techniques that allow them to obtain valid and reliable measures of 'psychological skills' and 'academic achievement'?

How is academic achievement measured?

The authors inform readers that

"To calculate students' academic achievement, the instructors of the swimming courses conducted a valid and reliable assessment as a pre-midterm, midterm, and final exam throughout the semester…The assessment included performance tests and theoretical tests (paper and pencil tests) for each level"
Bayyat et al., 2021: 4538
Bayyat et al., 2021: 4538

Although the authors claim their assessment are valid and reliable, a careful reader will note that the methodology here does not match the definition of "accomplishment of specific goals by the end of an educational experience" (emphasis added)- as only the final examinations took place at the end of the programme. On that point, then, there is a lack of internal consistency in the study. This might not matter to a reader who did not think academic achievement needed to be measured at the end of a course of study.

Information on the "Academic achievement assessment tool", comprising six examinations (pre-midterm, midterm, and final examinations at each of the intermediate and advanced levels) is included as an appendix – good practice that allows a reader to interrogate the instrument.

Although this appendix is somewhat vague on precise details, it offers a surprise to someone (i.e., me) with a traditional notion of what is meant by 'academic achievement' – as both theory and practical aspects are included. Indeed, most of the marks seem to be given for practical swimming proficiency. So, the 'Intermediate swimming Pre-midterm exam' has a maximum of 20 marks available – with breast stroke leg technique and arm technique each scored out of ten marks.

The 'Advanced swimming midterm exam' is marked out of 30, with 10 marks each available for the 200m crawl (female), individual medley (female) and life guarding techniques. This seems to suggest that 20 of the 30 marks available can only be obtained by being female, but this point does not seem to be clarified. Presumably (?) male students had a different task that the authors considered equivalent.

How are psychological skills measured?

In order to measure psychological skills the authors proceeded to "to develop and validate a questionnaire" (p.4536). Designing a new instrument is a complex and challenging affair. The authors report how they

"generated a 40 items-questionnaire reflecting the psychological skills previously mentioned [leadership, emotional stability, sport achievement motivation, self-confidence, stress management, and attention] by applying both deductive and inductive methods. The items were clear, understandable, reflect the real-life experience of the study population, and not too long in structure."
Bayyat et al., 2021: 4538

So, items were written which it was thought would reflect the focal skills of interest. (Unfortunately there are no details of what the authors mean by "applying both deductive and inductive methods" to generate the items.) Validity was assured by asking a panel of people considered to have expertise to critique the items:

"the scale was reviewed and assessed by eight qualified expert judges from different related fields (sport psychology, swimming, teaching methodology, scientific research methodology, and kinesiology). They were asked to give their opinion of content representation of the suggested PS [psychological skills], their relatedness, clarity, and structure of items. According to the judges' reviews, we omitted both leadership and emotional stability domains, in addition to several items throughout the questionnaire. Other items were rephrased, and some items were added. Again, the scale was reviewed by four judges, who agreed on 80% of the items."

So, construct validity was a kind of face validity, in that people considered to be experts thought the final set of items would elicit the constructs intended, but there was no attempt to see if responses correlated in any way with any actual measurements of the 'skills'.

Readers of the paper wondering if they should be convinced by the study would need to judge if the expert panel had the right specialisms to evaluate scale items for 'psychological skills',and might find some of the areas of expertise (i.e.,

sport psychology
swimming
teaching methodology
scientific research methodology
kinesiology)

more relevant than others:

Self-reports

If respondents responded honestly, their responses would have reflected their own estimates of their 'skills' – at least to the extent that their interpretation of the items matched that of the experts. (That is, there was no attempt to investigate how members of the population of interest would understand what was meant by the items.)

Here are some examples of the items in the instrument:

Construct ('psychological skill')	Example item
self-confidence	I manage my time effectively while in class
sports motivation achievement	I do my best to control everything related to swimming lessons.
attention	I can pay attention and focus on different places in the pool while carrying out swimming tasks
stress-management	I am not afraid to perform any difficult swimming skill, no matter what

Examples of statements students were asked to rate in order to measure their 'psychological skills' (source: Bayyat et al., 2021: 4539-4541)

Analysis of data

The authors report various analyses of their data, that lead to the conclusions they reach. If a critical reader was convinced about matters so far, they would still need to beleive that the analyses undertaken were

appropriate, and
completed competently, and
correctly interpreted.

Drawing conclusions

However, as a reader I personally would have too many quibbles with the conceptualisation and design of instrumentation to consider the analysis in much detail.

To my mind, at least, the measure of 'academic achievement' seems to be largely an evaluation of swimming skills. They are obviously important in a swimming course, but I do not consider this a valid measure of academic achievement. That is not a question of suggesting academic achievement is better or more important than practical or athletic achievements, but it is surely something different (akin to me claiming to have excellent sporting achievement on the basis of holding a PhD in education).

The measure of psychological skills does not convince me either. I am not sure some of the focal constructs can really be called 'skills' (self-confidence? motivation?), but even if they were, there is no attempt to directly measure skill. At best, the questionnaire offers self-reports of how students perceive (or perhaps wish to be seen as perceiving) their characteristics.

It is quite common in research to see the term 'questionnaire' used for an instrument that is intended to test knowledge or skill – but questionnaires are not the right kind of instrument for that job.

(Read about questionnaires)

Significant positive relation between psychological skills and academic achievement?

So, I do not think this methodology would allow anyone to find a "significant positive relation between psychological skills and academic achievement" – only a relationship between students self-ratings on some psychological characteristics and swimming achievement. (That may reflect an interesting research question, and could perhaps be a suitable basis for a study, but is not what this study claims to be about.)

Significant differences in psychological skills level in favor of female students?

In a similar way, although it is interesting that females tended to score higher on the questionnaire scales, this shows they had higher self-ratings on average, and tells us nothing about their actual skills.

It may be that the students have great insight into these constructs and their own characteristics and so make very accurate ratings on these scales – but with absolutely no evidential basis for thinking this there are no grounds for making such a claim.

An alternative interpretation of the results is that on average the male students under-rate their 'skills' compared to their female peers. That is the 'skills' could be much the same across gender, but there might be a gender-based difference in perception. (I am not suggesting that is the case, but the evidence presented in the paper can be explained just as well by that possibility.)

An average level of psychological skills?

Finally, we might ask what is meant by

"The statistical analysis results revealed an average level of psychological skills…"
"Results of this study revealed that the participants acquired all four psychological skills at a moderate level."
Bayyat et al., 2021: 4535, 4545

Even leaving aside that what is being measured is something other than psychological skills, it is hard to see how these statements can be justified. This was the first administration of a new instrument being applied to a sample of a very specific population.

The paper reports standard deviations for the ratings on the items in the questionnaire, so – as would be expected – there were distributions of results: spreads with different students giving different ratings. Within the sample tested, some of the students will have given higher than median ratings on an item, some will have given lower than median ratings – although on average the ratings for that particular item would have been – average for this sample (that is, by definition!) So, assuming this claim (of average/moderate levels of psychological skills) was not meant as a tautology, the authors might seem to be suggesting that the ratings given on this administration of the instrument align with what would be typically obtained, that is from across other administrations.

That is, the authors seem to be suggesting that the ratings given on this administration of the instrument align with what they expect would be typically obtained from across other administrations. Of course they have absolutely no way of knowing that is the case without collecting data from samples of other populations.

What the authors actually seem to be basing these claims (of average/moderate levels of psychological skills) on is that the average responses on these scales did not give a very high or very low rating in terms of the raw scale. Yet, with absolutely no reference data for how other groups of people might respond on the same instrument that offers little useful information. At best, it suggests something welcome about the instrument itself (ideally one would wish items to elicit a spread of responses, rather than having most responses rated very high or very low) but nothing about the students sampled.

On this point the authors seem to be treating the scale as calibrated in terms of some nominal standard (e.g. 'a rating of 3-4 would be the norm'), when there is no inherent interpretation of particular ratings of items in such a scale that can just be assumed – rather this would be a matter that would need to be explored empirically.

The research paper as an argument

The research paper is a very specific genre of writing. It is an argument for new knowledge claims. The conclusions of the paper rest on a chain of argument that starts with the conceptualisation of the study and moves through research design, data collection, analysis, and interpretation. As a reader, any link in the argument chain that is not convincing potentially invalidates the knowledge claim(s) being made. Thus the standards expected for research papers are very high.

In sum then, this was an intriguing study, but did not convince me (even if it apparently convinced the peer reviewers and editor of Psychology and Education). I am not sure it was really about psychological skills or, academic achievement

…but at least it was clearly set in the context of swimming.

Work cited:

Bayyat, M. M., Orabi, S. M., Al-Tarawneh, A. D., Alleimon, S. M., & Abaza, S. N. (2021). Psychological Skills in Relation to Academic Achievement through Swimming Context. Psychology and Education, 58(5), 4535-4551.

Taber, K. S. (2019). Experimental research into teaching innovations: responding to methodological and ethical challenges. Studies in Science Education, 55(1), 69-119. doi:10.1080/03057267.2019.1658058 [Download manuscript version]

* Despite searching the different sections of the journal site, I was unable to find who publishes the journal. However, searching outside the site I found a record of the publisher of this journal being 'Auricle Technologies, Pvt., Ltd'.

** It transpired later in the paper that 'JU' referred to students at the University of Jordan: one of three universities involved in the study.

*** I think literally this means those who participated in the study were students attending both an intermediate swimming course and an advanced swimming course – but I read this to mean those who participated in the study were students attending either an intermediate or advanced swimming course. This latter interpretation is consistent with information given elsewhere in the paper: "All schools of sports sciences at the universities of Jordan offer mandatory, reliable, and valid swimming programs. Students enroll in one of three swimming courses consequently: the basic, intermediate, and advanced levels". (Bayyat et al., 2021: 4535, emphasis added)

**** That is, if the sample is unrepresentative of the population, there is no way to know how biased the sample might be. However, if there is a representative sample, then although there will still likely be some error (the results for the sample will not be precisely what the results across the whole population would be) it is possible to calculate the likely size of this error (e.g., say ±3%) which will be smaller when a higher proportion of the population are sampled.

***** It is possible some text that was intended to be at this point has gone missing during production – as, oddly, the following sentence is

facilisi, veritus invidunt ei mea (Times New Roman, 10)
Bayyat et al., 2021: 4535

which seems to be an accidental retention of text from the journal's paper template.

Why write about Cronbach's alpha?

Keith S. Taber

What is Cronbach's alpha?

It is a statistic that is commonly quoted by researchers when reporting the use of scales and questionnaires.

Why carry out a study of the use of this statistic?

I am primarily a qualitative researcher, so do not usually use statistics in my own work. However, I regularly came across references to alpha in manuscripts I was asked to review for journals, and in manuscripts submitted to the journal I was editing myself (i.e., Chemistry Education Research and Practice).

I did not really understand what alpha was, or what is was supposed to demonstrate, or what value was desirable – which made it difficult to evaluate that aspect of a manuscript which was citing the statistic. So, I thought I had better find out more about it.

So, what is Cronbach's alpha?

It is a statistic that tests for internal consistency in scales. It should only be applied to a scale intended to measure a unidimensional factor – something it is assumed can be treated a single underlying variable (perhaps 'confidence in physics learning', 'enjoyment of school science practicals', or 'attitude to genetic medicine').

If someone developed a set of questionnaire items intended to find out, say, how skeptical a person was regarding scientific claims in the news, and administered the items to a sample of people, then alpha would offer a measure of the similarity of the set of items in terms of the patterns of responses from that sample. As the items are meant to be measuring a single underlying factor, they should all elicit similar responses from any individual respondent. If they do, then alpha would approach 1 (its maximum value).

Does alpha not measure reliability?

Often, studies state that alpha is measuring reliability – as internal consistency is sometimes considered a kind of reliability. However, more often in research what we mean by reliability is that repeating the measurements later will give us (much) the same result – and alpha does not tell us about that kind of reliability.

I think there is a kind of metaphorical use of 'reliability' here. The technique derives from an approach used to test equivalence based on dividing the items in a scale into two subsets*, and seeing whether analysis of the two subsets gives comparable results – so one could see if the result from the 'second' measure reliably reproduced that from the 'first' (but of course the ordering of the two calculations is arbitrary, and the two subsets of items were actually administered at the same time as part of a single scale).

* In calculating alpha, all possible splits are taken into account.

Okay, so that's what alpha is – but, still, why carry out a study of the use of this statistic?

Once I understood what alpha was, I was able to see that many of the manuscripts I was reviewing did not seem to be using it appropriately. I got the impression that alpha was not well understood among researchers even though it was commonly used. I felt it would be useful to write a paper that both highlighted the issues and offered guidance on good practice in applying and reporting alpha.

In particular studies would often cite alpha for broad features like 'understanding of chemistry' where it seems obvious that we would not expect understanding of pH, understanding of resonance in benzene, understanding of oxidation numbers, and understanding of the mass spectrometer, to be the 'same' thing (or if they are, we could save a lot of time and effort by reducing exams to a single question!)

It was also common for studies using instruments with several different scales to not only quote alpha for each scale (which is appropriate), but to also give an overall alpha for the whole instrument even though it was intended to be multidimensional. So imagine a questionnaire which had a section on enjoyment of physics, another on self-confidence in genetics, and another on attitudes to science-fiction elements in popular television programmes: why would a researcher want to claim there was a high level of internal consistency across what are meant to be such distinct scales?

There was also incredible diversity in how different authors describe different values of alpha they might calculate – so the same value of alpha might be 'acceptable' in one study, 'fairly high' in another, and 'excellent' in a third (see figure 1).

Fig. 1 Qualitative descriptors used for values/ranges of values of Cronbach's alpha reported in papers in leading science education journals (The Use of Cronbach's Alpha When Developing and Reporting Research Instruments in Science Education)

Some authors also suggested that a high value of alpha for an instrument implied it was unidimensional – that all the items were measuring the same things – which is not the case.

But isn't it the number that matters: we want alpha to be as high as possible, and at least 0.7?

Yes, and no. And no, and no.

But the number matters?

Yes of course, but it needs to be interpreted for a reader: not just 'alpha was 0.73'.

But the critical value is 0.7, is that right?

No.

It seems extremely common for authors to assume that they need alpha to reach, or exceed, 0.7 for their scale to be acceptable. But that value seems to be completely arbitrary (and was not what Cronbach was suggesting).

Well, it's a convention, just as p<0.05 is commonly taken as a critical value.

But it is not just like that. Alpha is very sensitive to how many items are included in a scale. If there are only a few items, then a value of, say, 0.6 might well be sensibly judged acceptable. In any case it is nearly always possible to increase alpha by adding more items till you reach 0.7.

But only if the added items genuinely fit for the scale?

Sadly, no.

Adding a few items that are similar to each other, but not really fitting the scale, would usually increase alpha. So adding 'I like Manchester United', 'Manchester United are the best soccer team', and 'Manchester United are great' as items to be responded to in a scale about self-efficacy in science learning would likely increase alpha.

Are you sure: have you tried it?

Well, no. But, as I pointed out above, instruments often contain unrelated scales, and authors would sometimes calculate an overall alpha (the computer found to be greater than that of each of its component scales – at least that would be the implication if it were assumed that a larger alpha means a higher internal consistency without factoring how alpha tends to be larger the more items are included in the calculation.

But still, it is clear that the bigger alpha the better?

Up to a point.

But consider a scale with five items where everybody responds to each item in exactly the same way (not, that is, different people respond in the same way as each other, just whatever response a person gives to one item – e.g., 2 on a scale of 1-7 – they also give to the other items). So alpha should be 1, as high as it can get. But Cronbach would suggest you are wasting researcher and participant effort by having many items if they all elicit the same response. The point of scales having several items is that we assume no one item directly catches perfectly what we are trying to measure. Whether they do or not, there is no point in multiple items that are effectively equivalent.

Was it necessary to survey science education journals to make the point?

I did not originally think so.

My draft manuscript made the argument by drawing on some carefully selected examples of published papers in relation to the different issues I felt needed to be highlighted and discussed. I think the draft manuscript effectively made the point that there were papers getting published in good journals that quoted alpha but seemed to simply assume it demonstrated something (unexplained) to readers, and/or used alpha when their instrument was clearly not meant to be multidimensional, and/or took 0.7 as a definitive cut-off regardless of the number of items concerned, and/or quoted alpha values for overall instruments as well as for the distinct scales as if that added some evidence of instrument quality, or claimed a high value of alpha for an instrument demonstrated it was unidimensional.

So why did you then spend time reviewing examples across four journals over a whole year of publication?

Although I did not think this was necessary, when the paper was reviewed for publication a journal reviewer felt the paper was too anecdotal: that just because a few papers included weak practice, that may not have been especially significant. I think there was also a sense that a paper critiquing a research technique did not fit in the usual categories of study published in the journal, but a study with more empirical content (even if the data were published papers) better fitted the journal.

At that point I could have decided to try and get the paper published elsewhere, but Research in Science Education is a good journal and I wanted the paper in a good science education journal. This took extra work, but satisfied the journal.

I still think the paper would have made a contribution without the survey BUT the extra work did strengthen paper. In retrospect, I am happy that I responded to review comments in that way – as it did actually show just how frequency alpha is used in science education, and the wide variety of practice in reporting the statistic. Peer review is meant to help authors improve their work, and I think it did here.

Has the work had impact?

I think so, but…

The study has been getting a lot of citations, and it is always good to think someone notices a study, given the work it involves. Perhaps a lot of people have genuinely thought about their use of alpha as a result of reading the paper, and perhaps there are papers out their which do a better job of using and reporting alpha as a result of authors reading my study. (I would like to think so.)

However, I have also noticed that a lot of papers citing this study as an authority for using alpha in the reported research are still doing the very things I was criticising, and sometimes directly justifying poor practice by citing my study! These authors either had not actually read the study (but were just looking for something about alpha to cite) or perhaps did not fully appreciate the points made.

Oh well, I think it was Oscar Wilde who said there is only one thing in academic life worse than being miscited…