questionnaires - Science-Education-Research

Study reports that non-representative sample of students has average knowledge of earthquakes

When is a cross-sectional study not a cross-sectional study?

Keith S. Taber

I only came to this paper because I was criticising the Biomedical Journal of Scientific & Technical Research's claimed Impact Factor which seems to be a fabrication. I saw this particular paper being featured in a recent tweet from the journal and wondered how it fitted in a biomedical journal. The paper is on an important topic – what young people know about how to respond to an earthquake, but I was not sure why it fitted in this particular journal.

Respectable journals normally have a clear scope (i.e., the range of topics within which they consider submissions for publication) – whereas predatory journals are often primarily interested in publishing as many papers as possible (and so attracting publication fees from as many authors as possible) and so may have no qualms about publishing material that would seem to be out of scope.

This paper reports a questionnaire about secondary age students' knowledge of earthquakes. It would seem to be an education study, possibly even a science education study, rather than a 'biomedical' study. (The journal invites papers from a wide range of fields ¹, some of which – geology, chemical engineering – are not obviously 'biomedical' in nature; but not education.)

The paper reports research (so I assume is classed as 'research' in terms of the scale of charges) and comes from Bangladesh (which I assume the journal publishers consider a low income country) and so it would seem that the author's would have been charged $799 to be published in this journal. Part of what authors are supposed to get for that fee is for editors to arrange peer review to provide evaluation of, feedback on, and recommendations for improving, their work.

Peer review

Respectable journals employ rigorous peer review to ensure that only work of quality is published.

Read about peer review

According to the Biomedical Journal of Scientific & Technical Research website:

Peer review process is the system used to assess the quality of a manuscript before it is published online. Independent professionals/experts/researchers in the relevant research area are subjected to assess the submitted manuscripts for originality, validity and significance to help editors determine whether a manuscript should be published in their journal.
This Peer review process helps in validating the research works, establish a method by which it can be evaluated and increase networking possibilities within research communities. Despite criticisms, peer review is still the only widely accepted method for research validation
Only the articles that meet good scientific standards, explanations, records and proofs of their work presented with Bibliographic reasoning (e.g., acknowledge and build upon other work in the field, rely on logical reasoning and well-designed studies, back up claims with evidence etc.) are accepted for publication in the Journal.
https://biomedres.us/peer-review-process.php

Which seems reassuring. It seems 'Preventive Practice on Earthquake Preparedness Among Higher Level Students of Dhaka City' should then only have been published after evaluation in rigorous peer review. Presumably any weaknesses in the submission would have been highlighted in the review process helping the authors to improve their work before publication. Presumably, the (unamed) editor did not approve publication until peer reviewers were satisfied the paper made a valid new contribution to knowledge and, accordingly, recommended publication. ²

The paper was, apparently, submitted; screened by editors; sent to selected expert peer reviewers; evaluated by reviewers, so reports could be returned to the editor who collated them, and passed them to the authors with her/his decision; revised as indicated; checked by editors and reviewers, leading to a decision to publish; copy edited, allowing proofs to be sent to authors for checking; and published, all in less than three weeks.

Although supposedly published in July 2021, the paper seems to be assigned to an issue published a year before it was submitted

Although one might wonder if a journal which seems to advertise with an inflated Impact Factor can be trusted to follow the procedures it claims. So, I had a quick look at the paper.

The abstract begins:

The present study was descriptive Cross-sectional study conducted in Higher Secondary Level Students of Dhaka, Bangladesh, during 2017. The knowledge of respondent seems to be average regarding earthquake. There is a found to have a gap between knowledge and practice of the respondents.
Gurung & Khanum, 2021, p.29274

Sampling a population (or not)

So, this seems to be a survey, and the population sampled was Higher Secondary Level Students of Dhaka, Bangladesh. Dhaka has a population of about 22.5 million people. I could not readily find out how many of these might be considered 'Higher Secondary Level', but clearly it will be many, many thousands – I would imagine about half a million as a 'ball-park' figure.

Dhaka has a large population of 'higher secondary level students'
(Image by Mohammad Rahmatullah from Pixabay)

For a survey of a population to be valid it needs to be based on a sample which is large enough to minimise errors in extrapolating to the full population, and (even more importantly) the sample needs to be representative of the population.

Read about sampling

Here:

"Due to time constrain the sample of 115."
Gurung & Khanum, 2021, p.29276

So, the sample size was limited to 115 because of time constraints. This would likely lead to large errors in inferring population statistics from the sample, but could at least give some indication of the population as long as the 115 were known to be reasonable representative of the wider population being surveyed.

The reader is told

"the study sample was from Mirpur Cantonment Public School and College , (11 and 12 class)."
Gurung & Khanum, 2021, p.29275

It seems very unlikely that a sample taken from any one school among hundreds could be considered representative of the age cohort across such a large City.

Is the school 'typical' of Dhaka?

The school website has the following evaluation by the school's 'sponsor':

"…one the finest academic institutions of Bangladesh in terms of aesthetic beauty, uncompromised quality of education and, most importantly, the sheer appeal among its learners to enrich themselves in humanity and realism."
Major General Md Zahirul Islam

The school Principal notes:

"Our visionary and inspiring teachers are committed to provide learners with all-rounded educational experiences by means of modern teaching techniques and incorporation of diverse state-of-the-art technological aids so that our students can prepare themselves to face the future challenges."
Lieutenant Colonel G M Asaduzzaman

While both of these officers would be expected to be advocates for the school, this does not give a strong impression that the researchers have sought a school that is typical of Dhakar schools.

It also seems unlikely that this sample of 115 reflects all of the students in these grades. According to the school website, there are 7 classes in each of these two grades so the 115 students were drawn from 14 classes. Interestingly, in each year 5 of the 7 classes are following a science programme ³ – alongside with one business studies and one humanities class. The paper does not report which programme(s) were being followed by the students in the sample. Indeed no information is given regarding how the 115 were selected. (Did the researchers just administer the research instrument to the first students they came across in the school? Were all the students in these grades asked to contribute, and only 115 returned responses?)

Yet, if the paper was seen and evaluated by "independent professionals/experts/researchers in the relevant research area" they seem to have not questioned whether such a small and unrepresentative sample invalidated the study as being a survey of the population specified.

Cross-sectional studies

A cross-sectional study examines and compares different slices of a population – so here, different grades. Yet only two grades were sampled, and these were adjacent grades – 11 and 12 – which is not usually ideal to make comparisons across ages.

There could be a good reason to select two grades that are adjacent in this way. However, the authors do not present separate data for year 11 and year 12, but rather pool it. So they make no comparisons between these two year groups. This "Cross-sectional study" was then NOT actually a cross-sectional study.

If the paper did get sent to "independent professionals/experts/researchers in the relevant research area" for review, it seems these experts missed that error.

Theory and practice?

The abstract of the paper claims

"There is a found to have a gap between knowledge and practice of the respondents. The association of the knowledge and the practice of the students were done in which after the cross-tabulation P value was 0.810 i.e., there is not any [statistically significant?] association between knowledge and the practice in this study."
Gurung & Khanum, 2021, p.29274

This seems to suggest that student knowledge (what they knew about earthquakes) was compared in some way with practice (how they acted during an earthquake or earthquake warning). But the authors seem to have only collected data with (what they label) a questionnaire. They do not have any data on practice. The distinction they seem to really be making is between

knowledge about earthquakes, and
knowledge about what to do in the event of an earthquake.

That might be a useful thing to examine, but any "independent professionals/experts/researchers in the relevant research area"asked to look at the submission do not seem to have noted that the authors do not investigate practice and so needed to change the descriptions they use an claims they make.

Average levels of knowledge

Another point that any expert reviewer 'worth their salt' would have queried is the use of descriptors like 'average' in evaluating students responses. The study concluded that

"The knowledge of earthquake and its preparedness among Higher Secondary Student were average."
Gurung & Khanum, 2021, p.29280

But how do the authors know what counts as 'average'?

This might mean that there is some agreed standard here described in extant literature – but, if so, this is not revealed. It might mean that the same instrument had previously been used to survey nationally or internationally to offer a baseline – but this is not reported. Some studies on similar themes carried out elsewhere are referred to, but it is not clear they used the same instrumentation or analytical scheme. Indeed, the reader is explicitly told very little about the instrument used:

"Semi-structured both open ended and close ended questionnaire was used for this study."
Gurung & Khanum, 2021, p.29276

The authors seem to have forgotten to discuss the development, validation and contents of the questionnaire – and any experts asked to evaluate the submission seem to have forgotten to look for this. I would actually suggest that the authors did not really use a questionnaire, but rather an assessment instrument.

Read about questionnaires

A questionnaire is used to survey opinions, views and so forth – and there are no right or wrong answers. (What type of music do you like? Oh jazz, sorry that's not the right answer.) As the authors evaluated and scored the student responses this was really an assessment.

The authors suggest:

"In this study the poor knowledge score was 15 (13%), average 80 (69.6%) and good knowledge score 20 (17.4%) among the 115 respondents. Out of the 115 respondents most of the respondent has average knowledge and very few 20 (17.4%) has good knowledge about earthquake and the preparedness of it."
Gurung & Khanum, 2021, p.29280

Perhaps this means that the authors had used some principled (but not revealed) technique to decide what counted as poor, average and good.

Score	Description
15	poor knowledge
80	average knowledge
20	good knowledge

Descriptors applied to student scores on the 'questionnaire'

Alternatively, perhaps "poor knowledge score was 15 (13%), average 80 (69.6%) and good knowledge score 20 (17.4%)" is reporting what was found in terms of the distribution in this sample – that is, they empirically found these outcomes in this distribution.

Well, not actually these outcomes, of course, as that would suggest that a score of 20 is better than a score of 80, but presumably that is just a typographic error that was somehow missed by the authors when they made their submission, then missed by the editor who screened the paper for suitability (if there is actually an editor involved in the 'editorial' process for this journal), then missed by expert reviewers asked to scrutinise the manuscript (if there really were any), then missed by production staff when preparing proofs (i.e., one would expect this to have been raised as an 'author query' on proofs ⁴), and then missed again by authors when checking the proofs for publication.

If so, the authors found that most respondents got fairly typical scores, and fewer scored at the tails of the distribution – as one would expect. On any particular assessment, the average performance is (as the authors report here)…average.

Work cited:

Gurung, N. and Khanum, H. (2021) Preventive Practice on Earthquake Preparedness Among Higher Level Students of Dhaka City. Biomedical Journal of Scientific & Technical Research, July, 2020, Volume 37, 2, pp 29274-29281

Note:

¹ The Biomedical Journal of Scientific & Technical Research defines its scope as including:

Agri and Aquaculture
Biochemistry
Bioinformatics & Systems Biology
Biomedical Sciences
Clinical Sciences
Chemical Engineering
Chemistry
Computer Science
Economics & Accounting
Engineering
Environmental Sciences
Food & Nutrition
General Science
Genetics & Molecular Biology
Geology & Earth Science
Immunology & Microbiology
Informatics
Materials Science
Orthopaedics
Mathematics
Medical Sciences
Nanotechnology
Neuroscience & Psychology
Nursing & Health Care
Pharmaceutical Sciences
Physics
Plant Sciences
Social & Political Sciences
Veterinary Sciences
Clinical & Medical
Anesthesiology
Cardiology
Clinical Research
Dentistry
Dermatology
Diabetes & Endocrinology
Gastroenterology
Genetics
Haematology
Healthcare
Immunology
Infectious Diseases
Medicine
Microbiology
Molecular Biology
Nephrology
Neurology
Nursing
Nutrition
Oncology
Ophthalmology
Pathology
Pediatrics
Physicaltherapy & Rehabilitation
Psychiatry
Pulmonology
Radiology
Reproductive Medicine
Surgery
Toxicology

Such broad scope is a common characteristic of predatory journals.

² The editor(s) of a research journal is normally a highly regarded academic in the field of the journal. I could not find the name of the editor of this journal although it has seven associate editors and dozens of people named as being on an 'editorial committee'. Whether any of these people actually carry out the functions of an academic editor or whether this work is delegated to non-academic office staff is a moot point.

³ The classes are given names. So, nursery classes include Lotus and Tulip and so forth. In the senior grades, the science classes are called:

Flora
Neon
Meson
Sigma
Platinam [sic]
Argon
Electron
Neutron
Proton
Redon [sic]

⁴ Production staff are not expected to be experts in the topic of the paper, but they do note any obvious omissions (such as missing references) or likely errors and list these as 'author queries' for authors to respond to when checking 'proofs', i.e., the article set in the journal format as it will be published.

Assessing Chemistry Laboratory Equipment Availability and Practice

Comparative education on a local scale?

Keith S. Taber

I have just read a paper in a research journal which compares the level of chemistry laboratory equipment and 'practice' in two schools in the "west Gojjam Administrative zone" (which according to a quick web-search is in the Amhara Region in Ethiopia). According to Yesgat and Yibeltal (2021),

"From the analysis of Chemistry laboratory equipment availability and laboratory practice in both … secondary school and … secondary school were found in very low level and much far less than the average availability of chemistry laboratory equipment and status of laboratory practice. From the data analysis average chemistry laboratory equipment availability and status of laboratory practice of … secondary school is better than that of Jiga secondary school."
Yesgat and Yibeltal, 2021: abstract [I was tempted to omit the school names in this posting as I was not convinced the schools had been treated reasonably, but the schools are named in the very title of the article]

Now that would seem to be something that could clearly be of interest to teachers, pupils, parents and education administrators in those two particular schools, but it raises the question that can be posed in relation to any research: 'so what?' The findings might be a useful outcome of enquiry in its own context, but what generalisable knowledge does this offer that justifies its place in the research literature? Why should anyone outside of West Gojjam care?

The authors tell us,

"There are two secondary schools (Damot and Jiga) with having different approach of teaching chemistry in practical approach"
Yesgat and Yibeltal, 2021: 96

So, this suggests a possible motivation.

If these two approaches reflect approaches that are common in schools more widely, and
if these two schools can be considered representative of schools that adopt these two approaches, and
if 'Chemistry Laboratory Equipment Availability and Practice' can be considered to be related to (a factor influencing? an effect of?) these different approaches, and
if the study validly and reliably measures 'Chemistry Laboratory Equipment Availability and Practice', and
if substantive differences are found between the schools

then the findings might well be of wider interest. As always in research, the importance we give to findings depends upon a whole logical chain of connections that collectively make an argument.

Spoiler alert!

At the end of the paper, I was none the wiser what these 'different approaches' actually were.

A predatory journal

I have been reading some papers in a journal that I believed, on the basis of its misleading title and website details, was an example of a poor-quality 'predatory journal'. That is, a journal which encourages submissions simply to be able to charge a publication fee (currently $1519, according to the website), without doing the proper job of editorial scrutiny. I wanted to test this initial evaluation by looking at the quality of some of the work published.

Although the journal is called the Journal of Chemistry: Education Research and Practice (not to be confused, even if the publishers would like it to be, with the well-established journal Chemistry Education Research and Practice) only a few of the papers published are actually education studies. One of the articles that IS on an educational topic is called 'Assessment of Chemistry Laboratory Equipment Availability and Practice: A Comparative Study Between Damot and Jiga Secondary Schools' (Yesgat & Yibeltal, 2021).

Comparative education?

Yesgat and Yibeltal imply that their study falls in the field of comparative education. ¹ They inform readers that ²,

"One purpose of comparative education is to stimulate critical reflection about our educational system, its success and failures, strengths and weaknesses. This critical reflection facilitates self-evaluation of our work and is the basis for determining appropriate courses of action. Another purpose of comparative education is to expose us to educational innovations and systems that have positive outcomes. Most compartivest states [sic] that comparative education has four main purposes. These are:
To describe educational systems, processes or outcomes
To assist in development of educational institutions and practices
To highlight the relationship between education and society
To establish generalized statements about education that are valid in more than one country
Yesgat & Yibeltal, 2021: 95-96

Comparative education studies look to characterise (national) education systems in relation to their social/cultural contexts (Image by Gerd Altmann from Pixabay)

Of course, like any social construct, 'comparative education' is open to interpretation and debate: for example, "that comparative education brings together data about two or more national systems of education, and comparing and contrasting those data" has been characterised as an "a naive and obvious answer to the question of what constitutes comparative education" (Turner, 2019, p.100).

There is then some room for discussion over whether particular research outputs should count as 'comparative education' studies or not. Many comparative education studies do not actually compare two educational systems, but rather report in detail from a single system (making possible subsequent comparisons based across several such studies). These educational systems are usually understood as national systems, although there may be a good case to explore regional differences within a nation if regions have autonomous education systems and these can be understood in terms of broader regional differences.

Yet, studying one aspect of education within one curriculum subject at two schools in one educational educational administrative area of one region of one country cannot be understood as comparative education without doing excessive violence to the notion. This work does not characterise an educational system at national, regional or even local level.

My best assumption is that as the study is comparing something (in this case an aspect of chemistry education in two different schools) the authors feel that makes it 'comparative education', by which account of course any educational experiment (comparing some innovation with some kind of comparison condition) would automatically be a comparative education study. We all make errors sometimes, assuming terms have broader or different meanings than their actual conventional usage – and may indeed continue to misuse a term till someone points this out to us.

This article was published in what claims to be a peer reviewed research journal, so the paper was supposedly evaluated by expert reviewers who would have provided the editor with a report on strengths and weaknesses of the manuscript, and highlighted areas that would need to be addressed before possible publication. Such a reviewer would surely have reported that 'this work is not comparative education, so the paragraph on comparative education should either be removed, or authors should contextualise it to explain why it is relevant to their study'.

The weak links in the chain

A research report makes certain claims that derive from a chain of argument. To be convinced about the conclusions you have to be convinced about all the links in the chain, such as:

sampling (were the right people asked?)
methodology (is the right type of research design used to answer the research question?)
instrumentation (is the data collection instrument valid and reliable?)
analysis (have appropriate analytical techniques been carried out?)

These considerations cannot be averaged: if, for example, a data collection instrument does not measure what it is said to measure, then it does not matter how good the sample, or how careful the analysis, the study is undermined and no convincing logical claims can be built. No matter how skilled I am in using a tape measure, I will not be able to obtain accurate weights with it.

Sampling

The authors report the make up of their sample – all the chemistry teachers in each school (13 in one, 11 in the other), plus ten students from each of grades 9, 10 and 11 in each school. They report that "… 30 natural science students from Damot secondary school have been selected randomly. With the same technique … 30 natural sciences students from Jiga secondary school were selected".

Random selection is useful to know there is no bias in a sample, but it is helpful if the technique for randomisation is briefly reported to assure readers that 'random' is not being used as a synonym for 'arbitrary' and that the technique applied was adequate (Taber, 2013b).

A random selection across a pooled sample is unlikely to lead to equal representation in each subgroup (From Taber, 2013a)

Actually, if 30 students had been chosen at random from the population of students taking natural sciences in one of the schools, it would be extremely unlikely they would be evenly spread, 10 from each year group. Presumably, the authors made random selections within these grade levels (which would be eminently sensible, but is not quite what they report).

Read about the criterion for randomness in research

Data collection

To collect data the authors constructed a questionnaire with Likert-type items.

"…questionnaire was used as data collecting instruments. Closed ended questionnaires with 23 items from which 8 items for availability of laboratory equipment and 15 items for laboratory practice were set in the form of "Likert" rating scale with four options (4=strongly agree, 3=agree, 2=disagree and 1=strongly disagree)"
Yesgat & Yibeltal, 2021: 96

These categories were further broken down (Yesgat & Yibeltal, 2021: 96): "8 items of availability of equipment were again sub grouped in to

physical facility (4 items),
chemical availability (2 items), and
laboratory apparatus (2 items)

whereas 15 items of laboratory practice were further categorized as

before actual laboratory (4 items),
during actual laboratory practice (6 items) and
after actual laboratory (5 items)

Internal coherence

So, there were two basic constructs, each broken down into three sub-constructs. This instrument was piloted,

"And to assure the reliability of the questionnaire a pilot study on a [sic] non-sampled teachers and students were conducted and Cronbach's Alpha was applied to measure the coefficient of internal consistency. A reliability coefficient of 0.71 was obtained and considered high enough for the instruments to be used for this research"
Yesgat & Yibeltal, 2021: 96

Running a pilot study can be very useful as it can highlight issues about items. However, although simply asking people to complete a questionnaire might highlight items people could not make any sense of, it may not be as useful as interviewing them about how they understood items to check that respondents understand items in the same way as researchers.

The authors cite the value of Cronbach's alpha to demonstrate their instrument has internal consistency. However, they seem to be quoting the value obtained in the pilot study, where the statistic strictly applies to a particular administration of an instrument (so the value from the main study is more relevant to the results reported).

More problematic, the authors appear to cite a value of alpha from across all 23 items (n.b., the value of alpha tends to increase as the number of items increases, so what is considered an acceptable value needs to allow for the number of items included) when these are actually two distinct scales: 'availability of laboratory equipment' and 'laboratory practice'. Alpha should be quoted separately for each scale – values across distinct scales are not useful (Taber, 2018). ³

Do the items have face validity?

The items in the questionnaire are reported in appendices (pp.102-103), so I have tabulated them here, so readers can consider

(a) whether they feel these items reflect the constructs of 'availability of equipment' and 'laboratory practice';
(b) whether the items are phrased in a clear way for both teachers and students (the authors report "conceptually the same questionnaires with different forms were prepared" (p.101) but if this means different wording fro teachers than students this is not elaborated – teachers were also asked demographic questions about their educational level)); and
(c) whether they are all reasonable things to expect both teachers and students to be able to rate.

'Availability of equipment' items	'Laboratory practice' items
Structured and well- equipped laboratory room	You test the experiments before your work with students
Availability of electric system in laboratory room	You give laboratory manuals to student before practical work
Availability of water system in laboratory room	You group and arrange students before they are coming to laboratory room
Availability of laboratory chemicals are available [sic]	You set up apparatus and arrange chemicals for activities
No interruption due to lack of lab equipment	You follow and supervise students when they perform activities
Isolated bench to each student during laboratory activities	You work with the lab technician during performing activity
Chemicals are arranged in a logical order.	You are interested to perform activities?
Laboratory apparatus are arranged in a logical order	You check appropriate accomplishment of your students' work
	Check your students' interpretation, conclusion and recommendations
	Give feedbacks to all your students work
	Check whether the lab report is individual work or group
	There is a time table to teachers to conduct laboratory activities.
	Wear safety goggles, eye goggles, and other safety equipment in doing so
	Work again if your experiment is failed
	Active participant during laboratory activity

Items teachers and students were asked to rate on a four point scale (agree / strongly agree / disagree / strongly disagree)

Perceptions

One obvious limitation of this study is that it relies on reported perceptions.

One way to find out about the availability of laboratory equipment might be to visit teaching laboratories and survey them with an observation schedule – and perhaps even make a photographic record. The questionnaire assumes that teacher and student perceptions are accurate and that honest reports would be given (might teachers have had an interest in offering a particular impression of their work?)

Sometimes researchers are actually interested in impressions (e.g., for some purposes whether a students considers themselves a good chemistry student may be more relevant than an objective assessment), and sometimes researchers have no direct access to a focus of interest and must rely on other people's reports. Here it might be suggested that a survey by questionnaire is not really the best way to, for example, "evaluate laboratory equipment facilities for carrying out practical activities" (p.96).

Findings

The authors describe their main findings as,

"Chemistry laboratory equipment availability in both Damot secondary school and Jiga secondary school were found in very low level and much far less than the average availability of chemistry laboratory equipment. This finding supported by the analysis of one sample t-values and as it indicated the average availability of laboratory equipment are very much less than the test value and the p-value which is less than 0.05 indicating the presence of significant difference between the actual availability of equipment to the expected test value (2.5).
Chemistry laboratory practice in both Damot secondary school and Jiga secondary school were found in very low level and much far less than the average chemistry laboratory practice. This finding supported by the analysis of one sample t-values and as it indicated the average chemistry laboratory practice are very much less than the test value and the p-value which is less than 0.05 indicating the presence of significant difference between the actual chemistry laboratory practice to the expected test value."
Yesgat & Yibeltal, 2021: 101 (emphasis added)

This is the basis for the claim in the abstract that "From the analysis of Chemistry laboratory equipment availability and laboratory practice in both Damot secondary school and Jiga secondary school were found in very low level and much far less than the average availability of chemistry laboratory equipment and status of laboratory practice."

'The average …': what is the standard?

But this raises a key question – how do the authors know what the "the average availability of chemistry laboratory equipment and status of laboratory practice" is, if they have only used their questionnaire in two schools (which are both found to be below average)?

Yesgat & Yibeltal have run a comparison between the average ratings they get from the two schools on their two scales and the 'average test value' rating of 2.5. As far as I can see, this is not an empirical value at all. It seems the authors have just assumed that if people are asked to use a four point scale – 1, 2, 3, 4 – then the average rating will be…2.5. Of course, that is a completely arbitrary assumption. (Consider the question – 'how much would you like to be beaten and robbed today?': would the average response be likely to be nominal mid-point of a ratings scale?) Perhaps if a much wider survey had been undertaken the actual average rating would have been 1.9 0r 2.7 or …

That is even assuming that 'average' is a meaningful concept here. A four point Likert scale is an ordinal scale ('agree' is always less agreement than 'strongly agree' and more than 'disagree') but not a ratio scale (that is, it cannot be assumed that the perceived 'agreement' gap (i) from 'strongly disagree' to 'disagree' is the same for each respondent and the same as that (ii) from 'disagree' to 'agree' and (iii) from 'agree' to 'strongly agree'). Strictly, Likert scale ratings cannot be averaged (better being presented as bar charts showing frequencies of response) – so although the authors carry out a great deal of analysis, much of this is, strictly, invalid.

So what has been found out from this study?

I would very much like to know what peer reviewers made of this study. Expert reviewers would surely have identified some very serious weaknesses in the study and would have been expected to have recommended some quite major revisions even if they thought it might eventually be publishable in a research journal.

An editor is expected to take on board referee evaluations and ask authors to make such revisions as are needed to persuade the editor the submission is ready for publication. It is the job of the editor of a research journal, supported by the peer reviewers, to

a) ensure work of insufficient quality is not published

b) help authors strengthen their paper to correct errors and address weaknesses

Sometimes this process takes some time, with a number of cycles of revision and review. Here, however, the editor was able to move to a decision to publish in 5 days.

The study reflects a substantive amount of work by the authors. Yet, it is hard to see how this study, at least as reported in this journal, makes a substantive contribution to public knowledge. The study finds that one school has somewhat higher survey ratings on an instrument that has not been fully validated than another school, and is based on a pooling of student and teacher perceptions, and which guesses that both rate lower than a hypothetical 'average' school. The two schools were supposed to represent a "different approach[es] of teaching chemistry in practical approach" – but even if that is the case, the authors have not shared with their readers what these different approaches are meant to be. So, there would be no possibility of generalising from the schools to 'approach[es] of teaching chemistry', even if that was logically justifiable. And comparative education it is not.

This study, at least as published, does not seem to offer useful new knowledge to the chemistry education community that could support teaching practice or further research. Even in the very specific context of the two specific schools it is not clear what can be done with the findings which simply reflect back to the informants what they have told the researchers, without exploring the reasons behind the ratings (how do different teachers and students understand what counts as 'Chemicals are arranged in a logical order') or the values the participants are bringing to the study (is 'Check whether the lab report is individual work or group' meant to imply that it is seen as important to ensure that students work cooperatively or to ensure they work independently or …?)

If there is a problem highlighted here by the "very low levels" (based on a completely arbitrary interpretation of the scales) there is no indication of whether this is due to resourcing of the schools, teacher preparation, levels of technician support, teacher attitudes or pedagogic commitments, timetabling problems, …

This seems to be a study which has highlighted two schools, invited teachers and students to complete a dubious questionnaire, and simply used this to arbitrarily characterise the practical chemistry education in the schools as very poor, without contextualising any challenges or offering any advice on how to address the issues.

Work cited:

Mbozi, E. H. (2017). Comparative Education. Nairobi, Kenya: African Virtual University.
Taber, K. S. (2013a). Classroom-based Research and Evidence-based Practice: An introduction (2nd ed.). London: Sage.
Taber, K. S. (2013b). Non-random thoughts about research. Chemistry Education Research and Practice, 14(4), 359-362
Taber, K. S. (2018). The Use of Cronbach's Alpha When Developing and Reporting Research Instruments in Science Education. Research in Science Education, 48, 1273-1296. doi:10.1007/s11165-016-9602-2
Turner, D. A. (2019). What Is Comparative Education? In A. W. Wiseman (Ed.), Annual Review of Comparative and International Education 2018 (Vol. 37, pp. 99-114): Emerald Publishing Limited.
Yesgat, D., & Yibeltal, J. (2021). Assessment of Chemistry Laboratory Equipment Availability and Practice: A Comparative Study Between Damot and Jiga Secondary Schools. Journal of Chemistry: Education Research and Practice, 5(2), 95-103.

Note:

¹ 'Imply' as Yesgat and Yibeltal do not actually state that they have carried out comparative education. However, if they do not think so, then the paragraph on comparative education in their introduction has no clear relationship with the rest of the study and is not more than a gratuitous reference, like suddenly mentioning Nottingham Forest's European Cup triumphs or noting a preferred flavour of tea.

² This seemed an intriguing segment of the text as it was largely written in a more sophisticated form of English than the rest of the paper, apart from the odd reference to "Most compartivest [comparative education specialists?] states…" which seemed to stand out from the rest of the segment. Yesgat and Yibeltal do not present this as a quote, but cite a source informing their text (their reference [4] :Joubish, 2009). However, their text is very similar to that in another publication:

Quote from Mbozi, 2017, p.21	Quote from Yesgat and Yibeltal, 2021, pp.95-96
"One purpose of comparative education is to stimulate critical reflection about our educational system, its success and failures, strengths and weaknesses.	"One purpose of comparative education is to stimulate critical reflection about our educational system, its success and failures, strengths and weaknesses.
This critical reflection facilitates self-evaluation of our work and is the basis for determining appropriate courses of action.	This critical reflection facilitates self-evaluation of our work and is the basis for determining appropriate courses of action.
Another purpose of comparative education is to expose us to educational innovations and systems that have positive outcomes.	Another purpose of comparative education is to expose us to educational innovations and systems that have positive outcomes.
The exposure facilitates our adoption of best practices.
Some purposes of comparative education were not covered in your exercise above.
Purposes of comparative education suggested by two authors Noah (1985) and Kidd (1975) are presented below to broaden your understanding of the purposes of comparative education.
Noah, (1985) states that comparative education has four main purposes [4] and these are:	Most compartivest states that comparative education has four main purposes. These are:
1. To describe educational systems, processes or outcomes	• To describe educational systems, processes or outcomes
2. To assist in development of educational institutions and practices	• To assist in development of educational institutions and practices
3. To highlight the relationship between education and society	• To highlight the relationship between education and society
4. To establish generalized statements about education, that are valid in more than one country."	• To establish generalized statements about education that are valid in more than one country"

Comparing text (broken into sentences to aid comparison) from two sources

³ There are more sophisticated techniques which can be used to check whether items do 'cluster' as expected for a particular sample of respondents.

⁴ As suggested above, researchers can pilot instruments with interviews or 'think aloud' protocols to check if items are understood as intended. Asking assumed experts to read through and check 'face validity' is of itself quite a limited process, but can be a useful initial screen to identify items of dubious relevance.

What COVID really likes

Researching viral preferences

Keith S. Taber

When I was listening to the radio news I heard a clip of the Rt. Hon. Sajid Javid MP, the U.K. Secretary of State for Health and Social Care, talking about the ongoing response to the COVID pandemic:

Health Secretary Sajid Javid talking on 12th September

"Now that we are entering Autumn and Winter, something that COVID and other viruses, you know, usually like, the prime minister this week will be getting out our plans to manage COVID over the coming few months."
Sajid Javid

So, COVID and other viruses usually like Autumn and Winter (by implication, presumably, in comparison with Spring and Summer).

This got me wondering how we (or Sajid, at least) could know what the COVID virus (i.e., SARS-CoV-2 – severe acute respiratory syndrome coronavirus 2) prefers – what the virus 'likes'. I noticed that Mr Javid offered a modal qualification to his claim: usually. It seemed 'COVID and other viruses' did not always like Autumn and Winter, but usually did.

Yet there was a potential ambiguity here depending how one parsed the claim. Was he suggesting that

[COVID and other viruses]

usually

like Autumn and Winter

COVID

[and other viruses usually]

like Autumn and Winter

This might have been clearer in a written text as either

COVID and other viruses usually like Autumn and Winter

COVID, and other viruses usually, like Autumn and Winter

The second option may seem a little awkward in its phrasing, ¹ but then not all viral diseases are more common in the Winter months, and some are considered to be due to 'Summer viruses':

"Adenovirus, human bocavirus (HBoV), parainfluenza virus (PIV), human metapneumovirus (hMPV), and rhinovirus can be detected throughout the year (all-year viruses). Seasonal patterns of PIV are type specific. Epidemics of PIV type 1 (PIV1) and PIV type 3 (PIV3) peak in the fall [Autumn] and spring-summer, respectively. The prevalence of some non-rhinovirus enteroviruses increases in summer (summer viruses)"

Moriyama, Hugentobler & Iwasaki, 2020: 86

Just a couple of days later Mr Javid was being interviewed on the radio, and he made a more limited claim:

Health Secretary Sajid Javid talking on BBC Radio 4's 'Today' programme, 15th September

"…because we know Autumn and Winter, your COVID is going to like that time of year"
Sajid Javid

So, this claim was just about the COVID virus, not viruses more generally, and that we know that COVID is going to like Autumn and Winter. No ambiguity there. But how do we know?

Coming to knowledge

Historically there have been various ways of obtaining knowledge.

Divine revelation: where God reveals the knowledge to someone, perhaps through appearing to the chosen one in a dream.
Consulting an oracle, or a prophet or some other kind of seer.
Intuiting the truth by reflecting on the nature of things using the rational power of the human intellect.
Empirical investigation of natural phenomena.

My focus in this blog is related to science, and given that we are talking about public health policy in modern Britain, I would like to think Mr Javid was basing his claim on the latter option. Of course, even empirical methods depend upon some metaphysical assumptions. For example, if one assumes the cosmos has inbuilt connections one might look for evidence in terms of sympathies or correspondences. Perhaps, if the COVID virus was observed closely and looked like a snowflake, that could (in this mindset) be taken as a sign that it liked Winter.

A **snowflake** – or is it a *virus particle*?
(Image by Gerd Altmann from Pixabay)

Sympathetic magic

This kind of correspondence, a connection indicated by appearance, was once widely accepted, so that a plant which was thought to resemble some part of the anatomy might be assumed to be an appropriate medicine for diseases or disorders associated with that part of the body.

This is a kind of magic, and might seem a 'primitive' belief to many people today, but such an idea was sensible enough in the context of a common set of underlying beliefs about the nature and purposes of the world, and the place and role of people in that world. One might expect that specific beliefs would soon die out if, for example, the plant shaped like an ear turned out to do nothing for ear ache. Yet, at a time when medical practitioners could offer little effective treatment, and being sent to a hospital was likely to reduce life expectancy, herbal remedies at least often (if not always) did no harm.

Moreover, many herbs do have medicinal properties, and something with a general systemic effect might work as topical medicine (i.e., when applied to a specific site of disease). Add to that, the human susceptibility to confirmation bias (taking more notice of, and giving more weight to, instances that meet our expectations than those which do not) and the placebo effect (where believing we are taking effective medication can sometimes in itself have beneficial effects) and the psychological support offered by spending time with an attentive practitioner with a good 'bedside' manner – and we can easily see how beliefs about treatments may survive limited definitive evidence of effectiveness.

The gold standard of experimental method

Of course, today, we have the means to test such medicines by taking a large representative sample of a population (of ear ache sufferers, or whatever), randomly dividing them into two groups, and using a double-blind (or should that be double-deaf) approach, treat them with the possible medicine or a placebo, without either the patient or the practitioner knowing who was getting which treatment. (The researchers have a way to know of course – or it would difficult to deduce anything from the results.) That is, the randomised control trial (RCT).

Now, I have been very critical of the notion that these kinds of randomised experimental designs should be automatically be seen as the preferred way of testing educational innovations (Taber, 2019) – but in situations where control of variables and 'blinding' is possible, and where randomisation can be applied to samples of well-defined populations, this does deserve to be considered the gold standard. (It is when the assumptions behind a research methodology do not apply that we should have reservations about using it as a strategy for enquiry.)

So can the RCT approach be used to find out if COVID has a preference for certain times of year? I guess this depends on our conceptual framework for the research (e.g., how do we understand what a 'like' actually is) and the theoretical perspective we adopt.

So, for example, behaviourists would suggest that it is not useful to investigate what is going on in someone's mind (perhaps some behaviorists do not even think the mind concept corresponds to anything real) so we should observe behaviours that allow us to make inferences. This has to be done with care. Someone who buys and eats lots of chocolate presumably likes chocolate, and someone who buys and listens to a lot of reggae probably likes reggae, but a person who cries regularly, or someone that stumbles around and has frequent falls, does not necessary like crying, or falling over, respectively.

A viral choice chamber

So, we might think that woodlice prefer damp conditions because we have put a large number of woodlice in choice chambers with different conditions (dry and light, dry and dark, damp and light, damp and dark) and found that there was a statistically significant excess of woodlice settling down in the damp sections of the chamber.

Of course, to infer preferences from behaviour – or even to use the term 'behaviour' – for some kinds of entity is questionable. (To think that woodlice make a choice based on what they 'like' might seem to assume a level of awareness that they perhaps lack?) In a cathode ray tube electrons subject to a magnetic field may be observed (indirectly!) to move to one side of the tube, just as woodlice might congregate in one chamber, but I am not sure I would describe this as electrons liking that part of the tube. I think it can be better explained with concepts such as electrical charge, fields, forces, and momentum.

It is difficult to see how we can do double blind trials to see which season a virus might like, as if the COVID virus really does like Winter, it must surely have a way of knowing when it is Winter (making blinding impossible). In any case, a choice chamber with different sections at different times of the year would require some kind of time portal installed between its sections.

Like electrons, but unlike woodlice, COVID viral particles do not have an active form of transport available to them. Rather, they tend to be sneezed and coughed around and then subject to the breeze, or deposited by contact with surfaces. So I am not sure that observing virus 'behaviour' helps here.

So perhaps a different methodology might be more sensible.

A viral opinion poll

A common approach to find out what people like would be a survey. Surveys can sometimes attract responses from large numbers of respondents, which may seem to give us confidence that they offer authentic accounts of widespread views. However, sample size is perhaps less important than sample representativeness. Imagine carrying out a survey of people's favourite football teams at a game at Stamford Bridge; or undertaking a survey of people's favourite bands as people queued to enter a King Crimson concert! The responses may [sic, almost certainly would] not fully reflect the wider population due to the likely bias in such samples. Would these surveys give reliable results which could be replicated if repeated at the Santiago Bernabeu or at a Marillion concert?

How do we know what 'COVID 'really likes?
(Original Images by OpenClipart-Vectors and Gordon Johnson from Pixabay)

A representative sample of vairants?

This might cause problems with the COVID-19 virus (SARS-CoV-2). What counts as a member of the population – perhaps a viable virus particle? Can we even know how big the population actually is at the time of our survey? The virus is infecting new cells, leading to new virus particles being produced all the time, just as shed particles become non-viable all the time. So we have no reliable knowledge of population numbers.

Moreover, a survey needs a representative sample: do the numbers of people in a sample of a human population reflect the wider population in relevant terms (be that age, gender, level of educational qualifications, earnings, etc.)? There are viral variants leading to COVID-19 infection – and quite a few of them. That is, SARS-CoV-2 is a class with various subgroups. The variants replicate to different extents under particular conditions, and new variants appear from time to time.

So, the population profile is changing rapidly. In recent months in the UK nearly all infections where the variant has been determined are due to the variant VOC-21APR-02 (or B.1.617.2 or Delta) but many people will be infected asymptotically or with mild symptoms and not be tested, and so this likely does not mean that VOC-21APR-02 dominates the SARS-CoV-2 population as a whole to the extent it currently dominates in investigated cases. Assuming otherwise would be like gauging public opinion from the views of those particular people who make themselves salient by attending a protest, e.g.:

"Shock finding – 98% of the population would like to abolish the nuclear arsenal,
according to a [hypothetical] survey taken at the recent Campaign for Nuclear Disarmament march"

In any case, surveys are often fairly blunt instruments as they need to present objectively the same questions to all respondents, and elicit responses in a format that can be readily classified into a discrete number of categories. This is why many questionnaires use Likert type items:

Would you say you like Autumn and Winter:

1	2	3	4	5
Always	Nearly always	Usually	Sometimes	Never

Such 'objective' measures are often considered to avoid the subjective nature of some other types of research. It may seem that responses do not need to be interpreted – but of course this assumes that the researchers and all the respondents understand language the same way (what exactly counts as Autumn and Winter? What does 'like' mean? How is 'usually' understood – 60-80% of the time, or 51-90% of the time or…). We can usually (sic) safely assume that those with strong language competence will have somewhat similar understandings of terms, but we cannot know precisely what survey participants meant by their responses or to what extent they share a meaning for 'usually'.

There are so-called 'qualitative surveys' which eschew this kind of objectivity to get more in-depth engagement with participants. They will usually use interviews where the researcher can establish rapport with respondents and ask them about their thoughts and feelings, observe non-verbal signals such as facial expressions and gestures, and use follow-up questions… However, the greater insight into individuals comes at a cost of smaller samples as these kinds of methods are more resource-intensive.

But perhaps Mr Javid does not actually mean that COVID likes Autumn and Winter?

So, how did the Department of Health & Social Care, or the Health Secretary's scientific advisors, find out that COVID (or the COVID virus) likes Autumn and Winter? The virus does not think, or feel, and it does not have preferences in the way we do. It does not perceive hot or cold, and it does not have a sense of time passing, or of the seasons.² COVID does not like or dislike anything.

Mr Javid needs to make himself clear to a broad public audience, so he has to avoid too much technical jargon. It is not easy to pitch a presentation for such an audience and be pithy, accurate, and engaging, but it is easy for someone (such as me) to be critical when not having to face this challenge. Cabinet ministers, unlike science teachers, cannot be expected to have skills in communicating complex and abstract scientific ideas in simplified and accessible forms that remain authentic to the science.

It is easy and perhaps convenient to use anthropomorphic language to talk about the virus, and this will likely make the topic seem accessible to listeners, but it is less clear what is actually meant by a virus liking a certain time of year. In teaching the use of anthropomorphic language can be engaging, but it can also come to stand in place of scientific understanding when anthropomorphic statements are simply accepted uncritically at face value. For example, if the science teacher suggests "the atom wants a full shell of electrons" then we should not be surprised that students may think this is a scientific explanation, and that atoms do want to fill their shells. (They do not of course. ³)

Of course Mr Javid's statements cannot be taken as a literal claim about what the virus likes – my point in this posting is to provoke the question of what this might be intended to mean? This is surely intended metaphorically (at least if Mr Javid had thought about his claim critically): perhaps that there is higher incidence of infection or serious illness caused by the COVID virus in the Winter. But by that logic, I guess turkeys really would vote for Christmas (or Thanksgiving) after all.

Typically, some viruses cause more infection in the Winter when people are more likely to mix indoors and when buildings and transport are not well ventilated (both factors being addressed in public health measures and advice in regard to COVID-19). Perhaps 'likes' here simply means that the conditions associated with a higher frequency/population of virus particles occur in Autumn and Winter?

A **snowflake**.
The conditions suitable for a higher frequency of snowflakes are more common in Winter.
So do snowflakes also 'like' Winter?
(Image by Gerd Altmann from Pixabay)

However, this is some way from assigning 'likes' to the virus. After all, in evolutionary terms, a virus might 'prefer', so to speak, to only be transmitted asymptomatically, as it cannot be in the virus's 'interests', so to speak, to encourage a public health response that will lead to vaccines or measures to limit the mixing of people.

If COVID could like anything (and of course it cannot), I would suggest it would like to go 'under the radar' (another metaphor) and be endemic in a population that was not concerned about it (perhaps doing so little harm it is not even noticed, such that people do not change their behaviours). It would then only 'prefer' a Season to the extent that that time of year brings conditions which allow it to go about its life cycle without attracting attention – from Mr Javid or anyone else.

Keith S. Taber, September 2021

Addendum: 1st December 2021

Déjà vu?

The health secretary was interviewed on 1st December

"…we have always known that when it gets darker, it gets colder, the virus likes that, the flu virus likes that and we should not forget that's still lurking around as well…"
Rt. Hon. Sajid Javid MP, the U.K. Secretary of State for Health and Social Care, interviewed on BBC Radio 4 Today programme, 1st December, 2021

Works cited:

Moriyama, M., Hugentobler, W. J., & Iwasaki, A. (2020). Seasonality of Respiratory Viral Infections. Annual Review of Virology, 7(1), 83-101. doi:10.1146/annurev-virology-012420-022445
Taber, K. S. (2019). Experimental research into teaching innovations: responding to methodological and ethical challenges. Studies in Science Education, 55(1), 69-119. doi:10.1080/03057267.2019.1658058

Footnotes:

¹. It would also seem to be a generalisation based on the only two Winters that the COVID-19 virus had 'experienced'

². Strictly I cannot know what it is like to be a virus particle. But a lot of well-established and strongly evidenced scientific principles would be challenged if a virus particle is sentient.

³. Yet this is a VERY common alternative conceptions among school children studying chemistry: The full outer shells explanatory principle

Psychological skills, academic achievement and…swimming

Keith S. Taber

'Psychological Skills in Relation to Academic Achievement through Swimming Context'

Original image by Clker-Free-Vector-Images from Pixabay

I was intrigued by the title of an article I saw in a notification: "Psychological Skills in Relation to Academic Achievement through Swimming Context". In part, it was the 'swimming context' – despite never having been very athletic or sporty (which is not to say I did not enjoy sports, just that I was never particularly good at any), I have always been a regular and enthusiastic swimmer. Not a good swimmer, mind (too splashy, too easily veering off-line) – but an enthusiastic one. But I was also intrigued by the triad of psychological skills, academic achievement, and swimming.

Perhaps I had visions of students' psychological skills being tested in relation to their academic achievement as they pounded up and down the pool. So, I was tempted to follow this up.

Investigating psychological skills and academic achievement

The abstract of the paper by Bayyat and colleagues reported three aims for their study:

"This study aimed to investigate:

(1) the level of psychological skills among students enrolled in swimming courses at the Physical Education faculties in the Jordanian Universities.

(2) the relation between their psychological skills and academic achievement.

(3) the differences in these psychological skills according to gender."

Bayyat et al., 2021: 4535

The article was published in a journal called 'Psychology and Education', which, its publishers* suggest is "a quality journal devoted to basic research, theory, and techniques and arts of practice in the general field of psychology and education".

A peer reviewed journal

The peer review policy reports this is a double-blind peer-reviewed journal. This means other academics have critiqued and evaluated a submission prior to its being accepted for publication. Peer review is a necessary (but not sufficient) condition for high quality research journals.

Journals with high standards use expert peer reviewers, and the editors use their reports to both reject low-quality submissions, and to seek to improve high-quality submissions by providing feedback to authors about points that are not clear, any missing information, incomplete chains of argumentation, and so forth. In the best journals editors only accept submissions after reviewers' criticisms have been addressed to the satisfaction of reviewers (or authors have made persuasive arguments for why some criticism does not need addressing).

(Read about peer review)

The authors here report that

"The statistical analysis results revealed an average level of psychological skills, significant differences in psychological skills level in favor of female students, A students and JU[**], and significant positive relation between psychological skills and academic achievement."
Bayyat et al., 2021: 4535

Rewriting slightly it seems that, according to this study:

the students in the study had average levels of psychological skills;
the female students have higher levels of psychological skills than their male peers;
and that there was some kind of positive correlation between psychological
skills and academic achievement;

Anyone reading a research paper critically asks themselves questions such as

'what do they mean by that?';
'how did they measure that?;
'how did they reach that conclusion?'; and
'who does this apply to?'

Females are better – but can we generalise?

In this study it was reported that

"By comparing psychological skills between male and female participants, results revealed significant differences in favor [sic] of female participants"
"All psychological skills' dimensions of female participants were significant in favor [sic] of females compared to their male peers. They were more focused, concentrated, confident, motivated to achieve their goals, and sought to manage their stress."
Bayyat et al., 2021: 4541, 4545

"It's our superior psychological skills that make us this good!" (Image by CristianoCavina from Pixabay)

A pedant (such as the present author) might wonder if "psychological skills' dimensions of female participants" [cf. psychological skills' dimensions of male participants?] would not be inherently likely to be in favour of females , but it is clear from the paper that this is intended to refer to the finding that females (as a group) got significantly higher ratings than males (as a group) on the measures of 'psychological skills'.

If we for the moment (but please read on below…) accept these findings as valid, an obvious question is the extent to which these results might generalise beyond the study. That is, to what extent would these findings being true for the participants of this study imply the same thing would be found more widely (e.g., among all students in Jordanian Universities? among all university students? Among all adult Jordanians? among all humans?)

Statistical generalisation (From Taber, 2019)

Two key concepts here are the population and the sample. The population is the group that we wish our study to be about (e.g., chemistry teachers in English schools, 11-year olds in New South Wales…), and the sample is the group who actually provide data. In order to generalise to the population from the sample it is important that the sample is large enough and representative of the population (which of course may be quite difficult to ascertain).

(Read about sampling in research)

(Read about generalisation)

In this study the reader is told that "The population of this study was undergraduate male and female students attending both intermediate and advanced swimming courses" (Bayyat et al., 2021: 4536). Taken at face value this might raise the question of why a sample was drawn exclusively from Jordan – unless of course this is the only national context where students attend intermediate or advanced swimming courses. *** However, it was immediately clarified that "They consisted of (n= 314) students enrolled at the schools of Sport Sciences at three state universities". That is, the population was actually undergraduate male and female students from schools of Sport Sciences at three Jordanian state universities attending both intermediate and advanced swimming courses.

"The Participants were an opportunity sample of 260 students" (Bayyat et al., 2021: 4536). So in terms of sample size, 260, the sample made up most of the population – almost 83%. This is in contrast to many educational studies where the samples may necessarily only reflect a small proportion of the population. In general, representatives of a sample is more important than size as skew in the sample undermines statistical generalisations (whereas size, for a representative sample, influences the magnitude of the likely error ****) – but a reader is likely to feel that when over four-fifths of the population were sampled it is less critical that a convenience sample was used.

This still does not ensure us that the results can be generalised to the population (students from schools of Sport Sciences at three Jordanian state universities attending 'both' intermediate and advanced swimming courses), but psychologically it seems quite convincing.

Ontology: What are we dealing with?

The study is only useful if it is about something that readers think is important – and it is clear what it is about. The authors tells us their study is about

Psychological Skills
Academic Achievement

which would seem to be things educators should be interested in. We do need to know however how the authors understand these constructs: what do they mean by 'a Psychological Skill' and 'Academic achievement'? Most people would probably think they have a pretty good idea what these terms might mean, but that is no assurance at all that different people would agree on this.

So, in reading this paper it is important to know what the authors themselves mean by these terms – so a reader can check they understand these terms in a sufficiently similar way.

What is academic achievement?

The authors suggest that

"academic achievement reflects the learner's accomplishment of specific goals by the end of an educational experience in a determined amount of time"
Bayyat et al., 2021: 4535
Bayyat et al., 2021: 4535

This seems to be the extent of the general characterisation of this construct in the paper *****.

What are psychological skills?

The authors tell readers that

"Psychological skills (PS) are a group of skills and abilities that enhances peoples' performance and achievement…[It has been] suggested that PS includes a whole set of trainable skills including emotional control and self-confidence"
Bayyat et al., 2021: 4535
Bayyat et al., 2021: 4535

For the purposes of this particular study, they

"identified the psychological skills related to the swimming context such as; leadership, emotional stability, sport achievement motivation, self-confidence, stress management, and attention"
Bayyat et al., 2021: 4536
Bayyat et al., 2021: 4536

So the relevant skills are considered to be:

leadership
emotional stability
sport achievement motivation
self-confidence
stress management
attention

I suspect that there would not be complete consensus among psychologists or people working in education over whether all of these constructs actually are 'skills'. Someone who did not consider these (or some of these) characteristics as skills would need to read the authors' claims arising from the study about 'psychological skills' accordingly (i.e., perhaps as being about something other than skills) but as the authors have been clear about their use of the term, this should not confuse or mislead readers.

Epistemology: How do we know?

Having established what is meant by 'psychological skills' and 'academic achievement' a reader would want to know how these were measured in the present study – do the authors use techniques that allow them to obtain valid and reliable measures of 'psychological skills' and 'academic achievement'?

How is academic achievement measured?

The authors inform readers that

"To calculate students' academic achievement, the instructors of the swimming courses conducted a valid and reliable assessment as a pre-midterm, midterm, and final exam throughout the semester…The assessment included performance tests and theoretical tests (paper and pencil tests) for each level"
Bayyat et al., 2021: 4538
Bayyat et al., 2021: 4538

Although the authors claim their assessment are valid and reliable, a careful reader will note that the methodology here does not match the definition of "accomplishment of specific goals by the end of an educational experience" (emphasis added)- as only the final examinations took place at the end of the programme. On that point, then, there is a lack of internal consistency in the study. This might not matter to a reader who did not think academic achievement needed to be measured at the end of a course of study.

Information on the "Academic achievement assessment tool", comprising six examinations (pre-midterm, midterm, and final examinations at each of the intermediate and advanced levels) is included as an appendix – good practice that allows a reader to interrogate the instrument.

Although this appendix is somewhat vague on precise details, it offers a surprise to someone (i.e., me) with a traditional notion of what is meant by 'academic achievement' – as both theory and practical aspects are included. Indeed, most of the marks seem to be given for practical swimming proficiency. So, the 'Intermediate swimming Pre-midterm exam' has a maximum of 20 marks available – with breast stroke leg technique and arm technique each scored out of ten marks.

The 'Advanced swimming midterm exam' is marked out of 30, with 10 marks each available for the 200m crawl (female), individual medley (female) and life guarding techniques. This seems to suggest that 20 of the 30 marks available can only be obtained by being female, but this point does not seem to be clarified. Presumably (?) male students had a different task that the authors considered equivalent.

How are psychological skills measured?

In order to measure psychological skills the authors proceeded to "to develop and validate a questionnaire" (p.4536). Designing a new instrument is a complex and challenging affair. The authors report how they

"generated a 40 items-questionnaire reflecting the psychological skills previously mentioned [leadership, emotional stability, sport achievement motivation, self-confidence, stress management, and attention] by applying both deductive and inductive methods. The items were clear, understandable, reflect the real-life experience of the study population, and not too long in structure."
Bayyat et al., 2021: 4538

So, items were written which it was thought would reflect the focal skills of interest. (Unfortunately there are no details of what the authors mean by "applying both deductive and inductive methods" to generate the items.) Validity was assured by asking a panel of people considered to have expertise to critique the items:

"the scale was reviewed and assessed by eight qualified expert judges from different related fields (sport psychology, swimming, teaching methodology, scientific research methodology, and kinesiology). They were asked to give their opinion of content representation of the suggested PS [psychological skills], their relatedness, clarity, and structure of items. According to the judges' reviews, we omitted both leadership and emotional stability domains, in addition to several items throughout the questionnaire. Other items were rephrased, and some items were added. Again, the scale was reviewed by four judges, who agreed on 80% of the items."

So, construct validity was a kind of face validity, in that people considered to be experts thought the final set of items would elicit the constructs intended, but there was no attempt to see if responses correlated in any way with any actual measurements of the 'skills'.

Readers of the paper wondering if they should be convinced by the study would need to judge if the expert panel had the right specialisms to evaluate scale items for 'psychological skills',and might find some of the areas of expertise (i.e.,

sport psychology
swimming
teaching methodology
scientific research methodology
kinesiology)

more relevant than others:

Self-reports

If respondents responded honestly, their responses would have reflected their own estimates of their 'skills' – at least to the extent that their interpretation of the items matched that of the experts. (That is, there was no attempt to investigate how members of the population of interest would understand what was meant by the items.)

Here are some examples of the items in the instrument:

Construct ('psychological skill')	Example item
self-confidence	I manage my time effectively while in class
sports motivation achievement	I do my best to control everything related to swimming lessons.
attention	I can pay attention and focus on different places in the pool while carrying out swimming tasks
stress-management	I am not afraid to perform any difficult swimming skill, no matter what

Examples of statements students were asked to rate in order to measure their 'psychological skills' (source: Bayyat et al., 2021: 4539-4541)

Analysis of data

The authors report various analyses of their data, that lead to the conclusions they reach. If a critical reader was convinced about matters so far, they would still need to beleive that the analyses undertaken were

appropriate, and
completed competently, and
correctly interpreted.

Drawing conclusions

However, as a reader I personally would have too many quibbles with the conceptualisation and design of instrumentation to consider the analysis in much detail.

To my mind, at least, the measure of 'academic achievement' seems to be largely an evaluation of swimming skills. They are obviously important in a swimming course, but I do not consider this a valid measure of academic achievement. That is not a question of suggesting academic achievement is better or more important than practical or athletic achievements, but it is surely something different (akin to me claiming to have excellent sporting achievement on the basis of holding a PhD in education).

The measure of psychological skills does not convince me either. I am not sure some of the focal constructs can really be called 'skills' (self-confidence? motivation?), but even if they were, there is no attempt to directly measure skill. At best, the questionnaire offers self-reports of how students perceive (or perhaps wish to be seen as perceiving) their characteristics.

It is quite common in research to see the term 'questionnaire' used for an instrument that is intended to test knowledge or skill – but questionnaires are not the right kind of instrument for that job.

(Read about questionnaires)

Significant positive relation between psychological skills and academic achievement?

So, I do not think this methodology would allow anyone to find a "significant positive relation between psychological skills and academic achievement" – only a relationship between students self-ratings on some psychological characteristics and swimming achievement. (That may reflect an interesting research question, and could perhaps be a suitable basis for a study, but is not what this study claims to be about.)

Significant differences in psychological skills level in favor of female students?

In a similar way, although it is interesting that females tended to score higher on the questionnaire scales, this shows they had higher self-ratings on average, and tells us nothing about their actual skills.

It may be that the students have great insight into these constructs and their own characteristics and so make very accurate ratings on these scales – but with absolutely no evidential basis for thinking this there are no grounds for making such a claim.

An alternative interpretation of the results is that on average the male students under-rate their 'skills' compared to their female peers. That is the 'skills' could be much the same across gender, but there might be a gender-based difference in perception. (I am not suggesting that is the case, but the evidence presented in the paper can be explained just as well by that possibility.)

An average level of psychological skills?

Finally, we might ask what is meant by

"The statistical analysis results revealed an average level of psychological skills…"
"Results of this study revealed that the participants acquired all four psychological skills at a moderate level."
Bayyat et al., 2021: 4535, 4545

Even leaving aside that what is being measured is something other than psychological skills, it is hard to see how these statements can be justified. This was the first administration of a new instrument being applied to a sample of a very specific population.

The paper reports standard deviations for the ratings on the items in the questionnaire, so – as would be expected – there were distributions of results: spreads with different students giving different ratings. Within the sample tested, some of the students will have given higher than median ratings on an item, some will have given lower than median ratings – although on average the ratings for that particular item would have been – average for this sample (that is, by definition!) So, assuming this claim (of average/moderate levels of psychological skills) was not meant as a tautology, the authors might seem to be suggesting that the ratings given on this administration of the instrument align with what would be typically obtained, that is from across other administrations.

That is, the authors seem to be suggesting that the ratings given on this administration of the instrument align with what they expect would be typically obtained from across other administrations. Of course they have absolutely no way of knowing that is the case without collecting data from samples of other populations.

What the authors actually seem to be basing these claims (of average/moderate levels of psychological skills) on is that the average responses on these scales did not give a very high or very low rating in terms of the raw scale. Yet, with absolutely no reference data for how other groups of people might respond on the same instrument that offers little useful information. At best, it suggests something welcome about the instrument itself (ideally one would wish items to elicit a spread of responses, rather than having most responses rated very high or very low) but nothing about the students sampled.

On this point the authors seem to be treating the scale as calibrated in terms of some nominal standard (e.g. 'a rating of 3-4 would be the norm'), when there is no inherent interpretation of particular ratings of items in such a scale that can just be assumed – rather this would be a matter that would need to be explored empirically.

The research paper as an argument

The research paper is a very specific genre of writing. It is an argument for new knowledge claims. The conclusions of the paper rest on a chain of argument that starts with the conceptualisation of the study and moves through research design, data collection, analysis, and interpretation. As a reader, any link in the argument chain that is not convincing potentially invalidates the knowledge claim(s) being made. Thus the standards expected for research papers are very high.

In sum then, this was an intriguing study, but did not convince me (even if it apparently convinced the peer reviewers and editor of Psychology and Education). I am not sure it was really about psychological skills or, academic achievement

…but at least it was clearly set in the context of swimming.

Work cited:

Bayyat, M. M., Orabi, S. M., Al-Tarawneh, A. D., Alleimon, S. M., & Abaza, S. N. (2021). Psychological Skills in Relation to Academic Achievement through Swimming Context. Psychology and Education, 58(5), 4535-4551.

Taber, K. S. (2019). Experimental research into teaching innovations: responding to methodological and ethical challenges. Studies in Science Education, 55(1), 69-119. doi:10.1080/03057267.2019.1658058 [Download manuscript version]

* Despite searching the different sections of the journal site, I was unable to find who publishes the journal. However, searching outside the site I found a record of the publisher of this journal being 'Auricle Technologies, Pvt., Ltd'.

** It transpired later in the paper that 'JU' referred to students at the University of Jordan: one of three universities involved in the study.

*** I think literally this means those who participated in the study were students attending both an intermediate swimming course and an advanced swimming course – but I read this to mean those who participated in the study were students attending either an intermediate or advanced swimming course. This latter interpretation is consistent with information given elsewhere in the paper: "All schools of sports sciences at the universities of Jordan offer mandatory, reliable, and valid swimming programs. Students enroll in one of three swimming courses consequently: the basic, intermediate, and advanced levels". (Bayyat et al., 2021: 4535, emphasis added)

**** That is, if the sample is unrepresentative of the population, there is no way to know how biased the sample might be. However, if there is a representative sample, then although there will still likely be some error (the results for the sample will not be precisely what the results across the whole population would be) it is possible to calculate the likely size of this error (e.g., say ±3%) which will be smaller when a higher proportion of the population are sampled.

***** It is possible some text that was intended to be at this point has gone missing during production – as, oddly, the following sentence is

facilisi, veritus invidunt ei mea (Times New Roman, 10)
Bayyat et al., 2021: 4535

which seems to be an accidental retention of text from the journal's paper template.

When is a cross-sectional study not a cross-sectional study?

Peer review

Sampling a population (or not)

Is the school 'typical' of Dhaka?

Cross-sectional studies

Theory and practice?

Average levels of knowledge

Work cited:

Note:

Comparative education on a local scale?

Spoiler alert!

A predatory journal

Comparative education?

The weak links in the chain

Sampling

Data collection

Internal coherence

Do the items have face validity?

Perceptions

Findings

'The average …': what is the standard?

So what has been found out from this study?

Work cited:

Note:

Researching viral preferences

Coming to knowledge

Sympathetic magic

The gold standard of experimental method

A viral choice chamber

A viral opinion poll

A representative sample of vairants?

But perhaps Mr Javid does not actually mean that COVID likes Autumn and Winter?

Addendum: 1st December 2021

Works cited:

Footnotes:

Related reading:

'Psychological Skills in Relation to Academic Achievement through Swimming Context'

Investigating psychological skills and academic achievement

A peer reviewed journal

Females are better – but can we generalise?

Ontology: What are we dealing with?

What is academic achievement?

What are psychological skills?

Epistemology: How do we know?

How is academic achievement measured?

How are psychological skills measured?

Self-reports

Analysis of data

Drawing conclusions

Significant positive relation between psychological skills and academic achievement?

Significant differences in psychological skills level in favor of female students?

An average level of psychological skills?

The research paper as an argument

Work cited: