Laboratory safety – not on the face of it – Science-Education-Research

An invalid research instrument for testing 'safety sign awareness'

Keith S. Taber

I was recently invited to write for the 'Journal of Chemistry: Education Research and Practice' (not to be confused with the well-established R.S.C. Journal 'Chemistry Education Research and Practice') which describes itself as "a leading International Journal for the publication of high quality articles". It is not.

I already had reason to suspect this of being a predatory journal (one that entices authors to part with money to publish work without adhering to the usual academic standards and norms). But as I had already reached that judgement before the journal had started publishing, I decided to check out the quality of the published work.

The current issue, at the time of writing, has five articles, only one of which is educational in nature: 'Chemistry Laboratory Safety Signs Awareness Among Undergraduate Students in Rivers State'.

Below I describe key aspects of this study, including some points that I would have expected to have been picked-up in peer review, and therefore to have been addressed before the paper could have been published.

Spoiler alert

My main observation is that the research instrument used is invalid – I do not think it actually measures what the authors claim it does. (As the article is published with a open-access license¹, I am able to reproduce the instrument below so you can see if you agree with me or not.)

'Chemistry laboratory safety signs awareness among undergraduate students in Rivers State'

A study about chemistry laboratory safety signs awareness?

Laboratory safety is very important in chemistry education, and is certainly a suitable topic for research. A range of signs and symbols are used to warn people of different types of potential chemical hazard, so learning about these signs is important for those working in laboratories; and so investigating this aspect of learning is certainly a suitable focus for research.

Motivating a study

As part of a published research study authors are expected to set out the rationale for the study – to demonstrate, usually based on existing literature, that there is something of interest to investigate. This can be described as the 'conceptual framework' for the study. This is one of the aspects of a study which is usually tested in peer-review where manuscripts submitted to a journal are sent to other researchers with relevant expertise for evaluation.

The authors of this study, Ikiroma, Chinda and Bankole, did begin by discussing aspects of laboratory safety, and reporting some previous work around this topic. They cite an earlier study that had been carried out surveying second-year science education students at Lagos State University, Nigeria, and where:

"The result of the study revealed 100% of the respondents are not aware of the laboratory sign and symbols" ²
Ikiroma, Chinda & Bankole, 2021: 50

This would seem a good reason to do follow-up work elsewhere.

Research questions and hypotheses

A study should have one or more research questions. These will be quite general in more open-ended 'discovery research' (exploratory enquiry), but need to be more specific in 'confirmatory research' such as experiments and surveys. This study had both specific research questions and null hypotheses

"Research Questions
1. What is the percentage awareness level of safety signs among undergraduate Chemistry students?
2. What is the difference in awareness level of safety signs between undergraduate Chemistry Education students and Chemistry Science students?
3. To what extent do the awareness levels of safety signs among undergraduate Chemistry students depended on Institutional types?"
Hypotheses
1. There is no significant difference in awareness level of safety signs between undergraduate Chemistry Education students and Chemistry Science students
2. The awareness levels of safety signs among undergraduate Chemistry students are not significantly dependent on Institutional types."
Ikiroma, Chinda & Bankole, 2021: 50

These specific questions and hypotheses do not seem to be motivated in the conceptual framework. That is, a reader has not been given any rationale to think that there are reasons to test for differences between these different groups. There may have been good reasons to explore these variables, but authors of research papers are usually expected to share their reasoning with reader. (This is something which one would expect to be spotted in peer review, leading to the editor asking the authors to revise their submission to demonstrate the background behind asking about these specific points.)

It is not explained quite what 'institutional types' actually refers to. From the way results are discussed later in the paper (p.53), 'Institutional types' seems to be used here simply to mean different universities

Sampling – how random is random?

The sample is described as:

"A total of 60 year three undergraduate students studying Chemistry Education (B.Sc. Ed) and Pure Chemistry (B.Sc.) were randomly drawn from three universities namely; University of Port Harcourt (Uniport), Rivers State University (RSU) and Ignatius Ajuru University of Education (IAUE) with each university contributing 20 students."
Ikiroma, Chinda & Bankole, 2021: 50

This study was then effectively a survey where data was collected from a sample of a defined population (third undergraduate students studying chemistry education or pure chemistry in any of three named universities in one geographical area) to draw inferences about the whole population.

Randomisation is an important process when it is not possible to collect data from the whole population of interest, as it allows statistics to be used to infer from the sample what is likely in the wider population. Ideally, authors should briefly explain how they have randomised (Taber, 2013) so readers can judge if the technique used does really give each member of the population (here one assumes 3rd year chemistry undergraduates in each of the Universities) an equal chance of being sampled. (If the authors are reading this blog, please feel free to respond to this point in the comments below: how did you go about the randomisation?)

Usually in survey research an indication would be given of the size of the population (as a random sample of 0.1% of a population gives results with larger inherent error than a random sample of 10%). That information does not seem to be provided here.

Even if the authors did use randomisation, presumably they did not randomise across the combined population of "year three undergraduate students studying Chemistry Education (B.Sc. Ed) and Pure Chemistry (B.Sc.)…from three universities" as they would have been very unlikely to have ended up with equal numbers from the three different institutions. So, probably this means they took (random?) samples from within each of the three sub-populations (which would be sensible to compare between them).

It later becomes clear that of the 60 sampled students, 30 were chemistry education students and 30 straight chemistry students (p.53) – so again it seems likely that sampling was done separately for the two types of course. There does not seem to be any information on the break down between university and course, so it is possible there were 10 students in each of 6 cells, if each University offered both courses:

	chemistry education	pure chemistry	total
University of Port Harcourt	?	?	20
Rivers State University	?	?	20
Ignatius Ajuru University of Education	?	?	20
total	30	30	60

Sample

Clearly this distribution potentially matters as there could be interactions between these two different variables. Consider for example that perhaps students taking pure chemistry tended to have a higher 'awareness level of safety signs' than students taking chemistry education: then (see the hypothetical example in the table below), if a sample from one university mostly comprised of pure chemistry students, and that from another university mostly of chemistry education students, then this would likely lead to finding differences between institutions in the samples even if there were no such differences between the combined student populations in the two universities. The uneven sampling from the two courses within the universities would bias the comparison between institutions.

	course 1	course 2	total
University A	20	0	20
University B	10	10	20
Education C	0	20	20
total	30	30	60

A problematic sample for disentangling factors

My best guess is the the authors appreciated that, and that all three universities taught both types of course, and the authors sampled 10 students from each course in each of the universities. Perhaps they even did it randomly – but it would be good to know how as I have found that sometimes authors who claim to have made random selections have not used a technique that would strictly support this claim. (And if a sample is not random we can have much less confidence about how it reflects the population sampled.)

The point is that a reader of a research report should not have to guess. Often researchers (and research students) are so close to their own project that it becomes easy to assume others will know things about the work that have become taken for granted by the research team. This is where a good editor or peer reviewer can point out, and ask for, missing information that is not available to a reader.

Ethical research?

Sampling can also be impacted by ethics. It is one thing to select people randomly, but not all people will volunteer to help with research and it is general principle of educational research that participants should offer voluntary informed consent. Where some people agree to participate, and others do not, this may bias results if people's reasons for accepting/declining an invitation are linked to the focus of the research.

Imagine inviting students to some research to test whether cheating (copying homework, taking reference material into examinations) can be detected by using a lie detector to questions students about their behaviours. Are those who cheat and those who are scrupulously honest likely to volunteer to take part in such research to the same extent, or might we expect most cheats to opt out?

It is normal practice in educational research to make a brief statement that the research was carried out ethically, e.g., that participants all volunteered freely having had the purpose and nature of the research clearly explained to them. I could not find any such statement in the article, nor any requirement for authors to include this in the journal's author guidelines.

Lack of face validity

In research, validity is about measuring what you think you are measuring. In the school laboratory, if we saw a student completing the 'potential difference/V' column of a results table when taking readings with an ammeter we would consider the recorded results were invalid.

I once gave a detention to a first year (Y7) student who had done something naughty that I forget now, and as we were working on a measurement topic I set her to measure the length of the corridor outside the lab. with a metre rule. Although this was an appropriate instrument, I found that she did not appreciate that in order to get a valid result she had to make sure she moved the metre stick on by the right amount (that is, one metre!) for each counted metre – instead she would move the metre stick by about half its length! Some pupils may resent being in detention and deliberately respond with sloppy work, but in this case it seemed the fault was with the teacher who had overestimated prior knowledge and consequently given an insufficiently detailed explanation of the task!

In research we have to be confident that an instrument is measuring what it is meant to. This may mean testing and calibrating – using the instrument somewhere where we already have a good measure and checking it gives the expected answers (like checking a clock against the Greenwich pips on the radio) before using it in research to measure an unknown.

In educational studies we can sometimes spot invalid instruments because they lack face validity – that is, 'on the face of it' an instrument does not seem suitable to do the job. Certainly when 'we' are people with relevant expertise. Consider an instrument to test understanding of trigonometry which consisted of the item: "discuss five factors which contributed to the 'industrial revolution' in eighteenth century Britain". We might suspect this could be used to measure something, but probably not understanding of trigonometry. This would be an invalid test to use for that purpose.

Awareness level of safety signs?

The focus of Ikiroma, Chinda & Bankole's study was 'awareness level of safety signs'. Strictly this only seems to mean being aware of such signs³, but I read this to mean that the authors wanted to know if students recognised the meaning of different signs commonly used: whether they were aware what particular signs signified.

The 'Chemistry Laboratory Test on Safety Signs' instrument:

The authors report they used:

A well validated and researchers['] constructed test instrument, titled, Chemistry Laboratory Test on Safety Signs (CLTSS) which had an internal reliability index of 0.94 via Cronbach Alpha was used for data collection in the study. The questions in the test required the students to match a list of 20 chemicals in column A and of nine safety signs accompanied with a short description in column B. This aimed to reduce the wrong response because the students incorrectly considered only the symbol.
Ikiroma, Chinda & Bankole, 2021: 50

Validation

A key question that an editor would expect peer reviewers to consider is whether the instrumentation used in research can provide valid findings. Where this is not clear in a manuscript submited to a journal, the editor should (if not rejecting the paper) ask for this to be addressed in a revision of the manuscript. Validity is clearly critical, and research should not be published it if makes claims based on invalid instrumentation.

Therefore when reporting research instruments it is usually expected that authors explain how they tested for validity – simply stating something is well-validated does not count! Face validity might be tested by asking carefully identified experts to see if they think the instrument tests what is claimed (so here, perhaps asking university chemistry lecturers – "do you think these questions are suitable for elciiting undergraduate students' awareness levels of safety signs?").

If an instrument passed this initial test, more detailed work would be undertaken. Here perhaps a small sample of students from a closely related poopulation to that being studied (pehaps second year chemistry students in the same universities; or third year chemistry students from another university) would be asked to complete the instrument using a 'think aloud' protocol where they explain their thinking as they answer the questions – or would be interviewed about their awareness of safety signs as well as comepleting the instrument to triangulate reponses to the instrument against interview responses.

Cronbach's alpha measures the internal consistency of an scale (Taber, 2018), but offers no assurance of validity. (If a good set of items meant to test enjoyment of school science were used instead to measure belief in ghosts the set of items would still show the same high level of internal consistency despite being used for a totally invalid purpose.)

Chemistry Laboratory Test on Safety Signs (Ikiroma, Chinda & Bankole, 2021: 51)

So, what the students had to do was match a chemical (name) with the appropriate hazard sign.What was being tested was knowledge of the hazards associated with laboratory chemicals (an important enough topic, but not what was promised).

Had the signs not been labelled, then the items would have required BOTH knowing about the hazards of specific chemicals AND knowing which sign was used for the associated hazards. However, Ikiroma, Chinda & Bankole had actually looked "to reduce the wrong response because the students incorrectly considered only the symbol" (emphasis added). That is, they had built into the test instrument a means to ensure it did not test awareness of 'safety signs' (what they were supposedly interested in) and only measured awareness of the hazards associated with particular substances.

What Ikiroma, Chinda & Bankolehad tested was potentially useful and interesting – but it was not what they claimed. The paper title, the research questions, and the hypotheses (and consequently their statements of findings) were all misleading in that regard. One would have expected the editor and peer reviewers should have noticed that and required corrections before publication was considered.

Quality assurance?

The journal's website claims that "Journal of Chemistry: Education Research and Practice is an international peer reviewed journal…" Peer review involves the editor rejecting poor submissions, and ensuring that the quality of what is published by arranging that experts in the field scrutinise submissions to ensure they meet quality standards. Peer reviewers are chosen for expertise related to the specific topic of the specific submission. In particular, reviewers will ask for changes where these seem to be needed, and the editor of a journal decides to publish only when she is satisfied sufficient changes have been made in response to review reports.

Publishing poor quality work, especially work with glaring issues, reflects badly on the authors, the journal, and the editor.⁴

The journal accepted the paper about 9 days after submision

In this case the editor – Professor Nour Shafik Emam El-Gendy of the Environmental Sciences & Nanobiotechnology Egyptian Petroleum Research Institute in Egypt – appears to have taken just over a week to

arrange peer review,
receive and consider the referee reports, and
report back to the authors asking them for any changes she felt were needed, and (once she received any revisions that may have been requested) then
decide the paper was ready for publication in this supposed 'leading international journal'.

That could be seen as impressive, but actually seems incredible.

Peer review is not just about sorting good work from bad, it is also about supporting authors by showing them where their work needs to be improved before it is put on public display. Peer review is as much about improving work as selecting.

I do not know if Ikiroma, Chinda & Bankole were expected to pay the standard charge for publishing – that is $999 for this journal – but, if so, I do not think they got value for money. Given the level of support they seem to have received from the peer review process, I think they should be entitled to a refund.

Work cited:

Ikiroma, B., Chinda, W., & Bankole, I. S. (2021). Chemistry Laboratory Safety Signs Awareness Among Undergraduate Students in Rivers State. Journal of Chemistry: Education Research and Practice, 5(1), 47-54.
Oludipe, O. S., & Etobro, B. A. (2018). Science Education Undergraduate Students' Level of Laboratory Safety Awareness. Journal of Education, Society and Behavioural Science, 23(4), 1-7. https://doi.org/10.9734/JESBS/2017/37461
Taber, K. S. (2013). Non-random thoughts about research. Chemistry Education Research and Practice, 14(4), 359-362. doi: 10.1039/c3rp90009f. [Free access]
Taber, K. S. (2014). Ethical considerations of chemistry education research involving "human subjects". Chemistry Education Research and Practice, 15(2), 109-113. [Free access]
Taber, K. S. (2018). The Use of Cronbach's Alpha When Developing and Reporting Research Instruments in Science Education. Research in Science Education, 48, 1273-1296. doi:10.1007/s11165-016-9602-2

Notes:

¹: "All works published by 'Journal of Chemistry: Education Research and Practice' is [sic, are] under the terms of Creative Commons Attribution License. This permits anyone to copy, distribute, transmit and adapt the work provided the original work and source is appropriately cited." (https://opastonline.com/journal/journal-of-chemistry-education-research-and-practice/author-guidelines)

²: This is indeed what these authors claim – by which they seem to mean none of the students tested reaches a score of half-marks. (I infer that from the way they report other results in the same study.) They report that, of 50 respondents,

"21 (42%) could not identify correctly all [sic, could not identify correctly any of] the eight symbols presented in the survey. 24 (48%) was only able to identify one out of eight symbols presented, and 5 (10%) could identify just two. Thus, it is alarming to discover that *100% of the respondents are not aware of the laboratory signs and symbols"
Oludipe & Etobro, 2018: 5

(The asterisk seems to indicate which rows from a result table are being summed to give 100%.)

3. If we simply wanted to test for awareness of safety signs we might think of displaying some jars of reagents and asking something like "is there any way you would know about which of these chemicals present particular risks?" or "how might we find out about special precautions we should take when working with these reagents?" and see if the respondents pointed out the safety signs printed on the labels.

⁴. Journals that attract high volumes of submissions may have a team of editors to share the work. Some journals with several editors acknowledge the specific editor who handles each published study.

I suspect that some predatory journals appoint editors who do not actually see the submissions (as it is difficult to see how qualified editors would approve some of the nonsense published in some journals), which are instead handled by administrators who may not be experts in the field (and so may not be in a position to judge the expertise of peer reviewers). If this is so, the editor should be described as an 'honorary editor' as misrepresenting a journal as edited by a subject expert is dishonest.

Share on Facebook

Author: Keith

Former school and college science teacher, teacher educator, research supervisor, and research methods lecturer. Emeritus Professor of Science Education at the University of Cambridge. View all posts by Keith