Assessing Chemistry Laboratory Equipment Availability and Practice

Comparative education on a local scale?

Keith S. Taber

Image by Mostafa Elturkey from Pixabay 

I have just read a paper in a research journal which compares the level of chemistry laboratory equipment and 'practice' in two schools in the "west Gojjam Administrative zone" (which according to a quick web-search is in the Amhara Region in Ethiopia). According to Yesgat and Yibeltal (2021),

"From the analysis of Chemistry laboratory equipment availability and laboratory practice in both … secondary school and … secondary school were found in very low level and much far less than the average availability of chemistry laboratory equipment and status of laboratory practice. From the data analysis average chemistry laboratory equipment availability and status of laboratory practice of … secondary school is better than that of Jiga secondary school."

Yesgat and Yibeltal, 2021: abstract [I was tempted to omit the school names in this posting as I was not convinced the schools had been treated reasonably, but the schools are named in the very title of the article]

Now that would seem to be something that could clearly be of interest to teachers, pupils, parents and education administrators in those two particular schools, but it raises the question that can be posed in relation to any research: 'so what?' The findings might be a useful outcome of enquiry in its own context, but what generalisable knowledge does this offer that justifies its place in the research literature? Why should anyone outside of West Gojjam care?

The authors tell us,

"There are two secondary schools (Damot and Jiga) with having different approach of teaching chemistry in practical approach"

Yesgat and Yibeltal, 2021: 96

So, this suggests a possible motivation.

  • If these two approaches reflect approaches that are common in schools more widely, and
  • if these two schools can be considered representative of schools that adopt these two approaches, and
  • if 'Chemistry Laboratory Equipment Availability and Practice' can be considered to be related to (a factor influencing? an effect of?) these different approaches, and
  • if the study validly and reliably measures 'Chemistry Laboratory Equipment Availability and Practice', and
  • if substantive differences are found between the schools

then the findings might well be of wider interest. As always in research, the importance we give to findings depends upon a whole logical chain of connections that collectively make an argument.

Spoiler alert!

At the end of the paper, I was none the wiser what these 'different approaches' actually were.

A predatory journal

I have been reading some papers in a journal that I believed, on the basis of its misleading title and website details, was an example of a poor-quality 'predatory journal'. That is, a journal which encourages submissions simply to be able to charge a publication fee (currently $1519, according to the website), without doing the proper job of editorial scrutiny. I wanted to test this initial evaluation by looking at the quality of some of the work published.

Although the journal is called the Journal of Chemistry: Education Research and Practice (not to be confused, even if the publishers would like it to be, with the well-established journal Chemistry Education Research and Practice) only a few of the papers published are actually education studies. One of the articles that IS on an educational topic is called 'Assessment of Chemistry Laboratory Equipment Availability and Practice: A Comparative Study Between Damot and Jiga Secondary Schools' (Yesgat & Yibeltal, 2021).

Comparative education?

Yesgat and Yibeltal imply that their study falls in the field of comparative education. 1 They inform readers that 2,

"One purpose of comparative education is to stimulate critical reflection about our educational system, its success and failures, strengths and weaknesses. This critical reflection facilitates self-evaluation of our work and is the basis for determining appropriate courses of action. Another purpose of comparative education is to expose us to educational innovations and systems that have positive outcomes. Most compartivest states [sic] that comparative education has four main purposes. These are:

To describe educational systems, processes or outcomes

To assist in development of educational institutions and practices

To highlight the relationship between education and society

To establish generalized statements about education that are valid in more than one country

Yesgat & Yibeltal, 2021: 95-96
Comparative education studies look to characterise (national) education systems in relation to their social/cultural contexts (Image by Gerd Altmann from Pixabay)

Of course, like any social construct, 'comparative education' is open to interpretation and debate: for example, "that comparative education brings together data about two or more national systems of education, and comparing and contrasting those data" has been characterised as an "a naive and obvious answer to the question of what constitutes comparative education" (Turner, 2019, p.100).

There is then some room for discussion over whether particular research outputs should count as 'comparative education' studies or not. Many comparative education studies do not actually compare two educational systems, but rather report in detail from a single system (making possible subsequent comparisons based across several such studies). These educational systems are usually understood as national systems, although there may be a good case to explore regional differences within a nation if regions have autonomous education systems and these can be understood in terms of broader regional differences.

Yet, studying one aspect of education within one curriculum subject at two schools in one educational educational administrative area of one region of one country cannot be understood as comparative education without doing excessive violence to the notion. This work does not characterise an educational system at national, regional or even local level.

My best assumption is that as the study is comparing something (in this case an aspect of chemistry education in two different schools) the authors feel that makes it 'comparative education', by which account of course any educational experiment (comparing some innovation with some kind of comparison condition) would automatically be a comparative education study. We all make errors sometimes, assuming terms have broader or different meanings than their actual conventional usage – and may indeed continue to misuse a term till someone points this out to us.

This article was published in what claims to be a peer reviewed research journal, so the paper was supposedly evaluated by expert reviewers who would have provided the editor with a report on strengths and weaknesses of the manuscript, and highlighted areas that would need to be addressed before possible publication. Such a reviewer would surely have reported that 'this work is not comparative education, so the paragraph on comparative education should either be removed, or authors should contextualise it to explain why it is relevant to their study'.

The weak links in the chain

A research report makes certain claims that derive from a chain of argument. To be convinced about the conclusions you have to be convinced about all the links in the chain, such as:

  • sampling (were the right people asked?)
  • methodology (is the right type of research design used to answer the research question?)
  • instrumentation (is the data collection instrument valid and reliable?)
  • analysis (have appropriate analytical techniques been carried out?)

These considerations cannot be averaged: if, for example, a data collection instrument does not measure what it is said to measure, then it does not matter how good the sample, or how careful the analysis, the study is undermined and no convincing logical claims can be built. No matter how skilled I am in using a tape measure, I will not be able to obtain accurate weights with it.

Sampling

The authors report the make up of their sample – all the chemistry teachers in each school (13 in one, 11 in the other), plus ten students from each of grades 9, 10 and 11 in each school. They report that "… 30 natural science students from Damot secondary school have been selected randomly. With the same technique … 30 natural sciences students from Jiga secondary school were selected".

Random selection is useful to know there is no bias in a sample, but it is helpful if the technique for randomisation is briefly reported to assure readers that 'random' is not being used as a synonym for 'arbitrary' and that the technique applied was adequate (Taber, 2013b).

A random selection across a pooled sample is unlikely to lead to equal representation in each subgroup (From Taber, 2013a)

Actually, if 30 students had been chosen at random from the population of students taking natural sciences in one of the schools, it would be extremely unlikely they would be evenly spread, 10 from each year group. Presumably, the authors made random selections within these grade levels (which would be eminently sensible, but is not quite what they report).

Read about the criterion for randomness in research

Data collection

To collect data the authors constructed a questionnaire with Likert-type items.

"…questionnaire was used as data collecting instruments. Closed ended questionnaires with 23 items from which 8 items for availability of laboratory equipment and 15 items for laboratory practice were set in the form of "Likert" rating scale with four options (4=strongly agree, 3=agree, 2=disagree and 1=strongly disagree)"

Yesgat & Yibeltal, 2021: 96

These categories were further broken down (Yesgat & Yibeltal, 2021: 96): "8 items of availability of equipment were again sub grouped in to

  • physical facility (4 items),
  • chemical availability (2 items), and
  • laboratory apparatus (2 items)

whereas 15 items of laboratory practice were further categorized as

  • before actual laboratory (4 items),
  • during actual laboratory practice (6 items) and
  • after actual laboratory (5 items)

Internal coherence

So, there were two basic constructs, each broken down into three sub-constructs. This instrument was piloted,

"And to assure the reliability of the questionnaire a pilot study on a [sic] non-sampled teachers and students were conducted and Cronbach's Alpha was applied to measure the coefficient of internal consistency. A reliability coefficient of 0.71 was obtained and considered high enough for the instruments to be used for this research"

Yesgat & Yibeltal, 2021: 96

Running a pilot study can be very useful as it can highlight issues about items. However, although simply asking people to complete a questionnaire might highlight items people could not make any sense of, it may not be as useful as interviewing them about how they understood items to check that respondents understand items in the same way as researchers.

The authors cite the value of Cronbach's alpha to demonstrate their instrument has internal consistency. However, they seem to be quoting the value obtained in the pilot study, where the statistic strictly applies to a particular administration of an instrument (so the value from the main study is more relevant to the results reported).

More problematic, the authors appear to cite a value of alpha from across all 23 items (n.b., the value of alpha tends to increase as the number of items increases, so what is considered an acceptable value needs to allow for the number of items included) when these are actually two distinct scales: 'availability of laboratory equipment' and 'laboratory practice'. Alpha should be quoted separately for each scale – values across distinct scales are not useful (Taber, 2018). 3

Do the items have face validity?

The items in the questionnaire are reported in appendices (pp.102-103), so I have tabulated them here, so readers can consider

  • (a) whether they feel these items reflect the constructs of 'availability of equipment' and 'laboratory practice';
  • (b) whether the items are phrased in a clear way for both teachers and students (the authors report "conceptually the same questionnaires with different forms were prepared" (p.101) but if this means different wording fro teachers than students this is not elaborated – teachers were also asked demographic questions about their educational level)); and
  • (c) whether they are all reasonable things to expect both teachers and students to be able to rate.
'Availability of equipment' items'Laboratory practice' items
Structured and well- equipped laboratory roomYou test the experiments before your work with students
Availability of electric system in laboratory roomYou give laboratory manuals to student before practical work
Availability of water system in laboratory roomYou group and arrange students before they are coming to laboratory room
Availability of laboratory chemicals are available [sic]You set up apparatus and arrange chemicals for activities
No interruption due to lack of lab equipmentYou follow and supervise students when they perform activities
Isolated bench to each student during laboratory activitiesYou work with the lab technician during performing activity
Chemicals are arranged in a logical order.You are interested to perform activities?
Laboratory apparatus are arranged in a logical orderYou check appropriate accomplishment of your students' work
Check your students' interpretation, conclusion and recommendations
Give feedbacks to all your students work
Check whether the lab report is individual work or group
There is a time table to teachers to conduct laboratory activities.
Wear safety goggles, eye goggles, and other safety equipment in doing so
Work again if your experiment is failed
Active participant during laboratory activity
Items teachers and students were asked to rate on a four point scale (agree / strongly agree / disagree / strongly disagree)

Perceptions

One obvious limitation of this study is that it relies on reported perceptions.

One way to find out about the availability of laboratory equipment might be to visit teaching laboratories and survey them with an observation schedule – and perhaps even make a photographic record. The questionnaire assumes that teacher and student perceptions are accurate and that honest reports would be given (might teachers have had an interest in offering a particular impression of their work?)

Sometimes researchers are actually interested in impressions (e.g., for some purposes whether a students considers themselves a good chemistry student may be more relevant than an objective assessment), and sometimes researchers have no direct access to a focus of interest and must rely on other people's reports. Here it might be suggested that a survey by questionnaire is not really the best way to, for example, "evaluate laboratory equipment facilities for carrying out practical activities" (p.96).

Findings

The authors describe their main findings as,

"Chemistry laboratory equipment availability in both Damot secondary school and Jiga secondary school were found in very low level and much far less than the average availability of chemistry laboratory equipment. This finding supported by the analysis of one sample t-values and as it indicated the average availability of laboratory equipment are very much less than the test value and the p-value which is less than 0.05 indicating the presence of significant difference between the actual availability of equipment to the expected test value (2.5).

Chemistry laboratory practice in both Damot secondary school and Jiga secondary school were found in very low level and much far less than the average chemistry laboratory practice. This finding supported by the analysis of one sample t-values and as it indicated the average chemistry laboratory practice are very much less than the test value and the p-value which is less than 0.05 indicating the presence of significant difference between the actual chemistry laboratory practice to the expected test value."

Yesgat & Yibeltal, 2021: 101 (emphasis added)

This is the basis for the claim in the abstract that "From the analysis of Chemistry laboratory equipment availability and laboratory practice in both Damot secondary school and Jiga secondary school were found in very low level and much far less than the average availability of chemistry laboratory equipment and status of laboratory practice."

'The average …': what is the standard?

But this raises a key question – how do the authors know what the "the average availability of chemistry laboratory equipment and status of laboratory practice" is, if they have only used their questionnaire in two schools (which are both found to be below average)?

Yesgat & Yibeltal have run a comparison between the average ratings they get from the two schools on their two scales and the 'average test value' rating of 2.5. As far as I can see, this is not an empirical value at all. It seems the authors have just assumed that if people are asked to use a four point scale – 1, 2, 3, 4 – then the average rating will be…2.5. Of course, that is a completely arbitrary assumption. (Consider the question – 'how much would you like to be beaten and robbed today?': would the average response be likely to be nominal mid-point of a ratings scale?) Perhaps if a much wider survey had been undertaken the actual average rating would have been 1.9 0r 2.7 or …

That is even assuming that 'average' is a meaningful concept here. A four point Likert scale is an ordinal scale ('agree' is always less agreement than 'strongly agree' and more than 'disagree') but not a ratio scale (that is, it cannot be assumed that the perceived 'agreement' gap (i) from 'strongly disagree' to 'disagree' is the same for each respondent and the same as that (ii) from 'disagree' to 'agree' and (iii) from 'agree' to 'strongly agree'). Strictly, Likert scale ratings cannot be averaged (better being presented as bar charts showing frequencies of response) – so although the authors carry out a great deal of analysis, much of this is, strictly, invalid.

So what has been found out from this study?

I would very much like to know what peer reviewers made of this study. Expert reviewers would surely have identified some very serious weaknesses in the study and would have been expected to have recommended some quite major revisions even if they thought it might eventually be publishable in a research journal.

An editor is expected to take on board referee evaluations and ask authors to make such revisions as are needed to persuade the editor the submission is ready for publication. It is the job of the editor of a research journal, supported by the peer reviewers, to

a) ensure work of insufficient quality is not published

b) help authors strengthen their paper to correct errors and address weaknesses

Sometimes this process takes some time, with a number of cycles of revision and review. Here, however, the editor was able to move to a decision to publish in 5 days.

The study reflects a substantive amount of work by the authors. Yet, it is hard to see how this study, at least as reported in this journal, makes a substantive contribution to public knowledge. The study finds that one school has somewhat higher survey ratings on an instrument that has not been fully validated than another school, and is based on a pooling of student and teacher perceptions, and which guesses that both rate lower than a hypothetical 'average' school. The two schools were supposed to represent a "different approach[es] of teaching chemistry in practical approach" – but even if that is the case, the authors have not shared with their readers what these different approaches are meant to be. So, there would be no possibility of generalising from the schools to 'approach[es] of teaching chemistry', even if that was logically justifiable. And comparative education it is not.

This study, at least as published, does not seem to offer useful new knowledge to the chemistry education community that could support teaching practice or further research. Even in the very specific context of the two specific schools it is not clear what can be done with the findings which simply reflect back to the informants what they have told the researchers, without exploring the reasons behind the ratings (how do different teachers and students understand what counts as 'Chemicals are arranged in a logical order') or the values the participants are bringing to the study (is 'Check whether the lab report is individual work or group' meant to imply that it is seen as important to ensure that students work cooperatively or to ensure they work independently or …?)

If there is a problem highlighted here by the "very low levels" (based on a completely arbitrary interpretation of the scales) there is no indication of whether this is due to resourcing of the schools, teacher preparation, levels of technician support, teacher attitudes or pedagogic commitments, timetabling problems, …

This seems to be a study which has highlighted two schools, invited teachers and students to complete a dubious questionnaire, and simply used this to arbitrarily characterise the practical chemistry education in the schools as very poor, without contextualising any challenges or offering any advice on how to address the issues.

Work cited:
Note:

1 'Imply' as Yesgat and Yibeltal do not actually state that they have carried out comparative education. However, if they do not think so, then the paragraph on comparative education in their introduction has no clear relationship with the rest of the study and is not more than a gratuitous reference, like suddenly mentioning Nottingham Forest's European Cup triumphs or noting a preferred flavour of tea.


2 This seemed an intriguing segment of the text as it was largely written in a more sophisticated form of English than the rest of the paper, apart from the odd reference to "Most compartivest [comparative education specialists?] states…" which seemed to stand out from the rest of the segment. Yesgat and Yibeltal do not present this as a quote, but cite a source informing their text (their reference [4] :Joubish, 2009). However, their text is very similar to that in another publication:

Quote from Mbozi, 2017, p.21Quote from Yesgat and Yibeltal, 2021, pp.95-96
"One purpose of comparative education is to stimulate critical reflection about our educational system, its success and failures, strengths and weaknesses."One purpose of comparative education is to stimulate critical reflection about our educational system, its success and failures, strengths and weaknesses.
This critical reflection facilitates self-evaluation of our work and is the basis for determining appropriate courses of action.This critical reflection facilitates self-evaluation of our work and is the basis for determining appropriate courses of action.
Another purpose of comparative education is to expose us to educational innovations and systems that have positive outcomes. Another purpose of comparative education is to expose us to educational innovations and systems that have positive outcomes.
The exposure facilitates our adoption of best practices.
Some purposes of comparative education were not covered in your exercise above.
Purposes of comparative education suggested by two authors Noah (1985) and Kidd (1975) are presented below to broaden your understanding of the purposes of comparative education.
Noah, (1985) states that comparative education has four main purposes [4] and these are:Most compartivest states that comparative education has four main purposes. These are:
1. To describe educational systems, processes or outcomes• To describe educational systems, processes or outcomes
2. To assist in development of educational institutions and practices• To assist in development of educational institutions and practices
3. To highlight the relationship between education and society• To highlight the relationship between education and society
4. To establish generalized statements about education, that are valid in more than one country."• To establish generalized statements about education that are valid in more than one country"
Comparing text (broken into sentences to aid comparison) from two sources

3 There are more sophisticated techniques which can be used to check whether items do 'cluster' as expected for a particular sample of respondents.


4 As suggested above, researchers can pilot instruments with interviews or 'think aloud' protocols to check if items are understood as intended. Asking assumed experts to read through and check 'face validity' is of itself quite a limited process, but can be a useful initial screen to identify items of dubious relevance.