Not motivating a research hypothesis

A 100% survey return that represents 73% (or 70%, or perhaps 48%) of the population

Keith S. Taber

…the study seems to have looked for a lack of significant difference regarding a variable which was not thought to have any relevance…

This is like hypothesising…that the amount of alkali needed to neutralise a certain amount of acid will not depend on the eye colour of the researcher; experimentally confirming this is the case; and then seeking to publish the results as a new contribution to knowledge.

…as if a newspaper headline was 'Earthquake latest' and then the related news story was simply that, as usual, no earthquakes had been reported.

Structuring a research report

A research report tends to have a particular kind of structure. The first section sets out background to the study to be described. Authors offer an account of the current state of the relevant field – what can be called a conceptual framework.

In the natural sciences it may be that in some specialised fields there is a common, accepted way of understanding that field (e.g., the nature of important entities, the relevant variables to focus on). This has been described as working within an established scientific 'paradigm'. 1 However, social phenomena (such as classroom teaching) may be of such complexity that a full account requires exploration at multiple levels, with a range of analytical foci (Taber, 2008). 2 Therefore the report may indicate which particular theoretical perspective (e.g., personal constructivism, activity theory, Gestalt psychology, etc.) has informed the study.

This usually leads to one or more research questions, or even specific hypotheses, that are seen to be motivated by the state of the field as reflected in the authors' conceptual framework.

Next, the research design is explained: the choice of methodology (overall research strategy), the population being studied and how it was sampled, the methods of data collection and development of instruments, and choice of analytical techniques.

All of this is usually expected before any discussion (leaving aside a short statement as part of the abstract) of the data collected, results of analysis, conclusions and implications of the study for further research or practice.

There is a logic to designing research. (Image after Taber, 2014).

A predatory journal

I have been reading some papers in a journal that I believed, on the basis of its misleading title and website details, was an example of a poor-quality 'predatory journal'. That is, a journal which encourages submissions simply to be able to charge a publication fee (currently $1519, according to the website), without doing the proper job of editorial scrutiny. I wanted to test this initial evaluation by looking at the quality of some of the work published.

Although the journal is called the Journal of Chemistry: Education Research and Practice (not to be confused, even if the publishers would like it to be, with the well-established journal Chemistry Education Research and Practice) only a few of the papers published are actually education studies. One of the articles that IS on an educational topic is called 'Students' Perception of Chemistry Teachers' Characteristics of Interest, Attitude and Subject Mastery in the Teaching of Chemistry in Senior Secondary Schools' (Igwe, 2017).

A research article

The work of a genuine academic journal

A key problem with predatory journals is that because their focus is on generating income they do not provide the service to the community expected of genuine research journals (which inevitably involves rejecting submissions, and delaying publication till work is up to standard). In particular, the research journal acts as a gatekeeper to ensure nonsense or seriously flawed work is not published as science. It does this in two ways.

Discriminating between high quality and poor quality studies

Work that is clearly not up to standard (as judged by experts in the field) is rejected. One might think that in an ideal world no one is going to send work that has no merit to a research journal. In reality we cannot expect authors to always be able to take a balanced and critical view of their own work, even if we would like to think that research training should help them develop this capacity.

This assumes researchers are trained, of course. Many people carrying out educational research in science teaching contexts are only trained as natural scientists – and those trained as researchers in natural science often approach the social sciences with significant biases and blind-spots when carrying out research with people. (Watch or read 'Why do natural scientists tend to make poor social scientists?')

Also, anyone can submit work to a research journal – be they genius, expert, amateur, or 'crank'. Work is meant to be judged on its merits, not by the reputation or qualifications of the author.

De-bugging research reports – helping authors improve their work

The other important function of journal review is to identify weaknesses and errors and gaps in reports of work that may have merit, but where these limitations make the report unsuitable for publication as submitted. Expert reviewers will highlight these issues, and editors will ensure authors respond to the issues raised before possible publication. This process relies on fallible humans, and in the case of reviewers usually unpaid volunteers, but is seen as important for quality control – even if it not a perfect system. 3

This improvement process is a 'win' all round:

  • the quality of what is published is assured so that (at least most) published studies make a meaningful contribution to knowledge;
  • the journal is seen in a good light because of the quality of the research it publishes; and
  • the authors can be genuinely proud of their publications which can bring them prestige and potentially have impact.

If a predatory journal which claims (i) to have academic editors making decisions and (ii) to use peer review does not rigorously follow proper processes, and so publishes (a) nonsense as scholarship, and (b) work with major problems, then it lets down the community and the authors – if not those making money from the deceit.

The editor took just over a fortnight to arrange any peer review, and come to a decision that the research report was ready for publication

Students' perceptions of chemistry teachers' characteristics

There is much of merit in this particular research study. Dr Iheanyi O. Igwe explains why there might be a concern about the quality of chemistry teaching in the research context, and draws upon a range of prior literature. Information about the population (the public secondary schools II chemistry students in Abakaliki Education Zone of Ebonyi State) and the sample is provided – including how the sample, of 300 students at 10 schools, was selected.

There is however an unfortunate error in characterising the population:

"the chemistry students' population in the zone was four hundred and ten (431)"

Igwe, 2017, p.8

This seems to be a simple typographic error, but the reader cannot be sure if this should read

  • "…four hundred and ten (410)" or
  • "…four hundred and thirty one (431)".

Or perhaps neither, as the abstract tells readers

"From a total population of six hundred and thirty (630) senior secondary II students, a sample of three hundred (300) students was used for the study selected by stratified random sampling technique."

Igwe, 2017, abstract

Whether the sample is 300/410 or 300/431 or even 300/630 does not fundamentally change the study, but one does wonder how these inconsistencies were not spotted by the editor, or a peer reviewer, or someone in the production department. (At least, one might wonder about this if one had not seen much more serious failures to spot errors in this journal.) A reader could wonder whether the presence of such obvious errors may indicate a lack of care that might suggest the possibility of other errors that a reader is not in a position to spot. (For example, if questionnaire responses had not been tallied correctly in compiling results, then this would not be apparent to anyone who did not have access to the raw data to repeat the analysis.) The author seems to have been let down here.

A multi-scale instrument

The final questionnaire contained 5 items on each of three scales

  • students' perception of teachers' interest in the teaching of chemistry;
  • students' perception of teachers' attitude towards the teaching of chemistry;
  • students' perception of teachers' mastery of the subject in the teaching of chemistry

Igwe informs readers that,

"the final instrument was tested for reliability for internal consistency through the Cronbach Alpha statistic. The reliability index for the questionnaire was obtained as 0.88 which showed that the instrument was of high internal consistency and therefore reliable and could be used for the study"

Igwe, 2017, p.4

This statistic is actually not very useful information as one would want to know about the internal consistency within the scales – an overall value across scales is not informative (conceptually, it is not clear how it should be interpreted – perhaps that the three scales are largely eliciting much the same underlying factor? ) (Taber, 2018). 4

There are times when aggregate information is not very informative (Image by Syaibatul Hamdi from Pixabay )

Again, one might have hoped that expert reviewers would have asked the author to quote the separate alpha values for the three scales as it is these which are actually informative.

The paper also offers a detailed account of the analysis of the data, and an in-depth discussion of the findings and potential implications. This is a serious study that clearly reflects a lot of work by the researcher. (We might hope that could be taken for granted when discussing work published in a 'research journal', but sadly that is not so in some predatory journals.) There are limitations of course. All research has to stop somewhere, and resources and, in particular, access opportunities are often very limited. One of these limitations is the wider relevance of the population sampled.

But do the results apply in Belo Horizonte?

This is the generalisation issue. The study concerns the situation in one administrative zone within a relatively small state in South East Nigeria. How do we know it has anything useful to tell us about elsewhere in Nigeria, let alone about the situation in Mexico or Vietnam or Estonia? Even within Ebonyi State, the Abakaliki Education Zone (that is, the area of the state capital) may well be atypical – perhaps the best qualified and most enthusiastic teachers tend to work in the capital? Perhaps there would have been different findings in a more rural area?

Yet this is a limitation that applies to a good deal of educational research. This goes back to the complexity of educational phenomena. What you find out about an electron or an oxidising agent studied in Abakaliki should apply in Cambridge, Cambridgeshire or equally in Cambridge, Massachusetts. That cannot be claimed about what you may find out about a teacher in Abakaliki, or a student, a class, a school, a University

Misleading study titles?

Educational research studies often have strictly misleading titles – or at least promise a lot more than they deliver. This may in part be authors making unwarranted assumptions, or it may be journal editors wanting to avoid unwieldy titles.

"This situation has inadvertently led to production of half backed graduate Chemistry educators."

Igwe, 2017, p.2

The title of this study does suggest that the study concerns perceptions of Chemistry Teachers' Characteristics …in Senior Secondary Schools, when we cannot assume that chemistry teachers in the Abakaliki Education Zone of Ebonyi State can stand for chemistry teachers more widely. Indeed some of the issues raised as motivating the need for the study are clearly not issues that would apply in all other educational contexts – that is the 'situation', which is said to be responsible for the "production of half backed [half-baked?] graduate Chemistry educators" in Nigeria, will not apply everywhere. Whilst the title could be read as promising more general findings than were possible in the study, Igwe's abstract is quite explicit about the specific population sampled.

A limited focus?

Another obvious limitation is that whilst pupils' perceptions of their teachers are very important, it does not offer a full picture. Pupils may feel the need to give positive reviews, or may have idealistic conceptions. Indeed, assuming that voluntary, informed consent was given (which would mean that students knew they could decline to take part in the research without fear of sanctions) it is of note that every one of the 30 students targeted in each of the ten schools agreed to complete the survey,

"The 300 copies of the instrument were distributed to the respondents who completed them for retrieval on the spot to avoid loss and may be some element of bias from the respondents. The administration and collection were done by the researcher and five trained research assistants. Maximum return was made of the instrument."

Igwe, 2017, p.4

To get a 100% return on a survey is pretty rare, and if normal ethical procedures were followed (with the voluntary nature of the activity made clear) then this suggests these students were highly motivated to appease adults working in the education system.

But we might ask how student perceptions of teacher characteristics actually relate to teacher characteristics?

For example, observations of the chemistry classes taught by these teachers could possibly give a very different impression of those teachers than that offered by the student ratings in the survey. (Another chemistry teacher may well be able to distinguish teacher confidence or bravado from subject mastery when a learner is not well placed to do so.) Teacher self-reports could also offer a different account of their 'Interest, Attitude and Subject Mastery', as could evaluations by their school managers. Arguably, a study that collected data from multiple sources would offer the possibility of 'triangulating' between sources.

However, Igwe, is explicit about the limited focus of the study, and other complementary strands of research could be carried out to follow-up on the study. So, although the specific choice of focus is a limitation, this does not negate the potential value of the study.

Research questions

Although I recognise a serious and well-motivated study, there is one aspect of Igwe's study which seemed rather bizarre. The study has three research questions (which are well-reflected in the title of the study) and a hypothesis which I suspect will likely surprise some readers.

That is not a good thing. At least, I always taught research students that unlike in a thriller or 'who done it?' story, where a surprise may engage and amuse a reader, a research report or thesis is best written to avoid such surprises. The research report is an argument that needs to flow though the account – if a reader is surprised at something the researcher reports doing then the author has probably forgotten to properly introduce or explain something earlier in the report.

Here are the research questions and hypotheses:

"Research Questions

The following research questions guided the study, thus:

How do students perceive teachers' interest in the teaching of chemistry?

How do students perceive teachers' attitude towards the teaching of chemistry?

How do students perceive teachers' mastery of the subjects in the teaching of chemistry?

Hypotheses
The following null hypothesis was tested at 0.05 alpha levels, thus:
HO1 There is no significant difference in the mean ratings of male and female students on their perception of chemistry teachers' characteristics in the teaching of chemistry."

Igwe, 2017, p.3

A surprising hypothesis?

A hypothesis – now where did that come from?

Now, I am certainly not criticising a researcher for looking for gender differences in research. (That would be hypocritical as I looked for such differences in my own M.Sc. thesis, and published on gender differences in teacher-student interactions in physics classes, gender differences in students' interests in different science topics on stating secondary school, and links between pupil perceptions of (i) science-relatedness and (ii) gender-appropriateness of careers.)

There might often be good reasons in studies to look for gender differences. But these reasons should be stated up-front. As part of the conceptual framework motivating the study, researchers should explain that based on their informal observations, or on anecdotal evidence, or (better) drawing upon explicit theoretical considerations, or that informed by the findings of other related studies – or whatever reason there might – there are good reasons to check for gender differences.

The flow of research (Underlying image from Taber, 2013) The arrows can be read as 'inform(s)'.

Perhaps Igwe had such reasons, but there seems to be no mention of 'gender' as a relevant variable prior to the presentation of the hypothesis: not even a concerning dream, or signs in the patterns of tea leaves. 5 To some extent, this is reinforced by the choice of the null hypothesis – that no such difference will be found. Even if it makes no substantive difference to a study whether a hypothesis is framed in terms of there being a difference or not, psychologically the study seems to have looked for a lack of significant difference regarding a variable which was not thought to have any relevance.

Misuse of statistics

It is important for researchers not to test for effects that are not motivated in their studies. Statistical significance tells a researcher something is unlikely to happen just by chance – but it still might. Just as someone buying a lottery ticket is unlikely to win the lottery – but they might. Logically a small proportion of all the positive statistical results in the literature are 'false positives' because unlikely things do happen by chance – just not that often. 6 The researcher should not (metaphorically!) go round buying up lots of lottery tickets, and then seeing an occasional win as something more than chance.

No alarms and no surprises

And what was found?

"From the result of analysis … the null hypothesis is accepted which means that there is no significant difference in the mean ratings of male and female students in their perception of chemistry teachers' characteristics (interest, attitude and subject mastery) in the teaching of chemistry."

Igwe, 2017, p.6

This is like hypothesising, without any motivation, that the amount of alkali needed to neutralise a certain amount of acid will not depend on the eye colour of the researcher; experimentally confirming this is the case; and then seeking to publish the results as a new contribution to knowledge.

Why did Igwe look for gender difference (or more strictly, look for no gender difference)?

  • A genuine relevant motivation missing from the paper?
  • An imperative to test for something (anything)?
  • Advice that journals are more likely to publish studies using statistical testing?
  • Noticing that a lot of studies do test for gender differences (whether there seems a good reason to do so or not)?

This seems to be an obvious point for peer reviewers and the editor to raise: asking the author to either (a) explain why it makes sense to test for gender differences in this study – or (b) to drop the hypothesis from the paper. It seems they did not notice this, and readers are simply left to wonder – just as you would if a newspaper headline was 'Earthquake latest' and then the related news story was simply that, as usual, no earthquakes had been reported.

Work cited:


Footnotes:

1 The term paradigm became widely used in this sense after Kuhn's (1970) work although he later acknowledged criticisms of the ambiguous way he used the term, in particular as learning about a field through working through standard examples, paradigms, and the wider set of shared norms and values that develop in an established field which he later termed 'disciplinary matrix'. In psychology research 'paradigm' may be used in the more specific sense of an established research design/protocol.


2 There are at least three ways of explaining why a lot of research in the social science seems more chaotic and less structured to outsiders than most research in the natural sciences.

  • a) Ontology. Perhaps the things studied in the natural sciences really exist, and some of those in the social sciences are epiphenomena and do not reflect fundamental, 'real', things. There may be some of that sometimes, but if so I think it is a matter of degree (that is, scientists have not been beyond studying the ether or phlogiston), because of the third option (c).
  • b) The social sciences are not as mature as many areas of the natural sciences and so are sill 'pre-paradigmatic'. I am sure there is sometimes an element of this: any new field will take time to focus in on reliable and productive ways of making sense of its domain.
  • c) The complexity of the phenomena. Social phenomena are inherently more complex, often involving feedback loops between participants' behaviours and feelings and beliefs (including about the research, the researcher, etc.)

Whilst (a) and (b) may sometimes be pertinent, I think (c) is often especially relevant to this question.


3 An alternative approach that has gained some credence is to allow authors to publish, but then invite reader reviews which will also be published – and so allowing a public conversation to develop so readers can see the original work, criticism, responses to those criticisms, and so forth, and make their own judgements. To date this has only become common practice in a few fields.

Another approach for empirical work is for authors to submit research designs to journals for peer review – once a design has been accepted by the journal, the journal agrees to publish the resulting study as long as the agreed protocol has been followed. (This is seen as helping to avoid the distorting bias in the literature towards 'positive' results as studies with 'negative' results may seem less interesting and so less likely to be accepted in prestige journals.) Again, this is not the norm (yet) in most fields.


4 The statistic has a maximum value of 1, which would indicate that the items were all equivalent, so 0.88 seems a high value, till we note that a high value of alpha is a common artefact of including a large number of items.

However, playing Devil's advocate, I might suggest that the high overall value of alpha could suggest that the three scales

  • students' perception of teachers' interest in the teaching of chemistry;
  • students' perception of teachers' attitude towards the teaching of chemistry;
  • students' perception of teachers' mastery of the subject in the teaching of chemistry

are all tapping into a single underlying factor that might be something like

  • my view of whether my chemistry teacher is a good teacher

or even

  • how much I like my chemistry teacher

5 Actually the discrimination made is between male and female students – it is not clear what question students were asked to determine 'gender', and whether other response options were available, or whether students could decline to respond to this item.


6 Our intuition might be that only a small proportion of reported positive results are false positives, because, of course, positive results reflect things unlikely to happen by chance. However if, as is widely believed in many fields, there is a bias to reporting positive results, this can distort the picture.

Imagine someone looking for factors that influence classroom learning. Consider that 50 variables are identified to test, such as teacher eye colour, classroom wall colour, type of classroom window frames, what the teacher has for breakfast, the day of the week that the teacher was born, the number of letters in the teacher's forename, the gender of the student who sits nearest the fire extinguisher, and various other variables which are not theoretically motivated to be considered likely to have an effect. With a confidence level of p[robability] ≤ 0.05 it is likely that there will be a very small number of positive findings JUST BY CHANCE. That is, if you look across enough unlikely events, it is likely some of them will happen. There is unlikely to be a thunderstorm on any particular day. Yet there will likely be a thunderstorm some day in the next year. If a report is written and published which ONLY discusses a positive finding then the true statistical context is missing, and a likely situation is presented as unlikely to be due to chance.