Internal consistency

A topic in research methodology

Sometimes research instruments include scales made up of a range of items that are intended to collectively measure some construct (e.g., attitude to research methods classes).

Read about 'Measurement scales'

The extent to which the different items seem to be measuring the same thing according to the pattern of responses across scale items from a sample of respondents is usually referred to as the 'internal consistency' of the scale.

Sometimes this feature is called the 'reliability' of the scale, but reliability usually means the extent to which repeated measurements made with the same instrument are the same, so it is preferable to use the term 'internal consistency' to avoid confusion with other uses of 'reliability'.

Read about 'Reliability'

Cronbach's alpha

Internal consistency of a scale is often measured using a statistic called Cronbach's alpha. This is a very widely used statistic, but it is often misused (Taber, 2018).

Read about why I wrote a paper about Cronbach's alpha

Alpha can take values from 0 (in theory ) to 1. The larger the value, the greater the internal consistency. However, very high internal consistency is not necessarily a good thing (see below).

Misconceptions of alpha

A common error is to calculate alpha for a multi-scale instrument: it makes sense to calculate the internal consistency of a single scale, but not to do so for a set of scales meant to be measuring different constructs. (This is like trying to get a single value for the budget of a school plus the number of students on roll, plus the duration of the lunch-break plus the mean age of the teaching staff! Such a number can be calculated, but does not relate to anything meaningful.)

Cronbach's alpha is sometimes quoted for an instrument, but is only valid for an administration of the instrument (Taber, 2018). If the same instrument is used in a different population, or even with a different sample of the same population, or even with the same sample later, the value of alpha needs to be calculated anew based on the current set of responses.

It is very common for authors to quote a value of alpha of 0.7 as a criterion for acceptable internal consistency*. There is little basis for this. Indeed alpha is sensitive to scale size (so a value of 0.7 for a scale of four items indicates something rather different to a value of 0.7 for a scale of 20 items!), so alpha can usually be easily pushed up to 0.7 just by adding more items to a scale – this can work even if those items are not strictly valid.

* Indeed, I have seen a number of of papers stating that an alpha of 0.7 is acceptable and citing my own paper (Taber, 2018) – that points out this is nonsense – as justification.

A related misconception is that the higher the value of alpha the better (Taber, 2018). Yet the reason scales are used is rather than single items is that the different items are believed to each, imperfectly, elicit partial information related to the latent variable. Very high values of alpha (say over 0.9) may suggests some items are much too similar to each other, and indicate that some items should be removed to give a shorter scale that takes less time for participants to complete and for the researcher to analyse. (If alpha reached 1 this would suggest any one item would do the job just as well as the scale.)

Source cited:

My introduction to educational research:

Taber, K. S. (2013). Classroom-based Research and Evidence-based Practice: An introduction (2nd ed.). London: Sage.