Sampling bias

A topic in research methodology

Populations and samples

In research we are often interested in 'populations' such as:

  • 16 year olds students taking National school examinations
  • chemistry teachers
  • final year engineering undergraduates
  • physics textbooks recommended by curriculum authorities
  • analogies used in teaching high school biology
  • lectures on thermodynamics

Read about populations of interest in research

As well as defining populations carefully (e.g., is our research about all physics textbooks that have been recommended by curriculum authorities, or those currently recommended – and, everywhere, or in some particular national context?) we may find that the size or accessibility of the population requires collecting data from/about a sample from the population.

Read about sampling in research

Respresentativeness of samples

To be informative in research, a sample has to be representative of the wider population. That is, we have to be confident that what we learn about the sample is likely to be true of the population in general – that we can generalise.

Read about generalisation in research

Sampling techniques are intended to ensure our sample is as informative as possible. In some forms of research this likely means looking for statistical representativeness (for example, by randomly selecting the sample from the full population so every member of the popular has the same chance of being included).

Read about randomisation

Other approaches may be sensible in certain circumstances, such as building a sample which includes representatives of different subgroups from the population – or something 'purposive sampling' of particular individuals we think are especially well-placed to inform us.

Read about purposive sampling

Sampling bias

Sampling bias occurs when the sample does not represent the population in some systematic way. Consider a survey to find out the extent to which old age pensioners engaged with modern digital technology. It might find that a very high proportion of respondents comfortably use modern technologies – if potential participants were invited by contacting pensioners with email addresses and asking them to complete an internet survey.

Bias in self-selecting samples

A particular issue in research that relies on self-selecting samples is that there may be systematic differences between those who choose to participate the study, and those that do not. This is not just an unlikely 'theoretical' consideration.

Consider a study exploring whether students cheat in examinations. Imagine a thousand students were surveyed, and a hundred responded (and a 10% response rate is not especially low when we just approach people we do not know well and ask them politely to complete a survey) – and only one of the hundred admitted to ever cheating in examinations.

Read about response rate

If we can assume responses received are honest (a big assumption – and an epistemological question!) then we can reasonably assume that the incidence of exam. cheating among out sample is very low. Statistical methods can tell us about the likely error in the 1% figure when considering all 1000 surveyed – as long as the 100 who replied are fully representative of the 1000 asked.

So, then the question we would need to ask is:

Is there any reason to believe that people who have cheated in examinations will be any more or less likely to respond to a survey about cheating in examinations than those who have never cheated in examinations?

We cannot be sure of the answer here, but there must surely be a reasonable suspicion that people who have cheated may be less likely to be prepared to report this than people who have never cheated. We would be foolhardy to assume the 1% (± statistical error based on an assumption of sample representativeness) figure can be generalised.

An example

In one study researchers sought to survey students taking an organic chemistry course about their experience of the on-line homework set as part of the programme (Richards-Babb, Curtis, Georgieva & Penn, 2015). The survey was completed by 159 of 226 eligible respondents. That is a response rate of 70%.

However, the researchers wanted to check whether there might some systematic difference between those responding and those not responding. (As this was work undertaken within a university department the researchers has access to relevant information about those not responding – something that is often not the case in surveys.)

They reported on three criteria:

characteristicresponders (N=159)non-responders significance?
[average] final course numeric grade76.89±10.60
(N=159)
69.99±17.55
(N=65)
*
average online homework scores92.00±13.59
(N = 159)
74.03±26.43
(N = 67).
*
(average) previous chemistry achievement2.57±0.83
(N = 157)
2.48±0.84
(N = 66)
n.s.
Collated information from Richards-Babb, Curtis, Georgieva & Penn, 2015

The figures quote represent the mean and standard deviations calculated, along with the number of data available (there were a small number of potential data points where information was missing). The precision quoted here is as in the original research report. (It is questionable whether it is meaningful to cite results to 3 or 4 significant figures here given the precision of the original data and the size of the sample.)

Richards-Babb and colleagues report that the differences on final course numeric grade and average online homework score were statistically significant (shown in the table above by *). Examining these discrepancies between response rates of responders and non-responders can be useful in considering the responses received about student perceptions of the on-line resource.

Work cited:

Richards-Babb, M., Curtis, R., Georgieva, Z., & Penn, J. H. (2015). Student Perceptions of Online Homework Use for Formative Assessment of Learning in Organic Chemistry. Journal of Chemical Education, 92(11), 1813-1819. doi:10.1021/acs.jchemed.5b00294

My introduction to educational research:

Taber, K. S. (2013). Classroom-based Research and Evidence-based Practice: An introduction (2nd ed.). London: Sage.