Educational experiments – making the best of an unsuitable tool?

Can small-scale experimental investigations of teaching carried-out in a couple of arbitrary classrooms really tells us anything about how to teach well?


Keith S. Taber


Undertaking valid educational experiments involves (often, insurmountable) challenges, but perhaps this grid (shown larger below) might be useful for researchers who do want to do genuinely informative experimental studies into teaching?


Applying experimental method to educational questions is a bit like trying to use a precision jeweller's screwdriver to open a tin of paint: you may get the tin open eventually, but you will probably have deformed the tool in the process whilst making something of a mess of the job.


In recent years I seem to have developed something of a religious fervour about educational research studies of the kind that claim to be experimental evaluations of pedagogies, classroom practices, teaching resources, and the like. I think this all started when, having previously largely undertaken interpretive studies (for example, interviewing learners to find out what they knew and understood about science topics) I became part of a team looking to develop, and experimentally evaluate, classroom pedagogy (i.e., the epiSTEMe project).

As a former school science teacher, I had taught learners about the basis of experimental method (e.g., control of variables) and I had read quite a number of educational research studies based on 'experiments', so I was pretty familiar with the challenges of doing experiments in education. But being part of a project which looked to actually carry out such a study made a real impact on me in this regard. Well, that should not be surprising: there is a difference between watching the European Cup Final on the TV, and actually playing in the match, just as reading a review of a concert in the music press is not going to impact you as much as being on stage performing.

Let me be quite clear: the experimental method is of supreme value in the natural sciences; and, even if not all natural science proceeds that way, it deserves to be an important focus of the science curriculum. Even in science, the experimental strategy has its limitations. 1 But experiment is without doubt a precious and powerful tool in physics and chemistry that has helped us learn a great deal about the natural world. (In biology, too, but even here there are additional complications due to the variations within populations of individuals of a single 'kind'.)

But transferring experimental method from the laboratory to the classroom to test hypotheses about teaching is far from straightforward. Most of the published experimental studies drawing conclusions about matters such as effective pedagogy, need to be read with substantive and sometimes extensive provisos and caveats; and many of them are simply invalid – they are bad experiments (Taber, 2019). 2

The experiment is a tool that has been designed, and refined, to help us answer questions when:

  • we are dealing with non-sentient entities that are indifferent to outcomes;
  • we are investigating samples or specimens of natural kinds;
  • we can identify all the relevant variables;
  • we can measure the variables of interest;
  • we can control all other variables which could have an effect;

These points simply do not usually apply to classrooms and other learning contexts. 3 (This is clearly so, even if educational researchers often either do not appreciate these differences, or simply pretend they can ignore them.)

Applying experimental method to educational questions is a bit like trying to use a precision jeweller's screwdriver to open a tin of paint: you may get the tin open eventually, but you will probably have deformed the tool in the process whilst making something of a mess of the job.

The reason why experiments are to be preferred to interpretive ('qualitative') studies is that supposedly experiments can lead to definite conclusions (by testing hypotheses), whereas studies that rely on the interpretation of data (such as classroom observations, interviews, analysis of classroom talk, etc.) are at best suggestive. This would be a fair point when an experimental study genuinely met the control-of-variables requirements for being a true experiment – although often, even then, to draw generalisable conclusions that apply to a wide population one has to be confident one is working with a random or representatives sample, and use inferential statistics which can only offer a probabilistic conclusion.

My creed…researchers should prefer to undertake competent work

My proselytising about this issue, is based on having come to think that:

  • most educational experiments do not fully control relevant variables, so are invalid;
  • educational experiments are usually subject to expectancy effects that can influence outcomes;
  • many (perhaps most) educational experiments have too few independent units of analysis to allow the valid use of inferential statistics;
  • most large-scale educational experiments can not assure that samples are fully representative of populations, so strictly cannot be generalised;
  • many experiments are rhetorical studies that deliberately compare a condition (supposedly being tested but actually) assumed to be effective with a teaching condition known to fall short of good teaching practice;
  • an invalid experiment tells us nothing that we can rely upon;
  • a detailed case study of a learning context which offers rich description of teaching and learning potentially offers useful insights;
  • given a choice between undertaking a competent study of a kind that can offer useful insights, and undertaking a bad experiment which cannot provide valid conclusions, researchers should prefer to undertake competent work;
  • what makes work scientific is not the choice of methodology per se, but the adoption of a design that fits the research constraints and offers a genuine opportunity for useful learning.

However, experiments seem very popular in education, and often seem to be the methodology of choice for researchers into pedagogy in science education.

Read: Why do natural scientists tend to make poor social scientists?

This fondness of experiments will no doubt continue, so here are some thoughts on how to best draw useful implications from them.

A guide to using experiments to inform education

It seems there are two very important dimensions that can be used to characterise experimental research into teaching – relating to the scale and focus of the research.


Two dimensions used to characterise experimental studies of teaching


Scale of studies

A large-scale study has a large number 'units of analysis'. So, for example, if the research was testing out the value of using, say, augmented reality in teaching about predator-prey relationships, then in such a study there would need to be a large number of teaching-learning 'units' in the augmented learning condition and a similarly large number of teaching-learning 'units' in the comparison condition. What a unit actually is would vary from study to study. Here a unit might be a sequence of three lessons where a teacher teaches the topic to a class of 15-16 year-old learners (either with, or without, the use of augmented reality).

For units of analysis to be analysed statistically they need to be independent from each other – so different students learning together from the same teacher in the same classroom at the same time are clearly not learning independently of each other. (This seems obvious – but in many published studies this inconvenient fact is ignored as it is 'unhelpful' if researchers wish to use inferential statistics but are only working with a small number of classes. 4)

Read about units of analysis in research

So, a study which compared teaching and learning in two intact classes can usually only be considered to have one unit of analysis in each condition (making statistical tests completely irrelevant 5, thought this does not stop them often being applied anyway). There are a great many small scale studies in the literature where there are only one or a few units in each condition.

Focus of study

The other dimension shown in the figure concerns the focus of a study. By the focus, I mean whether the researchers are interested in teaching and learning in some specific local context, or want to find out about some general population.

Read about what is meant by population in research

Studies may be carried out in a very specific context (e.g., one school; one university programme) or across a wide range of contexts. That seems to simply relate to the scale of the study, just discussed. But by focus I mean whether the research question of interest concerns just a particular teaching and learning context (which may be quite appropriate when practitioner-researchers explore their own professional contexts, for exmample), or is meant to help us learn about a more general situation.


local focusgeneral focus
Why does school X get such outstanding science examination scores?Is there a relationship between teaching pedagogy employed and science examination results in English schools?
Will jig-saw learning be a productive way to teach my A level class about the properties of the transition elements?Is jig-saw learning an effective pedagogy for use in A level chemistry classes?
Some hypothetical research questions relating either to a specific teaching context, or a wider population. (n.b. The research literature includes a great many studies that claim to explore general research questions by collecting data in a single specific context.)

If that seems a subtle distinction between two quite similar dimensions then it is worth noting that the research literature contains a great many studies that take place in one context (small-scale studies) but which claim (implicitly or explicitly) to be of general relevance. So, many authors, peer reviewers, and editors clearly seem think one can generalise from such small scale studies.

Generalisation

Generalisation is the ability to draw general conclusions from specific instances. Natural science does this all the time. If this sample of table salt has the formula NaCl, then all samples of table salt do; if the resistance of this copper wire goes up when the wire is heated the same will be found with other specimens as well. This usually works well when dealing with things we think are 'natural kinds' – that is where all the examples (all samples of NaCl, all pure copper wires) have the same essence.

Read about generalisation in research

Education deals with teachers, classes, lessons, schools…social kinds that lack that kind of equivalence across examples. You can swap any two electrons in a structure and it will make absolutely no difference. Does any one think you can swap the teachers between two classes and safely assume it will not have an effect?

So, by focus I mean whether the point of the research is to find out about the research context in its own right (context-directed research) or to learn something that applies to a general category of phenomena (theory-directed research).

These two dimensions, then, lead to a model with four quadrants.

Large-scale research to learn about the general case

In the top-right quadrant is research which focuses on the general situation and is larger-scale. In principle 6 this type of research can address a question such as 'is this pedagogy (teaching resource, etc.) generally effective in this population', as long as

  • the samples are representative of the wider population of interest, and
  • those sampled are randomly assigned to conditions, and
  • the number of units supports statistical analysis.

The slight of hand employed in many studies is to select a convenience sample (two classes of thirteen years old students at my local school) yet to claim the research is about, and so offers conclusions about, a wider population (thirteen year learners).

Read about some examples of samples used to investigate populations


When an experiment tests a sample drawn at random from a wider population, then the findings of the experiment can be assumed to (probably) apply (on average) to the population. (Taber, 2019)

Even when a population is properly sampled, it is important not to assume that something which has been found to be generally effective in a population will be effective throughout the population. Schools, classes, courses, learners, topics, etc. vary. If it has been found that, say, teaching the reactivity series through enquiry generally works in the population of English classes of 13-14 year students, then a teacher of an English class of 13-14 year students might sensibly think this is an approach to adopt, but cannot assume it will be effective in her classroom, with a particular group of students.

To implement something that has been shown to generally work might be considered research-based teaching, as long as the approach is dropped or modified if indications are it is not proving effective in this particular context. That is, there is nothing (please note, UK Department for Education, and Ofsted) 'research-based' about continuing with a recommended approach in the face of direct empirical evidence that it is not working in your classroom.

Large-scale research to learn about the range of effectiveness

However, even large-scale studies where there are genuinely sufficient units of analysis for statistical analysis may not logically support the kinds of generalisation in the top-right quadrant. For that, researchers needs either a random sampling of the full population (seldom viable given people and institutions must have a choice to participate or not 7), or a sample which is known to be representative of the population in terms of the relevant characteristics – which means knowing a lot about

  • (i) the population,
  • (ii) the sample, and
  • (ii) which variables might be relevant!

Imagine you wanted to undertake a survey of physics teachers in some national context, and you knew you could not reach all that population so you needed to survey a sample. How could you possibly know that the teachers in your sample were representative of the wider population on whatever variables might potentially be pertinent to the survey (level of qualification?; years of experience?; degree subject?; type of school/college taught in?; gender?…)

But perhaps a large scale study that attracts a diverse enough sample may still be very useful if it collects sufficient data about the individual units of analysis, and so can begin to look at patterns in how specific local conditions relate to teaching effectiveness. That is, even if the sample cannot be considered representative enough for statistical generalisation to the population, such a study might be a be to offer some insights into whether an approach seems to work well in mixed-ability classes, or top sets, or girls' schools, or in areas of high social deprivation, or…

In practice, there are very few experimental research studies which are large-scale, in the sense of having enough different teachers/classes as units of analysis to sit in either of these quadrants of the chart. Educational research is rarely funded at a level that makes this possible. Most researchers are constrained by the available resources to only work with a small number of accessible classes or schools.

So, what use are such studies for producing generalisable results?

Small-scale research to incrementally extend the range of effectiveness

A single small-scale study can contribute to a research programme to explore the range of application of an innovation as if it was part of a large-scale study with a diverse sample. But this means such studies need to be explicitly conceptualised and planned as part of such a programme.

At the moment it is common for research papers to say something like

"…lots of research studies, from all over the place, report that asking students to

(i) first copy science texts omitting all the vowels, and then

(ii) re-constituting them in full by working from the reduced text, by writing it out adding vowels that produce viable words and sentences,

is an effective way of supporting the learning of science concepts; but no one has yet reported testing this pedagogic method when twelve year old students are studying the topic of acids in South Cambridgeshire in a teaching laboratory with moveable stools and West-facing windows.

In this ground-breaking study, we report an experiment to see if this constructivist, active-learning, teaching approach leads to greater science learning among twelve year old students studying the topic of acids in South Cambridgeshire in a teaching laboratory with moveable stools and West-facing windows…"

Over time, the research literature becomes populated with studies of enquiry-based science education, jig-saw learning, use of virtual reality, etc., etc., and these tend to refer to a range of national contexts, variously aged students, diverse science topics, etc., this all tends to be piecemeal. A coordinated programme of research could lead to researchers both (a) giving rich description of the context used, and (b) selecting contexts strategically to build up a picture across ranges of contexts,

"When there is a series of studies testing the same innovation, it is most useful if collectively they sample in a way that offers maximum information about the potential range of effectiveness of the innovation.There are clearly many factors that may be relevant. It may be useful for replication studies of effective innovations to take place with groups of different socio-economic status, or in different countries with different curriculum contexts, or indeed in countries with different cultural norms (and perhaps very different class sizes; different access to laboratory facilities) and languages of instruction …. It may be useful to test the range of effectiveness of some innovations in terms of the ages of students, or across a range of quite different science topics. Such decisions should be based on theoretical considerations.

Given the large number of potentially relevant variables, there will be a great many combinations of possible sets of replication conditions. A large number of replications giving similar results within a small region of this 'phase space' means each new study adds little to the field. If all existing studies report positive outcomes, then it is most useful to select new samples that are as different as possible from those already tested. …

When existing studies suggest the innovation is effective in some contexts but not others, then the characteristics of samples/context of published studies can be used to guide the selection of new samples/contexts (perhaps those judged as offering intermediate cases) that can help illuminate the boundaries of the range of effectiveness of the innovation."

Taber, 2019

Not that the research programme would be co-ordinated by a central agency or authority, but by each contributing researcher/research team (i) taking into account the 'state of play' at the start of their research; (ii) making strategic decisions accordingly when selecting contexts for their own work; (iii) reporting the context in enough detail to allow later researchers to see how that study fits into the ongoing programme.

This has to be a more scientific approach than simply picking a convenient context where researchers expect something to work well; undertake a small-scale local experiment (perhaps setting up a substandard control condition to be sure of a positive outcome); and then report along the lines "this widely demonstrated effective pedagogy works here too", or, if it does not, perhaps putting the study aside without publication. As the philosopher of science, Karl Popper, reminded us, science proceeds through the testing of bold conjectures: an 'experiment' where you already know the outcome is actually a demonstration. Demonstrations are useful in teaching, but do not contribute to research. What can contribute is an experiment in a context where there is reason to be unsure if an innovation will be an improvement or not, and where the comparison reflects good teaching practice to offer a meaningful test.

Small-scale research to inform local practice

Now, I would be the first to admit that I am not optimistic that such an approach will be developed by researchers; and even if it is, it will take time for useful patterns to arise that offer genuine insights into the range of convenience of different pedagogies.

Does this mean that small-scale studies in single context are really a waste of research resource and an unmerited inconvenient for those working in such contexts?

Well, I have time for studies in my final (bottom left) quadrant. Given that schools and classrooms and teachers and classes all vary considerably, and that what works well in a highly selective boys-only fee-paying school with a class size of 16 may not be as effective in a co-educational class of 32 mixed ability students in an under-resourced school in an area of social deprivation – and vice versa, of course!, there is often value in testing out ideas (even recommended 'research-based' ones) in specific contexts to inform practice in that context. These are likely to be genuine experiments, as the investigators are really motived to find out what can improve practice in that context.

Often such experiments will not get published,

  • perhaps because the researchers are teachers with higher priorities than writing for publication;
  • perhaps because it is assumed such local studies are not generalisable (but they could sometimes be moved into the previous category if suitably conceptualised and reported);
  • perhaps because the investigators have not sought permissions for publication (part of the ethics of research), usually not necessary for teachers seeking innovations to improve practice as part of their professional work;
  • perhaps because it has been decided inappropriate to set up control conditions which are not expected to be of benefit to those being asked to participate;
  • but also because when trying out something new in a classroom, one needs to be open to make ad hoc modifications to, or even abandon, an innovation if it seems to be having a deleterious effect.

Evaluation of effectiveness here usually comes down to professional judgement (rather than statistical testing – which assumes a large random sample of a population – being used to invalidly generalise small, non-random, local results to that population) which might, in part, rely on the researcher's close (and partially tacit) familiarity with the research context.

I am here describing 'action research', which is highly useful for informing local practice, but which is not ideally suited for formal reporting in academic journals.

Read about action research

So, I suspect there may be an irony here.

There may be a great many small-scale experiments undertaken in schools and colleges which inform good teaching practice in their contexts, without ever being widely reported; whilst there are a great many similar scale, often 'forced' experiments, carried out by visiting researchers with little personal stake in the research context, reporting the general effectiveness of teaching approaches, based on misuse of statistics. I wonder which approach best reflects the true spirit of science?

Source cited:


Notes:

1 For example:

Even in the natural sciences, we can never be absolutely sure that we have controlled all relevant variables (after all, if we already knew for sure which variables were relevant, we would not need to do the research). But usually existing theory gives us a pretty good idea what we need to control.

Experiments are never a simple test of the specified hypothesis, as the experiment is likely to depends upon the theory of instrumentation and the quality of instruments. Consider an extreme case such as the discovery of the Higgs boson at CERN: the conclusions relied on complex theory that informed the design of the apparatus, and very challenging precision engineering, as well as complex mathematical models for interpreting data, and corresponding computer software specifically programmed to carry out that analysis.

The experimental results are a test of a hypothesis (e.g., that a certain particle would be found at events below some calculated energy level) subject to the provisos that

  • the theory of the the instrument and its design is correct; and
  • the materials of the apparatus (an apparatus as complex and extensive as a small city) have no serious flaws; and
  • the construction of the instrumentation precisely matches the specifications;
  • and the modelling of how the detectors will function (including their decay in performance over time) is accurate; and
  • the analytical techniques designed to interpret the signals are valid;
  • the programming of the computers carries out the analysis as intended.

It almost requires an act of faith to have confidence in all this (and I am confident there is no one scientist anywhere in the world who has a good enough understanding and familiarity will all these aspects of the experiment to be able to give assurances on all these areas!)


CREST {Critical Reading of Empirical Studies} evaluation form: when you read a research study, do you consider the cumulative effects of doubts you may have about different aspects of the work?

I would hope at least that as professional scientists and engineers they might be a little more aware of this complex chain of argumentation needed to support robust conclusions than many students – for students often seem to be overconfident in the overall value of research conclusions given any doubts they may have about aspects of the work reported.

Read about the Critical Reading of Empirical Studies Tool


Galileo Galilei was one of the first people to apply the telescope to study the night sky

Galileo Galilei was one of the first people to apply the telescope to study the night sky (image by Dorothe from Pixabay)


A historical example is Galileo's observations of astronomical phenomena such as Jovian moons (he spotted the four largest: Io, Europa, Ganymede and Callisto) and the irregular surface of the moon. Some of his contemporaries rejected these findings on the basis that they were made using an apparatus, the newly fanged telescope, that they did not trust. Whilst this is now widely seen as being arrogant and/or ignorant, arguably if you did not understand how a telescope could magnify, and you did not trust the quality of the lenses not to produce distortions, then it was quite reasonable to be sceptical of findings which were counter to a theory of the 'heavens' that had been generally accepted for many centuries.


2 I have discussed a number of examples on this site. For example:

Falsifying research conclusions: You do not need to falsify your results if you are happy to draw conclusions contrary to the outcome of your data analysis.

Why ask teachers to 'transmit' knowledge…if you believe that "knowledge is constructed in the minds of students"?

Shock result: more study time leads to higher test scores (But 'all other things' are seldom equal)

Experimental pot calls the research kettle black: Do not enquire as I do, enquire as I tell you

Lack of control in educational research: Getting that sinking feeling on reading published studies


3 For a detailed discussion of these and other challenges of doing educational experiments, see Taber, 2019.


4 Consider these two situations.

A researcher wants to find out if a new textbook 'Science for the modern age' leads to more learning among the Grade 10 students she teaches than the traditional book 'Principles of the natural world'. Imagine there are fifty grade 10 students divided already into two classes. The teacher flips a coin and randomly assigns one of the classes to the innovative book, the other being assigned by default the traditional book. We will assume she has a suitable test to assess each students' learning at the end of the experiment.

The teacher teaches the two classes the same curriculum by the same scheme of work. She presents a mini-lecture to a class, then sets them some questions to discuss using the text book. At the end of the (three part!) lesson, she leads a class disucsison drawing on students' suggested answers.

Being a science teacher, who believes in replication, she decides to repeat the exercise the following year. Unfortunately there is a pandemic, and all the students are sent into lock-down at home. So, the teacher assigns the fifty students by lot into two groups, and emails one group the traditional book, and the other the innovative text. She teaches all the students on line as one cohort: each lesson giving them a mini-lecture, then setting them some reading from their (assigned) book, and a set of questions to work through using the text, asking them to upload their individual answers for her to see.

With regard to experimental method, in the first cohort she has only two independent units of analysis – so she may note that the average outcome scores are higher in one group, but cannot read too much into that. However, in the second year, the fifty students can be considered to be learning independently, and as they have been randomly assigned to conditions, she can treat the assessment scores as being from 25 units of analysis in each condition (and so may sensibly apply statistics to see if there is a statistically significant different in outcomes).


5 Inferential statistical tests are usually used to see if the difference in outcomes across conditions is 'significant'. Perhaps the average score in a class with an innovation is 5.6, compared with an average score in the control class of 5.1. The average score is higher in the experimental condition, but is the difference enough to matter?

Well, actually, if the question is whether the difference is big enough to likely to make a difference in practice then researchers should calculate the 'effect size' which will suggest whether the difference found should be considered small, moderate or large. This should ideally be calculated regardless of whether inferential statistics are being used or not.

Inferential statistical tests are often used to see if the result is generalisable to the wider population – but, as suggested above, this is strictly only valid if the population of interest have been randomly sampled – which virtually never happens in educational studies as it is usually not feasible.

Often researchers will still do the calculation, based on the sets of outcome scores in the two conditions, to see if they can claim a statistically significant difference – but the test will only suggest how likely or unlikely the difference between the outcomes is, if the units of analysis have been randomly assigned to the conditions. So, if there are 50 learners each randomly assigned to experimental or control condition this makes sense. That is sometimes the case, but nearly always the researchers work with existing classes and do not have the option of randomly mixing the students up. [See the example in the previous note 4.] In such a situation, the stats. are not informative. (That does not stop them often being reported in published accounts as if they are useful.)


6 That is, if it possible to address such complications as participant expectations, and equitable teacher-familiarity with the different conditions they are assigned to (Taber, 2019).

Read about expectancy effects


7 A usual ethical expectation is that participants voluntarily (without duress) offer informed consent to participate.

Read about voluntary informed consent


Is your heart in the research?

Someone else's research, that is


Keith S. Taber


Imagine you have a painful and debilitating illness. Your specialist tells you there is no conventional treatment known to help. However, there is a new – experimental – procedure: a surgery that may offer relief. But it has not yet been fully tested. If you are prepared to sign up for a study to evaluate this new procedure, then you can undergo surgery.

You are put under and wheeled into the operating theatre. Whilst you experience – rather, do not experience – the deep, sleepless rest of anaesthesia, the surgeon saws through your breastbone, prises open your ribcage with a retractor (hopefully avoiding breaking any ribs),
reaches in, and gently lifts up your heart.

The surgeon, pauses, perhaps counts to five, then carefully replaces your heart between the lungs. The ribcage is closed, and you are sown-up without any actual medical intervention. You had been randomly assigned to the control group.


How can we test whether surgical interventions are really effective without blind controls?

Is it right to carry out sham operations on sick people just for the sake of research?

Where is the balance of interests?

(Image from Pixabay)


Research ethics

A key aspect of planning, executing and reviewing research is ethical scrutiny. Planning, obviously, needs to take into account ethical considerations and guidelines. But even the best laid plans 'of mice and men' (or, of, say, people investigating mice) may not allow for all eventualities (after all, if we knew what was going to happen for sure in a study, it would not be research – and it would be unethical to spend precious public resources on the study), so the ethical imperative does not stop once we have got approval and permissions. And even then, we may find that we cannot fully mitigate for unexpected eventualities – which is something to be reported and discussed to help inform future research.

Read about research ethics

When preparing students setting out on research, instruction about research ethics is vital. It is possible to teach about rules, and policies, and guidelines and procedures – but real research contexts are often complex, and ethical thinking cannot be algorithmic or a matter of adopting slogans and following heuristics. In my teaching I would include discussion of past cases of research studies that raised ethical questions for students to discuss and consider.

One might think that as research ethics is so important, it would be difficult to find many published studies which were not exemplars of good practice – but attitudes to, and guidance on, ethics have developed over time, and there are many past studies which, if not clearly unethical in today's terms, at least present problematic cases. (That is without the 'doublethink' that allows some contemporary researchers to, in a single paper, both claim active learning methods should be studied because it is known that passive learning activities are not effective, yet then report how they required teachers to instruct classes through passive learning to act as control groups.)

Indeed, ethical decision-making may not always be straight-forward – as it often means balancing different considerations, and at a point where any hoped-for potential benefits of the research must remain uncertain.

Pretending to operate on ill patients

I recently came across an example of a medical study which I thought raised some serious questions, and which I might well have included in my teaching of research ethics as a case for discussion, had I known about before I retired.

The research apparently involved surgeons opening up a patient's ribcage (not a trivial procedure), and lifting out the person's heart in order to carry out a surgical intervention…or not,

"In the late 1950s and early 60s two different surgical teams, one in Kansas City and one in Seattle, did double-blind trials of a ligation procedure – the closing of a duct or tube using a clip – for very ill patients suffering from severe angina, a condition in which pain radiates from the chest to the outer extremities as a result of poor blood supply to the heart. The surgeons were not told until they arrived in the operating theatre which patients were to receive a real ligation and which were not. All the patients, whether or not they were getting the procedure, had their chest cracked open and their heart lifted out. But only half the patients actually had their arteries rerouted so that their blood could more efficiently bathe its pump …"

Slater, 2018

The quote is taken from a book by Lauren Slater which sets out a history of drug use in psychiatry. Slater is a psychotherapist who has written a number of books about aspects of mental health conditions and treatments.

Fair testing

In order to make a fair experiment, the double-blind procedure sought to treat the treatment and control group the same in all respects, apart from the actual procedure of ligation of selected blood vessels that comprised the mooted intervention. The patients did not know (at least, in one of the studies) they might not have the real operation. Their physicians were not told who was getting the treatment. Even the surgeons only found out who was in each group when the patient arrived in theatre.

It was necessary for those in the control group to think they were having an intervention, and to undergo the sham surgery, so that they formed a fair comparison with those who got the ligation.

Read about control of variables

It was necessary to have double-blind study (neither the patients themselves, nor the physicians looking after them, were told which patients were, and which were not, getting the treatment), because there is a great deal of research which shows that people's beliefs and expectations make substantial differences to outcomes. This is a real problem in educational research when researchers want to test classroom practices such as new teaching schemes or resources or innovative pedagogies (Taber, 2019). The teacher almost certainly knows whether she is teaching the experimental or control group, and usually the students have a pretty good idea. (If every previous lesson has been based on teacher presentations and note-taking, and suddenly they are doing group discussion work and making videos, they are likely to notice.)

Read about expectancy effects

It was important to undertake a study, because there was not clear objective evidence to show whether the new procedure actually improved patient outcomes (or possibly even made matters worst). Doctors reported seeing treated patients do better – but could only guess how they might have done without surgery. Without proper studies, many thousands or people might ultimately undergo an ineffective surgery, with all the associated risks and costs, without getting any benefit.

Simply comparing treated patients with matched untreated patients would not do the job, as there can be a strong placebo effect of believing one is getting a treatment. (It is likely that at least some alternative therapies largely work because a practitioner with good social skills spends time engaging with the patient and their concerns, and the client expects a positive outcome.)

If any positive effects of heart surgery were due to the placebo effect, then perhaps a highly coloured sugar pill prescribed with confidence by a physician could have the same effect without operating theatres, surgical teams, hospital stays… (For that matter, a faith healer who pretended to operate without actually breaking the skin, and revealed a piece of material {perhaps concealed in a pocket or sleeve} presented as an extracted mass of diseased tissue or a foreign body, would be just as effective if the patient believed in the procedure.)

So, I understood the logic here.

Do no harm

All the same – this seemed an extreme intervention. Even today, anaesthesia is not very well understood in detail: it involves giving a patient drugs that could kill them in carefully controlled sub-lethal doses – when how much would actually be lethal (and what would be insufficient to fully sedate) varies from person to person. There are always risks involved.


"All the patients, whether or not they were getting the procedure had their chest cracked open and their heart lifted out."

(Image by Starllyte from Pixabay)


Open heart surgery exposes someone to infection risks. Cracking open the chest is a big deal. It can take two months for the disrupted tissues to heal. Did the research really require opening up the chest and lifting the heart for the control group?

Could this really ever have been considered ethical?

I might have been much more cynical had I not known of other, hm, questionable medical studies. I recall hearing a BBC radio documentary in the 1990s about American physicians who deliberately gave patients radioactive materials without their knowledge, just to to explore the effects. Perhaps most infamously there was the Tuskegee Syphilis study where United States medical authorities followed the development of disease over decades without revealing the full nature of the study, or trying to treat any of those infected. Compared with these violations, the angina surgery research seemed tame.

But do not believe everything you read…

According to the notes at the back of Slater's book, her reference was another secondary source (Moerman, 2002) – that is someone writing about what the research reports said, not those actual 'primary' accounts in the research journals.

So, I looked on-line for the original accounts. I found a 1959 study, by a team from the University of Washington School of Medicine. They explained that:

"Considerable relief of symptoms has been reported for patient with angina pectoris subjected to bilateral ligation of the internal mammary arteries. The physiologic basis for the relief of angina afforded by this rather simple operation is not clear."

Cobb, Thomas, Dillard, Merendino & Bruce, 1959

It was not clear why clamping these blood vessels in the chest should make a substantial difference to blood flow to the heart muscles – despite various studies which had subjected a range of dogs (who were not complaining of the symptoms of angina, and did not need any surgery) to surgical interventions followed by invasive procedures in order to measure any modifications in blood flow (Blair, Roth & Zintel, 1960).

Would you like your aorta clamped, and the blood drained from the left side of your heart, for the sake of a research study?

That raises another ethical issue – the extent of pain and suffering and morbidity it is fair to inflect on non-human animals (which are never perfect models for human anatomy and physiology) to progress human medicine. Some studies explored the details of blood circulation in dogs. Would you like your aorta clamped, and the blood drained from the left side of your heart, for the sake of a research study? Moreover, in order to test the effectiveness of the ligation procedure, in some studies healthy dogs had to have the blood supply to the heart muscles disrupted to given them similar compromised heart function as the human angina sufferers. 1

But, hang on a moment. I think I passed over something rather important in that last quote: "this rather simple operation"?

"Considerable relief of symptoms has been reported for patient with angina pectoris subjected to bilateral ligation of the internal mammary arteries. The physiologic basis for the relief of angina afforded by this rather simple operation is not clear."

Cobb and colleagues' account of the procedure contradicted one of my assumptions,

 At the time of operation, which was performed under local anesthesia [anaesthesia], the surgeon was handed a randomly selected envelope, which contained a card instructing him whether or not to ligate the internal mammary arteries after they had been isolated.

Cobb et al, 1959

It seems my inference that the procedure was carried out under general anaesthetic was wrong. Never assume! Surgery under local anaesthetic is not a trivial enterprise, but carries much less risk than general anaesthetic.

Yet, surely, even back then, no surgeon was going to open up the chest and handle the heart under a local anaesthetic? Cobb and colleagues wrote:

"The surgical procedures commonly used in the therapy of coronary-artery disease have previously been "major" operations utilizing thoracotomy and accompanied by some morbidity and a definite mortality. … With the advent of internal-mammary-artery ligation and its alleged benefit, a unique opportunity for applying the principles of a double-blind evaluation to a surgical procedure has been afforded

Cobb, Thomas, Dillard, Merendino & Bruce, 1959

So, the researchers were arguing that, previously, surgical interventions for this condition were major operations that did involve opening up the chest (thorax) – thoracotomy – where sham surgery would not have been ethical; but the new procedure they were testing – "this rather simple operation" was different.

Effects of internal-mammary-artery ligation on 17 patients with angina pectoris were evaluated by a double-blind technic. Eight patients had their internal mammary arteries ligated; 9 had skin incisions only. 

Cobb et al, 1959

They describe "a 'placebo' procedure consisting of parasternal skin incisions"– that is some cuts were made into the skin next to the breast bone. Skin incisions are somewhat short of open heart surgery.

The description given by the Kansas team (from the Departments of Medicine and Surgery, University of Kansas Medical Center, Kansas City) also differs from Slater's third-hand account in this important way:

"The patients were operated on under local anesthesia. The surgeon, by random sampling, selected those in whom bilateral internal mammary artery and vein ligation (second interspace) was to be carried out and those in whom a sham procedure was to be performed. The sham procedure consisted of a similar skin incision with exposure of the internal mammary vessels, but without ligation."

Dimond, Kittle & Crocket, 1960

This description of the surgery seemed quite different from that offered by Slater.

These teams seemed to be reporting a procedure that could be carried out without exposing the lungs or the heart and opening their protective covers ("in this technique…the pericardium and pleura are not entered or disturbed", Glover, et al, 1957), and which could be superficially forged by making a few cuts into the skin.


"The performance of bilateral division of the internal mammary arteries as compared to other surgical procedures for cardiac disease is safe, simple and innocuous in capable hands."

Glover, Kitchell, Kyle, Davila & Trout, 1958

The surgery involved making cuts into the skin of the chest to access, and close off, arteries taking blood to (more superficial) chest areas in the hope it would allow more to flow to the heart muscles; the sham surgery, the placebo, involved making similar incisions, but without proceeding to change the pattern of arterial blood flow.

The sham surgery did not require general anaesthesia and involved relatively superficial wounds – and offered a research technique that did not need to cause suffering to, and the sacrifice of, perfectly healthy dogs. So, that's all ethical then?

The first hand research reports at least give a different impression of the balance of costs and potential benefits to stakeholders than I had originally drawn from Lauren Slater's account.

Getting consent for sham surgery

A key requirement for ethical research with human participants is being offered voluntary informed consent. Unlike dogs, humans can assent to research procedures, and it is generally considered that research should not be undertaken without such consent.

Read about voluntary informed consent

Of course, there is nuance and complication. The kind of research where investigators drop large denomination notes to test the honesty of passers by – where the 'participants' are in a public place and will not be identified or identifiable – is not usually seen as needing such consent (which would clearly undermine any possibility of getting authentic results). But is it acceptable to observe people using public toilets without their knowledge and consent (as was described in one published study I used as a teaching example)?

The extent to which a lay person can fully understand the logic and procedures explained to them when seeking consent can vary. The extent to which most participants would need, or even want to, know full details of the study can vary. When children of various ages are are involved, the extent to which consent can be given on their behalf by a parent or teachers raises interesting questions.


"I'm looking for volunteers to have a procedure designed to make it look like you've had surgery"

Image by mohamed_hassan from Pixabay


There is much nuance and many complications – and this is an area researchers needs to give very careful consideration.

  • How many ill patients would volunteer for sham surgery to help someone else's research?
  • Would that answer change, if the procedure being tested would later be offered to them?
  • What about volunteering for a study where you have a 50-50 chance of getting the real surgery or the placebo treatment?

In Cobb's study, the participants had all volunteered – but we might wonder if the extent of the information they were given amounted to what was required for informed consent,

The subjects were informed of the fact that this procedure had not been proved to be of value, and yet many were aware of the enthusiastic report published in the Reader's Digest. The patients were told only that they were participating in an evaluation of this operation; they were not informed of the double-blind nature of the study.

Cobb et al, 1959

So, it seems the patients thought they were having an operation that had been mooted to help angina sufferers – and indeed some of them were, but others just got taken into surgery to get a few wounds that suggested something more substantive had been done.

Was that ethical? (I doubt it would be allowed anywhere today?)

The outcome of these studies was that although the patients getting the ligation surgery did appear to get relief from their angina – so did those just getting the skin incisions. The placebo seemed just as good as the re-plumbing.

In hindsight, does this make the studies more worthwhile and seem more ethical? This research has probably prevented a great many people having an operation to have some of their vascular system blocked when that does not seem to make any difference to angina. Does that advance in medical knowledge justify the deceit involved in leading people to think they would get an experimental surgical treatment when they might just get an experimental control treatment?


Ethical principles and guidelines can helps us judge the merits of study

Coda – what did the middle man have to say?

I wondered how a relatively minor sham procedure under local anaesthetic became characterised as "the patients, whether or not they were getting the procedure had their chest cracked open and their heart lifted out" – a description which gave a vivid impression of a major intervention.


The heart is pretty well integrated into the body – how easy is it to life an intact, fully connected, working heart out of position?

Image by HANSUAN FABREGAS from Pixabay


I wondered to what extent it would even be possible to lift the heart out from the chest whilst it remained connected with the major vessels passing the blood it was pumping, and the nerves supplying it, and the vessels supplying blood to its own muscles (the ones that were considered compromised enough to make the treatment being tested worth considering). Some sources I found on-line referred to the heart being 'lifted' during open-heart procedures to give the surgeon access to specific sites: but that did not mean taking the heart out of the body. Having the heart 'lifted out' seemed more akin to Aztec sacrificial rites than medical treatment.

Although all surgery involves some risk, the actual procedure being investigated seemed of relatively routine nature. I actually attended a 'minor' operation which involved cutting into the chest when my late wife was prepared for kidney dialysis. Usually a site for venal access is prepared in the arm well in advance, but it was decided my wife needed to be put on dialysis urgently. A temporary hole was cut into her neck to allow the surgeon to connect a tube (a central venous catheter) to a vein, and another hole into her chest so that the catheter would exit in her chest, where the tap could be kept sterile, bandaged to the chest. This was clearly not considered a high risk operation (which is not to say I think I could have coped with having this done to me!) as I was asked by the doctors to stay in the room with my wife during the procedure, and I did not need to 'scrub' or 'gown up'.

Bilateral internal mammary artery ligation seemed a procedure on that kind of level, accessing blood vessels through incisions made in the skin. However, if Lauren Slater had read up some of the earlier procedures that did require opening the chest, or if she had read the papers describing how the dogs were investigated to trace blood flow through connected vessels, measure changes in flow, and prepare them for induced heart conditions, I could appreciate the potential for confusion. Yet she did not cite the primary research, but rather Daniel Moerman, an Emeritus Professor of Anthropology at University of Michigan-Dearborn, who has written a book about placebo treatments in medicine.

Moerman does write about the bilateral internal mammary artery ligation, and the two sham surgery studies I found in my search. Moerman describes the operation:

"It was quite simple, and since the arteries were not deep in the body, could be performed under local anaesthetic."

Moerman, 2002

He also refers to the subjective reports on one of the patients assigned to the placebo condition in one of the studies, who claimed to feel much better immediately after the procedure:

"This patient's arteries were not ligated…But he did have two scars on his chest…"

Moerman, 2002

But nobody cracked open his chest, and no one handled his heart.

There are still ethical issues here, but understanding the true (almost superficial) nature of the sham surgery clearly changes the balance of concerns. If there is a moral to this article, it is perhaps the importance of being fully informed before reaching judgement about the ethics of a research study.


Work cited:
  • Blair, C. R., Roth, R. F., & Zintel, H. A. (1960). Measurement of coronary artery blood-flow following experimental ligation of the internal mammary artery. Annals of Surgery, 152(2), 325.
  • Cobb, L. A., Thomas, G. I., Dillard, D. H., Merendino, K. A., & Bruce, R. A. (1959). An evaluation of internal-mammary-artery ligation by a double-blind technic. New England Journal of Medicine, 260(22), 1115-1118.
  • Dimond, E. G., Kittle, C. F., & Crockett, J. E. (1960). Comparison of internal mammary artery ligation and sham operation for angina pectoris. The American Journal of Cardiology, 5(4), 483-486.
  • Glover, R. P., Davila, J. C., Kyle, R. H., Beard, J. C., Trout, R. G., & Kitchell, J. R. (1957). Ligation of the internal mammary arteries as a means of increasing blood supply to the myocardium. Journal of Thoracic Surgery, 34(5), 661-678. https://doi.org/https://doi.org/10.1016/S0096-5588(20)30315-9
  • Glover, R. P., Kitchell, J. R., Kyle, R. H., Davila, J. C., & Trout, R. G. (1958). Experiences with Myocardial Revascularization By Division of the Internal Mammary Arteries. Diseases of the Chest, 33(6), 637-657. https://doi.org/https://doi.org/10.1378/chest.33.6.637
  • Moerman, D. E. (2002). Meaning, Medicine, and the "Placebo Effect". Cambridge University Press Cambridge.
  • Slater, Lauren (2018) The Drugs that Changed our Minds. The history of psychiatry in ten treatments. London. Simon & Schuster
  • Taber, K. S. (2019). Experimental research into teaching innovations: responding to methodological and ethical challengesStudies in Science Education, 55(1), 69-119. doi:10.1080/03057267.2019.1658058 [Download this paper.]


Note:

1 To find out if the ligation procedure protected a dog required stressing the blood supply to the heart itself,

"An attempt has been made to evaluate the degree of protection preliminary ligation of the internal mammary artery may afford the experimental animal when subjected to the production of sudden, acute myocardial infarction by ligation of the anterior descending coronary artery at its origin. …

It was hoped that survival in the control group would approximate 30 per cent so that infarct size could be compared with that of the "protected" group of animals. The "protected" group of dogs were treated in the same manner but in these the internal mammary arteries were ligated immediately before, at 24 hours, and at 48 hours before ligation of the anterior descending coronary.

In 14 control dogs, the anterior descending coronary artery with the aforementioned branch to the anterolateral aspect of the left ventricle was ligated. Nine of these animals went into ventricular fibrillation and died within 5 to 20 minutes. Attempts to resuscitate them by defibrillation and massage were to no avail. Four others died within 24 hours. One dog lived 2 weeks and died in pulmonary edema."

Glover, Davila, Kyle, Beard, Trout & Kitchell, 1957

Pulmonary oedema involves fluid build up in the lungs that restricts gaseous exchange and prevents effective breathing. The dog that survived longest (if it was kept conscious) will have experienced death as if by slow suffocation or drowning.

Out of the womb of darkness

Medical ethics in 20th Century movies


Keith S. Taber


The hero of the film, Dr Holden, is presented as a scientist. Here he is trying to collect some data.
(still from 'The Night of the Demon')

"The Night of the Demon" is a 1957 British film about an American professor who visits England to investigate a supposed satanic cult. It was just shown on English television. It was considered as a horror film at the time of its release, although the short scenes that actually feature a (supposedly real? merely imagined? *) monster are laughable today (think Star Trek's Gorn in the original series, and consider if it is believable as anything other than an actor wearing a lizard suit – and you get the level of horror involved). [*Apparently the director, Jacques Tourneur, never intended a demon to be shown, but the film's producer decided to add footage showing the monster in the opening scenes, potentially undermining the whole point of the film: but giving the publicity department something they could work with. 6]


A real scary demon (in 1959) and a convincing alien (in 1967)?
(stills from 'The Night of the Demon' and ' Star Trek' episode 'Arena')
[Move the slider to see more of each image.]

The film's protagonist is a psychologist, Dr. John Holden, who dismisses stories of demons and witchcraft and the like, and has made a career studying people's beliefs about such superstitions. Dr Holden's visit to Britain deliberately coincided with a conference at which he was to present, as well as coincidentally with the death of one of his colleagues (who had been subject to a hex for investigating the cult).


'Night of the Demon' (Dir.  Jacques Tourneur) movie poster: Sabre Film Production.
[As was common at the time, although the film was in monochrome, the publicity was coloured. Whether the colour painting of the monster looks even less scary than the version in the film itself is a moot point.]

The film works much better as a kind of psychological thriller examining the power of beliefs, than as horror. (Director: 1 – Producer, 0.) That, if we believe something enough, it can have real effects is well acknowledged – but this does not need a supernatural explanation. People can be 'scared' to death by what they imagine, and how they respond to their fears. Researchers expecting a positive outcome from their research are likely to inadvertently behave in ways that leads to this very result: thus the use of double blind studies in medical trials, so that the researchers do not know which patients are receiving which treatment.

Read about expectancy effects in research

While the modern viewer will find little of suspense in the film, I did metaphorically at least 'recoil with shock' from one moment of 'horror'. At the conference a patient (Rand Hobart) is wheeled in on a trolley – someone suspected of having committed a murder associated with the cult, whom the authorities had allowed to be questioned by the researchers…at the conference.


"The authorities have lent me this suspected murderer for the benefit of dramatic effect and for plot development purposes"
(still from 'The Night of the Demon').

A variety of movie posters were produced for the film 6 – arguably this one reflects the genuinely horrific aspect of ther story. To a modern viewer this might also appear the most honest representation of the film as the demon given prominence in some versions of the poster barely features in the film.

Holden's British colleague, Professor O'Brien, explains to the delegates,

"For a period of time this man has been as you see him here. He fails to respond to any normal stimulation. His experience, whatever it was, which we hope here to discover, has left him in a state of absolute catatonic immobility. When I first investigated this case, the problem of how to hypnotise an unresponsive person was the major one. Now the proceedings may be somewhat dramatic, but they are necessary. The only way of bringing his mind out of the womb of darkness into which it has retreated to protect itself, is by therapeutic shock, electrical or chemical. For our purposes we are today using pentothal [? 1] and later methylamphetamine."

Introducing a demonstration of non-consensual use of drugs on a prisoner/patient

"Okay, we'll give him a barbiturate, then we'll hypnotise him, then a stimulant, and if that does not kill him, surely he will simply, calmly and rationally, tell us what so traumatised him that he has completely withdrawn into his subconscious."
(Still from 'The Night of the Demon')


After an injection, Hobart comes out of his catatonic state, becomes aware of his surroundings, and panics.

The dignity of the accused: Hobart is forced out of his 'state of absolute catatonic immobility' to discover he is an exhibit at a scientific conference.
(Still from 'The Night of the Demon'.)

He is physically restrained, and examined by Holden (supposedly the 'hero' of the piece), who then hypnotises him.



He is then given an injection of methylamphetamine before being questioned by O'Brien and Holden. He becomes agitated (what, after being forcibly given 'speed'?), breaks free, and leaps, out of a conveniently placed window, to his death.

Now, of course, this is all just fiction – a story. No one is really drugged, and Hobart is played by an' actor who is unharmed. (I can be fairly sure of that as the part was played by Brian Wilde who much later turned up alive and well as prison officer 'Mr Barrowclough' in BBC's Ronnie Barker vehicle 'Porridge'.)


The magic of the movies – people do not stay dead, and there are no professional misconduct charges brought against our hero.
(Stills from 'The Night of the Demon' and from BBC series 'Porridge'.3 )
[Move the slider to see more of each image.]

Yet this is not some fantastical film (the Gorn's distant cousin aside) but played for realism. Would a psychiatric patient and murder suspect have been released to be paraded and demonstrated at a conference on the paranormal in 1957? I expect not. Would the presenters have been allowed to drug Hobart without his consent?

Read about voluntary, informed, consent

An adult cannot normally be medicated without their consent unless they are considered to lack the ability to make responsible decisions for themselves. Today, it might be possible to give a patient drugs without consent if they have been sectioned under the Mental Health Act (1983) and it was considered the action was necessary for their safety or for the safety of others. Hobart was certainly not an immediate threat to anyone before he was brought out of his trance.

However, even if this enforced use of drugs was sanctioned, this would not be done in a public place with dozens of onlookers. 4 And it would not be done (in the U.K. at least!) simply to question someone about a crime.5 Presumably, the makers of the film either thought that this scene reflected something quite reasonable, or, at least, that the cinema-going public would find this sufficiently feasible to suspend disbelief. If this fictitious episode did not reflect acceptable ethical standards at the time, it would seem to tell us something about public perceptions of the attitude of those in authority (whether the actual authorities who were meant to have a duty of care to a person under arrest, or those designated with professional roles and academic titles) to human rights.

Today, however, professionals such as researchers, doctors, and even teachers, are prepared for their work with a strong emphasis on professional ethics. In medical care, the interest of the patient themselves comes first. In research, informants are voluntary participants in our studies, who offer us the gift of data, and are not subjects of our enquiries to be treated simply as available material for our work.

Yet, actually, this is largely a modern perspective that has developed in recent decades, and sadly there are many real stories, even in living memory, of professionals deciding that people (and this usually meant people with less standing or power in their society) should be drugged, or shocked, or operated on, without their consent and even against their explicit wishes; for what is seen as their own, or even what is judged as some greater, good; in circumstances where it would be totally unacceptable in most countries these days.

So, although this is not really a horror film by today's measures, I hope any other researchers (or medical practitioners) who were watching the film shared my own reaction to this scene: 'no, they cannot do that!'

At least, they could not do that today.

Read about research ethics


Notes

1 This sounds to me like 'pentatyl', but I could not find any reference to a therapeutic drug of that name. Fentanyl is a powerful anti-pain drug, which like amphetamines is abused for recreational use – but was only introduced into practice the year after the film was made. It was most likely referring to sodium thiopental, known as pentothal, and much used (in movies and television, at least) as a truth serum. 5 As it is a barbiturate, and so is used in anaesthesia, it does not seem an obvious drug of choice to wake someone from a catatonic state.


2 The script is based loosely on a 1911 M. R. James short story, 'Casting the Runes' that does not include the episode discussed.


3 I have flipped this image (as can be seen form the newspaper) to put Wilde (playing alongside Ronnie Barker, standing, and Richard Beckinsale), on the right hand side of picture.


4 Which is not to claim that such a public demonstration would have been unlikely at another time and place. Execution was still used in the U.K. until 1964 (during my lifetime), although by that time being found guilty of vagrancy (being unemployed and hanging around {unfortunate pun unintended}) for the second time was no longer a capital offence. However, after 1868 executions were no longer carried out in public.

It was not unknown for the corpses of executed criminals to be subject to public dissection in Renaissance [sic, ironically] Europe.


5 Fiction, of course, has myriad scenes where 'truth drugs' are used to obtain secrets from prisoners – but usually those carrying out the torture are the 'bad guys', either criminals or agents of what is represented in the story as an enemy or dystopian state.


6 Some variations on a theme. (For some reason, for its slightly cut U.S. release 'The Night of the Demon' was called 'The Curse of the Demon'.) The various representations of the demon and the prominence given to it seem odd to a modern viewer given how little the demon actually features in the film.

The references to actually seeing demons and monsters from hell on the screen, "the most terrifying story ever told", and "scenes of terror never before imagined" raises the question of whether the copywriters were expected to watch a film before producing their copy.