Can we be sure that fun in the sun alters water chemistry?

Minimalist sampling and experimental variables


Keith S. Taber


Dirty water

I was reading the latest edition of Education in Chemistry and came across an article entitled "Fun in the sun alters water chemistry. How swimming and tubing are linked to concerning rises in water contaminants" (Notman, 2023). This was not an article about teaching, but a report of some recent chemistry research summarised for teachers. [Teaching materials relating to this article can be downloaded from the RSC website.]

I have to admit to not having understood what 'tubing' was (I plead 'age') apart from its everyday sense of referring collectively to tubes, such as those that connect Bunsen burners to gas supplies, and was intrigued by what kinds of tubes were contaminating the water.

The research basically reported on the presence of higher levels of contaminants in the same body of water at Clear Creak, Colorado on a public holiday when many people used the water for recreational pursuits (perhaps even for 'tubing'?) than on a more typical day.

This seems logical enough: more people in the water; more opportunities for various substances to enter the water from them. I have my own special chemical sensor which supports this finding. I go swimming in the local hotel pool, and even though people are supposed to shower before entering the pool: not everyone does (or at least, not effectively). Sometimes one can 'taste' 1 the change when someone gets in the water without washing off perfume or scented soap residue. Indeed, occasionally the water 'tastes' 1 differently after people enter the pool area wearing strong perfume, even if they do not use the pool and come into direct contact with the water!

The scientists reported finding various substances they assumed were being excreted 2 by the people using the water – substances such as antihistamines and cocaine – as well as indicators of various sunscreens and cosmetics. (They also found higher levels of "microbes associated with humans", although this was not reported in Education in Chemistry.)


I'm not sure why I bother having a shower BEFORE I go for a swim in there… (Image by sandid from Pixabay)


It makes sense – but is there a convincing case?

Now this all seems very reasonable, as the results fit into a narrative that seems theoretically feasible: a large number of people entering the fresh water of Clear Creek are likely to pollute it sufficiently (if not to rename it Turbid Creek) for detection with the advanced analytical tools available to the modern chemist (including "an inductively coupled plasma mass spectrometer and a liquid chromatography high resolution mass spectrometer").

However, reading on, I was surprised to learn that the sampling in this study was decidedly dodgy.

"The scientists collected water samples during a busy US public holiday in September 2022 and on a quiet weekday afterwards."

I am not sure how this (natural) experiment would rate as a design for a school science investigation. I would certainly have been very critical if any educational research study I had been asked to evaluate relied on sampling like this. Even if large numbers of samples were taken from various places in the water over an extended period during these two days this procedure has a major flaw. This is because the level of control of other possibly relevant factors is minimal.

Read about control in experimental research

The independent variable is whether the samples were collected on a public holiday when there was much use of the water for leisure, or on a day with much less leisure use. The dependent variables measured were levels of substances in the water that would not be considered part of the pristine natural composition of river water. A reasonable hypothesis is that there would be more contamination when more people were using the water, and that was exactly what was found. But is this enough to draw any strong conclusions?

Considering the counterfactual

A useful test is to ask whether we would have been convinced that people do not contaminate the water had the analysis shown there was no significant difference in water samples on the two days? That is to examine a 'counterfactual' situation (one that is not the case, but might have been).

In this counterfactual scenario, would similar levels of detected contaminants be enough to convince us the hypotheses was misguided – or might we look to see if there was some other factor which might explain this unexpected (given how reasonable the hypothesis seems) result and rescue our hypothesis?

Had pollutant levels been equally high on both days, might we have sought ('ad hoc') to explain that through other factors:

  • Maybe it was sunnier on the second day with high U.V. levels which led to more breakdown of organic debris in the river?
  • Perhaps there was a spill of material up-river 3 which masked any effect of the swimmers (and, er, tubers?)
  • Perhaps rainfall between the two sampling dates had increased the flow of the river and raised its level, washing more material into the water?
  • Perhaps the wind direction was different and material was being blown in from nearby agricultural land on the second day.
  • Perhaps the water temperature was different?
  • Perhaps a local industry owner tends to illegally discharge waste into the river when the plant is operating on normal working days?
  • Perhaps spawning season had just started for some species, or some species was emerging from a larval state on the river bed and disturbing the debris on the bottom?
  • Perhaps passing migratory birds were taking the opportunity to land in the water for some respite, and washing off parasites as well as dust.
  • Perhaps a beaver's dam had burst up stream 3 ?
  • Perhaps (for any panspermia fans among readers) an asteroid covered with organic residues had landed in the river?
  • Or…

But: if we might consider some of those factors to potentially explain a lack of effect we were expecting, then we should equally consider them as possible alternative causes for an effect we predicted.

  • Maybe it was sunnier on the first day with high U.V. levels which led to more breakdown of organic debris in the river?
  • Perhaps a local industry owner tends to illegally discharge waste into the river on public holidays because the work force are off site and there will be no one to report this?
  • … etc.

Lack of control of confounding variables

Now, in environmental research, as in research into teaching, we cannot control conditions in the way we can in a laboratory. We cannot ensure the temperature and wind direction and biota activity in a river is the same. Indeed, one thing about any natural environment that we can be fairly sure of is that biological activity (and so the substances released by such activity) varies seasonally, and according to changing weather conditions, and in different ways for different species.

So, as in educational research, there are often potentially confounding variables which can undermine our experiments:

In quasi-experiments or natural experiments, a more complex design than simply comparing outcome measures is needed. …this means identifying and measuring any relevant variables. …Often…there are other variables which it is recognised could have an effect, other than the dependent variable: 'confounding' variables.

Taber, 2019, p.85 [Download this article]

independent variableclass of day (busy holiday versus quiet working day)
dependent variablesconcentrations of substances and organisms considered to indicate contamination
confounding variablesanything that might feasibly influence the level of concentrations of substances and organisms considered to indicate contamination – other than the class of day
In a controlled experiment any potential confounding variables are held at fixed levels, but in 'natural experiments' this is not possible

Read about confounding variables in research

Sufficient sampling?

The best we can do to mitigate for the lack of control is rigorous sampling. If water samples from a range of days when there was high level of leisure activity, and a range of days when there was low level of leisure activity were compared, this would be more convincing that just one day from each category. Especially so if these were randomly selected days. It is still possible that factors such as wind direction and water temperature could bias findings, but it becomes less likely – and with random sampling of days it is possible to estimate how likely such chance factors are to have an effect. Then we can at least apply models that suggest whether observed differences in outcomes exceed the level likely due to chance effects.

Read about sampling in research

I would like to think that any educational study that had this limitation would be questioned in peer review. The Education in Chemistry article cited the original research, although I could not immediately find this. The work does not seem to have been published in a research journal (at least, not yet) but was presented at a conference, and is discussed in a video published by the American Chemical Society on YouTube.

"With Labor Day approaching, many people are preparing to go tubing and swimming at local streams and rivers. These delightful summertime activities seem innocuous, but do they have an impact on these waterways? Today, scientists report preliminary [sic] results from the first holistic study of this question 4, which shows that recreation can alter the chemical and microbial fingerprint of streams, but the environmental and health ramifications are not yet known."

American Chemical Society Meeting Newsroom, 2023

In the video, Noor Hamdan, of John Hopkins University, reports that "we are thinking of collecting more samples and doing some more statistical analysis to really, really make sure that humans are significantly impacting a stream".

This seems very wise, as it is only too easy to be satisfied with very limited data when it seems to fit with your expectations. Indeed that is one of the everyday ways of thinking that science challenges by requiring more rigorous levels of argument and evidence. In the meantime, Noor Hamdan suggests people using the water should use mineral-based rather than organic-based sunscreens, and she "recommend[s] not peeing in rivers". No, I am fairly sure 'tubing' is not meant as a euphemism for that. 5


Work cited:

Notes:


1 Perhaps more correctly, smell, though it is perceived as tasting – most of the flavour we taste in food is due to volatile substances evaporating in the mouth cavity and diffusing to be detected in the nose lining.


2 The largest organ of excretion for humans is the skin. The main mechanism for excreting the detected contaminating substances into the water (if perhaps not the only pertinent one, according to the researchers) was sweating. Physical exertion (such as swimming) tends to be associated with higher levels of sweating. We do not notice ourselves sweating when the sweat evaporates as fast as it is released – nor, of course, when we are immersed in water.


One of those irregular verbs?

I perspire.

You sweat.

She excretes through her skin

(Image by Sugar from Pixabay)


3 The video suggests that sampling took place both upriver and downriver of the Creek which would offer some level of control for the effect of completely independent influxes into the water – unless they occurred between the sampling points.


4 There seem to be plenty of studies of the effects of water quality on leisure use of waterways: but not on the effects of the recreational use of waterways on their quality.


5 Just in case any readers were also ignorant about this, it apparently refers to using tyre inner tubes (or similar) as floatation devices. This suggests a new line of research. People who float around in inner tubes will tend to sweat less than those actively swimming – but are potentially harmful substances leached from the inner tubes themselves?


Join an email discussion list for those teaching chemistry


Creeping bronzes

Evidence of journalistic creep in 'surprising' Benin bronzes claim


Keith S. Taber


How certain can we be about the origin of metals used in historic artefacts? (Image by Monika from Pixabay)


Science offers reliable knowledge of the natural world – but not absolutely certain knowledge. Conclusions from scientific studies follow from the results, but no research can offer absolutely certain conclusions as there are always provisos.

Read about critical reading of research

Scientists tend to know this, something emphasised for example by Albert Einstein (1940), who described scientific theories (used to interpret research results) as "hypothetical, never completely final, always subject to question and doubt".

When scientists talk to one another within some research programme they may used a shared linguistic code where they can omit the various conditionals ('likely', 'it seems', 'according to our best estimates', 'assuming the underlying theory', 'within experimental error', and the rest) as these are understood, and so may be left unspoken, thus increasing economy of language.

When scientists explain their work to a wider public such conditionals may also be left out to keep the account simple, but really should be mentioned. A particular trope that annoyed me when I was younger was the high frequency of links in science documentaries that told me "this could only mean…" (Taber, 2007) when honest science is always framed more along the lines "this would seem to mean…", "this could possibly mean…", "this suggested the possibility"…

Read about scientific certainty in the media

Journalistic creep

By journalistic creep I mean the tendency for some journalists who act as intermediates between research scientists and the public to keep the story simple by omitting important provisos. Science teachers will appreciate this, as they often have to decide which details can be included in a presentation without loosing or confusing the audience. A useful mantra may be:

Simplification may be necessary – but oversimplification can be misleading

A slightly different type of journalist creep occurs within stories themselves, Sometimes the banner headline and the introduction to a piece report definitive, certain scientific results – but reading on (for those that do!) reveals nuances not acknowledged at the start. Teachers will again appreciate this tactic: offer the overview with the main point, before going back to fill in the more subtle aspects. But then, teachers have (somewhat) more control over whether the audience engages with the full account.

I am not intending to criticise journalists in general here, as scientists themselves have a tendency to do something similar when it comes to finding titles for papers that will attract attention by perhaps suggesting something more certain (or, sometimes, poetic or even controversial) than can be supported by the full report.


An example of a Benin Bronze (a brass artefact from what is now Nigeria) in the British [sic] Museum

(British Museum, CC BY-SA 3.0 https://creativecommons.org/licenses/by-sa/3.0, via Wikimedia Commons)


Where did the Benin bronzes metal come from?

The title of a recent article in the RSC's magazine for teachers, Education in Chemistry, proclaimed a "Surprise origin for Benin bronzes".1 The article started with the claim:

"Geochemists have confirmed that most of the Benin bronzes – sculptured heads, plaques and figurines made by the Edo people in West Africa between the 16th and 19th centuries – are made from brass that originated thousands of miles away in the German Rhineland."

So, this was something that scientists had apparently confirmed as being the case.

Reading on, one finds that

  • it has been "long suspected that metal used for the artworks was melted-down manillas that the Portuguese brought to West Africa"
  • scientists "analysed 67 manillas known to have been used in early Portuguese trade. The manillas were recovered from five shipwrecks in the Atlantic and three land sites in Europe and Africa"
  • they "found strong similarities between the manillas studied and the metal used in more than 700 Benin bronzes with previously published chemical compositions"
  • and "the chemical composition of the copper in the manillas matched copper ores mined in northern Europe"
  • and "suggests that modern-day Germany, specifically the German Rhineland, was the main source of the metal".

So, there is a chain of argument here which seems quite persuasive, but to move from this to it being "confirmed that most of the Benin bronzes…are made from brass that originated …in the German Rhineland" seems an example of journalistic creep.

The reference to "the chemical composition of the copper [sic] in the manillas" is unclear, as according to the original research paper the sample of manilla analysed were:

"chemically different from each other. Although most manillas analysed here …are brasses or leaded brasses, sometimes with small amounts of tin, a few specimens are leaded copper with little or no zinc."

Skowronek, et al., 2023

The key data presented in the paper concerned the ratios of different lead isotopes (205Pb:204Pb; 206Pb:204Pb; 207Pb:204Pb; 208Pb:204Pb {see the reproduced figure below}) in

  • ore from different European locations (according to published sources)
  • sampled Benin bronze (as reported from earlier research), and
  • sampled recovered manillas

and the ratios of different elements (Ni:AS; Sb:As; Bi:As) in previously sampled Benin bronzes and sampled manillas.

The tendency to consider a chain of argument where each link seems reasonably persuasive as supporting fairly certain conclusions is logically flawed (it is like concluding from knowledge that one's chance of dying on any particular day is very low, that one must be immortal) but seems reflected in something I have noticed with some research students: that often their overall confidence in the conclusions of a research paper they have scrutinised is higher than their confidence in some of the distinct component parts of that study.


An example of a student's evaluation of a research study


This is like being told by a mechanic that your cycle brakes have a 20% of failing in the next year; the tyres 30%; the chain 20%; and the frame 10%; and concluding from this that there is only about a 20% chance of having any kind of failure in that time!

A definite identification?

The peer reviewed research paper which reports the study discussed in the Education in Chemistry article informs readers that

"In the current study, documentary sources and geochemical analyses are used to demonstrate that the source of the early Portuguese "tacoais" manillas and, ultimately, the Benin Bronzes was the German Rhineland."

"…this study definitively identifies the Rhineland as the principal source of manillas at the opening of the Portuguese trade…"

Skowronek, et al.,2023

which sounds pretty definitive, but interestingly the study did not rely on chemical analysis alone, but also 'documentary' evidence. In effect, historical evidence provided another link in the argument, by suggesting the range of possible sources of the alloy that should be considered in any chemical comparisons. This assumes there were no mining and smelting operations providing metal for the trade with Africa which have not been well-documented by historians. That seems a reasonable assumption, but adds another proviso to the conclusions.

The researchers reported that

Pre-18th century manillas share strong isotopic similarities with Benin's famous artworks. Trace elements such as antimony, arsenic, nickel and bismuth are not as similar as the lead isotope data…. The greater data derivation suggests that manillas were added to older brass or bronze scrap pieces to produce the Benin works, an idea proposed earlier.

and acknowledges that

Millions of these artifacts were sent to West Africa where they likely provided the major, virtually the only, source of brass for West African casters between the 15th and the 18th centuries, including serving as the principal metal source of the Benin Bronzes. However, the difference in trace elemental patterns between manillas and Benin Bronzes does not allow postulating that they have been the only source.

The figure below is taken from the research report.


Part of Figure 2 from the open access paper (© 2023 Skowronek et al. – distributed under the terms of the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are credited.)

The chart shows results from sampled examples of Benin bronzes (blue circles); compared with the values of the same isotope ratios from different copper ore site (squares) and manillas sampled from different archaeological sties (triangles).


The researchers feel that the pattern of clustering of results (in this, and other similar comparisons between lead isotope ratios) from the Benin bronzes, compared with those from the sampled manillas, and the ore sites, allows them to identify the source of metal re-purposed by the Edo craftspeople to make the bronzes.

It is certainly the case that the blue circles (which refer to the artworks) and the green squares (which refer to copper ore samples from Rhineland) do seem to generally cluster in a similar region of the graph – and that some of the samples taken from the manillas also seem to fit this pattern.

I can see why this might strongly suggest the Rhineland (certainly more so than Wales) as the source of the copper believed to be used in manillas which were traded in Africa and are thought to have been later melted down as part of the composition of alloy used to make the Benin bronzes.

Whether that makes for either

  • definitive identification of the Rhineland as the principal source of manillas (Skowronek paper), or
  • confirmation that most of the Benin bronze are made from brass that originated thousands of miles away in the German Rhineland (EiC)

seems somewhat less certain. Just as scientific claims should be.


A conclusion for science education

It is both human nature, and often good journalistic or pedagogic practice to begin with a clear, uncomplicated statement of what is to be communicated. But we also know that what is heard or read first may be better retained in memory than what follows. It also seems that people in general tend to apply the wrong kind of calculus when there are multiple source of doubt – being more likely to estimate overall doubt as being the mean or modal level of the several discrete sources of doubt, rather than something that accumulates step-on-step.

It seems there is a major issue here for science education in training young people in critically questioning claims, looking for the relevant provisos, and understanding how to integrate levels of doubt (or, similarly, risk) that are distributed over a sequence of phases in a process.


All research conclusions (in any empirical study in any discipline) rely on a network of assumptions and interpretations, any one of which could be a weak link in the chain of logic. This is my take on some of the most critical links and assumptions in the Benin bronzes study. One could easily further complicate this scheme (for example, I have ignored the assumptions about the validity of the techniques and calibration of the instrumentation used to find the isotopic composition of metal samples).


Work cited:

Note:

1 It is not clear to me what the surprise was – but perhaps this is meant to suggest the claim may be surprising to readers of the article. The study discussed was premised on the assumption that the Benin Bronzes were made from metal largely re-purposed from manillas traded from Europe, which had originally been cast in one of the known areas in Europe with metal working traditions. The researchers included the Rhineland as one of the potential regional sites they were considering. So, it was surely a surprise only in a similar sense to rolling a die and it landing on 4, rather than say 2 or 5, would be a surprise.

But then, would you be just as likely to read an article entitled "Benin bronzes found to have anticipated origin"?