Lies, damned lies, and COVID-19 statistics?

A few days ago WHO reported that the UK had had over 300 000 confirmed cases of COVID-19, but now WHO is reporting the cumulative total is many fewer. How come?

Keith S. Taber

I have been keeping an eye on the way the current pandemic has been developing around the world by looking at the World Health Organisation website (at https://covid19.who.int) which offers regularly updated statistics, globally, regionally, and in those countries with the most cases.

An example of the stats. reported by the WHO (June 23rd 2020)
Note: on this day the UK Prime Minister reported: "In total, 306,210 people have now tested positive for coronavirus" which almost matches the figure shown by WHO (306 214) the next day.

Whilst the information is very interesting (and in view of what it represents, very saddening) there are some strange patterns in the graphs presented – reminding one that measurements can never just be assumed to precise, accurate and reliable. Some of the data looks unlikely to accurate, and in at least one case what is presented is downright impossible.

Questionable stats.

One type of anomaly that stands out is how some countries where the pandemic is active suddenly have a day with no new cases – before the level returning to trend.

This appeared to be the case in both Spain and Italy on 22nd March, and the two months later the same thing happened in Iran. One assumes this has more to do with reporting procedures than blessed days when no one was found to have the infection – although if that was the case should there not be some compensation in the following days (perhaps so in Spain above, but apparently not in Italy, and certainly not in Iran)?

Less easy to explain away is a peak found in the graph for Chile.

Suddenly for one day, 18th June, a much larger number of cases is reported: but then there is an immediate return to the baseline:

How is it possible that suddenly on one day there are seven times as many cases reported – as a blip superimposed on an otherwise fairly flat trend-line? Perhaps there is a rational explanation – but unfortunately the WHO site is rich in stats, but does not seem to offer interpretation or explanations *. Without a rationale, one wonder just how trustworthy the stats actually are.

Obviously false information

Even if there are explanations for some of these odd patterns due to the practicalities of reporting, and the ongoing development of systems of testing and reporting, in different jurisdictions, there is one anomaly that cannot be feasibly explained – where the data is surely, and clearly, wrong.

An example of the stats reported by the WHO (July 6th 2020)

So the graph above shows the nations with the most reported cases as of the last few days. This is a more recent update than the similar image at the top of this page. Yet, the cumulative total of confirmed cases for the United Kingdom in this figures is something like 20 000 cases LESS than the figure quoted in the EARLIER set of graphs. (Note that this has allowed the UK to have lower cumulative totals than either Chile or Peru – which would not have been the case without this reduction in cumulative total.)

The total number of confirmed cases in the UK is now (7th July) LESS that it was a week ago (see above). How come? Well, a close look the graph below explains this. The drop in cumulative numbers is due to the number of new cases that WHO gives as reported on 3rd July, when there were -29 726 new cases. Yes, that's right minus 29 thousand odd cases.

The WHO data show negative cases (-525 new cases) for the UK on May 21st as well, but on the 3rd July the magnitude of the negative number of confirmed cases is over three times as many as the highest daily number of positive new cases on any single day (April 12th, i.e., 8719 new cases).

I can imagine that if it was identified that a previous miscalculation had occurred it might be necessary to revise previous data. But surely an adjustment would be made to the earlier data: not the cumulative total corrected by interjecting a large negative number of cases on some arbitrary date in order to put the total right. [Note: the most recent data I can find on the UK government site cites 309,360 confirmed cases as of 26th June (2020-06-26 COVID-19 Press Conference Slides) so as of yet the UK data does not show the reduction in cumulative total being published by WHO.**]

Yet surely someone at WHO must have spotted that the anomaly is bizarre and brings their reports into question. The negative cases claimed for the UK on that one day are so great that the UK line has since burrowed into the graphics for completely different countries. (See below. On the day the UK graph was located above the graphic for Mexico, the UK line actually went down so far it actually crossed below the line for Mexico.)

Of course, each unit in these figures represents someone, a fellow human somewhere in the world, who has been found to be infected with a very serious, and sometimes fatal, virus. Fixating on the stats can distract from the real human drama that many of these cases represent. Yet, when the data reflect something so important, and when data are so valuable in understanding and responding to the global pandemic, such an obvious flaw in the data is disappointing and worrying.

*I could not find a link to send an email; a tweet did not get a response from @WHO; and an invitation to type my question on the website was met by the site bot with a suggestion to return to the data I was asking about.

** If I subsequently learn of the reason for the report of negative numbers of cases in these statistics, I will post an update here.

Update at 2020-07-12: duplicate testing

As of Saturday 11 July 2020 at 6:20pm
The UK government reports
Total number of lab-confirmed UK cases
288,953
Total number of people who have had a positive test result

So this is less than they were reporting a week earlier, despite their graph (for England, where most cases are because it is the most populous county of the UK) not showing any dip:

However, I did find this explanation:

"The data published on this website are constantly being reviewed and corrected. Cumulative counts can occasionally go down from one day to the next, and on some occasions there have been major revisions that have a significant effect on local, regional, National or UK totals. Data are provided daily from several different electronic data collection systems and these can experience technical issues which can affect daily figures, usually resulting in lower daily counts. The missing data are normally included in the data published the following day.

From 2 July 2020, Pillar 2 data [from "swab testing for the wider population" i.e., than just "for those with a clinical need, and health and care workers"] has been reported separately by all 4 Nations. Pillar 2 data for England has had duplicate tests for the same person removed by PHE [Public Helth England] from 2 July 2020. This means that the cumulative total number of UK lab-confirmed cases is now around 30,000 lower than reported on 1 July 2020."

https://coronavirus.data.gov.uk/about

So that explains the mystery – but duplicate reporting at that level seems extraordinary! It does not support confidence in official statistics. An error of c.10% suggests a systemic flaw in the methodology being used. It also makes one wonder about the accuracy of some of the figures being quoted for elsewhere in the world.

Natural rates of infection and the optimum level of simplification

How much dumbing down is good for our health?

Keith S. Taber

Image by Pete Linforth from Pixabay 

I just heard the UK Prime Minister introduce a public information message about what would be considered when deciding to ease current measures to tackle the COVID-19 pandemic.

Two particular statements in the clip played gained my attention:

  • "All viruses, like normal 'flu, have a rate of infection. Scientists call this R. R is the average number of people one infected person passes the virus onto."
  • "In March, at its peak, R was around 3, which seems to be the natural rate for this virus."

Neither of these statements seemed strictly correct to me.

To make things clearer, let's call a spade, a fork

Surely the rate of infection is the number of people who are infected in a unit time period – say per day. R is something else – the reproduction number. Now, those working in the public understanding of science, just as in science education, have to seek an optimum level of simplification when communicating with non-experts. There is no point using complex language that will be unclear to people, and so likely lead to them disengage with the message. So, simplification may indeed be needed. But not such a degree of simplification that what we say no longer adequately represents the ideas we are trying to convey.

But the term 'reproduction number' is not some really obscure and inaccessible jargon – it uses words that most people are quite familiar and comfortable with.It does not seem any more technical and frightening that the term 'rate of infection'. Now I accept that perhaps the compound phrase 'reproduction number' is itself unfamiliar, whereas 'rate of infection' is more commonly heard. BUT rate of infection already has a meaning, a different one – so is it sensible to confuse matters by defining rate of infection with a new meaning inconsistent with the existing common usage?

This seems an odd way to promote public understanding of science to me. This is a bit like deciding that the term 'electrical field' may seem a bit too technical for an audience, so it will be a good idea to instead start calling it 'gravity' from now on, because people are used to that term. Or thinking that 'water of crystallisation' sounds obscure, so deciding to refer to the copper sulphate crystal incorporating 'ice cubes' when talking to non-experts because they know what ice cubes are (i.e., something other than molecules of water of crystallisation!).

So what is natural about rates of viral infection?

I was not sure precisely what a normal 'flu was (in relation to an abnormal 'flu, presumably?), but was more surprised to the reference to a virus having a natural rate of infection – even if this actually meant a natural reproduction number.

Will this not depend on the conditions in which the virus exists?* R will surely be very different in a population that is sparsely spread with small social group sizes than in a population that is largely living in extended family groups in overcrowded slums – so what is the natural environment for that virus?

We have reduced R by social distancing and increased hygiene measures. Are we to assume what is natural is the work and social (and hygiene) habits of the UK population as it was in February 2020, rather than now? If so, were the social conditions in the UK in 1920 or 1820 'unnatural'? So, I think the reference to the rate (actually R) being 3 is not a natural rate, but the R value contingent on the specific conditions of UK social and economic activity at a particular historical moment. To believe that the way WE live NOW (or, actually, two months ago) reflects what is natural seems a very anthropocentric notion of 'natural'.

The natural state of things (Image by Samuele Schirò from Pixabay )

I guess I am being pedantic (one of the few things I tend to be good at – and we all need to work to our strengths) but it seems to me that if you are going to commission a public health message at a time when the public understanding of science is actually a matter of life and death, then it is worth trying to get the science right.

* This seemed intuitively obvious, but I thought I ought to check. A quick web-search led to lots of different estimates of R (or R0, that is R whilst a population is all susceptible) presented as if there was a single right value (even if we do not know it precisely) that applied across different contexts globally. Hm. So, I was reassured to come across: "Firstly, R0 is not an intrinsic variable of the infectious agent, but it is calculated through at least three parameters: the duration of contagiousness; the likelihood of infection per contact between; and the contact rate, along with economical, social and environmental factors, that may vary among studies aimed to estimate the R0", Viceconte and  Petrosillo, 2020, COVID-19 R0: Magic number or conundrum?, Infectious Disease Reports, 12(1).