There are many ways of presenting analytical summaries of new series data for which the underlying mechanisms are incompletely understood. With respect to series describing the COVID-19 pandemic, Tamino has used piecewise linear models. I have mentioned how I prefered penalized (regression) splines. I intended to illustrate something comparable with what Tamino does (see also), but, then, I thought I could do both that and expand the discussion to include a kind of presentation used in functional data analysis, namely phase plane plots. Be sure to visit that last link for an illustration of what the curves below mean.
Here I’ll look at various series describing regional characteristics of the COVID-19 pandemic. These include counts of deaths, counts of cases, and counts of number of recovered people. My data source is exclusively the Johns Hopkins Coronavirus Resource Center and, in particular, their Github repository. I have also examined a similar repository maintained by The New York Times, but I find it not as well curated, particularly when it admits the most recent data, data which sometimes exhibits wild swings.
That said, and while curation is important, as well as data validation, no organization can do much when the reporting agencies do not provide counts uniformly or backfill counts to their proper dates. Thus, if a large count of deaths from COVID-19 is identified, the proper procedure is to tag each death with an estimate of the day of death, and correct figures for deaths on those days. Instead, some government agencies appear to dump the discovered counts in all on the present day. There is evidence as well that some agencies hold on to counts until they reach some number, and then release the counts all at once. These are important because this sampling or reporting policy will manifest as changes in dynamics of the disease when, in fact, it is nothing of the kind.
I’m not saying this kind of distortion is deliberate, although, in some cases, it could be. (We cannot tell if it’s deliberate evasion or bureaucratic rigidity, or simply they-don’t-care-as-long-as-they-report.) It is indicative of overworked, fatigued demographers, epidemiologists, and public health professionals making a best effort to provide accurate counts. Johns Hopkins makes an effort to try to resolve some of these changes. But it cannot do everything.
This idea of having real time reporting is an essential part of mounting a proper pandemic response and is, at least in the United States, another thing for which we were woefully underprepared. Without it, even the highest government officials are flying low to the ground with a fogged up window, so to speak.
Nevertheless, I have accepted the Johns Hopkins data as they are and analyzed it in the manner described below. I’ll comment about some features the plots show, some of which pertains to hints about how data is collected and reported.
I’m very much in favor of avoiding use of absolute counts. We don’t know how comprehensive those are, so, I prefer to look at rates of change. Professor David Spiegelhalter discusses why rates are a good idea in connection with COVID-19 reporting.
Note that the data used only goes through the 29th of April 2020.
The thing about track counts of death is that, even if they are complete, which in the middle of a crisis they seldom are, these counts don’t tell you much. What do you want to know? Ultimately, you want to know how effective a set of policies are behaving in order to curtail deaths due to COVID-19. So, for instance, if the cumulative counts of deaths for Germany are considered:
this doesn’t really say a lot about what’s driving the shape of the curve. A phase plane plot is constructed by calculating good estimates of the time series first and second (time) derivatives at each point, and then plotting the second derivative against the first. The first derivative is the rate at which people are dying, and the second derivative is the change per unit time in that rate, just like acceleration or deceleration of a car, where the second time derivative being positive means acceleration, and it negative negative means deceleration.
From the perspective of policy, if the rate is appreciably positive, you want policy measures which decelerate, driving that rate down. A win is when both first and second derivatives are near zero. So, for Germany,
An interpretation is that Germany began taking the matter seriously about 25th-28th March, and activated a bunch of measures on the 31st of March, and then has struggled to decelerate rates of death. The loops denote oscillations where a policy is implemented and, for some reason, there’s a reaction, whether in government, or public, which accelerates the rate again. This tug of war has, on average, kept the rate of death about 200 deaths a day, but it’s not going down.
A caution here, however: These dates of action should be interpreted with care. A death occurs about 10-14 days after infection. So, if a policy action is taken, the effects of that action won’t be seen for 2-3 weeks. So, rather than saying Germany took it seriously 25th March, I should have said 3rd of March, and actions were implemented about the 10th of March.
Can it be managed as I say, driving both acceleration and rates to zero? Sure, consider Taiwan:
The disease got a little away from them in early March, and then again but to a smaller degree in early April, but now it’s controlled. Note, however, that the number of cases per day was never permitted to get above twenty.
A Review of Some Countries
So, what about the United States? It’s actually not too bad, except that it’s clear control measures have not been anywhere as stringent as the couple of countries already mentioned. There are no loops:
Sure, it’s good the rate is decelerating, but it’s not decelerating by much: Less than 50 deaths per day, per day. That’s when the rate is just under 2000 deaths per day.
What about the biggest contributor, New York State?
It’s clear New York is really struggling to get control of the pandemic, holding the rate to about 700 deaths a day, but it was accelerating again as of reporting on the 29th of April. Those cycles mean action, however.
What about Sweden? They’ve been touted as not having any lockdowns. I’d say Sweden is in trouble:
But that judgment rests on basically just a couple of datapoints. Perhaps they are askew, and the sharp upwards isn’t real. After all, they seem to have managed to keep the rate of death to about 70 per day on average, with a wide scatter.
The Scene from (Some of) the States
Finally, let’s look at some states in the United States.
Consider Michigan, for instance, scene of much conflict over the lockdown measures their governor has taken:
Michigan has struggled as well, but whatever was done about 10th April or so really slammed the brakes on numbers of deaths.
Florida is interesting because although the number of deaths is (reported as) 50 per day, there is little evidence of acceleration or deceleration.
They have 1200 deaths. An interpretation is that it’s still early in Florida. Another is that all the deaths have not been reported yet. Another interpretation is that for whatever reason people dying of COVID-19 are being given a cause-of-death which is something else. There are a lot of elderly people in Florida. Perhaps a comparison of overall mortality rates there with historical would be advised. Note also that the overall shape of the acceleration vs rate of death in Florida is similar to the United States at large, although the magnitudes are different.
Finally, consider Georgia, another state where policy on managing COVID-19 has been contentious:
Despite the contention, it’s clear Georgia has been doing something effective to contain the disease. The trouble is that the latest data suggests it is beginning to get away from them, although it’s early to say if that will continue, or they will find a way to bring it back.
Update, 3rdMay 2020
Pingback: Phase Plane plots of COVID-19 deaths with uncertainties | 667 per cm
Pingback: Interregnum | hypergeometric
Pingback: So, today, a diversion … | hypergeometric
Thanks for this really compelling take. Have you given any consideration to reporting-interval artifacts, such that an apparent cycling in the second derivative is at least partially owing to an artifact of the reporting cycle (“burst of new cases on Monday” being some kind of example)? I work with climate observations and we have to correct for “took the weekend off” precipitation observers in a subset of our data universe.
Thanks again this was really enlightening.
Trying to filter out reporting artefacts might be done by using a dynamic linear model to estimate the underlying counts, effectively smoothing it better, and then calculating the derivatives on the output of that. The other alternative is to use a better data source.
I may do that after coming up with an uncertainty estimate for the present curves and plotting them.
I’m not sure covidtracking is a “better” data source. It just gets information in a different way. It uses updates from state health departments; Johns Hopkins gets information from local health departments. As an example of differences, New York State gets its information from the Hospital Emergency Data System and from daily calls to hospitals. New York City gets information from positive laboratory tests and the Medical Examiner and the Bureau of Vital Statistics (registered death certificates). As a result, New York State has ~5,000 fewer deaths on record for New York City than the city has. People who didn’t die in the hospital system aren’t getting into the state records very quickly (more than a month, it seems.)
Thanks for the tips on sources!
One thing I can try once I have my uncertainty drapes working for the phase plane trajectories is to calculate them using different data sources for the same region and see how sensitive the results are to data sources.
There seems to be a ~weekly cycle in the phase diagrams. Similar is visible in the raw Johns Hopkins data, which I’ve thought to be some death counting artefact (people dying less during weekends seems weird). This would lend itself to unreasonably wiggly curves and potential overinterpretation – could the loops stem from these sorts of artefacts?
That said, this is still a pretty nice way of looking at the evolution of the pandemic!
Yes, as I noted in the text:
Still, not all the cycles have week-long periods. Consider Taiwan, for instance.
Nice visualization. Over at Tamino’s you mentioned wanting to overlay uncertainty – I’m sure you understand the uncertainty better than I, but I wouldn’t be surprised if some of those loops are entirely within the uncertainty, and thus may not have occurred at all. Finally, I think you’re very right to be suspicious of under-reporting in FL and GA.
Hi Greg. Thanks.
We’ll see. One thing that gives credence to loop existence is that since first and second derivatives are linearly independent, hokiness has to happen to both of them in a coordinated way in order for them to be artificial. So, yes, could be, but the bar for rejection is higher.
great read ! also, good to have started with a solid reference to actual phase plane plots.
This is really great and elucidating!
Manny Barros | firstname.lastname@example.org | +1 (617) 834-1643