There are several sources of information regarding Covid19 incidence now available. This post uses data from a single source: the COVID Tracking Project. In particular I restrict attention to cumulative daily case counts for the United States, the UK, and the states of New York, Massachusetts, and Connecticut. The United States data are available here. The data for the United Kingdom are available here. I’m only considering reports through the 26th of March.
Please note little of these models can be used to properly inform projections, and nothing here should be interpreted as medical advice for your personal care. Consult your physician for that. This is a scholarly investigation.
Beginning, the table of counts of positive tests for coronavirus in the United States is:
date 
positive 
as fraction of positive and negative tests 
20200326 
80735 
0.1555 
20200325 
63928 
0.1517 
20200324 
51954 
0.1507 
20200323 
42152 
0.1508 
20200322 
31879 
0.1415 
20200321 
23197 
0.1295 
20200320 
17033 
0.1260 
20200319 
11719 
0.1162 
20200318 
7730 
0.1045 
20200317 
5723 
0.1073 
20200316 
4019 
0.1002 
20200315 
3173 
0.1233 
20200314 
2450 
0.1253 
20200313 
1922 
0.1237 
20200312 
1315 
0.1406 
20200311 
1053 
0.1478 
20200310 
778 
0.1697 
20200309 
584 
0.1478 
20200308 
417 
0.1515 
20200307 
341 
0.1586 
20200306 
223 
0.1243 
20200305 
176 
0.1559 
20200304 
118 
0.1363 
Now, the increase in positive tests is driven by numbers of infections, but it is also heavily influenced by amount of testing. The same source, however, offers the cumulative number of negative tests and, so, I have expressed the positive test count as a fraction of the number of positive tests plus the number of negative tests in the rightmost column.
If the increase in the number of positive tests is related to the increase in the prevalence of the virus, and since the infection diffuses through the population, then the increase ought to be related to the number of positive tests. As noted, the increase in the number of positive tests could also be related to the administration of additional tests, so, with only positive tests data, these are confounded. However, since the cumulative number of tests administered is available, we can see how strongly the increase in the number of tests determines the number of positives, rather than an expansion in the number of cases.
Letting denote the cumulative count of number of positive cases on day , and the cumulative count of number of positive and number of negative tests, I’m interested in
the difference relationship. In another equivalent expression,
where denotes integral count noise.
This amounts to a linear regression on two covariates, with the resulting and indicating how strongly the increase in positive test counts is determined by the corresponding covariate. The left term is an AR(1) model. (See also.) Using R‘s lm function results in:
> fit.usa.pn summary(fit.usa.pn)
Call:
lm(formula = D.usa ~ Q.usa[2:23] + PN.usa[2:23])
Residuals:
Min 1Q Median 3Q Max
1411.26552 491.51999 172.32025 341.17292 1652.94005
Coefficients:
Estimate Std. Error t value Pr(>t)
(Intercept) 717.8679340137 301.0981976893 2.38417 0.027702 *
Q.usa[2:23] 0.1981967029 0.0087058298 22.76597 2.986e15 ***
PN.usa[2:23] 0.0025874532 0.0016414650 1.57631 0.131460

Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
Residual standard error: 821.74248 on 19 degrees of freedom
Multiple Rsquared: 0.97426944, Adjusted Rsquared: 0.97156096
Fstatistic: 359.71083 on 2 and 19 DF, pvalue: 7.9298893e16
Doing a model without an intercept improves matters negligibly
> fit.usa.noint summary(fit.usa.noint)
Call:
lm(formula = D.usa ~ Q.usa[1:22] + PN.usa[1:22] + 0)
Residuals:
Min 1Q Median 3Q Max
1950.894882 61.091628 8.041637 584.900171 2464.061157
Coefficients:
Estimate Std. Error t value Pr(>t)
Q.usa[1:22] 0.26801699520 0.01121341634 23.90146 3.5088e16 ***
PN.usa[1:22] 0.00018947235 0.00132300092 0.14321 0.88755

Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
Residual standard error: 1158.4248 on 20 degrees of freedom
Multiple Rsquared: 0.96619952, Adjusted Rsquared: 0.96281947
Fstatistic: 285.8538 on 2 and 20 DF, pvalue: 1.946384e15
Note that the dependence upon the total number of tests is weak. If the data of increases in positive tests is plotted against cumulative number of positive tests the previous day and the interceptfree line is superimposed:
(Click on figure to see larger image.)
Below is the same analysis applied to New York State:
> fit.ny.noint summary(fit.ny.noint)
Call:
lm(formula = D.ny ~ Q.ny[1:22] + PN.ny[1:22] + 0)
Residuals:
Min 1Q Median 3Q Max
1208.249960 17.977446 3.489550 450.488558 2247.967816
Coefficients:
Estimate Std. Error t value Pr(>t)
Q.ny[1:22] 0.24758294065 0.01993668614 12.41846 7.4014e11 ***
PN.ny[1:22] 0.00027030184 0.00452203747 0.05977 0.95293

Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
Residual standard error: 988.67859 on 20 degrees of freedom
Multiple Rsquared: 0.88521556, Adjusted Rsquared: 0.87373712
Fstatistic: 77.119823 on 2 and 20 DF, pvalue: 3.9703623e10
Below is the same analysis applied to Massachusetts:
> fit.ma.noint summary(fit.ma.noint)
Call:
lm(formula = D.ma ~ Q.ma[1:22] + PN.ma[1:22] + 0)
Residuals:
Min 1Q Median 3Q Max
24.4349493 7.8075546 0.8168569 8.3116535 42.0791554
Coefficients:
Estimate Std. Error t value Pr(>t)
Q.ma[1:22] 0.17585256284 0.04856188611 3.62121 0.0035068 **
PN.ma[1:22] 0.00032421632 0.00052756343 0.61455 0.5503242

Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
Residual standard error: 18.978648 on 12 degrees of freedom
(8 observations deleted due to missingness)
Multiple Rsquared: 0.54502431, Adjusted Rsquared: 0.46919503
Fstatistic: 7.1875178 on 2 and 12 DF, pvalue: 0.0088701128
Below is the same analysis applied to Connecticut:
> fit.ct.noint summary(fit.ct.noint)
Call:
lm(formula = D.ct ~ Q.ct[1:19] + PN.ct[1:19] + 0)
Residuals:
Min 1Q Median 3Q Max
117.0797164 1.3840086 1.2688491 11.6339214 127.2275604
Coefficients:
Estimate Std. Error t value Pr(>t)
Q.ct[1:19] 0.29036730663 0.04573094922 6.34947 7.2613e06 ***
PN.ct[1:19] 0.00027743517 0.00461747631 0.06008 0.95279

Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
Residual standard error: 55.280014 on 17 degrees of freedom
Multiple Rsquared: 0.70352934, Adjusted Rsquared: 0.66865044
Fstatistic: 20.170628 on 2 and 17 DF, pvalue: 3.2497097e05
And, finally, below is the same analysis applied to the United Kingdom:
> fit.uk.noint summary(fit.uk.noint)
Call:
lm(formula = D.uk ~ Q.uk[1:29] + 0)
Residuals:
Min 1Q Median 3Q Max
423.525900 8.490398 5.891371 37.652952 352.355823
Coefficients:
Estimate Std. Error t value Pr(>t)
Q.uk[1:29] 0.2174876924 0.0074761256 29.09096 < 2.22e16 ***

Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
Residual standard error: 156.16753 on 28 degrees of freedom
Multiple Rsquared: 0.9679738, Adjusted Rsquared: 0.96683
Fstatistic: 846.28412 on 1 and 28 DF, pvalue: < 2.22045e16
Recall that the UK data comes from a different source and they do not have available the total numbers of tests performed. Consequently, I did not check to see if the dependency of increases in positive tests was weak in their case. Also, the UK’s testing procedure and its biochemistry is probably different than that in the United States, although there is no assurance tests performed from state to state are strictly exchangeable.
The plot for the UK is:
Summarizing the no intercept results:
Country/State 

standard error in 
adjusted 
United States 
0.268 
0.011 
0.963 
New York 
0.247 
0.020 
0.0.874 
Massachusetts 
0.176 
0.048 
0.469 
Connecticut 
0.290 
0.045 
0.669 
United Kingdom 
0.217 
0.007 
0.967 
As noted above the United Kingdom results are not strictly comparable for the reasons given. The conclusion is that the AR(1) term dominates and at least for country levels appears predictive. The low values for Massachusetts and Connecticut may be because of low numbers of counts, or relative youth of the epidemic there. Thus, my interpretation is that increase in positive case count is driven by the disease, not because testing has accelerated. This disagrees with an implication I made in an earlier blog post.
This work was inspired in part by the article,
D. Benvenuto, M.Giovanetti, L. Vassallo, S.Angeletti, M. Ciccozzi, “Application of the ARIMA model on the COVID2019 epidemic dataset“, Data in brief, 29 (2020), 105340.
Update, 20200329, 00:24 EDT
Other recent work with R regarding the COVID19 pandemic: