## “Social Distancing Works”

Nice work by Tamino, including showing when a log plot is appropriate and when it is not. His post, reblogged:

The death toll from Coronavirus in the U.S.A. stands at 4,059, and more alarming is the fact that yesterday brought nearly a thousand deaths in a single day. The numbers keep rising.

America has confirmed 188,639 cases (many more unconfirmed), more than any other country in the world (although Italy leads in fatalities with 12,428). The total number of cases in the U.S. shows a very unfortunate and frankly, scary trend: exponential growth.

View original post 539 more words

## New COVID-19 incidence in the United States as AR(1) processes

There are several sources of information regarding Covid-19 incidence now available. This post uses data from a single source: the COVID Tracking Project. In particular I restrict attention to cumulative daily case counts for the United States, the UK, and the states of New York, Massachusetts, and Connecticut. The United States data are available here. The data for the United Kingdom are available here. I’m only considering reports through the 26th of March.

Please note little of these models can be used to properly inform projections, and nothing here should be interpreted as medical advice for your personal care. Consult your physician for that. This is a scholarly investigation.

Beginning, the table of counts of positive tests for coronavirus in the United States is:

date positive as fraction of positive
and negative tests
20200326 80735 0.1555
20200325 63928 0.1517
20200324 51954 0.1507
20200323 42152 0.1508
20200322 31879 0.1415
20200321 23197 0.1295
20200320 17033 0.1260
20200319 11719 0.1162
20200318 7730 0.1045
20200317 5723 0.1073
20200316 4019 0.1002
20200315 3173 0.1233
20200314 2450 0.1253
20200313 1922 0.1237
20200312 1315 0.1406
20200311 1053 0.1478
20200310 778 0.1697
20200309 584 0.1478
20200308 417 0.1515
20200307 341 0.1586
20200306 223 0.1243
20200305 176 0.1559
20200304 118 0.1363

Now, the increase in positive tests is driven by numbers of infections, but it is also heavily influenced by amount of testing. The same source, however, offers the cumulative number of negative tests and, so, I have expressed the positive test count as a fraction of the number of positive tests plus the number of negative tests in the rightmost column.

If the increase in the number of positive tests is related to the increase in the prevalence of the virus, and since the infection diffuses through the population, then the increase ought to be related to the number of positive tests. As noted, the increase in the number of positive tests could also be related to the administration of additional tests, so, with only positive tests data, these are confounded. However, since the cumulative number of tests administered is available, we can see how strongly the increase in the number of tests determines the number of positives, rather than an expansion in the number of cases.

Letting $y_{t}$ denote the cumulative count of number of positive cases on day $t$, and $x_{t}$ the cumulative count of number of positive and number of negative tests, I’m interested in

$\delta y_{t} \sim a_{y} y_{t-1} + a_{x} x_{t-1} + \eta$

the difference relationship. In another equivalent expression,

$y_{t} - y_{t-1} = a_{y} y_{t-1} + a_{x} x_{t-1} + \eta$

where $\eta$ denotes integral count noise.

This amounts to a linear regression on two covariates, with the resulting $a_{y}$ and $a_{x}$ indicating how strongly the increase in positive test counts is determined by the corresponding covariate. The left term is an AR(1) model. (See also.) Using R‘s lm function results in:

> fit.usa.pn summary(fit.usa.pn)

Call:
lm(formula = D.usa ~ Q.usa[2:23] + PN.usa[2:23])

Residuals:
Min 1Q Median 3Q Max
-1411.26552 -491.51999 -172.32025 341.17292 1652.94005

Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) 717.8679340137 301.0981976893 2.38417 0.027702 *
Q.usa[2:23] 0.1981967029 0.0087058298 22.76597 2.986e-15 ***
PN.usa[2:23] -0.0025874532 0.0016414650 -1.57631 0.131460
---
Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1

Residual standard error: 821.74248 on 19 degrees of freedom
Multiple R-squared: 0.97426944, Adjusted R-squared: 0.97156096
F-statistic: 359.71083 on 2 and 19 DF, p-value: 7.9298893e-16

Doing a model without an intercept improves matters negligibly

> fit.usa.noint summary(fit.usa.noint)

Call:
lm(formula = D.usa ~ Q.usa[1:22] + PN.usa[1:22] + 0)

Residuals:
Min 1Q Median 3Q Max
-1950.894882 -61.091628 8.041637 584.900171 2464.061157

Coefficients:
Estimate Std. Error t value Pr(>|t|)
Q.usa[1:22] 0.26801699520 0.01121341634 23.90146 3.5088e-16 ***
PN.usa[1:22] 0.00018947235 0.00132300092 0.14321 0.88755
---
Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1

Residual standard error: 1158.4248 on 20 degrees of freedom
Multiple R-squared: 0.96619952, Adjusted R-squared: 0.96281947
F-statistic: 285.8538 on 2 and 20 DF, p-value: 1.946384e-15

Note that the dependence upon the total number of tests is weak. If the data of increases in positive tests is plotted against cumulative number of positive tests the previous day and the intercept-free line is superimposed:

###### (Click on figure to see larger image.)

Below is the same analysis applied to New York State:

> fit.ny.noint summary(fit.ny.noint)

Call:
lm(formula = D.ny ~ Q.ny[1:22] + PN.ny[1:22] + 0)

Residuals:
Min 1Q Median 3Q Max
-1208.249960 -17.977446 3.489550 450.488558 2247.967816

Coefficients:
Estimate Std. Error t value Pr(>|t|)
Q.ny[1:22] 0.24758294065 0.01993668614 12.41846 7.4014e-11 ***
PN.ny[1:22] 0.00027030184 0.00452203747 0.05977 0.95293
---
Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1

Residual standard error: 988.67859 on 20 degrees of freedom
Multiple R-squared: 0.88521556, Adjusted R-squared: 0.87373712
F-statistic: 77.119823 on 2 and 20 DF, p-value: 3.9703623e-10

Below is the same analysis applied to Massachusetts:

> fit.ma.noint summary(fit.ma.noint)

Call:
lm(formula = D.ma ~ Q.ma[1:22] + PN.ma[1:22] + 0)

Residuals:
Min 1Q Median 3Q Max
-24.4349493 -7.8075546 -0.8168569 8.3116535 42.0791554

Coefficients:
Estimate Std. Error t value Pr(>|t|)
Q.ma[1:22] 0.17585256284 0.04856188611 3.62121 0.0035068 **
PN.ma[1:22] 0.00032421632 0.00052756343 0.61455 0.5503242
---
Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1

Residual standard error: 18.978648 on 12 degrees of freedom
(8 observations deleted due to missingness)
Multiple R-squared: 0.54502431, Adjusted R-squared: 0.46919503
F-statistic: 7.1875178 on 2 and 12 DF, p-value: 0.0088701128

Below is the same analysis applied to Connecticut:

> fit.ct.noint summary(fit.ct.noint)

Call:
lm(formula = D.ct ~ Q.ct[1:19] + PN.ct[1:19] + 0)

Residuals:
Min 1Q Median 3Q Max
-117.0797164 -1.3840086 1.2688491 11.6339214 127.2275604

Coefficients:
Estimate Std. Error t value Pr(>|t|)
Q.ct[1:19] 0.29036730663 0.04573094922 6.34947 7.2613e-06 ***
PN.ct[1:19] 0.00027743517 0.00461747631 0.06008 0.95279
---
Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1

Residual standard error: 55.280014 on 17 degrees of freedom
Multiple R-squared: 0.70352934, Adjusted R-squared: 0.66865044
F-statistic: 20.170628 on 2 and 17 DF, p-value: 3.2497097e-05

And, finally, below is the same analysis applied to the United Kingdom:

> fit.uk.noint summary(fit.uk.noint)

Call:
lm(formula = D.uk ~ Q.uk[1:29] + 0)

Residuals:
Min 1Q Median 3Q Max
-423.525900 -8.490398 5.891371 37.652952 352.355823

Coefficients:
Estimate Std. Error t value Pr(>|t|)
Q.uk[1:29] 0.2174876924 0.0074761256 29.09096 < 2.22e-16 ***
---
Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1

Residual standard error: 156.16753 on 28 degrees of freedom
Multiple R-squared: 0.9679738, Adjusted R-squared: 0.96683
F-statistic: 846.28412 on 1 and 28 DF, p-value: < 2.22045e-16

Recall that the UK data comes from a different source and they do not have available the total numbers of tests performed. Consequently, I did not check to see if the dependency of increases in positive tests was weak in their case. Also, the UK’s testing procedure and its biochemistry is probably different than that in the United States, although there is no assurance tests performed from state to state are strictly exchangeable.

The plot for the UK is:

Summarizing the no intercept results:

Country/State $a_{y}$ standard error
in $a_{y}$
adjusted $R^{2}$
United States 0.268 0.011 0.963
New York 0.247 0.020 0.0.874
Massachusetts 0.176 0.048 0.469
Connecticut 0.290 0.045 0.669
United Kingdom 0.217 0.007 0.967

As noted above the United Kingdom results are not strictly comparable for the reasons given. The conclusion is that the AR(1) term dominates and at least for country levels appears predictive. The low $R^{2}$ values for Massachusetts and Connecticut may be because of low numbers of counts, or relative youth of the epidemic there. Thus, my interpretation is that increase in positive case count is driven by the disease, not because testing has accelerated. This disagrees with an implication I made in an earlier blog post.

This work was inspired in part by the article,

D. Benvenuto, M.Giovanetti, L. Vassallo, S.Angeletti, M. Ciccozzi, “Application of the ARIMA model on the COVID-2019 epidemic dataset“, Data in brief, 29 (2020), 105340.

#### Update, 2020-03-29, 00:24 EDT

Other recent work with R regarding the COVID-19 pandemic:

## What happens when time sampling density of a series matches its growth

This is the newly updated map of COVID-19 cases in the United States, updated, presumably, because of the new emphasis upon testing:

How do we know this is the recent of recent testing? Look at the map of active cases:

To the degree numbers of active cases fall down on top of cumulative cases means these are recent detections.

In other words, while concerns about importing COVID-19 cases from Europe are of some concern, the virus is here, the disease is here, and a typical person in the United States is much more likely to contract the disease from a fellow infected American who has not travelled than a European person (note there are no prohibitions against Americans coming home) coming here.

The lesson is that if a process has a certain rate of growth, and the sampling density in time isn’t keeping up with that growth, it is inevitable there will be extreme underreporting.

I would like to understand if that suppression of testing was deliberate or not. Between the present administration’s classifications of COVID-19-related information under National Security acts and the documented suppression of information on federal Web sites relating to climate change, I would be highly suspicious that such suppression, which would put 45 in an unpopular light, was accidental.

##### (update, 2020-03-15, 2113 EDT)

In the above, note that once a sampling density in time is increased to match growth of the counts, then it will appear as if the rate of growth of cases is extraordinary. That is false, of course, but it is a consequence of the failure to have an adequate sampling density (in time) in the first case.

MSRI talk:

## Primary producers

These are from NASA’s Aqua-MODIS, meaning, Aqua satellite, MODIS instrument:

##### (h/t Earth Observatory at NASA)

See my related blog post. And, note, it’s all about the phenology.

## R ecosystem package coronavirus

Dr Rami Krispin of the Johns Hopkins University Center for Systems Science and Engineering (JHU CCSE) has just released the R package coronavirus, which “provides a daily summary of the Coronavirus (COVID-19) cases by state/province“, caused by 2019-nCoV.

###### (update 2020-03-12 1337 EDT)

I have noticed that the data in the R package cited above seriously lags the data directly available at the Github site. So if you want current data, go to the latter source.

There is also an open source article in Lancet about the capability.

### Estimating mortality rate

##### (update 2020-03-08, 15:57 EDT)

There’s a lot of good quantitative epidemiological work being done in and around the 2019-nCoV outbreak. I was drawn to this report by Professor Andrew Gelman, one of the co-authors of STAN, but the primary work was done by Dr Julien Riou and colleagues in a pre-print paper at medR$\chi$iv.

There’ll probably be a good deal more once people have some time to get into case data. I’ll keep this post updated with interesting ones I find.

##### (update 2020-03-12, 11:37 EDT)

New compilation site by Avi Schiffmann, a budding data scientist.

##### (update 2020-03-12-13:26 EDT)

The Johns Hopkins data are also accessible from their at their Github site. These are updated every five minutes. And it is cloned in many places.

There is alternative source available and online, and having data last dated from 11 March 2020, yesterday, from many contributors (list available online). This source’s map has much finer spatial resolution.

##### (update 2020-03-13, 11:49 EDT)

The New England Journal of Medicine has a full page with links about and concerning the Coronavirus (SARS-CoV-2) causing Covid-19, and it is fully open access.

Science magazine this week features four SARS-CoV-2-related articles:

##### (update 2020-03-13, 12:41 EDT)

Posted in data presentation, data science, epidemiology | 1 Comment

## Curiosity`s recent view of Mars

“NASA Curiosity Project Scientist Ashwin Vasavada guides this tour of the rover’s view of the Martian surface.”

With a little imagination, feels like a de-vegetated version of the Northern Coastal Ranges of California, looking inland.

# via Code for causal inference: Interested in astronomical applications

## Reanalysis of business visits from deployments of a mobile phone app

This reports a reanalysis of data from the deployment of a mobile phone app, as reported in:

M. Yauck, L.-P. Rivest, G. Rothman, “Capture-recapture methods for data on the activation of applications on mobile phones“, Journal of the American Statistical Association, 2019, 114:525, 105-114, DOI: 10.1080/01621459.2018.1469991.

The article is as linked. There is supplementary information and most datasets are freely available.

The data set analyzed in the paper was provided by Ninth Decimals, 625 Ellis St., Ste. 301, Mountain View, CA 94043, a marketing platform using location data, as indicated in the documentation of the original paper.

This work is concerned with the analysis of marketing data on the activation of applications (apps) on mobile devices. Each application has a hashed identification number that is specific to the device on which it has been installed. This number can be registered by a platform at each activation of the application. Activations on the same device are linked together using the identification number. By focusing on activations that took place at a business location, one can create a capture-recapture dataset about devices, that is, users, that “visited” the business: the units are owners of mobile devices and the capture occasions are time intervals such as days. A unit is captured when she activates an application, provided that this activation is recorded by the platform providing the data. Statistical capture-recapture techniques can be applied to the app data to estimate the total number of users that visited the business over a time period, thereby providing an indirect estimate of foot traffic. This article argues that the robust design, a method for dealing with a nested mark-recapture experiment, can be used in this context. A new algorithm for estimating the parameters of a robust design with a fairly large number of capture occasions and a simple parametric bootstrap variance estimator are proposed. Moreover, new estimation methods and new theoretical results are introduced for a wider application of the robust design. This is used to analyze a dataset about the mobile devices that visited the auto-dealerships of a major auto brand in a U.S. metropolitan area over a period of 1 year and a half. Supplementary materials for this article, including a standardized description of the materials available for reproducing the work, are available as an online supplement.

The paper applies mark-recapture methods in a digital context, estimating the market size for a particular mobile phone application week by week, and offering a new means for doing so based upon estimating equations. The application to mobile phones was to estimate

My interest was to use the freely available data to check their overall estimates, but also to use it as the keystone of semi-mathematical tutorial introducing mark-recapture methods, and illustrating modern non-parametric varieties of mark-recapture inference. In particular, the tutorial highlights little known or at least infrequently cited work by R. Tanaka:

• R. Tanaka, “Estimation of Vole and Mouse Populations on Mt Ishizuchi and on the Uplands of Southern Shikoku“, Journal of Mammalogy, November 1951, 32(4), 450-458.
• R. Tanaka, “Safe approach to natural home ranges, as applied to the solution of edge effect subjects, using capture-recapture data in vole populations”, Proceedings of the 6th Vertebrate Pest Conference, 1974, Vertebrate Pest Conference Proceedings Collection, University of Nebraska, Lincoln.
• R. Tanaka, “Controversial problems in advanced research on estimating population densities of small rodents”, Researches on Population Ecology, 1980, 22, 1-67.

Seber cites Tanaka in:

G. A. F. Seber, The Estimation of Animal Abundance and Related Parameters, 2nd edition, 1982, Griffin, London, page 145.

The number of first encounters of phone apps, of which there are a total of 9316 distinct ones, looks like:

Only 1654 are observed two or more times during the 77 week experiment. For the total population seen, the minimum number
seen per week was 50 and the maximum was 509. The profile of the number of first observed apps, that first observation constituting a “marking”, looks like:

The technique by Tanaka reprised in the companion paper and extended there to populations with dramatically varying size and, separately, to populations with dramatically varying capture probabilities, determined there is no basis for appreciable variation in the population size or capture probability during the experiment. Accordingly, its estimate of the total population is done as follows:

Because Yauck, Rivest, and Rothman were principally interested in the marketing-oriented determination of visits per week, they do not offer an estimate of the overall population. So their results are difficult to compare with the result from the Tanaka extension. However, assuming the degree of overlap among visits to the business per week is smaller for a smaller interval, and is also smaller because, by all accounts, the number of deployed apps is small, the authors offer estimate for the first 20 weeks. This subset was also estimated by the Tanaka method, and produced:

A tally of the 20 week estimates from Yauck, Rivest, and Rothman gives the closed population gives a total of 1191, the “robust” model
a total of 1202, and the Jolly-Seber 1132. These compare with Tanaka which gives an estimate of population of size 2141, with a low estimate of 1896, and a high estimate of 2459.

Reasons for discrepancies may vary. For one, Yauck, Rivest, and Rothman are not trying to estimate overall population, or at least they do not report these. Their overall profile of population per week, taken from their paper, is shown below:

But they also do report a seasonal variation in capture probability, as shown in the following chart from their work:

Such a variation could explain a shortfall. It remains a discrepancy, though, why the segmentation of the Tanaka fit doesn’t acknowledge such variations in capture probability.

The implementation of the Tanaka extension is done using R code, with related files being available online. The segmentation is overseen using the facilities of the segmented package for R developed by Professor V. M. R. Muggeo.

If you think you have one or more problems which might benefit from this kind of insight and technique, be sure to reach out. You can find my contact information under the “Above” tab at the top of the blog. Or just leave a comment to this post below. All Comments are moderated and kept private until I review them, so if you’d prefer to keep the reach-out non-public, just say so, and I will.

And stay tuned for other blog posts. After mark-recapture, in the beginning of March May*, I’ll be showing how causal inference and techniques like propensity scoring can be used in scientific research as well as in policy and marketing assessments.

## Another kind of latent data: That encoded in journal and report figures

Many scholars today expect to find data as datasets. When I took some courses in Geology at Binghamton University, specifically in Tectonics and Paleomagnetism, I learned that libraries serves, in many cases, as Geologists’ repositories of data. No, the libraries hadn’t any servers or big RAIDed disks, they had books and journals. Geologists published maps and charts with contour lines, and both synthetic and actual images. But they most often preferred greyscale for the images. At first I did not appreciate why.

I later learned, as a graduate student, this was because the way to get data out of a published paper was to take the graphics in the paper, and digitize them. This is done even today, such as when I took relative visit histograms from Google Maps to inform a study of plastic versus paper bag usage. I have used Engauge Digitizer as my main. I once estimated the volume of Fenway Park in Boston using a plan and profile of it, and using Engauge Digitizer. But I’ve also just used Inkscape, pulling in an image and measuring positions of features on a grid of pixels.

Often the data needs to be checked for smudges and corrections. This is an experiment-within-an-experiment. But it has little difference to when laboratory equipment is used, although there people tend to calibrate and recalibrate.

Accordingly it was refreshing to read Dr El-Ad David Amir’s piece on recovering heat map values from a figure, all the more because he took on the challenge of decoding a false color image, something the geologists eschewed.

Dr Amir also put together a YouTube video of his experience, linked below:

## “Because we need to make the science stick.”

H. Holden Thorp, writing in Science, an excerpt:

The scientific community needs to step out of its labs and support evidence-based decision-making in a much more public way. The good news is that over the past few years, scientists have increasingly engaged with the public and policy-makers on all levels, from participating in local science cafes, to contacting local representatives and protesting in the international March for Science in 2017 and 2018. Science and the American Association for the Advancement of Science (AAAS, the publisher of Science) will continue to advocate for science and its objective application to policy in the United States and around the world, but we too must do more.

Scientists must speak up. In June 2019, Patrick Gonzalez, the principal climate change scientist of the U.S. National Park Service, testified to Congress on the risks of climate change even after he was sent a cease-and-desist letter by the administration (which later agreed that he was free to testify as a private citizen). That’s the kind of gumption that deserves the attention of the greater scientific community. There are many more examples of folks leading federal agencies and working on science throughout the government. When their roles in promoting science to support decision-making are diminished, the scientific community needs to raise its voice in loud objection.

I would add that, from what I have seen, efforts to “remain objective and detached” from the public discourse, even when, objectively, an individual only has the public’s interest at heart, are nearly always met by derision and dismissal by people whose interests are challenged, and, increasingly, in at least the United States, by a public which detests scholarship and expertise. Accordingly, the only path left is speaking out.

And lest readers think this is only directed towards conservatives and Republicans, there are many instances where, say, environmental progressives have departed from evidence-based, scientific considerations and knowledge. Surely not regarding climate change — although the characterization of a cliff edge in 12 years or something is obviously just wrong — but many aspects regarding plastics, potential for afforestation, and on how to implement large scale climate change mitigation and what it will cost.

## Australia: Too little, too late, and what about the future?

Or, in other words, borrowing from a bookstore in Cobargo, New South Wales: “Post-Apocalyptic Fiction has been moved to Current Affairs.”

## Greta Thunberg on the BBC : Her guest edit

Also featuring:

• Svante Thunberg
• Sir David Attenborough
• Mark Carney
• Robert Del Naja
• Maarten Wetselaar

### Synopsis

Svante Thunberg and Greta speaking to Sir David Attenborough for the first time. Also, outgoing Bank of England chief Mark Carney on how the financial sector can tackle climate change, Massive Attack’s Robert Del Naja on reducing the music industry’s carbon blueprint, and Shell’s Maarten Wetselaar on big energy’s environmental impact.

Quoting Ms Thunberg, prompted by interviewer Mishal Husain:

MH: What would you say we should do as individuals? … What should other people do?

GT: Of course, I’m not telling anyone else that they need to stop flying or become vegan. Um, but if I were to give one advice, it would be to read up, to inform yourself about the actual science, about the situation, about what is being done and what is not being done. Because if you understand you will know what you can do yourself, and also, of course, to be an active democratic citizen, because democracy is not only on election day, it’s happening all the time. If people in general decided that this is enough, that would have to make the politicians and the people in power change their policies.

## Why Massachusetts needs the Transportation and Climate Initiative

The Massachusetts Transportation and Climate Initiative (TCI) or something very much like it, perhaps stronger, is needed because of one simple reason.

The false color heatmap below shows the Carbon Dioxide (CO2) emissions from roadways in Southern New England in 2017, based upon data from the NASA ORNL DAAC.

Period. Cannot get to the targets of the Massachusetts Global Warming Solutions Act (GWSA) by decarbonizing electricity production alone, and, with NIMBYism on putting up things like solar farms and land-based wind turbines, even that’s a stretch. Moreover, next will be decarbonizing heating and cooling. Fugitive emissions from natural gas pipelines still have not been addressed.

Vehicle emissions are part of the fossil fuel infrastructure.

## We shouldn’t forget where we are on the course towards climate disruption

We shouldn’t forget where we are on the course towards climate disruption. We shouldn’t forget we’ve already disrupted. Emissions are still increasing. This means it’s getting worse every year. It is not something which is in the future. It’s here now, and it will develop.

Professor Eric Rignot from 2014:

We have yet to apply the brakes.

## More reasons why centralized grids and ISOs/RTOs cannot be trusted, with an afterthought

From Inside Climate News and I’m sure it’ll eventually show up at Legal Planet, where they touched the matter over a year ago:

The new rules, approved by the Federal Energy Regulatory Commission, are designed to counteract state subsidies that support the growth of renewable energy and use of nuclear power. The rules involve what are known as “capacity markets,” where power plants bid to provide electricity to the grid. The change would require higher minimum bids for power plants that receive such subsidies, giving fossil fuel plants an advantage.

The FERC order, passed 2-1, is a response to complaints from operators of coal and natural gas power plants who say that state subsidies have led to unfair competition in the grid region managed by PJM Interconnection.

Richard Glick, the panel’s lone Democrat, cast the dissenting vote and said during the commission meeting that his Republican colleagues were trying to “stunt transition to a clean energy future that states are pursuing and consumers are pursuing.”

In his written dissent, he called the order “illegal, illogical and truly bad public policy.”

Continuing, the ICN report notes:

The Trump administration has taken other high-profile steps to try to boost the coal industry, but many of them are tied up in legal challenges. The new FERC order accomplishes many of the same goals.

But FERC’s action also is likely heading to court, where opponents will argue that the regulator has overstepped its authority and is now dictating state policy.

One issue going forward is that the order has a broad definition of “subsidy,” saying this includes direct or indirect payments, concessions and rebates, among other things. Glick said the definition is so broad that it may end up affecting many more power plants than the other commissioners intended.

In the meantime, PJM has 90 days to say how it will implement the rules, and power plant operators will need to figure out what this means for them.

Such authority would not exist if the grid were not centralized. In particular, if it were instead a loose aggregation of power islands or microgrids which had substantial authority to trade among themselves, political power would not be concentrated in organizations like PJM or, for that matter, ISO-NE or the FERC.

The economic consequences of artificial propping up of coal and natural gas are pretty straightforward: They make utility-scale zero Carbon generation more expensive, disincentivizing utilities from pursuing these options. There are other disincentives being mounted in the form of public pressure against, for example, Warren Buffett’s PacifiCorp electric utility owned by Berkshire Hathaway. There, in Wyoming, PacifiCorp has filed plans to move to wind and solar and shut down coal-fired electricity generators, raising the ire of Wyoming’s pro-coal governor. (Note I originally read this at The Financial Times and would love to link and credit them, but they have a restrictive paywall.) Specifically,

PacifiCorp this year accelerated plans to install wind turbines, solar panels and battery storage, while retiring coal-fired generators in the US west. The announcement was not received well in Wyoming, which mines 40 per cent of US coal.

It is interesting, too, that businessmen as astute as Mr Buffett and PacifiCorp’s CEO, Greg Abel, seem not in the least bit worried about the intermittency which, as some diehard Carbon worshippers who defend utilities claim:

Wind and PV in large amounts are inherently unfit for purpose; they cannot supply energy as needed, nor can they decarbonize even an electric grid by themselves.

completely ignoring the reality of utility scale battery storage.

The effect, of course, will be to raise prices to consumers of electricity, something which, no doubt, as they have in Massachusetts, utilities will claim is the fault of zero Carbon upstarts. Indirectly that’s true, but only because, as with FERC, the fossil fuel worshippers cannot compete and, so, need their price floor increased to make renewables artificially expensive.

Getting generation on your own, if you own a residence and have the means, or building your own microgrid, if you are a major consumer of electricity, such as a manufacturing facility or a university campus, has a marginally higher return as a consequence of being a customer of PJM. I don’t doubt that, as a consequence, other capacity markets will be tempted to set higher floors.

This drives the dance of electricity generation and consumption in the direction of balkanization which I’ve written about previously, and which seems to be the fate of the United States energy grid. In retrospect, how else could it be, with its collective over-optimization of measures of economic growth at the expensive of other risks, those accepted by its embrace of such costs of anarchy? (The price of anarchy has been studied extensively.)

And this is why, in part, Claire and I have configured our home as we have. We are presently participants in the local grid’s marketplace, following a rubric nearly shouted by a roundtable speaker and environmental advocate at a conference I once attended, that “You should not hoard electrons”. But that is truly a value when said grid respects you in turn. If economic or environmental reasons suggest it turns out we’re not respected so much, to the degree that happens we have lots of options to minimize our participation and increase our hoarding. Yes, we are someone limited by the silly bylaws of the Town of Westwood where we live. (See Section 4.3.2.) But technology is flowing ever onwards, and there will be increasingly more options down the river, ranging from ever cheaper battery storage, to dynamic in-home digital management of electricity flows (fans don’t need high quality voltage and power), to the ability to draw power from our Tesla Model 3 back into the home, to ever more efficient solar PV panels.

This is a contest which PJM, carbon worshippers, social capital anarchists, and even FERC will lose, for economic and environmental reasons. PJM may have more coal plants. But to keep electricity inexpensive enough to support their agriculture and manufacturing, those players will either need to move, or they’ll need to microgrid, and the PJM network will have fewer customers over time.

#### Afterthought

There is proper concern regarding the relative disadvantage which people of color and low incomes have with respect to climate impacts and environmental harms. Setting aside scientific exaggerations such as quoted in the Vimeo link there,

namely,

In a recent United Nations report, experts predict only 12 years remain to prevent unimaginable global devastation.

I’m no luckwarmer, but that’s just

## scientifically wrong

But, as I said, setting that aside, much more needs to be done to provide greater equalities and opportunities to reap the benefits of zero Carbon energy sources. Some of these can be had by subsidizing such energy for communities of color and low-income others, as has our Commonwealth, and more can be had by insisting that communities which consume much electricity which is otherwise generated in dirty centralized facilities, such as the generating facilities on the Mystic River, MA, reallocate some of their own public and other lands to the purpose of doing that generation in a clean manner. They otherwise put the burden of dirty impacts upon these disadvantaged communities.

But, in my opinion, the role of relatively wealthier members of our community and region should not be minimized. As noted above, there are economic forces which are trying to reset the competitive landscape, and, being entrenched, vested, and engaging in regulatory capture, these are formidable. So while no one can expect low income people and many of communities of color to fight back, people with means and purpose can do so, and it continues to be important to encourage them. That may or may not mean retaining subsidies. As implied above, abandonment of the grid would be accelerated if subsidies were withdrawn or electricity prices directed at them were increased. (In some utilities, such price increases have even been punitively targeted at solar adopters, for example.) But I think the role should be appreciated and, in particular, it is not constructive to dismiss their and, frankly, our participation as unimportant merely because we can afford it.

## On odds of storms, and extreme precipitation

People talk about “thousand year storms”. Rather than being a storm having a recurrence time of once in a thousand years, these are storms which have a 0.001 chance per year of occurring. Storms aren’t the only weather events of significance which have probabilities of occurrence like these. Consider current precipitation risks for the Town of Westwood, Massachusetts, where I live:

I have highlighted events which have a 0.01 chance per year of occurring, including things like a rainfall of almost an inch in 5 minutes, or 8 inches in a day. Again, the recurrence time is not once in a 100 years. And, note, these are not based upon expected climate change, although there already is some change in the climate baked into these. These are current risks.

So what does 0.01 per year mean? Well, as Radley Horton explains in part below, think of it as rolling a dice (*) having 100 faces, and looking for the event “It rolled the number 10”.

Assuming the rolls are independent, calculating the likelihood of at least once of these events in N years is calculating the upper tail probability of a Binomial Distribution (Lesson 8, page 76).

##### (Figure is from Statistics How To.)

What’s that mean?

It means that, for each successive number of years, the chances of the event happening at least once is as in the following table:

number of years chance of event happening 1 or more times
1 0.010
2 0.020
4 0.039
5 0.049
8 0.077
10 0.096
15 0.140
20 0.182
25 0.222

So by the time 10 years roll by, the 0.01 event has an almost ten times greater chance of happening. If a stormwater management system in the Town of Westwood is effectively destroyed by an 8 inch rain, there’s a 1-in-10 chance of that happening in a 10 year stretch.

As climate chances, extreme precipitation events become more likely. If the 8 inch rain has a 0.01 chance per year now, it will soon have a 1-in-50 chance per year, or 0.02 per year. How does the risk table chance for that?

number of years chance of event happening 1 or more times
1 0.020
2 0.040
4 0.078
5 0.096
8 0.149
10 0.183
15 0.261
20 0.332
25 0.396

Unsurprisingly, that 1-in-10 chance takes 5 years to realize, and in 10 years there’s a slightly less than 1-in-5 chance of it happening. If the stormwater management exceedance costs $10 million to repair, that means, in the first case that there’s an expected cost per year of$100,000, and, in 10 years a million dollars. When climate changes to the 1-in-50, these expected losses double.
In the case of weather events, they may not be entirely independent. Events might “bunch up” due to ENSO or other influences. Similarly a big volcanic explosion can affect global weather for a year or two, and depress probabilities of weather events.

When estimating risks of events like these directly from data on occurrences, it’s important to note that Gaussian approximations to distributions or even Poissons will underestimate risk. What’s needed to be used is a Generalized Extreme Value distribution. Lee Fawcett in his article, “A severe forecast” in the current issue of Significance Magazine (December 2019) explains in greater detail. A good book explaining use of the GEV distribution is:

 E. Castillo, A. S. Hadi, N. Balakrishnan, J. M. Sarabia, Extreme Value and Related Models with Applications in Engineering and Science, Wiley, 2005.

The R statistical programming language facility offers a number of packages for doing inference with this distribution.

## There’s Big Data, Tiny Data, and now Dead Data

You’ve heard of Big Data. You may have heard of Tiny Data. But now, presented in the Harvard Data Science Review, Professor Steve Stigler presents

See:

S. M. Stigler, "Data have a limited shelf life", Harvard Data Science Review, November 2019.

Abstract

Data, unlike some wines, do not improve with age. The contrary view, that data are immortal, a view that may underlie the often-observed tendency to recycle old examples in texts and presentations, is illustrated with three classical examples and rebutted by further examination. Some general lessons for data science are noted, as well as some history of statistical worries about the effect of data selection on induction and related themes in recent histories of science.

Keywords: dead data, zombie data, post-selection inference, history

Of particular historical interest is whether or not modern scholars can ever properly interpret classic experiments, with their defects, like the Millikan oil drop experiment, or Eddington’s measurement of light deflection to confirm General Relativity.

Also of interest is whether enough metadata about old datasets in business, such as insurance or operations, or even scientific observation, is kept to be able to properly reconstruct the provenance.

Hat tip to Professor Christian Robert for pointing out this article at his blog.

## JIGSAW-GEO v1.0

See:

• D. Engwirda, 2017: JIGSAW-GEO (1.0): Locally orthogonal staggered unstructured grid generation for general circulation modelling on the sphere, Geosci. Model Dev., 10, 2117-2140, doi:10.5194/gmd-10-2117-2017

and a general description at NASA. The figure below is copied from there.

## Applause!

Let it be said, apart from his so-called base, 45 is not a popular guy. Even his bud, Boris Johnson, is making moves to avoid his endorsement.

Yeah, that’s a popular, well respected guy.

Isn’t he?

## What you need to do

Yes, I know, this is from Orsted, a public company which, primarily, builds offshore wind farms. And, as a result, you out there (which is, frankly, an infinitesimal fraction of the world, because, basically, no one follows me), will critique me for promoting a specific company.

Bollocks.

Think of it.

Someone has a good idea. They pursue it. They promote it. They find a way of moving it into people’s lives. Great. What do they do? They found a company which has that as its purpose.

But, oh know, say the Environmental Purists, this is now “corporate greed” and we can’t have anything to do with that. It’s not us!

So, given a group of folks, the Environmental Purists, who want to advance a cause, but, then, deny the means to achieving that, they are either masochists, or they eternally want to be guaranteed of an opposition to fight, but they can never dominate and win.

I’m sick of this nonsense, whether it be Sierra Club or Extinction Rebellion. I want answers and programs, and not sham policies which hijack the hugely important issue of climate disruption to achieve long sought social objectives. I do not say the latter aren’t important. I say holding the rest of society in ransom for their objectives is cruel, heartless, uncharitable, and downright stupid.

## A glimpse of Solar Domination

Hat tip to PV Magazine:

### Highlights of Frew, Cole, Denholm, Frazier, Vincent, Margolis

• Load and operating reserves can be met in US grid with up to 55% PV with storage
• Power system must rapidly transition between synchronous and inverter-based generation
• Significant curtailment is seen, with hours of >40% economic curtailment
• Hours with very low energy prices become more frequent, up to 36% of hours

#### Summary

With rapid declines in solar photovoltaic (PV) and energy storage costs, futures with PV penetrations approaching or exceeding 50% of total annual US generation are becoming conceivable. The operational merits of such a national-scale system have not been evaluated sufficiently. Here, we analyze in detail the operational impacts of a future US power system with very high annual levels of PV (>50%) with storage. We show that load and operating reserve requirements can be met for all hours while considering key generator operational constraints. Storage plays an active role in maintaining the balance of supply and demand during sunset hours. Under the highest PV penetration scenario, hours with >90% PV penetration are relatively common, which require rapid transitions between predominately conventional synchronous generation and mostly inverter-based generation. We observe hours with almost 400 GW (over 40%) of economic curtailment and frequent (up to 36%) hours with very low energy prices.

## On why I write this blog

I mused a bit about why I write this blog here.