A few days ago, I wrote a post about poltical and demographic associations with changes in COVID-19 rates over all U.S. counties. Today, I’m augmenting that. For here, rather than considering all counties, I limited the study to counties with demonstrably large increases or decreases in COVID-19 prevalence.
Rather than working with 3083 counties, this works with 742 of them. Essentially the cutoff was that the response calculated had to exceed 7.02 in magnitude:
Two regressions were repeated. The first was a standard linear regression of response versus all the remaining predictors. The improvement in compared with the linear regression done with the complete dataset was remarkable: Adjusted was 0.47 rather than 0.16. The second was the random forest regression. Here, too, the improvement in was substantial: 0.71 versus 0.54 previously.
Here’s the result of the linear regression:
And, while the number of variables selected in the importance screening did not change, the ones having a consistent (monotonic) effect did. Those contributing to an increase in COVID-19 became:
Those contributing to a decrease in COVID-19 became:
Note, for contrast, the random forests regression based upon all the counties had those variables contributing to an increase being:
The random forests regression based upon all the counties had these variables:
contributing to a decrease in COVID-19 prevalence.
The current choices make more sense, in addition to being more statistically notable because of the increased . But increase in Republican Senate and House support associated with decrease in COVID-19? Hmmmm.