A few days ago, I wrote a post about poltical and demographic associations with changes in COVID-19 rates over all U.S. counties. Today, I’m augmenting that. For here, rather than considering all counties, I limited the study to counties with demonstrably large increases or decreases in COVID-19 prevalence.
Rather than working with 3083 counties, this works with 742 of them. Essentially the cutoff was that the response calculated had to exceed 7.02 in magnitude:
Two regressions were repeated. The first was a standard linear regression of response versus all the remaining predictors. The improvement in compared with the linear regression done with the complete dataset was remarkable: Adjusted
was 0.47 rather than 0.16. The second was the random forest regression. Here, too, the improvement in
was substantial: 0.71 versus 0.54 previously.
Here’s the result of the linear regression:
And, while the number of variables selected in the importance screening did not change, the ones having a consistent (monotonic) effect did. Those contributing to an increase in COVID-19 became:
- otherpres16
- otherhouse16
- hispanic_pct
- PerCapitaDollars
- PctChgFrom2017
Those contributing to a decrease in COVID-19 became:
- repsen16
- rephouse16
- black_pct
- clf_unemploy_pct
- lesshs_pct
- lesshs_whites_pct
Note, for contrast, the random forests regression based upon all the counties had those variables contributing to an increase being:
- otherpres16
- otherhouse16
- hispanic_pct
- PerCapitaDollars
- PctChgFrom2017
The random forests regression based upon all the counties had these variables:
- demhouse16
- black_pct
- clf_unemploy_pct
- lesshs_pct
- lesshs_whites_pct
- trump.obama.ratio
contributing to a decrease in COVID-19 prevalence.
The current choices make more sense, in addition to being more statistically notable because of the increased . But increase in Republican Senate and House support associated with decrease in COVID-19? Hmmmm.