Updated, 2020-08-31, 13:30 EDT
A bit of an apology … I was working to produce a couple of more blog posts and had hoped to have one out, reporting on the uncertainty clouds for the phase plane plots of COVID-19 deaths I had done earlier, but the Moirai intervened.
My 10 year old HP p6740f Pavilion Windows 7 Home Premium system has gone dark. I won’t recount the travails of the system in its death throes. I worked essentially all the business hours of two weeks to try to bring it back. It was a software death, but it did not directly have anything to do with a virus (hint: it began because the software update of an anti-virus and firewall product went badly awry), and it looked promising for a time. But, then, a side effect of a repair caused some driver complications, and then fixing that caused the system to become unbootable.
There is a replacement on order, but it won’t be here until 23rd September 2020. It’s a custom built Dell Precision 3630 Tower with Intel Core i7-9700K, 32 GB RAM, 1 TB SSD, NVIDIA Quadro RTX4000, 8GB.
I will be running Windows 10 64-bit, and mostly R.
So, apologies for the delay.
Meanwhile, I’m devoting the time to long overdue tooling of textbooks I’ve been wanted to get to do. I may post a write-up about an idea I got during the Data Science for COVID-19 Conference I attended yesterday. I need to develop it a bit first, though.
Update, 2020-08-31, 13:30 EDT
Since the above is a discussion of Which Machine/Which OS, I thought it appropriate to include a response I gave to an R mailing list when someone asked:
I need a new computer. have a friend who is convinced that I have an
aura about me that just kills electronic devices.Does anyone out there have an opinion about Windows vs. Linux?
Here’s what I said. It records my experience.
This ends up being a pretty personal decision, but here's my advice.
I have used Windows of various flavors, and Linux in a couple of versions. I have also used four or five Unixen, in addition to Linux. I've never spent a lot of time using a Mac, although in many instances most of my colleagues at companies have. It's invariably a cubicle-like environment, so when they have problems, you know. I also have a Chromebook, which is what I am using to write this, and while awaiting the arrival of a new Windows 10 system.
I have used R heavily on both Windows and Linux. On Linux I used it on my desktop, and I still use it on various large servers, now via RStudio, before from the shell. In the case of the servers, I don't have to maintain them, although I sometimes need to put up with peculiarities of their being maintained by others. (I rarely have sudo access, and sometimes someone has to install something for me, or help me install an R package, because the configuration of libraries on the server isn't quite what R expects.)
My experience with Linux desktops is that they seem fine initially, but then, inevitably, one day you need to upgrade to the next version of Ubuntu or whatever, and, for me, then the hell begins. In the last two times I did it, even with help of co-workers, it was so problematic, that I turned the desktop in, and stopped using the Linux.
Prior to my last Linux version, I also seemed to need to spend an increasingly large amount of time doing maintanence and moving things around ... I ran out of R library space once and had to move the entire installation elsewhere. I did, but it took literally 2 days to figured it out.
Yes, if Linux runs out of physical store -- a moment which isn't always predictable -- R freezes. Memory is of course an issue with Windows, but it simply does what, in my opinion, any modern system does and pages out to virtual memory, up to some limit of course. (I always begin my Windows R workspaces with 16 GB of RAM, and have expanded to 40 GB at times.) I have just purchased a new Windows 10 system, was going to get 64 GB of RAM, but, for economy, settled on 32 GB. (I'm semi-retired as well.) My practice on the old Windows 7 system (with 16 GB RAM) was that I purchased a 256 GB SSD and put the paging file there. That's not quite as good as RAM, but it's much better than a mechanical magnetic drive. My new Windows 10 has a 1 TB SSD. I may move my old 256 GB SSD over to the new just as a side store, but will need to observe system cooling limits. The new system is an 8 core Intel I7.
Windows updates are a pain, mostly because they almost always involve a reboot. I *loved* using my Windows 7 past end of support because there were no updates. I always found Windows Office programs to be incredibly annoying, tolerating them because if you exchange documents with the rest of the world, some appreciable fraction will be Word and Excel spreadsheets. That said, I got rid of all my official Microsoft Office and moved to Open Office, which is fine. I also primarily use LaTeX and MikTeX for my own documents authored, and often use R to generate tables and other things for including in the LaTeX.
On the other hand, when using Linux, ultimately YOU are responsible for keeping your libraries and everything else updated. When R updates, and new packages need to be updated, too, the update mechanism for Linux is recompiling from source. You sometimes need to do that for Windows, and Rtools gives you the way, but generally packages are in binary form. This means they are independent of the particular configuration of libraries you have on your system. That's great in my opinion. And easy. Occasionally you'll find an R package which is source only and for some reason doesn't work with Rtools. Then you are sometimes out of luck or need to run the source version of the package, if it's supported, which can be slow. Sometimes, but rarely, source versions aren't supported. I have also found in server environments that administrators are sometimes sloppy about keeping their gcc and other things updated. So at times I couldn't compile R packages because the admin on the server had an out-of-date gcc which produced a buggy version.
Whether Linux or Windows, I often use multi-core for the Monte Carlo calculations I run, whether bootstraps, random forests, or MCMC. I have used JAGS quite a lot but I don't believe it supports multi-core (unless something has changed recently). I use MCMCpack and others.
The media support for Windows is much better than Linux. (At least Ubuntu now *has* some.) And it is work to keep Linux meda properly updated. Still, I don't use Windows Media Player, preferring VLC.
And there are a wealth of programs and software available for Windows.
No doubt, you need a good anti-virus and a good firewall. (Heck, I have that on my Google Pixel 2, too.) I'm moving to the McAfee subscription my wife has for other systems in the house.
Note, while R is my primary computational world, by far, I do run Anaconda Python 3 from time to time. It can be useful for preparing data for consumption by R, given raw files, many with glitches and mistakes. But with the data.table package and other packages in R, I'm finding that's less and less true. The biggest headache of Python is that you need to keep its libraries updated. I also have used Python some times just to access MATPLOTLIB. I prefer R, though, because, like MATLAB, its numerics are better than Python's NUMPY and SCIPY.
As I said, I don't know Mac at all well. But I do know that, when Mac released a new version, somehow the colleagues about me would often degenerate into a couple of days of grumbling and meeting with each other about how they got past or around some stumbling point when updating their systems. Otherwise people seem to like them a lot.
I think all operating systems are deals with the Devil. It's what you put up with and deal with.
As you can see, I opted to go the Windows route again, for probably the next 10 years.
YMMV.
@anoilman,
It would be helpful to know what precisely you find worth criticizing. But, as a guess, I suspect the computational stack isn’t impressive.
If so, I might opine that, as I share with my younger data science colleagues who often want to throw hardware and algorithms at mounds of data, the longer I work with statistical problems, the more I find this rage for the large” to be a fad, and unhelpful. When faced with a problem and a large data set, I find it far more important to decide which key data are important to sample from the large, and what key models and inferences need to be calibrated using it. It is also true that the more data is there, the greater the burden of cleaning and normalizing it.
So, for instance, when faced with a 30 GB dataset, I picked out a key 20 MB subset and based inferences upon that. To realize this in a semi-production environment (what ML things are truly “production”?), the sampling and cleaning needs to be deployed, followed by the inference.
Until that can be done, I don’t think the statistician, data scientist, or their team understands the problem well enough to be helpful to their group or employer or project.
This also means, of course, that the computational requirements are that much less.
If the criticism is of R, it’s
capability in the parallel package now allows nearly effortless allocation of pieces over cores. This is in addition to the many pre-compiled packages which allocate to multiple cores upon request and demand, whether data.table, ranger, boot or the several MCMC schemes. Working at the high level, reproducible, open source end prevents a lot of mistakes in contrast with running C++ maven code on big iron.
Dell… Custom… to run R… Don’t make me laugh! 🙂