A clean environment is rather “hard work” to maintain
I always worry that installing stuff on my laptop will creep up my machine. I don’t like re-installing (not if that means using up my own spare time). So I try to install as little as possible (it’s also simply a matter of “reducing the attack surface”, my reasoning being: the less software you run, the less vulnerabilities your machine actually presents).
Anyhow, some years ago, I would run Oracle Virtual Box, on top of my Linux box. Not bad at all back then, and I’d still recommend it (or VMWare, whichever) to run the stuff you need and still be able to keep your Laptop as clean as possible.
Now I know, yes: It’s less “optimum” in terms of resources. Of course. Running a virtualization platform adds up to the pile. That’s definitely a trade-off.
But, the good news is this: If you wanted to update your VM, you could, that wouldn’t break your machine. (Once again: Trade-Off: You had to upgrade the virtualization platform of choice. No arguing about it…).
To the point
So, “along came Docker”. And I only came around to using it by 2016 or 2017. For people in my industry, yes, a bit late, I know.
Whatever the excuse might be, I am now a believer.
And a few months ago, I thought: Let’s move my R environments to Docker. I mean, why not? R is interpreted, “mono-core” (by default), and certainly not the fastest option anyways. RAM might arguably be more important, and a VM would eat up more than Docker. Plus, I wanted to play some more with an actual “Dockerfile”.
On the bad side, desktop environment would be an issue. But I had some experience with that already:
So I installed Docker Desktop and then worked my way to setting up a “working” R Studio Server from a Docker file.
The result (which I shared here) is actually quite slow to actually get to a working setup (and still has some quirks and warnings, I haven’t gone all the way yet), but it is very EASY and REPEATABLE.
In that instance, the first run takes about two hours (on my laptop). This is slow, but that is NOT relevant! See, all you need to use it is run one command line. Then you go have lunch, to the gym, whatever. Or you launch the file at night and wake up with a clean environment.
Why even bother
Well for a few reasons:
- Security: How do you keep up with an up to date environment? You either patch (and then patch some more). Or you reinstall from scratch, and I like that approach better: Nothing left there in obscure places and libraries: Just reinstall and you’re good to go.
- Easiness: Running one command line (or well, one script, to get rid of the earlier copy of your container first), albeit slow in machine time, is very efficient in “my personal time”.
- Because if I come across a new library for my R setup that I know I’m going to use again, I will throw it in and add packages dependencies in a couple of text files. Now that goes a bit against the next bit of recommendation, but it is actually making things faster in the day to day work.
- I force myself to separate my code (on my host machine) from the running environment (in the container). That forces me to be organized.
- Last, but not least: I can do backups of the whole setup copying only code, which is very inexpensive in terms of space (a few KB maybe) when compared to backing up my whole Laptop or a complete VM (yes, you can backup incrementals, but if you lose your laptop, you’ll have to setup from scratch anyways).
Whatever the rationalisation, I went through it mostly for self-education. 🙂
Finally
The whole “Docker Desktop for RStudio” exercise was interesting, and rather easy (I had some basic previous experience). But most importantly, I can now reset my whole programming environment by running a very short script, and I am quite happy to know that at any point in time, I can have an updated version of OS, R version, libraries, etc. to work with. I can eventually switch laptops with limited pain. And my backups are small. The downside (speed) really isn’t that relevant (I can make much more of a difference programming better :S).
But a new question arises: What if a script stops working after I run it but with newer libraries?
That’s something I know I’ll have to work on (among so many other ideas). I will prepare something and share it at some point. For now, all things seem to point toward the R package testthat (which, coincidentally, H. Wickham mentioned recently on Twitter, I knew nothing about it 2 weeks ago :S).
But that will make for another entry.