Project Log: Day 25 – Back to work

Alright, so a few days back, I did some numbers and concluded that… This thing as it is is somewhat CPU-intensive.

I could (maybe I should) look into making the core component of the simulation work on GPU, probably. But this is a personal project and instead, I want to scale OUT instead, to… MORE CPUs. It might not be the best choice, but hey…

So I’m right there at the limit of my laptop’s available CPUs (and I would love to buy myself a new MacBook Pro M3 Max… It would even be easier, but it’s a bit above budget, unless I could justify a ROI… Which I can’t really), but I have a good ol’ mini PC “home server” with 4 CPUs more, with little use, 3 of which I can probably spare to run my simulations. That’s that. (The home server, like my personal laptop, uses passive-cooling, so it’s NOT meant for long and hard processing, but for tests, it will do).

And so an idea would be to “productize” the simulator, if I can get my code to run say on Docker containers (it should be feasible). From there, I can then probably “expose” another computing space (i.e. my small home server) to run a few of my simulations in PARALLEL (say for a subset of the parameters), with PlumbeR.

That would be: C++ for speed of one iteration, Parallel computing on available local CPU Cores, AND distributed across more (even remote) CPUs… (And the overhead of network use in this new setup should be very much next to irrelevant). PLUS: in the short term, I can test the setup on my laptop (I’m “remote” for a few days), so that I can do the actual deployment across different machines later.

The good news is: This falls right in the concept of High Performance Computing, and that’s part of the evaluation of the covered topics of my Master, so this is valuable work for my Dissertation… IF I want to make this part of the project…

New Docker Image

Up until a few months back, I was using an RStudio Docker image that was ARM64v8 (e.g. M family of chips) compatible. But because of some limitations (some stuff with Apache Arrow) I fell back onto running RStudio directly from the MacBook. And since then, I haven’t really used my RStudio container at all.

But what’s more, I shouldn’t… Need it, anymore. Not for the Project, that is. My code should run “from script”, no UI needed in theory, I should be fine with just the Linux Shell… And visualizations don’t “cost” almost anything at all, it’s irrelevant (in the grand scheme of processing efforts), I can do that from say my “control center”, the RStudio on my Mac, basically with no impact.

Now this Docker image here gets apparently regular updates: https://hub.docker.com/_/r-base

So I’ll try that.

(… 10′ pass) –> It works, I can run R and install packages (tested ggplot2, curiously the first package that came to mind…) on it. Works on Linux (x86) just fine too (the beauty of Docker containers…). It’s a far cry from all I need, but it’s a promising start… (Running Docker containers – with or without R – on M1 was not that easy three years back or so, so this is progress, thanks to the Community!!)

Exposing Code with PlumbeR: The Plan

So my code runs many iterations, but some of them need NOT happen sequentially (they don’t actually), that is, for any given parameters-set (say: a chosen Beta, a chosen Budget), I can run that wherever I want, in theory.

So what I need is to wrap that inside a PlumbeR call. That means, I will need to pass all relevant parameters to an API, and it should return the results of running that particular subset of simulations (e.g. all generations for all sim-runs for a given set of parameters).

If/when I get there, I can launch Docker containers (provided I make them visible on my home network), and I can send some workload to them. RStudio on my laptop would be where I run a central “control” node, probably? Where I would configure the calls to the other container(s), as well as use a bit of the local computing power (i.e. the Mac CPU Cores).

With that, in theory, the more machines I connect to my Home network, the more simulations I can run in parallel (that is, the more CPU cores I could use), and obviously, the faster I could get to results. That’s a LINEAR computing capacity growth (in essence), of course, for an EXPONENTIAL complexity growth (i.e. related to the order of the underlying graph(s)), so this is not magic either…

But it’s a great progress, say to reduce in half the runtimes (with twice the CPUs, I mean)… And with that, for a fraction of the price of a new MacBook Pro M3 Max – as much as I would obviously love to acquire such a device… -, I could in theory buy myself another 8-16 CPUs cores, maybe in a ~500€ miniPC (or two) format, that I could connect to my home network, and “Voilà!”.

AND YES, I know: I should really just move to the Cloud when I get there, scale up&down on-demand, pay for potentially hundreds of CPUs for only minutes at a time, which is what the concept of this whole post is in fact really about, that you can scale the whole thing nicely…

Maybe I’ll do that someday, but I’m only interested in showing myself I can do it, really, and even if I make it “work” just on my laptop + home server (so with little gain, really)… I will KNOW the whole thing can scale potentially to thousands of CPU cores with little additional effort, which would mean, it would be practical to use on reasonably larger networks.

Conclusions for the day

The new container works just fine (tested on Mac and on a Linux box). Now the whole simulator might require a bit of re-work, to refactor and adapt to present computing capacity over APIs (…), but I have tested all the above concepts at some point in the past, it shouldn’t be unsurmountable… In theory, that is 😀

I don’t yet know whether I’ll do any of the above in the end (it’s not NECESSARY, it’s just a fun challenge).

Meanwhile, I’ll take another slow-progress week…