A quick look at memory usage when “copying variables”


This one is not exactly about changing code in practice…

This entry differs a bit in that it is purely “educational”, not an opinion or recommendation or anything to change in the way I program at all, but rather about how the R interpreter works. As always, there are much better references out there (see for instance: http://adv-r.had.co.nz/memory.html).

As I was “studying” some more (I was actually reading up to write something about assignments, “parent scope”, and the `<<-` operator, but then I thought: Let’s check memory and variables and…) , and as a came across the reference above, I concluded: Why not actually test it, instead of reading it (as I would usually do in the past). After all, Memory usage is an important concept in R, as (to keep it simple) everything runs in memory.

So I created a quick script to generate a sample dataset the other day (uploaded recently), and as at one point I “copied a variable” to use it a later point in the same script, I had an ideal candidate for the test. Then I implemented a quick test/demo.

Testing the “Copy On Modify” concept

So I edited the script (modified version here: https://github.com/kaizen-R/R/blob/master/Sample/CMDB_Exercises/cmdb_generator.R), making it a bit less readable, but of better educational value (hopefully).

Essentially, between code lines 26 and 57, I added some lines to demo how “copying a variable into another variable” does NOT, actually, create a copy per-se.

I didn’t go for complex stuff: I created a variable by assigning some values to a name. Then I assigned that variable to another, and while doing it I was thinking: A copy to be used later on. But if you dig a little bit, as the exercise shows, the total memory usage is not double, but is left almost unchanged.

Then for the heck of it I edited contents of the first variable name. Total memory of both at that point still isn’t double the size.

IF – trying to force things – I had set to NULL say the first variable, and then loaded it again from scratch, then yes, I would end up using twice the memory space: One for the copy of the original, one for the “new” variable.

 

And we can check that behaviour by checking memory usage of each of the variables at each step.

The results are shown here:

Conclusions

There are, as always, so many things to go through (invisible returns of functions, for instance, or the Garbage Collection in R, among others) and so this is NOT all you might want to learn about memory usage. But I hope it does demo how “copying” is not as simple as it might seem, and how it is indeed optimized in R 🙂 

 

Also, and while at it, a quick reference for “invisible returns”: https://www.r-bloggers.com/an-r-function-return-and-assignment-puzzle/, in case someone was wondering from the mention in the intro.

References

http://adv-r.had.co.nz/memory.html

https://stackoverflow.com/questions/15759117/what-exactly-is-copy-on-modify-semantics-in-r-and-where-is-the-canonical-source

,