Loading libraries in R


It should be fairly easy

Where do I load my libraries in my script?

Because well, in R, you CAN load a library when you need it. But after a while, it becomes an issue. And here is my personal recommendation: Don’t.

Why does it matter?

For one, libraries sometimes have overlapping functions. This is quite bothering sometimes when the function you use does not do what you want but something else.

As an example, two very common libraries are “plyr” and “dplyr”. Say you’re using “dplyr” in your script already. Then you look for a way of reading faster some files, you come across “rbind.fill”, which belongs to plyr. You’re in a hurry, so you just do a:

library(plyr)

in the middle of your code right before you need it. The R interpreter, when you run your modified script, actually alerts you about possible issues right there…

Well… If you load them in the incorrect order, you’re in for some surprises. Then you lookup the issue and find the workaround: you can call the function with the corresponding library clearly identified. You can do something like this, for instance:

dplyr::summarise()

If you had loaded plyr first, and then dplyr, because you use different functions of those two packages, you could have saved some time.

Besides the conflicting functions in different packages, there is another issue. When you code a big enough script, you’ll end up creating functions to avoid repeating stuff. Then your functions might prove useful somewhere else and so you move them to another file, so that you can use that in future scripts by simply “sourcing” your own code (there you go: You created your own simple library right there). But what about loading libraries then?

See, your functions will need other packages (probably). You need to have those packages loaded for your functions to work. But you might also need those libraries in your main script.

Do you load them twice, first in the main script, then in your auxiliary script of functions? What if you source a file after loading libraries in your main script, only to overwrite a function unwillingly and break your main script a few lines later?

Which brings me to my recommendations.

How I go about it

I’m not special in that, by the way, it’s a common recommendation:

Load your libraries at the beginning (and be done with it).

You can load them in a certain order, and you don’t have to worry anymore about it afterwards. Plus, it is definitely cleaner. And I am quite certain it is a good practice (I’d have to check).

But because you can source other files into your script, which you would do to keep your code organised a bit, then you need to worry about loading libraries.

What I do is I comment needed libraries at the beginning of my auxiliary file. It’s not a perfect system, but it goes as follows:

Whenever I source one of my files of functions, I can check what library(ies) is needed, and make sure I load that at the beginning of the main script. Libraries only get loaded that way, and should I be aware of some conflict (e.g. plyr vs dplyr) I can check what I’m doing.

To be noted: It is not too often that I find a conflict of functions. You can always use the “::” trick to make sure things work as you want.

One other instance I found of this type of conflict is between RStudio and base R for packages install. After detecting that issue, I now always go like this to install a package:

utils::install.packages()

Conclusion

Yes, yet another entry about “obvious stuff”, something that is not discussed too often when you are learning how to program. After all, things usually work without bothering with ordering your code to the slightest. Which is great for a “quick & dirty” script… But easily makes things worse for bigger programs.

Anyhow, as basic as it may seem, I found it to be important. It won’t necessarily change the functionality or your script. But I for one will find it easier to read.

Also: You will not end up doing something like this (definitely not great, although… it would work):

for (i in 1:100) {
   # load required functions
   library(plyr)
   # rest of the code (...)
}

Yes, I have seen it in real code (not quite like this, but with the same effect…).

 

References

https://github.com/tidyverse/dplyr/issues/29

https://www.r-bloggers.com/r-code-best-practices/

https://style.tidyverse.org/index.html