This is a very short entry for now, but I thought I couldn’t skip it.
Bad Libraries
This year in a conference, someone demoed how you could really do some harm if you were able to create libraries and get people to use those in Python. Essentially, “all there is to it” (I’m being simplistic on purpose) is copy a library, and publish one that is very similar in its name, but that does bad things (e.g. send your /etc/passwd file to a public IP address) on top of the rest of the functionality. This year’s conference was about creating package names that had very similar LOOKING characters but with different encodings, and use that to trick programmers.
As R is not as commonly discussed language in security-related conferences, I made a mental note: The same applies to R.
So when I had the chance, I went and looked it up. I won’t be re-writing what I found, as I don’t have much to add here, but I couldn’t responsibly go on without mentioning the concept of “bad packages”.
So in this case, let me put forth a few references first:
https://ropensci.org/blog/2017/07/25/notary/
https://github.com/hrbrmstr/rpwnd
The second resource is actually quite fun, and definitely “to the point” for this post.
Do not go completely paranoid
Using packages is only natural. I already made a point in a past entry, that writing code only on one’s own is probably not a good idea anyway (the more code I write alone, the more vulnerabilities I can introduce…).
And although all of us R “enthusiasts” have direct access to most of the source code of the functions and packages we use (just call it as an object, no “()”), I am ashamed to admit: I rarely check the source code of the functions I use. Much less of course do I actually actively look for vulnerabilities in all that code (whenever I check source code, it is usually simply to try to understand what it does, really).
Now it would be a full time job to check for vulnerabilities all functions of all packages that I use any given day. I won’t say I will get to that any time soon (or maybe: ever).
But maybe there is a balance between sourcing packages from CRAN, sourcing from GitHub directly, and developing your own functions. As always, it’s a trade-off.
Validate user provided data
One last thing. When creating your own code, functions, etc., there is one paramount thing from a security stand-point: Validate inputs.
Now that’s not all there is to it of course, but if there is ONE takeaway for any secure coding course out there, I’d say this is the one. I’ll probably write up about it in a future post (I need to get a bit more acquainted with fuzzr first, and do more research on that subject before I can actually have any real opinion here…), but for now:
If there is any chance someone else might use your code for real world production or on uncontrolled/unknown datasets, then you NEED to validate input.
And yes, I’ll admit, as it is TRUE: I do NOT ALWAYS check input values in my own code. But please understand that this is mostly because:
- I only use my code in controlled/closed environments.
- I (mostly) only use my code for myself, on datasets I have manually checked before.
- Or else, I use code for demo purposes, and it needs to be shorter sometimes precisely for that reason.
- I am aware of the risks and will pounder whether I need to add (several) checks to my own code if I ever suspect I will use it in a production setup.
Conclusions
So all in all:
- Careful with how you write your package names when you call “install.packages()”
- Prefer CRAN packages over GitHub (not my words, btw :))
- If you write a function and suspect someone may make a mistake feeding it data (or do it wrong on purpose), then you need to check the input for validity (correct format, acceptable values, etc…).
Other references: