Functional Programming tests on domain names


Intro

This week’s post is a bit short but trying to point towards some great coding concepts about R.

In summary, let’s just say that in R, everything is an object. So are functions. And one can have a function return an object, including… A function.

Why is that cool? Because it might just help clean up the code and reduce errors.

Context data for the example

OK so as you know, on my “Home server” lab, I have been gathering DNS logs (from the dnsmasq client), used by all my connections coming from my WIFI AP.

For the time being, I’ll focus on DNS Queries only (not the responses), and maybe even only a subset of them.

All the code is here on my GitHub account.

To the interesting part

But today is not about that. Let’s consider the “functional programming” approach to talk about a possibly better way of doing things (loops, etc.). We have FQDN data in our data frame, under “dns_logs$domain”. We can split that into a vector of sublevel components.

Then we can reconstruct the domains up to a certain level of interest, thereby allowing us to focus on subdomains we might want to look into (who knows?).

# Next, let's look at the domains:
# First level (TLD) are not too interesting, so let's focus on second and third level:
# Let's first get all the levels, separated by "."
dns_logs$domain_levels <- strsplit(dns_logs$domain, "\\.")

# Now let's get "functional":
# For more on this, see: http://adv-r.had.co.nz/Functional-programming.html
sublevel_n <- function(n) {
  function(x) {
    x_l <- length(x)
      ifelse(x_l >= n,
        paste(x[(x_l-n+1):x_l], collapse = "."),
        NA
      )
    }
}

first_level <- sublevel_n(1)
sapply(dns_logs$domain_levels, first_level)

second_level <- sublevel_n(2)
sapply(dns_logs$domain_levels, second_level)

So what happened there? Well we’ve done just what we said in the Intro.

So yes, there are ways of doing this differently. One can use loops, or a function with two parameters for instance, sure. I’m not saying this is necessarily the best approach every time…

But now we have “named” functions that are representative and similar to one another but using a same construct. It also help to use the apply() family of functions, for vectorised operations, as in the above, where each “domain_levels” is actually a vector of strings.

Conclusions

There is MUCH more to this, but I’m still playing with it and really I thought this granted an entry as it seems there is lots of potential to this approach of coding, in terms, if nothing else, of code “clarity”.

References

As often mentioned, A. Wickham’s “Advanced R” book is maybe one of the best references out there for anyone trying to improve their “R Kung-Fu” (or that’s what I’ve learned by experience over time).

The pertinent chapter for today’s Post can be found here: http://adv-r.had.co.nz/Functional-programming.html