Intro
I didn’t know about this one. Then again, I’m not “actually” a programmer, so I guess I have an excuse… Or do I? I do spend lots of time programming in R 🙂
It’s been a while since I last wrote here. Holidays + Master + full time job (and recently, the Cybersecurity world was busy…), all in all, I simply had little spare time. Apologies.
The Kata
So reading LinkedIn posts today, I came across someone mentioning the “FizzBuzz” exercise, with which I wasn’t familiar.
The example there was about code “being easily transferrable” from R to Python and vice-versa.
But then I thought: Yes, it’s good that one can program in different languages. But isn’t it also good to be reasonably good at one of them?
And so I decided I should see whether I could implement a better version of the exercise…
The code
The basic version is something like so:
for(each in 1:100) { if(each %% 3 == 0 & each %% 5 == 0) print("FIZZBUZZ") else if(each %% 3 == 0) print("fizz") else if(each %% 5 == 0) print("buzz") else print(each) }
Right there, the for loop tells me: Go “vectorized”. And so:
# faster option: Base, vectorized: sapply(1:100, function(x) { if(x %% 3 + x %% 5 == 0) return("FIZZBUZZ") # Sum of the modulus == 0 if(x %% 3 == 0) return("fizz") ifelse(x %% 5 == 0, "buzz", x) })
Two advantages here: Less “else”, I feel it’s cleaner somehow. And of course it’s much faster. Detail (for fun), instead of an AND, I used a SUM. Same result and little difference really.
Then, I thought further about it: We have three options that match certain numbers… Couldn’t we go the other way around? Say find where to put each string, instead of on which number to put each string… Let me explain with the code, better:
unlist(lapply(1:100, function(x) { t <- c("FIZZBUZ", "fizz", "buzz")[x %% c(15, 3, 5) == 0][1] if(!is.na(t)) return(t) x }))
I thought this was an original option. (Plus it turns out, it’s faster than the former one!). unlist()+lapply() being slower (but not relevant…) in this case.
Also, a mathematical detail: Using modulus 15 seems about right, mathematically 🙂 I would say: LCM(3, 5)… Anyway, so one operation less, I guess.
Then this next option was fastest so far (but not “elegant”). Please note that my R interpreter is on v4.0 in this particular ARM-compatible RStudio container… R 4.1 would have given me native |> pipe and some other anonymous function call that would look clean (even for the sapply() calls). But that’s what it is. So I fall back to the magrittr library and then can do this: as ifelse() is a vectorized operation itself, well…
1:100 %>% {ifelse(. %% 15 == 0, "FIZZBUZZ", (ifelse(. %% 3 == 0, "fizz", ifelse(. %% 5 == 0, "buzz", .)))) }
Now let’s see if we can be even more… Creative.
What if we played with two dimensions and use row numbers, maybe?
{ t1 <- matrix(1:100, ncol=5) t1[which(row(t1) %% 5 == 0)] <- "buzz" t1 <- as.vector(t1)[1:100] t1 <- matrix(t1, nrow=3) t1[which(row(t1) %% 3 == 0)] <- "fizz" t1 <- as.vector(t1)[1:100] t1[which(1:100 %% 15 == 0)] <- "FIZZBUZZ" t1 }
This last one was (mostly) the fastest option in my tests (although not by much…). It does however throw a warning, as 100 is not a multiple of 3, and forces us to cut the results to the first 100 entries because of it…
The timing of each option, in order of appearance:
min lq mean median uq max neval 4676.668 4833.0840 5214.04485 4994.6670 5213.5635 9304.584 100 137.459 142.7510 172.60865 149.7925 157.4385 2348.293 100 113.501 118.1460 137.29438 121.6670 127.1255 1571.167 100 51.334 54.0215 60.39899 57.6255 64.8760 89.084 100 42.043 46.1465 58.91734 59.2505 70.0840 88.584 100
Conclusion
Even for a very basic program or script, there are probably many alternatives out there to implement it. Depending on how much you’re willing to think about it (considering the effort vs value, I guess)…
EDITO: After the fact…
It only makes sense…
So I just found out, after doing it myself… The exercise is supposed to be Fizz, Buzz or FizzBuzz… Not what I was doing (fizz, buzz, FIZZBUZZ, and with some error somewhere in my code, as cherry on the cake).
This opens of course new possibilities: 15 no longer is only about multiples of 15, but in order multiples of 3 (Fizz) and multiples of 5 (Buzz). Then a “paste()” solution comes to mind.
And then, while looking into this exercise, I found (of course) many MANY sources of references. One solution I didn’t even consider is the dplyr() option “case_when()”. My bad, indeed, it looks cleaner I guess. As is often the case with dplyr.
I’ll put one nice reference with other alternative implementations in the references. If nothing else, it’s clear there are probably even more options out there. To my point, one can think of many ways of doing the same thing.
That, and read the requirements carefully :S
References
A link to my code, on my GitHub account
Looking for alternatives after the fact, I found that Post with nice options