Intro
We tested (very quickly) a MongoDB “No SQL” backend in a former post. But many a time, good’ol SQL will be just good enough for a Backend.
In the “Data Science” world, SQL is right up in the top list of key concepts. So let’s make a point to consider it in this Blog. (Also, so-called traditional RDBMS have been adding support for JSON, which seems relevant… We’ll get to that.)
Why Postgres?
Now one might see a pattern out there: Shodan and the PlumbR package both can/do make use of JSON as a data format. Most micro-services do too, nowadays. So an RDBMS that supports JSON natively is probably a better fit than those that don’t.
As it turns out, both MySQL and Postgres seem to support JSON natively now. This is interesting, and for more background, this reference helped me get a better grasp about it:
https://medium.com/praemineo/json-when-sql-met-nosql-8912d5400857
Why did I even think about Postgres? I have NO previous experience with Postgres. A few years back I programmed a bit with MySQL (in a few “LAMP” setups). But among other newsletters that find their way into my inbox, one mentioned a new course on Coursera by “Dr. Chuck”, called “postgresql for everybody”. I didn’t know the apparently-famous “Dr Chuck”, but I decided to audit his courses on the topic (auditing Coursera individual courses is free. As I don’t really care for the “certificate of completion” (most of the times), I do audit Coursera courses now and then).
Plus I needed a refresh on the SQL topic, as it has been a while (back when MySQL wasn’t yet owned by Oracle, for instance).
Here is the link to the course: https://www.coursera.org/specializations/postgresql-for-everybody
Let’s just say, I enjoyed those courses (I finished viewing the first “week” in one go (not including the exercises) before even starting the work day that Friday morning. Yes, I do get up unusually early, compared to the average citizen, I guess).
Anyhow, it seemed like a valid option for an RDBMS. And so I wanted to try it. And forward I went.
Using past-gathered data
So I have recently presented how to go about gathering some data from external and internal things, for instance my Home Lab Server & my website.
The R code would allow to manipulate the data as data.frames, and hence facilitate saving it in CSV format. Which is my default option.
But there comes a time when too much data becomes a bit messy. Organizing files is usually good enough, but when one needs to read multi-Megabytes files each time BEFORE looking up information in there, it slowly becomes a burden. Databases are meant to help with just that, I’d say: Saving data in an organised way, and maybe more importantly: indexed.
First: Setting up the DBMS
As most of the time, I’ll go the Docker-way.
Reminder: Installing stuff on my laptop/server directly is viable, but keeping things simple, with Dockerfiles for example, to reload from scratch with updated software, does seem like a better way to keep things clean. The only “clutter” installed on the machine is then the Docker install itself along with a few text files. At least, you know, that’s my opinion. (All I know is if I loose my laptop, having a few Dockerfiles and some code files (a few KB) will allow me to get back up on my feet within a few hours of processing, and 10 minutes-worth of my actual attention…)
So let’s get a Postgres server running, shall we? So from Docker Hub:
https://hub.docker.com/_/postgres
# sudo docker pull postgres:alpine
A quick reminder, as we will want to be able to connect to that container from another one (our R IDE):
sudo docker network create --driver=bridge --subnet=172.28.0.0/16 --ip-range=172.28.5.0/24 --gateway=172.28.5.254 r_analysis_br0
That will allow us to use the –network modifier so that our containers can communicate using names (otherwise, we’d need to look for the IPs of each container, each time – not ideal).
Finally we should be ready to bring up our new database server in a container:
sudo docker run --name postgres_test --network=r_analysis_br0 -p 5432:5432 -v <your local path here>:/var/lib/postgresql/data -e POSTGRES_PASSWORD=<your password here> postgres:alpine
Conclusions
So now we SHOULD have a Database, listening on the local Docker network on port 5432. But wait… How do I check that?
That’s for an upcoming entry.