Kaizen-R

Tag: Running environment

From local Spark to Azure Synapse

Intro In the last entry we laid out an Apache Spark foundation locally to our laptop, using some Docker containers. This time around, let’s look at what a Cloud-based alternative can look like. We’re going to have a look at Azure Synapse. Many things in one service That’s what is advertised. And being cloud based,…

August 27, 2022
Getting into Apache Spark

Intro Most RDBMS’s are just fine. Hadoop does work for Big Data (I used it some years back), although HQL proved a bit slow… And I haven’t “needed” anything to make things faster for now… But for whatever reason, one can’t be “into data science” (or data analysis, or whatever you name it…), without knowing…

August 20, 2022
About efficiency

Intro A few weeks back I needed to test working with the “parquet” file type. It turned out I couldn’t (as much as I tried) get my Docker container’s RStudio to take the “arrow” package and install it (it would always fail, but that’s beyond the point for this particular post). A side note about…

June 18, 2022