Tag: Running environment

  • From local Spark to Azure Synapse

    Intro In the last entry we laid out an Apache Spark foundation locally to our laptop, using some Docker containers. This time around, let’s look at what a Cloud-based alternative can look like. We’re going to have a look at Azure Synapse. Many things in one service That’s what is advertised. And being cloud based,…

  • Getting into Apache Spark

    Intro Most RDBMS’s are just fine. Hadoop does work for Big Data (I used it some years back), although HQL proved a bit slow… And I haven’t “needed” anything to make things faster for now… But for whatever reason, one can’t be “into data science” (or data analysis, or whatever you name it…), without knowing…

  • About efficiency

    Intro A few weeks back I needed to test working with the “parquet” file type. It turned out I couldn’t (as much as I tried) get my Docker container’s RStudio to take the “arrow” package and install it (it would always fail, but that’s beyond the point for this particular post). A side note about…