Category: Basics

  • NLP (2/n): Tokens of text

    A first (basic) look at the collected text: Tokens It’s all great: We’re able to collect all our blog entries, or well, the text thereof. We are even able to crawl a bit faster, provided we have a multi-core setup. (Worst case scenario, we’re limited in the crawling by the speed of our website’s, mostly).…

  • NLP (1/n): Scraping all the Blog articles (the hard way)

    Intro To use Natural Language Processing algorithms, we first need data. We’ve seen last time how to scrape ONE article. And how to get to different pages of the Blog. But for this to be usable in the future, we should be able to download all articles UNTIL there is no more (e.g. we probably…

  • Going SQL: Postgres (3/3) – Now with R

    For the past two Blog entries, we spent time preparing for today: We want to use a Postgres SQL Database backend to work with our JSON data, from R. Before we continue… We’re still missing one piece: the table(s) in the Database. Let’s tackle that very quickly for today, as the focus will be on…