GeoIP for Maps


To start of the year 2021, let’s go on with a visualization, using my “Home Lab”-generated data…

I might have mentioned the idea of creating “maps” in the past. Bringing that down to earth, what will usually apply for us (in the IT Security field), i.e. a typical use-case for this is “geo-IP”.

In other words, we might find it useful to map an IP address (public IP address, that is) to a physical location (i.e. country, city…).

We will get to the following result today:

Note: All the code below is offered on my GitHub account if you want to look at it in one go…

Filtering out private ranges

We’ve already seen that we can identify private IP ranges. In summary, we don’t want to locate those on a map (or we can set their location manually). So we want to filter those out before we go any further. Let’s test that, using the iptools package.

# This one will be useful almost anytime:
private_ip_ranges <- c("10.0.0.0/8", "172.16.0.0/12",
                       "192.168.0.0/16", “169.254.1.0/24")

test_ips <- c("10.0.0.1", "41.242.140.10")

test_subnet_masks <- c("10.0.0.0/8", "41.242.140.0/31")
is_ipv4("10.0.0.1000")
ip_in_range(test_ips[1], test_subnet_masks[1])
ip_in_any(test_ips[1], private_ip_ranges)
ip_in_any(test_ips[2], private_ip_ranges)
ip_in_any(test_ips, private_ip_ranges)

Tying an IP to a geographical location

We’ll focus on gathering the country tied to an IP address (we COULD do cities, but I don’t really care for that so I’ll skip it).

So what we need is a relation of IP addresses to Countries. That kind of data can evolve over time (public addresses are “bought” and “sold”). So in order to have a reasonable idea we probably need a third-party service to provide an updated copy of IP-to-Country data.

One of the most common such service is MaxMind GeoIP database. As I do this as a hobby, I probably don’t care for the highest possible accuracy, and I’ll want to keep it cheap. So I’ll choose the “GeoLite2” option here.

For more information, see: https://dev.maxmind.com/geoip/geoip2/geolite2/

First, you need to register for an account. This has changed (you didn’t need it in the past, but this is now a requirement).

I’ll just go on and register. Once done (it takes 5 minutes), I can go and download the GeoLite2 Country database. I’ll choose the CSV format (for ease of use, initially that is). The GeoLite2 database is updated weekly, on Tuesdays (if I’m not mistaken). But we’ll get to that later.

So I’ve got the IP to Country data I was looking for. Let’s have a look at the data format (the biggest file is the most interesting one): 

Well, that’s good to know: As you might understand, MaxMind won’t be providing the data per-IP but rather per IP range. We’ve seen how to deal with that in the past, so we should be able to match an IP address to its corresponding IP range from those available.

But then we’ll also need to map the registered_country_geoname_id to a country code (more on that later, as we could use the Country code to generate the visual maps). I don’t really care for other languages (I believe mixing languages, when you most probably need English when you work in IT anyway, is a loss of time and focus), and so I can use the “en” version. 

Right, so we now have the country_iso_code we could use later too (not for today). Let’s write some basic code here to make use of all that:

# Next, we will want geo data for IP ranges, as exported from MaxMind GeoLite2:
geo_ip_data <- read.csv("/mnt/R/Demos/maxmind_extract/GeoLite2-Country-Blocks-IPv4.csv")
geo_ip_context <- read.csv("/mnt/R/Demos/maxmind_extract/GeoLite2-Country-Locations-en.csv")
geo_ip_data <- merge(geo_ip_data[, c("network", "geoname_id")], 
                     geo_ip_context[, c("geoname_id", "country_iso_code", "country_name")], 
                     all.x = TRUE)
# We'll come back to this later

Testing it all

I’m going to just get a list of IP addresses for testing… But WAIT! That’s PRECISELY why I have my infamous “Home Lab” server… I have shown in a past entry that I have launched an fprobe daemon so that I capture traffic (on my server’s wifi interface). With nfdump, I can export that to csv:

nfdump -R /var/cache/nfdump -o csv

And so I did just that (I’m taking a very small extract for the sake of the example, though).

# This one will be useful almost anytime:
private_ip_ranges <- c("10.0.0.0/8", "172.16.0.0/12",
                       "192.168.0.0/16", “169.254.1.0/24")

demo_nf <- read.csv("/mnt/R/Demos/server_data/nfdump_data/nfdump_example_20201205_morning.csv")
demo_nf <- demo_nf[,c(1,3,4,5,6,7,8)]

all_ips_seen <- c(demo_nf$sa, demo_nf$da)
# Keep only valid IP v4: (package iptools::is_ipv4)
all_ips_seen <- all_ips_seen[!is.na(is_ipv4(all_ips_seen))]
# For geo_location, we will focus on public IPs...:
all_ips_seen <- all_ips_seen[!ip_in_any(all_ips_seen, 
                                        private_ip_ranges)]

# For later "merge()":
short_ip_df <- as.data.frame(table(all_ips_seen))
short_ip_df[] <- lapply(short_ip_df, as.character)

Working with geo data

There are several packages for maps creation in R. ggplot2 already includes some relevant maps’ data, so let’s go for that:

map_data_world <- map_data("world")
# maps::iso3166 contains the relevant data for countries:
# Not used, but could be useful to bypass mismatching countries names by using
# iso country names instead.
#context_countries <- iso3166

# Now map_data_world has many point marking the limits of each country...
# For later use, let's calculate the Countries "centers", i.e. the means of
# latitude and longitude (albeit a simplified approach):
countries_centers <- map_data_world %>%
  group_by(region) %>%
  summarise(mean_long = mean(long), mean_lat = mean(lat)) %>%
  select(region, mean_long, mean_lat) %>%
  distinct() %>%
  mutate(mean_long = ifelse(region == "USA", -98.5, mean_long),
         mean_lat = ifelse(region == "USA", 39.5, mean_lat)
  ) # Because of Alaska mainly, USA shows mis-aligned, hence manual fix

Mixing it all

geo_ip_data <- cbind(geo_ip_data,
                     range_boundaries(geo_ip_data$network)[,c("min_numeric", 
                                                              "max_numeric")])
short_ip_df$numeric_ip <- ip_to_numeric(short_ip_df$all_ips_seen)
country_code_for_IP <- function(t_ip_num) {
  country <- geo_ip_data[(t_ip_num >= geo_ip_data$min_numeric) & 
                           (t_ip_num <= geo_ip_data$max_numeric),
              "country_name"]
  data.frame(numeric_ip = t_ip_num, 
    country = ifelse(!is.null(country), country, ""))
}

ip_country <- melt(lapply(short_ip_df$numeric_ip, country_code_for_IP)) %>%
  # lapply returns a list of two variables, melt makes it into a data.frame
  filter(!is.na(country)) %>%
  select(value, country) # to remove unnecessary columns added by melt.

Preparing for Map creation

At this stage, we’re almost ready. We can just put together the necessary data for easier use with ggplot later on:

# Let's prepare for the graph now:
for_graph <- merge(short_ip_df, ip_country,
                   by.x = "numeric_ip", by.y = "value",
                   all.x = TRUE)
for_graph <- for_graph[!is.na(for_graph$country),]
for_graph$Freq <- as.numeric(for_graph$Freq)

for_graph_short <- for_graph %>% 
  group_by(country) %>%
  summarise(n = sum(Freq)) %>%
  mutate(
    country = ifelse(country == "United States", "USA", country),
    country = ifelse(country == "Antigua and Barbuda", "Antigua", country)
  ) # Those two at least wouldn't match with the map_data_world otherwise

for_graph_short <- merge(for_graph_short,
                  countries_centers, # Created earlier
                  by.x = "country",
                  by.y = "region",
                  all.x = TRUE)

Creating the maps

The easiest part (maybe, although I had to brush up again on this, as there are different approaches, raster, etc.) This one approach suits my need and uses little more than GGPLOT, which I prefer over other alternatives.

# That's the basis for our "painting".
world <- ggplot() +
  borders("world", colour = "gray85", fill = "gray80") +
  theme_map() 

# Let's now add some "dots" showing the number of connections to/from IPs seen
# in our Netflow data:
map <- world +
  geom_point(aes(x = mean_long, y = mean_lat, size = n),
             data = for_graph_short, 
             colour = 'red', alpha = .5) +
  ggtitle("Netflow's observed Geo IPs by number of connections")

map

That gets us to the map shown at the beginning of the Post.

Conclusions

That’s it for today. But there is plenty more to come (on the very same subject). Stay tuned!

Some References

The source code on my GitHub account:

https://github.com/kaizen-R/R/blob/master/Sample/Visualizations/maps/geoIP_demo_v001.R

MaxMind: https://dev.maxmind.com/geoip/geoip2/geolite2/

Not all of the rest were all “used”, but a few of interesting approaches to maps:

https://www.r-bloggers.com/2017/02/how-to-make-a-global-map-in-r-step-by-step/

 https://www.datanovia.com/en/blog/how-to-create-a-map-using-ggplot2/