This repository contains the code needed to reproduce and replicate our results in our IMC 2023 paper.
Our study replicates the methodology of two papers that obtained outstanding results on geolocating IP addresses in terms of coverage and accuracy in nowadays Internet on the largest publicly available measurement platform, RIPE Atlas. These two papers are:
They are called million scale and street level papers throughout this README, as done in our paper.
Our code offers the possibility to:
- reproduce our results using our measurement datasets.
- replicate our methodology with different targets and vantage points. For now, only RIPE Atlas vantage points are supported, but it should not be difficult to adapt the code to handle other vantage points and targets.
Our code performs measurements on RIPE Atlas, so be sure to have an account if you want to replicate our methodology with your own RIPE Atlas measurements.
You can fetch our data our on FTP ftp.iris.dioptra.io that will give you the ClickHouse tables dumped in CSV format.
git clone https://github.com/dioptra-io/geoloc-imc-2023.git
cd geoloc-imc-2023
You can use the script install.sh to:
- Pull the clickhouse docker image.
- Start the clickhouse server.
- Download clickhouse-client binary.
- Install python project using poetry.
- Create all tables and populate the database with our measurements.
source install.sh
If the installation fails, all necessary steps to use the project are described below.
GeoScale uses poetry has dependency manager, install the project using:
poetry shell
poetry lock
poetry install
We use docker to run clickhouse server, by default server is listening on localhost on port 8123 and tcp9000. If you prefer using your own docker configuration, please also modify default.py
# pull the docker image
docker pull clickhouse/clickhouse-server:22.6
# start the server
docker run --rm -d \
-v ./clickhouse_files/data:/var/lib/clickhouse/ \
-v ./clickhouse_files/logs:/var/log/clickhouse-server/ \
-v ./clickhouse_files/users.d:/etc/clickhouse-server/users.d:ro \
-v ./clickhouse_files/init-db.sh:/docker-entrypoint-initdb.d/init-db.sh \
-p 8123:8123 \
-p 9000:9000 \
--ulimit nofile=262144:262144 \
clickhouse/clickhouse-server:22.6
You can either install clickhouse-client or download clikhouse client binary (by default, install.sh download binary file).
curl https://clickhouse.com/ | sh
mv clickhouse ./clickhouse_files/
Finally, create all necessary tables and populate it with our own measurements with:
python scripts/utils/clickhouse_installer.py
Our tool relies on ENV variables for configuring clickhouse or interacting with RIPE Atlas API. An example of necessary ENV variables is given in .env.example. Create your own env file with following values:
RIPE_USERNAME=
RIPE_SECRET_KEY=
# clickhouse settings
CLICKHOUSE_CLIENT=
CLICKHOUSE_HOST=
CLICKHOUSE_DB=
CLICKHOUSE_USER=
CLICKHOUSE_PASSWORD=
The project has been run on:
- CentOS 7.5
- Python 3.9
- Server with 64GB RAM, 32 cores.
We provide python scripts and jupyter notebooks to reproduce the results and the graphs that we got in replicating the million scale and the street level papers.
You can reproduce Million scale results using a jupyter notebook: million_scale.ipynb
Alternatively you can also use the python script in background, as some steps are vey long to execute (several hours):
nohup python analysis/million_scale.py > output.log &
All analysis results can be found in ./analysis/results
No additional steps are necessary to reproduce the street-level experiment.
You can directly use notebooks plot.ipynb and tables.ipynb to produce the figures and tables of our paper.
You can also run your own measurements on custom datasets of targets (anchors) and vantage points (probes).
The jupyter notebook create_dataset will generate:
- the set of probes (used as vantage points)
- the set of anchors (used as targets)
- filter both sets by removing problematic probes (wrongly geolocated for example)
All generated files will be placed in /datasets/user_datasets.
With million_scale_measurements.ipynb, you can select a subset of vantage points and targets and run measurements on RIPE Atlas.
This script will start measurements for:
- towards all targets from all vantage points
- towards 3 responsive addresses for each target from all vantage points
Perform the analysis by using the same step described previously on your own measurements results and datasets by setting the boolean variable repro = True
, at the beginning of million_scale.ipynb (or million_scale.py if you are using the script).
TODO: Street level
@inproceedings{darwich2023replication,
title={Replication: Towards a Publicly Available Internet scale IP Geolocation Dataset},
author={Darwich, Omar and Rimlinger, Hugo and Dreyfus, Milo and Gouel, Matthieu and Vermeulen, Kevin},
booktitle={Proceedings of the 2023 ACM on Internet Measurement Conference},
pages={1--15},
year={2023}
}
This project is the result of a collaboration between the LAAS-CNRS and Sorbonne Université.