An incomplete listing of .mil
domains and the code for the scraper used to build the list
There currently isn't a publicly available directory of all the domain names registered under the US military's .mil
top-level domain. Such a directory would be useful for people looking to get an aggregate view of military websites and how they are hosted. For example, Ben Balter has been doing some great work analyzing the official set of .gov domains.
This is by no means an official or a complete list. It is intended to be a first step toward a better understanding of how the military is managing its domain name space and official sites.
You can download this list as a .csv file or view it with github's pretty formatting.
This list is populated by a scraper script, written in python, that crawls some official site listing pages.
To run the script yourself, open a terminal and type:
git clone https://github.com/esonderegger/dotmil-domains.git
cd dotmil-domains
I strongly suggest using virtualenv and virtualenvwrapper for managing python environments. The installation instructions for those tools are done better than I could ever write on their respective pages. Once you have created and activated your virtualenv, type:
python setup.py develop
dotmil-domains
This will install the two dependencies (Requests and Beautiful Soup) into your virtualenv, link the dotmil-domains
command to the __main__.py
script in the dotmil_domains
directory, and then run the command to create a new dotmil-domains.csv
file. The develop
after the python setup.py
means that you can make changes to the __main__.py
script, run dotmil-domains
again and see the new file.
I'd love to have some help with this! Please feel free to create an issue or submit a pull request if you notice something that can be better. Specifically, suggesting additional pages we can scrape and domains that are either not found or have incorrect organization names associated with them would be very helpful.
- Manually add the remaining "Unknown" domains to the
mil.csv
file - Find some more pages to scrape
- Write some comments explaining how the python code is structured
- Add some columns to the csv with whether or not the domain is still active, redirects somewhere else, etc.?