Geoscience computing workflow using Python with dataset in online and Google Drive databases.
👇 This is a quick catalogue of the open databases! Click on the associated link of each database to read the further details, including more detailed explanations (step-by-step) on the Python workflow to access the data.
Let's get started!
Open Databases | Owner | Topics | Accessible Contents | How-to-access tutorial |
---|---|---|---|---|
Google Drive Public geoscience Data |
Peter Amstrand |
|
|
Notebook |
SEG Wiki | SEG |
|
|
Notebook |
GDR OpenEI | United States Department of Energy |
Geothermal |
|
Notebook |
KGS Repository | Kansas Geological Survey | Oil and Gas | Archive of well logs from 1990 up to date |
|
MSEEL Repository | Marcellus Shale Energy and Environmental Laboratory |
Unconventional Oil and Gas |
Well datasets in Marcellus Shale |
notebook |
14 Wildcat Wells in the National Petroleum Reserve in Alaska |
Department of the Interior U.S. Geological Survey |
Oil and Gas wildcat survey |
|
|
Oil and Gas Data Files |
Oklahoma Corporation Commission | Oil and Gas | Production Data | |
SPE Data Repository | SPE | Multi-fracture well |
|
at the end of each of the open databases below. This is a Copyright note, meaning that when you use this data for certain purposes, you should cite the dataset to its owner institution. Please review the wiki-page link that associates.
For Python users, we have already been familiar with the Jupyter notebooks on a local computer. While geoscience datasets often comes into a very large size (megabytes, gigabytes), we don't want to populate our computer with large datasets. The right choice, is to access the datasets and compute on cloud.
Yes, Google Colaboratory is a Jupyter notebook in the Google cloud (Note: not a Google Cloud Platform). Why Google Colab? Well, first and foremost, Google Colab is free. Second, we can speed up the download rate of a large dataset.
On top of every how-to-access tutorial notebooks link, there will be a badge like this . Click this badge, and we will be re-directed to Google Colab to run the codes.
This Google Drive database comprises of public geoscience data compiled by @peter Ba. Contents as follows:
- Canning 3D TDQ seismic
- Dutch F3 seismic data
- GEOLINK North Sea wells
- Poseidon seismic
- Core images
- 48 well composite logs
Follow these steps:
- Open this link 👉
- Once you open the drive, the folder
Public geoscience Data
will be stored in yourShared with Me
. - Move the folder
Public geoscience Data
into yourMy Drive
. Just drag and drop. Follow the GIF tutorial below if you feel lost how to do this action 🙂
- Once the
Public geoscience Data
moved to yourMy Drive
, you could then access it. To access the data, we will use Google Colaboratory! Google Colab will be used for the entire geoscience computation in this repo. - A tutorial notebook consists of workflow to access the data (from extracting ZIP or TAR files to viewing simple data e.g. seismic data and well log data) is provided. Visit this notebook
⚠️ Copyright Every dataset that we use must be cited to the institution that owns the dataset. Please refer to this citation wiki-page
The website SEG Wiki, please go through the page to see the lists of the open data. There are almost > 30 major datasets (well logs, 2D and 3D seismic both the original data from major fields and synthetic data, synthetic earth model, gravity and magnetic data, etc) overall, however some data contains unrecognizable formats that could not be processed simply in Python. Here are some of the contents that are accessible so far:
- Poland 2D Vibroseis Line
- Stratton 3D
- Mobil Avo Viking Graben Line 12
Note: These lists will be updated frequently whenever I find certain dataset could be accessed using Python!
First off, it is best if we know any detail about the dataset. So, please visit first the SEG Wiki page, and browse the name of the dataset.
Next, directly visit this tutorial notebook link, click on the badge that appeared on top, and you will be re-directed to Google Colab. We will learn how to use wget
to get the data directly from the open database website, just follow through the instructions!
⚠️ Copyright Every dataset that we use must be cited to the institution that owns the dataset. Please refer to this citation wiki-page
GDR OpenEi is an open geothermal energy data portal provided by the United States Department of Energy (DoE) and developed by the National Renewable Energy Laboratory (NREL). Datasets in from the U.S. FORGE geothermal projects, Enhanced Geothermal System (EGS), and Hydrothermal, in this webpage. Contents as follows:
- Project Utah FORGE (Roosevelt Hot Springs)
- Project HOTSPOT (The Snake River Plain, SRP)
First off, it is best if we know any detail about the dataset. So, please visit first the SEG Wiki page, and browse the name of the dataset.
Next, directly visit this tutorial notebook link, click on the badge that appeared on top, and you will be re-directed to Google Colab. We will learn how to use wget
to get the data directly from the open database website, just follow through the instructions!
⚠️ Copyright Every dataset that we use must be cited to the institution that owns the dataset. Please refer to this citation wiki-page
- segyio: the easy-to-understand tutorial from Awesome Open Source and official docs from Equinor's segyio
- lasio: docs