Skip to content

Commit c55fdb2

Browse files
author
Jiong Zhu
committed
Update scripts and README for downloading datasets from Google Drive
1 parent 08011c5 commit c55fdb2

File tree

6 files changed

+46
-18
lines changed

6 files changed

+46
-18
lines changed

README.md

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -45,7 +45,7 @@ As a general note, TensorFlow 1.15 can be used for all code requiring TensorFlow
4545

4646
### Download Datasets
4747

48-
The datasets can be downloaded using the bash scripts provided in `/experiments/h2gcn/scripts`, which also prepare the datasets for use in our experimental framework based on `signac`.
48+
The datasets can be downloaded using the bash scripts provided in `/experiments/h2gcn/scripts` (requires the latest version of [`gdown`](https://github.com/wkentaro/gdown) to be installed), which also prepare the datasets for use in our experimental framework based on `signac`.
4949

5050
We make use of `signac` to index and manage the datasets: the datasets and experiments are stored in hierarchically organized signac jobs, with the **1st level** storing different graphs, **2nd level** storing different sets of features, and **3rd level** storing different training-validation-test splits. Each level contains its own state points and job documents to differentiate with other jobs.
5151

experiments/h2gcn/scripts/get-real-cora_full.sh

Lines changed: 11 additions & 4 deletions
Original file line numberDiff line numberDiff line change
@@ -9,10 +9,17 @@ SHA256SUM=b04a3db58aee34ddec4e24970665a3ef094125f39e2051c6e5024f124caa5053
99
cd "$(dirname ${BASH_SOURCE[0]})/.."
1010
mkdir -p archives
1111

12-
# Automatic downloading script adopted from https://stackoverflow.com/a/38937732
13-
filename="$(curl -sc /tmp/gcokie "${ggURL}&id=${ggID}" | grep -o '="uc-name.*</span>' | sed 's/.*">//;s/<.a> .*//')"
14-
getcode="$(awk '/_warning_/ {print $NF}' /tmp/gcokie)"
15-
curl -Lb /tmp/gcokie "${ggURL}&confirm=${getcode}&id=${ggID}" -o "${TARGET}"
12+
# # Automatic downloading script adopted from https://stackoverflow.com/a/38937732
13+
# filename="$(curl -sc /tmp/gcokie "${ggURL}&id=${ggID}" | grep -o '="uc-name.*</span>' | sed 's/.*">//;s/<.a> .*//')"
14+
# getcode="$(awk '/_warning_/ {print $NF}' /tmp/gcokie)"
15+
# curl -Lb /tmp/gcokie "${ggURL}&confirm=${getcode}&id=${ggID}" -o "${TARGET}"
16+
17+
if ! command -v gdown &> /dev/null
18+
then
19+
read -p "Prerequisite package gdown is not installed. Press any key to install (pip install --upgrade gdown)"
20+
pip install --upgrade gdown
21+
fi
22+
gdown -O "${TARGET}" ${ggID}
1623

1724
echo "$SHA256SUM $TARGET" | sha256sum -c
1825
test $? -eq 0 || read -p "Failed to verify SHA256 checksum. Press any key to continue anyway." -n 1 -r

experiments/h2gcn/scripts/get-real-geomgcn.sh

Lines changed: 11 additions & 4 deletions
Original file line numberDiff line numberDiff line change
@@ -9,10 +9,17 @@ SHA256SUM=06bf9a52cb272b3b25227530eafc2a40681fa7c548641ec00ca2427812fbe39f
99
cd "$(dirname ${BASH_SOURCE[0]})/.."
1010
mkdir -p archives
1111

12-
# Automatic downloading script adopted from https://stackoverflow.com/a/38937732
13-
filename="$(curl -sc /tmp/gcokie "${ggURL}&id=${ggID}" | grep -o '="uc-name.*</span>' | sed 's/.*">//;s/<.a> .*//')"
14-
getcode="$(awk '/_warning_/ {print $NF}' /tmp/gcokie)"
15-
curl -Lb /tmp/gcokie "${ggURL}&confirm=${getcode}&id=${ggID}" -o "${TARGET}"
12+
# # Automatic downloading script adopted from https://stackoverflow.com/a/38937732
13+
# filename="$(curl -sc /tmp/gcokie "${ggURL}&id=${ggID}" | grep -o '="uc-name.*</span>' | sed 's/.*">//;s/<.a> .*//')"
14+
# getcode="$(awk '/_warning_/ {print $NF}' /tmp/gcokie)"
15+
# curl -Lb /tmp/gcokie "${ggURL}&confirm=${getcode}&id=${ggID}" -o "${TARGET}"
16+
17+
if ! command -v gdown &> /dev/null
18+
then
19+
read -p "Prerequisite package gdown is not installed. Press any key to install (pip install --upgrade gdown)"
20+
pip install --upgrade gdown
21+
fi
22+
gdown -O "${TARGET}" ${ggID}
1623

1724
echo "$SHA256SUM $TARGET" | sha256sum -c
1825
test $? -eq 0 || read -p "Failed to verify SHA256 checksum. Press any key to continue anyway." -n 1 -r

experiments/h2gcn/scripts/get-syn-cora.sh

Lines changed: 11 additions & 4 deletions
Original file line numberDiff line numberDiff line change
@@ -9,10 +9,17 @@ SHA256SUM=93a5329054bc36d742f394589a4de9b6239f8a19ccb6b7b894228841887a413b
99
cd "$(dirname ${BASH_SOURCE[0]})/.."
1010
mkdir -p archives
1111

12-
# Automatic downloading script adopted from https://stackoverflow.com/a/38937732
13-
filename="$(curl -sc /tmp/gcokie "${ggURL}&id=${ggID}" | grep -o '="uc-name.*</span>' | sed 's/.*">//;s/<.a> .*//')"
14-
getcode="$(awk '/_warning_/ {print $NF}' /tmp/gcokie)"
15-
curl -Lb /tmp/gcokie "${ggURL}&confirm=${getcode}&id=${ggID}" -o "${TARGET}"
12+
# # Automatic downloading script adopted from https://stackoverflow.com/a/38937732
13+
# filename="$(curl -sc /tmp/gcokie "${ggURL}&id=${ggID}" | grep -o '="uc-name.*</span>' | sed 's/.*">//;s/<.a> .*//')"
14+
# getcode="$(awk '/_warning_/ {print $NF}' /tmp/gcokie)"
15+
# curl -Lb /tmp/gcokie "${ggURL}&confirm=${getcode}&id=${ggID}" -o "${TARGET}"
16+
17+
if ! command -v gdown &> /dev/null
18+
then
19+
read -p "Prerequisite package gdown is not installed. Press any key to install (pip install --upgrade gdown)"
20+
pip install --upgrade gdown
21+
fi
22+
gdown -O "${TARGET}" ${ggID}
1623

1724
echo "$SHA256SUM $TARGET" | sha256sum -c
1825
test $? -eq 0 || read -p "Failed to verify SHA256 checksum. Press any key to continue anyway." -n 1 -r

experiments/h2gcn/scripts/get-syn-products.sh

Lines changed: 11 additions & 4 deletions
Original file line numberDiff line numberDiff line change
@@ -9,10 +9,17 @@ SHA256SUM=ee92199881159dbb259c9b1c580984e3a9a7681b0b5da35ac2d2e36c9e240f26
99
cd "$(dirname ${BASH_SOURCE[0]})/.."
1010
mkdir -p archives
1111

12-
# Automatic downloading script adopted from https://stackoverflow.com/a/38937732
13-
filename="$(curl -sc /tmp/gcokie "${ggURL}&id=${ggID}" | grep -o '="uc-name.*</span>' | sed 's/.*">//;s/<.a> .*//')"
14-
getcode="$(awk '/_warning_/ {print $NF}' /tmp/gcokie)"
15-
curl -Lb /tmp/gcokie "${ggURL}&confirm=${getcode}&id=${ggID}" -o "${TARGET}"
12+
# # Automatic downloading script adopted from https://stackoverflow.com/a/38937732
13+
# filename="$(curl -sc /tmp/gcokie "${ggURL}&id=${ggID}" | grep -o '="uc-name.*</span>' | sed 's/.*">//;s/<.a> .*//')"
14+
# getcode="$(awk '/_warning_/ {print $NF}' /tmp/gcokie)"
15+
# curl -Lb /tmp/gcokie "${ggURL}&confirm=${getcode}&id=${ggID}" -o "${TARGET}"
16+
17+
if ! command -v gdown &> /dev/null
18+
then
19+
read -p "Prerequisite package gdown is not installed. Press any key to install (pip install --upgrade gdown)"
20+
pip install --upgrade gdown
21+
fi
22+
gdown -O "${TARGET}" ${ggID}
1623

1724
echo "$SHA256SUM $TARGET" | sha256sum -c
1825
test $? -eq 0 || read -p "Failed to verify SHA256 checksum. Press any key to continue anyway." -n 1 -r

npz-datasets/README.md

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -8,7 +8,7 @@ Note that the new `npz` format does NOT keep the same training, validation and t
88

99
### Download Datasets
1010

11-
The datasets can be downloaded using the bash scripts provided in the `scripts` folder under this folder.
11+
The datasets can be downloaded using the bash scripts provided in the `scripts` folder under this folder. Installation of the latest version of [`gdown`](https://github.com/wkentaro/gdown) is required.
1212

1313
### Example Usage
1414

0 commit comments

Comments
 (0)