Skip to content

Commit 3d8f890

Browse files
Added support for MinIO and B2 buckets
-Refactored SilNlpEnv in silnlp/common/environment.py to support connection to either MinIO or B2 -Kept in support for AWS temporarily -Updated readme and other documentation to show instructions on MinIO and B2 bucket setup
1 parent 25e3729 commit 3d8f890

File tree

13 files changed

+238
-154
lines changed

13 files changed

+238
-154
lines changed

.devcontainer/Dockerfile

Lines changed: 0 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -43,6 +43,5 @@ ENV SIL_NLP_CACHE_EXPERIMENT_DIR=/root/.cache/silnlp/experiments
4343
ENV SIL_NLP_CACHE_PROJECT_DIR=/root/.cache/silnlp/projects
4444
# Set environment variables
4545
ENV CLEARML_API_HOST="https://api.sil.hosted.allegro.ai"
46-
ENV SIL_NLP_DATA_PATH=/silnlp
4746
ENV EFLOMAL_PATH=/workspaces/silnlp/.venv/lib/python3.10/site-packages/eflomal/bin
4847
CMD ["bash"]

.devcontainer/devcontainer.json

Lines changed: 6 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -12,14 +12,18 @@
1212
"--gpus",
1313
"all",
1414
"-v",
15-
"${env:HOME}/.aws:/root/.aws", // Mount user's AWS credentials into the container
16-
"-v",
1715
"${env:HOME}/clearml/.clearml/hf-cache:/root/.cache/huggingface"
1816
],
1917
"containerEnv": {
2018
"AWS_REGION": "${localEnv:AWS_REGION}",
2119
"AWS_ACCESS_KEY_ID": "${localEnv:AWS_ACCESS_KEY_ID}",
2220
"AWS_SECRET_ACCESS_KEY": "${localEnv:AWS_SECRET_ACCESS_KEY}",
21+
"MINIO_ENDPOINT_URL": "${localEnv:MINIO_ENDPOINT_URL}",
22+
"MINIO_ACCESS_KEY": "${localEnv:MINIO_ACCESS_KEY}",
23+
"MINIO_SECRET_KEY": "${localEnv:MINIO_SECRET_KEY}",
24+
"B2_ENDPOINT_URL": "${localEnv:B2_ENDPOINT_URL}",
25+
"B2_KEY_ID": "${localEnv:B2_KEY_ID}",
26+
"B2_APPLICATION_KEY": "${localEnv:B2_APPLICATION_KEY}",
2327
"CLEARML_API_ACCESS_KEY": "${localEnv:CLEARML_API_ACCESS_KEY}",
2428
"CLEARML_API_SECRET_KEY": "${localEnv:CLEARML_API_SECRET_KEY}"
2529
},

README.md

Lines changed: 25 additions & 21 deletions
Original file line numberDiff line numberDiff line change
@@ -62,15 +62,17 @@ These are the main requirements for the SILNLP code to run on a local machine. S
6262
Create a text file with the following content and edit as necessary:
6363
```
6464
CLEARML_API_HOST="https://api.sil.hosted.allegro.ai"
65-
CLEARML_API_ACCESS_KEY=xxxxx
66-
CLEARML_API_SECRET_KEY=xxxxx
67-
AWS_REGION="us-east-1"
68-
AWS_ACCESS_KEY_ID=xxxxx
69-
AWS_SECRET_ACCESS_KEY=xxxxx
70-
SIL_NLP_DATA_PATH="/silnlp"
71-
```
72-
* If you do not intend to use SILNLP with ClearML and/or AWS, you can leave out the respective variables. If you need to generate ClearML credentials, see [ClearML setup](clear_ml_setup.md).
73-
* Note that this does not give you direct access to an AWS S3 bucket from within the Docker container, it only allows you to run scripts referencing files in the bucket.
65+
CLEARML_API_ACCESS_KEY=xxxxxxx
66+
CLEARML_API_SECRET_KEY=xxxxxxx
67+
B2_ENDPOINT_URL=https://s3.us-east-005.backblazeb2.com
68+
B2_KEY_ID=xxxxxxxx
69+
B2_APPLICATION_KEY=xxxxxxxx
70+
MINIO_ENDPOINT_URL=https://truenas.psonet.languagetechnology.org:9000
71+
MINIO_ACCESS_KEY=xxxxxxxxx
72+
MINIO_SECRET_KEY=xxxxxxx
73+
```
74+
* If you do not intend to use SILNLP with ClearML and/or B2/MinIO, you can leave out the respective variables. If you need to generate ClearML credentials, see [ClearML setup](clear_ml_setup.md).
75+
* Note that this does not give you direct access to a B2 or MinIO bucket from within the Docker container, it only allows you to run scripts referencing files in the bucket.
7476

7577
6. Start container
7678

@@ -129,22 +131,24 @@ These are the main requirements for the SILNLP code to run on a local machine. S
129131
poetry install
130132
```
131133
132-
10. If using ClearML and/or AWS, set the following environment variables:
134+
10. If using ClearML and/or B2/MinIO, set the following environment variables:
133135
```
134136
CLEARML_API_HOST="https://api.sil.hosted.allegro.ai"
135-
CLEARML_API_ACCESS_KEY=xxxxx
136-
CLEARML_API_SECRET_KEY=xxxxx
137-
AWS_REGION="us-east-1"
138-
AWS_ACCESS_KEY_ID=xxxxx
139-
AWS_SECRET_ACCESS_KEY=xxxxx
140-
SIL_NLP_DATA_PATH="/silnlp"
137+
CLEARML_API_ACCESS_KEY=xxxxxxx
138+
CLEARML_API_SECRET_KEY=xxxxxxx
139+
B2_ENDPOINT_URL=https://s3.us-east-005.backblazeb2.com
140+
B2_KEY_ID=xxxxxxxx
141+
B2_APPLICATION_KEY=xxxxxxxx
142+
MINIO_ENDPOINT_URL=https://truenas.psonet.languagetechnology.org:9000
143+
MINIO_ACCESS_KEY=xxxxxxxxx
144+
MINIO_SECRET_KEY=xxxxxxx
141145
```
142146
* If you need to generate ClearML credentials, see [ClearML setup](clear_ml_setup.md).
143-
* Note that this does not give you direct access to an AWS S3 bucket from within the Docker container, it only allows you to run scripts referencing files in the bucket.
147+
* Note that this does not give you direct access to a B2 or MinIO bucket from within the Docker container, it only allows you to run scripts referencing files in the bucket.
144148
* For instructions on how to permanently set up environment variables for your operating system, see the corresponding section under the Development Environment Setup header below.
145149
146-
11. If using AWS, there are two options:
147-
* Option 1: Mount the bucket to your filesystem following the instructions under [Install and Configure Rclone](https://github.com/sillsdev/silnlp/blob/master/s3_bucket_setup.md#install-and-configure-rclone).
150+
11. If using B2/MinIO, there are two options:
151+
* Option 1: Mount the bucket to your filesystem following the instructions under [Install and Configure Rclone](https://github.com/sillsdev/silnlp/blob/master/bucket_setup.md#install-and-configure-rclone).
148152
* Option 2: Create a local cache for the bucket following the instructions under [Create SILNLP cache](https://github.com/sillsdev/silnlp/blob/master/manual_setup.md#create-silnlp-cache).
149153
150154
## Development Environment Setup
@@ -177,7 +181,7 @@ Follow the instructions below to set up a Dev Container in VS Code. This is the
177181
178182
4. Define environment variables.
179183
180-
Set the following environment variables with your respective credentials: CLEARML_API_ACCESS_KEY, CLEARML_API_SECRET_KEY, AWS_ACCESS_KEY_ID, and AWS_SECRET_ACCESS_KEY. Additionally, set AWS_REGION. The typical value is "us-east-1".
184+
Set the following environment variables with your respective credentials: CLEARML_API_ACCESS_KEY, CLEARML_API_SECRET_KEY, B2_KEY_ID, B2_APPLICATION_KEY, MINIO_ACCESS_KEY, MINIO_SECRET_KEY. Also set B2_ENDPOINT_URL to https://s3.us-east-005.backblazeb2.com and set MINIO_ENDPOINT_URL to https://truenas.psonet.languagetechnology.org:9000 with no quotations.
181185
* Linux / macOS users: To set environment variables permanently, add each variable as a new line to the `.bashrc` file (Linux) or `.profile` file (macOS) in your home directory with the format
182186
```
183187
export VAR="VAL"
@@ -210,7 +214,7 @@ Follow the instructions below to set up a Dev Container in VS Code. This is the
210214
10. Install and activate Poetry environment.
211215
* In the VS Code terminal, run `poetry install` to install the necessary Python libraries, and then run `poetry shell` to enter the environment in the terminal.
212216
213-
11. (Optional) Locally mount the S3 bucket. This will allow you to interact directly with the S3 bucket from your local terminal (outside of the dev container). See instructions [here](s3_bucket_setup.md).
217+
11. (Optional) Locally mount the B2 and/or MinIO bucket(s). This will allow you to interact directly with the bucket(s) from your local terminal (outside of the dev container). See instructions [here](bucket_setup.md).
214218
215219
To get back into the dev container and poetry environment each subsequent time, open the silnlp folder in VS Code, select the "Reopen in Container" option from the Remote Connection menu (bottom left corner), and use the `poetry shell` command in the terminal.
216220

bucket_setup.md

Lines changed: 59 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,59 @@
1+
# B2/MinIO bucket setup
2+
3+
We use Backblaze B2 and MinIO storage for storing our experiment data. Here is some workspace setup to enable a decent workflow.
4+
5+
### Note For MinIO setup
6+
7+
In order to access the MinIO bucket locally, you must have a VPN connected to its network.
8+
9+
### Install and configure rclone
10+
11+
**Windows**
12+
13+
The following will mount /silnlp on your B drive or /nlp-research on your M drive and allow you to explore, read and write.
14+
* Install WinFsp: http://www.secfs.net/winfsp/rel/ (Click the button to "Download WinFsp Installer" not the "SSHFS-Win (x64)" installer)
15+
* Download rclone from: https://rclone.org/downloads/
16+
* Unzip to your desktop (or some convient location).
17+
* Add the folder that contains rclone.exe to your PATH environment variable.
18+
* Take the `scripts/rclone/rclone.conf` file from this SILNLP repo and copy it to `~\AppData\Roaming\rclone` (creating folders if necessary)
19+
* Add your credentials in the appropriate fields in `~\AppData\Roaming\rclone`
20+
* Take the `scripts/rclone/mount_b2_to_b.bat` and `scripts/rclone/mount_minio_to_m.bat` file from this SILNLP repo and copy it to the folder that contains the unzipped rclone.
21+
* Double-click either bat file. A command window should open and remain open. You should see something like, if running mount_b2_to_b.bat:
22+
```
23+
C:\Users\David\Software\rclone>call rclone mount --vfs-cache-mode full --use-server-modtime b2silnlp:silnlp B:
24+
The service rclone has been started.
25+
```
26+
27+
**Linux / macOS**
28+
29+
The following will mount /silnlp to a B folder or /nlp-research to a M folder in your home directory and allow you to explore, read and write.
30+
* For macOS, first download and install macFUSE: https://osxfuse.github.io/
31+
* Download rclone from: https://rclone.org/install/
32+
* Take the `scripts/rclone/rclone.conf` file from this SILNLP repo and copy it to `~/.config/rclone/rclone.conf` (creating folders if necessary)
33+
* Add your credentials in the appropriate fields in `~/.config/rclone/rclone.conf`
34+
* Create a folder called "B" or "M" in your user directory
35+
* Run the following command for B2:
36+
```
37+
rclone mount --vfs-cache-mode full --use-server-modtime b2silnlp:silnlp ~/B
38+
```
39+
* OR run the following command for MinIO:
40+
```
41+
rclone mount --vfs-cache-mode full --use-server-modtime miniosilnlp:nlp-research ~/M
42+
```
43+
### To start B: and/or M: drive on start up
44+
45+
**Windows**
46+
47+
Put a shortcut to the mount_b2_to_b.bat and/or mount_minio_to_m.bat file in the Startup folder.
48+
* In Windows Explorer put `shell:startup` in the address bar or open `C:\Users\<Username>\AppData\Roaming\Microsoft\Windows\Start Menu\Programs\Startup`
49+
* Right click to add a new shortcut. Choose `mount_b2_to_b.bat` and/or `mount_minio_to_m.bat` as the target, you can leave the name as the default.
50+
51+
Now your B2 and/or MinIO bucket should be mounted as B: or M: drive, respectively, when you start Windows.
52+
53+
**Linux / macOS**
54+
* Run `crontab -e`
55+
* For B2, paste `@reboot rclone mount --vfs-cache-mode full --use-server-modtime b2silnlp:silnlp ~/B` into the file, save and exit
56+
* For MinIO, paste `@reboot rclone mount --vfs-cache-mode full --use-server-modtime miniosilnlp:nlp-research ~/M` into the file, save and exit
57+
* Reboot Linux / macOS
58+
59+
Now your B2 and/or MinIO bucket should be mounted as ~/B or ~/M respectively when you start Linux / macOS.

manual_setup.md

Lines changed: 6 additions & 4 deletions
Original file line numberDiff line numberDiff line change
@@ -73,9 +73,9 @@ __Download and install__ the following before creating any projects or starting
7373
"editor.formatOnSave": true,
7474
```
7575

76-
### S3 bucket setup
76+
### B2 and/or MinIO bucket(s) setup
7777

78-
See [S3 bucket setup](s3_bucket_setup.md).
78+
See [Bucket setup](bucket_setup.md).
7979

8080
### ClearML setup
8181

@@ -88,8 +88,10 @@ See [ClearML setup](clear_ml_setup.md).
8888
* Create the directory "$HOME/.cache/silnlp/projects" and set the environment variable SIL_NLP_CACHE_PROJECT_DIR to that path.
8989

9090
### Additional Environment Variables
91-
* Set the following environment variables with your respective credentials: CLEARML_API_ACCESS_KEY, CLEARML_API_SECRET_KEY, AWS_ACCESS_KEY_ID, and AWS_SECRET_ACCESS_KEY.
92-
* Set SIL_NLP_DATA_PATH to "/silnlp" and CLEARML_API_HOST to "https://api.sil.hosted.allegro.ai".
91+
* Set the following environment variables with your respective credentials: CLEARML_API_ACCESS_KEY, CLEARML_API_SECRET_KEY, B2_KEY_ID, B2_APPLICATION_KEY, MINIO_ACCESS_KEY, MINIO_SECRET_KEY.
92+
* Set CLEARML_API_HOST to "https://api.sil.hosted.allegro.ai".
93+
* Set B2_ENDPOINT_URL to https://s3.us-east-005.backblazeb2.com
94+
* Set MINIO_ENDPOINT_URL to https://truenas.psonet.languagetechnology.org:9000
9395

9496
### Setting Up and Running Experiments
9597

s3_bucket_setup.md

Lines changed: 0 additions & 56 deletions
This file was deleted.

scripts/rclone/mount_to_s.bat renamed to scripts/rclone/mount_b2_to_b.bat

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -10,4 +10,4 @@ rem copy your key and secret to rclone.conf
1010

1111
rem run rclone - execute this file in the rclone folder
1212

13-
call rclone mount --vfs-cache-mode full --use-server-modtime s3silnlp:silnlp S:
13+
call rclone mount --vfs-cache-mode full --use-server-modtime b2silnlp:silnlp B:
Lines changed: 13 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,13 @@
1+
rem Install rclone
2+
rem get rclone from https://rclone.org/downloads/
3+
rem extract the files to a folder
4+
rem then move this bat file to the folder where you run this bat file to start the service
5+
rem --use-server-modtime flag speeds up displaying large numbers of files. Not exactly mod time, but close enough.
6+
7+
rem configure rclone
8+
rem copy the adjacent file "rclone.conf" to: C:\Users\<username>\AppData\Roaming\rclone\rclone.conf
9+
rem copy your key and secret to rclone.conf
10+
11+
rem run rclone - execute this file in the rclone folder
12+
13+
call rclone mount --vfs-cache-mode full --use-server-modtime --no-check-certificate miniosilnlp:nlp-research M:

scripts/rclone/rclone.conf

Lines changed: 11 additions & 6 deletions
Original file line numberDiff line numberDiff line change
@@ -1,7 +1,12 @@
1-
[s3silnlp]
2-
type = s3
3-
provider = AWS
4-
access_key_id = xxxxxxxxxxxxxxxxxx
5-
secret_access_key = xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx
6-
region = us-east-1
1+
[b2silnlp]
2+
type = b2
3+
account = xxxxxxxxx
4+
key = xxxxxxxxxxxx
5+
6+
[miniosilnlp]
7+
type= s3
8+
provider = Other
9+
access_key_id = xxxxxxxx
10+
secret_access_key = xxxxxxxxxx
11+
endpoint = https://truenas.psonet.languagetechnology.org:9000
712

0 commit comments

Comments
 (0)