|
1 | 1 | # datasets
|
2 | 2 |
|
| 3 | +This helps with the use of standard SQL datasets. |
| 4 | + |
| 5 | +It comes with 4 datasets: |
| 6 | +- 'extract': an extract from 2 simple datasets 'census' (from the US cenus) and 'beacon' (with japanese names and labels). |
| 7 | +- 'financial': from https://relational.fit.cvut.cz/dataset/Financial |
| 8 | +- 'imdb': from https://relational.fit.cvut.cz/dataset/IMDb |
| 9 | +- 'hematitis': from https://relational.fit.cvut.cz/dataset/Hepatitis |
| 10 | + |
3 | 11 | ## Instalation
|
4 | 12 |
|
5 |
| -To build the datasets, install the requirements with: |
| 13 | +The package can be installed with: |
6 | 14 | ```bash
|
7 |
| -poetry shell |
| 15 | +pip install qrlew-datasets |
8 | 16 | ```
|
| 17 | +The library assumes: |
| 18 | +- either that postgresql is installed, |
| 19 | +- or that docker is installed and can spawn postgresql containers. |
9 | 20 |
|
10 |
| -You can then build the datasets with: |
11 |
| -```bash |
12 |
| -python -m datasets.build |
13 |
| -``` |
| 21 | +### Postgresql in a container |
14 | 22 |
|
15 |
| -You may need to install the requirements of some drivers such as: https://pypi.org/project/mysqlclient/ |
| 23 | +The library automatically spawns containers. There is nothing to do. |
16 | 24 |
|
17 |
| -## Test in an environment without docker |
| 25 | +### Without docker installed |
18 | 26 |
|
19 |
| -You can simulate this by trying to run this code inside a container: |
20 |
| -`docker run --name test -d -i -t -v .:/datasets ubuntu:22.04` |
21 |
| -Then run: |
22 |
| -`docker exec -it test bash` |
| 27 | +Setup a `psql` as in https://colab.research.google.com/github/tensorflow/io/blob/master/docs/tutorials/postgresql.ipynb |
23 | 28 |
|
24 |
| -Then setup a `psql` as in https://colab.research.google.com/github/tensorflow/io/blob/master/docs/tutorials/postgresql.ipynb |
| 29 | +You can set the port to use: here 5433. |
25 | 30 |
|
26 | 31 | ```bash
|
27 | 32 | # Inspred by https://colab.research.google.com/github/tensorflow/io/blob/master/docs/tutorials/postgresql.ipynb#scrollTo=YUj0878jPyz7
|
28 | 33 | sudo apt-get -y -qq update
|
29 |
| -sudo apt-get -y -qq install postgresql |
| 34 | +sudo apt-get -y -qq install postgresql-14 |
30 | 35 | # Start postgresql server
|
31 |
| -sudo sed -i "s/#port = 5432/port = 5433/g" /etc/postgresql/14/main/postgresql.conf |
| 36 | +# sudo sed -i "s/#port = 5432/port = 5433/g" /etc/postgresql/14/main/postgresql.conf |
| 37 | +sudo sed -i "s/port = 5432/port = 5433/g" /etc/postgresql/14/main/postgresql.conf |
32 | 38 | sudo service postgresql start
|
33 | 39 | # Set password
|
34 | 40 | sudo -u postgres psql -U postgres -c "ALTER USER postgres PASSWORD 'pyqrlew-db'"
|
35 | 41 | # Install python packages
|
| 42 | +``` |
| 43 | + |
| 44 | +#### Testing the absence of docker if docker is installed: |
| 45 | + |
| 46 | +You can simulate the absence of docker by running this code inside a container. |
36 | 47 |
|
| 48 | +First run: |
| 49 | +`docker run --name test -d -i -t -v .:/datasets ubuntu:22.04` |
| 50 | +Then run: |
| 51 | +`docker exec -it test bash` |
| 52 | + |
| 53 | +## Building the `.sql` dumps |
| 54 | + |
| 55 | +To build the datasets, install the requirements with: |
| 56 | +```bash |
| 57 | +poetry shell |
37 | 58 | ```
|
38 |
| -<!-- TODO port not working --> |
| 59 | + |
| 60 | +You can then build the datasets with: |
| 61 | +```bash |
| 62 | +python -m datasets.build |
| 63 | +``` |
| 64 | + |
| 65 | +You may need to install the requirements of some drivers such as: https://pypi.org/project/mysqlclient/ |
0 commit comments