Skip to content

Commit 72c308a

Browse files
committed
0.3
1 parent ecae682 commit 72c308a

File tree

3 files changed

+45
-18
lines changed

3 files changed

+45
-18
lines changed

README.md

+43-16
Original file line numberDiff line numberDiff line change
@@ -1,38 +1,65 @@
11
# datasets
22

3+
This helps with the use of standard SQL datasets.
4+
5+
It comes with 4 datasets:
6+
- 'extract': an extract from 2 simple datasets 'census' (from the US cenus) and 'beacon' (with japanese names and labels).
7+
- 'financial': from https://relational.fit.cvut.cz/dataset/Financial
8+
- 'imdb': from https://relational.fit.cvut.cz/dataset/IMDb
9+
- 'hematitis': from https://relational.fit.cvut.cz/dataset/Hepatitis
10+
311
## Instalation
412

5-
To build the datasets, install the requirements with:
13+
The package can be installed with:
614
```bash
7-
poetry shell
15+
pip install qrlew-datasets
816
```
17+
The library assumes:
18+
- either that postgresql is installed,
19+
- or that docker is installed and can spawn postgresql containers.
920

10-
You can then build the datasets with:
11-
```bash
12-
python -m datasets.build
13-
```
21+
### Postgresql in a container
1422

15-
You may need to install the requirements of some drivers such as: https://pypi.org/project/mysqlclient/
23+
The library automatically spawns containers. There is nothing to do.
1624

17-
## Test in an environment without docker
25+
### Without docker installed
1826

19-
You can simulate this by trying to run this code inside a container:
20-
`docker run --name test -d -i -t -v .:/datasets ubuntu:22.04`
21-
Then run:
22-
`docker exec -it test bash`
27+
Setup a `psql` as in https://colab.research.google.com/github/tensorflow/io/blob/master/docs/tutorials/postgresql.ipynb
2328

24-
Then setup a `psql` as in https://colab.research.google.com/github/tensorflow/io/blob/master/docs/tutorials/postgresql.ipynb
29+
You can set the port to use: here 5433.
2530

2631
```bash
2732
# Inspred by https://colab.research.google.com/github/tensorflow/io/blob/master/docs/tutorials/postgresql.ipynb#scrollTo=YUj0878jPyz7
2833
sudo apt-get -y -qq update
29-
sudo apt-get -y -qq install postgresql
34+
sudo apt-get -y -qq install postgresql-14
3035
# Start postgresql server
31-
sudo sed -i "s/#port = 5432/port = 5433/g" /etc/postgresql/14/main/postgresql.conf
36+
# sudo sed -i "s/#port = 5432/port = 5433/g" /etc/postgresql/14/main/postgresql.conf
37+
sudo sed -i "s/port = 5432/port = 5433/g" /etc/postgresql/14/main/postgresql.conf
3238
sudo service postgresql start
3339
# Set password
3440
sudo -u postgres psql -U postgres -c "ALTER USER postgres PASSWORD 'pyqrlew-db'"
3541
# Install python packages
42+
```
43+
44+
#### Testing the absence of docker if docker is installed:
45+
46+
You can simulate the absence of docker by running this code inside a container.
3647

48+
First run:
49+
`docker run --name test -d -i -t -v .:/datasets ubuntu:22.04`
50+
Then run:
51+
`docker exec -it test bash`
52+
53+
## Building the `.sql` dumps
54+
55+
To build the datasets, install the requirements with:
56+
```bash
57+
poetry shell
3758
```
38-
<!-- TODO port not working -->
59+
60+
You can then build the datasets with:
61+
```bash
62+
python -m datasets.build
63+
```
64+
65+
You may need to install the requirements of some drivers such as: https://pypi.org/project/mysqlclient/

examples/simple.py

+1-1
Original file line numberDiff line numberDiff line change
@@ -17,7 +17,7 @@
1717
# '--publish', '5432:5432',
1818
# 'postgres'])
1919

20-
db = PostgreSQL(name=name, user='postgres', password=name, port=5432)
20+
db = PostgreSQL(name=name, user='postgres', password=name, port=5433)
2121
db.load('/datasets/datasets/sources/extract/extract.sql')
2222
db.set_schema('extract')
2323
db.dump('/tmp/test.dump')

pyproject.toml

+1-1
Original file line numberDiff line numberDiff line change
@@ -1,6 +1,6 @@
11
[tool.poetry]
22
name = "qrlew-datasets"
3-
version = "0.2.6"
3+
version = "0.3.0"
44
description = "Sample SQL datasets"
55
authors = ["Nicolas Grislain <[email protected]>"]
66
license = "Apache 2.0"

0 commit comments

Comments
 (0)