BDM P2: Formatted and Exploitation Zones

Aniol Bisquert & Daniel Cantabella

The project is structured in such a way that data contains the necessary files in order to execute the code. Inside this folder you can find the files from the different sources (which are assumed to be in a Landing Zone storage), and also the necessary driver (postgresql-42.6.0.jar) to connect and send the processed data to the exploitation zone.

The src folder contains 3 different folders:

DataFormatters contains the different formatters used for each different source. It also contains the reconciliation.py script, which performs the reconciliation process of our IRIS dataset and creates the IRIS lookup file: lookupIRIS.csv

NOTE: The reconciliation of some neihgborhoods achieved with the script contained some repeated instances with slightly differences between them (e.g., repeated neighborhoods with/without double spaces between hyphens) so lookupIRIS.csv was manually modified to ensure the correct reconciliation with other lookup files. We assume this process should have been created in previous steps of the project as the other lookup files.
DataExploters contains the files necessary for our Exploitation zone. It contains datasetGenerator.py to generate the file our machine learning model will work with and mapReduce.py which computes/prepares the data necessary for the user to calculate the KPIs.
DistributedML contains the files to train our machine learning model (ml_trainer.py) and the file to predict the rental price of a new apartment (ml_predict.py) based on the data generated with datasetGenerator.py.

Instructions:

In order to run the project, please install the requirements using the following command:

pip install -r requirements.txt

You just need to run the executeProgram.py by using the following command:

python src/executeProgram.py

This will show a menu with the different options to run the project:

 ============= MENU OPTIONS =============

0. Create LookUp Data (IRIS)
1. Execute Data Formatters
2. Prepare Data for KPIs
3. Generate datasets for ML
4. Train and deploy ML model
5. Deploy ML model and predict
6. Exit

The results can be seen by accessing the server, where the PostgreSQL database was also created.

Name		Name	Last commit message	Last commit date
Latest commit History 29 Commits
.idea		.idea
data		data
src		src
README.md		README.md
requirements.txt		requirements.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

BDM P2: Formatted and Exploitation Zones

Aniol Bisquert & Daniel Cantabella

Instructions:

About

Uh oh!

Releases

Packages

Uh oh!

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

BDM P2: Formatted and Exploitation Zones

Aniol Bisquert & Daniel Cantabella

Instructions:

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Uh oh!

Contributors

Uh oh!

Languages

Packages