Skip to content

anbipa/BDM_P2

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

29 Commits
 
 
 
 
 
 
 
 
 
 

Repository files navigation

BDM P2: Formatted and Exploitation Zones

Aniol Bisquert & Daniel Cantabella

The project is structured in such a way that data contains the necessary files in order to execute the code. Inside this folder you can find the files from the different sources (which are assumed to be in a Landing Zone storage), and also the necessary driver (postgresql-42.6.0.jar) to connect and send the processed data to the exploitation zone.

The src folder contains 3 different folders:

  • DataFormatters contains the different formatters used for each different source. It also contains the reconciliation.py script, which performs the reconciliation process of our IRIS dataset and creates the IRIS lookup file: lookupIRIS.csv

    NOTE: The reconciliation of some neihgborhoods achieved with the script contained some repeated instances with slightly differences between them (e.g., repeated neighborhoods with/without double spaces between hyphens) so lookupIRIS.csv was manually modified to ensure the correct reconciliation with other lookup files. We assume this process should have been created in previous steps of the project as the other lookup files.

  • DataExploters contains the files necessary for our Exploitation zone. It contains datasetGenerator.py to generate the file our machine learning model will work with and mapReduce.py which computes/prepares the data necessary for the user to calculate the KPIs.

  • DistributedML contains the files to train our machine learning model (ml_trainer.py) and the file to predict the rental price of a new apartment (ml_predict.py) based on the data generated with datasetGenerator.py.

Instructions:

In order to run the project, please install the requirements using the following command:

pip install -r requirements.txt 

You just need to run the executeProgram.py by using the following command:

python src/executeProgram.py

This will show a menu with the different options to run the project:

 ============= MENU OPTIONS =============

0. Create LookUp Data (IRIS)
1. Execute Data Formatters
2. Prepare Data for KPIs
3. Generate datasets for ML
4. Train and deploy ML model
5. Deploy ML model and predict
6. Exit

The results can be seen by accessing the server, where the PostgreSQL database was also created.

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors

Languages