Univr-Big-Data

The project is made for the course of Big Data with Prof. Carra Damiano at University of Verona for the master faculty of Computer Science Engenering.

The aim of the project is to create a guide to build, prepare, and run a Virtual Machine on Azure Cluster by Microsoft. Then, the main objective is to make a step - by - step guide on how to use the VM to run Sparks code in order to do an Inverted Index.

The data set used is a static dump in HTML of the italian Wikipedia pages. The processing scripts will work on all type of '''.html''' pages, but they are personalized for the Wikipedia's one.

The processing scripts are not meant to be the best and most efficent possible. They just filter and process the page to remove useless page.

TODO

Generalize and upgrade the processing scripts.
Make it working with a shared system between multiple VMs in Azure.

Name		Name	Last commit message	Last commit date
Latest commit History 8 Commits
processing		processing
sparks		sparks
templates		templates
.gitignore		.gitignore
BigData Project VMs.zip		BigData Project VMs.zip
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Univr-Big-Data

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

Univr-Big-Data

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages