StackOverflow - Analyzer

Developed by Simone Torrisi, Computer Science student at University of Catania

Project Goal

The goal of this project is to analyze real time questions from Stack Overflow and clustering them based on title, body and tags associated to the question. The results will be then displayed on dashboards.

You can get more information visiting docs, Kafka and Spark directories.

Technologies used

Centralized service: Zookeeper
Data Ingestion: Kafka Connect 2.4.1 with Java 11
Data Streaming: Apache Kafka and Spark Streaming
Data Processing: Apache Spark 3.0.0 and Spark MLlib with Java 11
Data Indexing: Elasticsearch 7.8.0
Data Visualization: Kibana 7.8.0

Project Structure

How to execute the project

Downloads

Apache Kafka: download from here and put the tgz file into Kafka/Setup directory.
Apache Spark: download from here and put the tgz file into Spark/Setup directory.

In addition, it is required that Docker and Apache Maven have been already installed.

Initial setup

To start the initial setup the following script initial-setup.sh has to be executed in the main directory.

There are two options:

Using bash command: bash initial-setup.sh
Making script executable: chmod +x initial-setup.sh and then ./initial-setup.sh

Start project

After the previous step is completed, the project can be started by using the code docker-compose up

Name		Name	Last commit message	Last commit date
Latest commit History 22 Commits
Kafka		Kafka
Spark		Spark
docs		docs
.gitignore		.gitignore
README.md		README.md
docker-compose.yaml		docker-compose.yaml
initial-setup.sh		initial-setup.sh

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

StackOverflow - Analyzer

Project Goal

Technologies used

Project Structure

How to execute the project

Downloads

Initial setup

Start project

About

Uh oh!

Languages

Quezal17/StackOverflow_Analyzer

Folders and files

Latest commit

History

Repository files navigation

StackOverflow - Analyzer

Project Goal

Technologies used

Project Structure

How to execute the project

Downloads

Initial setup

Start project

About

Resources

Uh oh!

Stars

Watchers

Forks

Languages