In this project, I build a reporting tool that will use information from news database to discover what kind of articles the readers like.The database contains newspaper's authors, articles and web server log. Using these tables the tool will analyze the site's user activity.
News database has three tables:
- authors
- name -> text
- bio -> text
- id -> integer (Primary Key)
- articles
- author -> integer (Forigen Key)
- title -> text
- slug -> text
- lead -> text
- body -> text
- time -> timestamp
- id -> integer (Primary key)
- log
- path -> text
- ip -> inet
- method -> text
- status -> text\
- time -> timestamp
- id -> integer (Primary Key)
The tool currently can perfrom three types of analysis:
- Get the most popular three articles of all time
- Get the most popular article authors of all time
- Get which days did more than 1% of requests lead to errors
- Python 3
- PostgreSQL
- Psycopg2
- News Database
-
Download/Clone the repository
-
Install Psycopg2 module by runing the following command
pip install psycopg2 -
Create the news database in PostgreSQL
-
Download and unzip News Database
-
Run the following command to load the data:
psql -d news -f newsdata.sql -
Create the following view that count all requests valid and invalid:
CREATE VIEW total_req_view AS SELECT time::timestamp::date AS date, COUNT(*) AS total_requests FROM log GROUP BY date ORDER BY date;
-
Create the following view that count invalid requests with status code 404:
CREATE VIEW invalid_req_view AS SELECT time::timestamp::date AS date, COUNT(*) AS invalid_requests FROM log WHERE status LIKE '%404%' GROUP BY date ORDER BY date;
-
Run main.py
python main.py
See the attached file
Mohammed Mahdi Ibrahim
For any related questions about the tool you can contact me at wmm@hotmail.it