Note that EDGE_LIST is a list of lists with the format: [[timestamp1,['hashtagX1','hashtagY1']], [timestamp2,['hashtagX2','hashtagY2']], ...etc]
-
Get tweets: -
If it's a rate limiting message, ignore it -
Check timestamp: -
If timestamp is older than 60s, delete tweet, jump to call calc_average_degree() to end -
If timestamp is newer than newest, update newest_timestamp value -
Delete edges that are older than 60 seconds -
Find hashtags: -
If tweet has 2 or most hashtags, check and remove all duplicates - If only 0 or 1 hashtag remains, discard tweet, jump to call calc_average_degree()
-
Create edge entries: (If tweet has 2 or more valid hashtags, create edge entries) -
Use the combination package that was imported. Eg: list(combinations(['hashtag1','hashtag2','hashtag3'],2)). This outputs a list of tuples. - Sort each edge entry alphabetically so that we don't have the check the reverse. Do this by converting each tuple into a list and sorting
-
Insert each new edge entry into EDGE_LIST: -
Check that the edge doesn't already exist, if it does, update timestamp of that edge (no need to check for reverse order, because each edge entry is already sorted) -
call calc_average_degree() -
Concatenate the 2 columns of nodes in the EDGE_LIST, and sum (this will be the sum of degrees) -
Concatenate the 2 columns of nodes in the EDGE_LIST, remove duplicates, and sum (this will be the total number of nodes) -
Divide the total degrees by total nodes to get average degree count
The following packages are used/imported:
- import time - needed to deal with timestamps
- import sys - for reading the arugments of the run.sh command
- import json - for processing json
- import os - for checking if output.txt already exists, and deleting it if it does
- from itertools import combinations - used to run combinations (order doesn't matter), Taken from: https://rosettacode.org/wiki/Combinations#Python
Note that this repo started off as a clone of https://github.com/InsightDataScience/coding-challenge
For testing, call "./run_tests.sh" from within the insight_testsuite directory