Insight_coding_challenge

The solution file to this coding challenge is src/average_degree.py

Python 3 is used, therefore the environment needs to have it installed

Overview of what src/average_degree.py does:

Note that steps 2 through 7 is performed for each tweet

Note that EDGE_LIST is a list of lists with the format: [[timestamp1,['hashtagX1','hashtagY1']], [timestamp2,['hashtagX2','hashtagY2']], ...etc]

```
Get tweets:
```

If it's a rate limiting message, ignore it

```
Check timestamp:
```

If timestamp is older than 60s, delete tweet, jump to call calc_average_degree() to end

If timestamp is newer than newest, update newest_timestamp value

 Delete edges that are older than 60 seconds

```
Find hashtags:
```

If tweet has 2 or most hashtags, check and remove all duplicates

If only 0 or 1 hashtag remains, discard tweet, jump to call calc_average_degree()

 Create edge entries: (If tweet has 2 or more valid hashtags, create edge entries)

Use the combination package that was imported. Eg: list(combinations(['hashtag1','hashtag2','hashtag3'],2)). This outputs a list of tuples.

Sort each edge entry alphabetically so that we don't have the check the reverse. Do this by converting each tuple into a list and sorting

 Insert each new edge entry into EDGE_LIST:

Check that the edge doesn't already exist, if it does, update timestamp of that edge (no need to check for reverse order, because each edge entry is already sorted)

```
 call calc_average_degree()
```

Concatenate the 2 columns of nodes in the EDGE_LIST, and sum (this will be the sum of degrees)

Concatenate the 2 columns of nodes in the EDGE_LIST, remove duplicates, and sum (this will be the total number of nodes)

Divide the total degrees by total nodes to get average degree count

src/average_degree.py already imports all the pacakges it needs.

The following packages are used/imported:

import time - needed to deal with timestamps
import sys - for reading the arugments of the run.sh command
import json - for processing json
import os - for checking if output.txt already exists, and deleting it if it does
from itertools import combinations - used to run combinations (order doesn't matter), Taken from: https://rosettacode.org/wiki/Combinations#Python

Note that this repo started off as a clone of https://github.com/InsightDataScience/coding-challenge

For testing, call "./run_tests.sh" from within the insight_testsuite directory

Name		Name	Last commit message	Last commit date
Latest commit History 140 Commits
data-gen		data-gen
images		images
insight_testsuite		insight_testsuite
src		src
tweet_input		tweet_input
tweet_output		tweet_output
.gitignore		.gitignore
README.md		README.md
insight_data_coding_challenege.ipynb		insight_data_coding_challenege.ipynb
run.sh		run.sh

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

Insight_coding_challenge

The solution file to this coding challenge is src/average_degree.py

Python 3 is used, therefore the environment needs to have it installed

Overview of what src/average_degree.py does:

Note that steps 2 through 7 is performed for each tweet

Note that EDGE_LIST is a list of lists with the format: [[timestamp1,['hashtagX1','hashtagY1']], [timestamp2,['hashtagX2','hashtagY2']], ...etc]

src/average_degree.py already imports all the pacakges it needs.

Note that this repo started off as a clone of https://github.com/InsightDataScience/coding-challenge

About

Uh oh!

Releases

Packages

Contributors 6

Uh oh!

Languages

gylu/insight_coding_challenge_spring_2016

Folders and files

Latest commit

History

Repository files navigation

Insight_coding_challenge

The solution file to this coding challenge is src/average_degree.py

Python 3 is used, therefore the environment needs to have it installed

Overview of what src/average_degree.py does:

Note that steps 2 through 7 is performed for each tweet

Note that EDGE_LIST is a list of lists with the format: [[timestamp1,['hashtagX1','hashtagY1']], [timestamp2,['hashtagX2','hashtagY2']], ...etc]

src/average_degree.py already imports all the pacakges it needs.

Note that this repo started off as a clone of https://github.com/InsightDataScience/coding-challenge

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Contributors 6

Uh oh!

Languages

Packages