Skip to content
Open
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
104 commits
Select commit Hold shift + click to select a range
5930340
the boys are back in town. with their old pandas code
hcaouette Mar 8, 2021
888b2c8
added basic html file for visuals
clay-ol Mar 11, 2021
4eb31e1
setting up to scrape by weeks instead of days (1500 data points to 200)
hcaouette Mar 11, 2021
338d9af
Merge branch 'main' of https://github.com/hcaouette/final into main
hcaouette Mar 11, 2021
6885a41
country folders in dir chart_scrapes
hcaouette Mar 11, 2021
7caa440
reduced weeks added to reduce scale of data collection
hcaouette Mar 11, 2021
64caac4
split country codes for processing times
clay-ol Mar 11, 2021
75d09cd
split datasets further
clay-ol Mar 11, 2021
338b0b5
uploading first 3 countries of data. changed scrape.py so it doesnt r…
hcaouette Mar 11, 2021
ee52410
pushing python changes
hcaouette Mar 11, 2021
9fc8f4e
added country datascrapes
clay-ol Mar 11, 2021
c059622
pulled data for reduced_countries3
alescion Mar 11, 2021
968a197
Merge branch 'main' of https://github.com/hcaouette/final into main
alescion Mar 11, 2021
7cf0dbb
reduced countries 1 data
hcaouette Mar 11, 2021
97ce0a1
merging reduced_weeks1 data
hcaouette Mar 11, 2021
176b288
updated marching through directories
clay-ol Mar 11, 2021
e241d27
added unique song parsing
clay-ol Mar 11, 2021
6ebad18
Update index.html
alescion Mar 11, 2021
f4065e3
updating metascrape to work with our unique songs
clay-ol Mar 11, 2021
16c4c3b
added start of flexbox layout css
hcaouette Mar 11, 2021
894fd18
Merge branch 'main' of https://github.com/hcaouette/final into main
hcaouette Mar 11, 2021
3f34ae1
song meta collected
clay-ol Mar 11, 2021
ada7dc2
Added radio buttons for modality
alescion Mar 12, 2021
7dcbc41
added meta flattening functionality
clay-ol Mar 14, 2021
b3a8475
initial work for adding metadata to unique songs
clay-ol Mar 14, 2021
c4d5818
unique songs with metadata functionality added
clay-ol Mar 14, 2021
487f85c
updated main.py to represent most recent versions of meta scraping
clay-ol Mar 14, 2021
ceb7452
countries have averages
clay-ol Mar 14, 2021
7a3f3eb
added emojis to buttons
alescion Mar 15, 2021
7aa6707
downloaded make-a-map from prof harrison
hcaouette Mar 15, 2021
b0d3ebf
refactored script tag out of body, added example choropleth
clay-ol Mar 15, 2021
c29420e
added professor harrison's html file from the zoom session
hcaouette Mar 15, 2021
963ce9d
master JSON created
clay-ol Mar 15, 2021
6aa67d1
flexboxes. beginning of timeline bar. loads master_JSON
hcaouette Mar 15, 2021
cb2f1de
changed structure of JSON to add country tag
clay-ol Mar 15, 2021
589a3d0
scrolling date bar. might have broke chloropleth interactivitywith cs…
hcaouette Mar 15, 2021
7b21e83
starting implementation of choropleth
clay-ol Mar 15, 2021
051d97d
Merge branch 'main' of https://github.com/hcaouette/final into main
clay-ol Mar 15, 2021
275f54a
added new font styling to remove default
clay-ol Mar 15, 2021
339abf5
added ISO3166 Alpha-3 to JSON for ease of indexing
clay-ol Mar 16, 2021
f2bd680
starting work on coloring based on country value
clay-ol Mar 16, 2021
27ecc02
updated naming for Alpha-3
clay-ol Mar 16, 2021
59131fd
scaling works, will need to refactor domain setup
clay-ol Mar 16, 2021
3b621b7
fixed map zoom/panning
hcaouette Mar 16, 2021
37500d2
created process book outline, added pictures
alescion Mar 16, 2021
a202d46
refactored drawing to pull week, mode from external source
clay-ol Mar 16, 2021
7491a03
code cleanup
clay-ol Mar 16, 2021
af8c426
removed references to 'cities.csv'
clay-ol Mar 16, 2021
763bc16
choropleth now updates when clicking on different modes - need to adj…
clay-ol Mar 16, 2021
dc33122
fixed key errors in data selection
clay-ol Mar 16, 2021
e5debca
added separate color domain for tempo
clay-ol Mar 16, 2021
37af87f
added massive timeline/week-picking functionality, dynamic formatting…
hcaouette Mar 17, 2021
4b8ffa8
initial version of tooltip added
clay-ol Mar 17, 2021
3b96840
added fixed precision of tooltip
clay-ol Mar 17, 2021
fb6046c
reformatted and colored tooltip
clay-ol Mar 17, 2021
3287779
adjusted scaling and translation of choropleth
clay-ol Mar 17, 2021
0988a65
fixed bug where a country without a week entry would result in data l…
clay-ol Mar 17, 2021
d728c01
added change that renders tooltip val as percent
hcaouette Mar 17, 2021
dcadad7
added headers with project name
clay-ol Mar 17, 2021
2ff1d62
changed tooltip to only use percents for values other than tempo, co…
clay-ol Mar 17, 2021
60136b2
made map_svg bigger, adjusted default zoom and offset
hcaouette Mar 17, 2021
9b57ccd
Merge branch 'main' of https://github.com/hcaouette/final into main
hcaouette Mar 17, 2021
51000ac
added section for results + conclusions
clay-ol Mar 17, 2021
7fb7261
updated readme
clay-ol Mar 17, 2021
9d677b7
refactored code and restructured
clay-ol Mar 17, 2021
5ae42e2
added example function call
clay-ol Mar 17, 2021
d4d435b
updated formatting
clay-ol Mar 17, 2021
529e111
Added some sections to process book
clay-ol Mar 17, 2021
e03afdd
dynamic color ranges by modality
hcaouette Mar 17, 2021
6e4c297
Merge branch 'main' of https://github.com/hcaouette/final into main
hcaouette Mar 17, 2021
d7c9a58
changed color for choropleth to green
clay-ol Mar 17, 2021
bb713d7
basic legend
clay-ol Mar 17, 2021
f70b72f
updated Implementation section
clay-ol Mar 17, 2021
020880e
moved modality legend to #modality
hcaouette Mar 17, 2021
e82d27f
Merge branch 'main' of https://github.com/hcaouette/final into main
hcaouette Mar 17, 2021
c34e9d0
added more pictures to implementation
clay-ol Mar 17, 2021
32f5e40
Merge branch 'main' of https://github.com/hcaouette/final into main
clay-ol Mar 17, 2021
66084e1
added audio feature definitions
clay-ol Mar 17, 2021
169d318
updated data sources, related work, and design evolution
alescion Mar 17, 2021
4894c6e
added project link
clay-ol Mar 17, 2021
978f28c
added scraping section
clay-ol Mar 17, 2021
760ec07
added section on data
clay-ol Mar 18, 2021
ff21f18
added Qestions section
clay-ol Mar 18, 2021
d25af78
updaated related work section
clay-ol Mar 18, 2021
c6f35d9
updated Related Work section
alescion Mar 18, 2021
4abe50c
disabled play_viz button due to incomplete animation feature
hcaouette Mar 18, 2021
ec835fe
'very important'
hcaouette Mar 18, 2021
2c008dd
Merge branch 'main' of https://github.com/hcaouette/final into main
hcaouette Mar 18, 2021
54b06ff
updated related work
clay-ol Mar 18, 2021
601d1b8
Merge branch 'main' of https://github.com/hcaouette/final into main
hcaouette Mar 18, 2021
6e02239
this is also 'very important'
hcaouette Mar 18, 2021
2e667fe
still 'very important'
hcaouette Mar 18, 2021
7172911
Update ProcessBook.md
alescion Mar 18, 2021
a777e18
added results to index.html
clay-ol Mar 18, 2021
1c76216
refactoring python folder structure
hcaouette Mar 18, 2021
fc1bb38
Merge branch 'main' of https://github.com/hcaouette/final into main
hcaouette Mar 18, 2021
7247cd2
cleaned up README
alescion Mar 18, 2021
a380e4e
Merge branch 'main' of https://github.com/hcaouette/final into main
alescion Mar 18, 2021
5d686e6
exploratory
clay-ol Mar 18, 2021
0473599
broke spacing
clay-ol Mar 18, 2021
7f66e51
added valence dashboard to process book
hcaouette Mar 18, 2021
d88b17b
merging processbook
hcaouette Mar 18, 2021
c388fae
Create demo.mp4
alescion Mar 18, 2021
ddfc949
linked to video
hcaouette Mar 18, 2021
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
The table of contents is too big for display.
Diff view
Diff view
  •  
  •  
  •  
The diff you're trying to view is too large. We only load the first 3000 changed files.
353 changes: 353 additions & 0 deletions Data_Processing/main.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,353 @@
import argparse
import requests
import ssl
import spotipy
from spotipy.oauth2 import SpotifyClientCredentials
import shutil
import os
import csv
import json
import pandas as pd
import pycountry_convert as pc
basepath = os.path.abspath(os.path.dirname(__file__))
path= basepath + "/chart_scrapes"


def main(function,overwrite_old):
# print("running function ",function)

weeks = csv_to_list( os.path.join('Datasets','reduced_weeks.csv') )
countries = csv_to_list( os.path.join('Datasets','countries.csv') )

if(function == "scrape"):
scrape(weeks, countries)
elif function=="stitch":
stitch(weeks, countries)
elif function=="flatten":
flatten()
elif function=="meta_scrape":
meta_scrape()
elif function=="meta_flatten":
meta_flatten()
elif function=="unique_songs":
unique_songs()
elif function=="meta_add":
meta_add()
elif function=="weekly_meta_calc":
weekly_meta_calc(weeks, countries)
elif function=="find_minmaxes":
find_minmaxes()
else:
assert False, f"failed to run function {function}"

def csv_to_list(path):
out_list = []
with open( path, newline='') as csvfile:
reader = csv.reader(csvfile, delimiter=',')
for row in reader:
out_list.append(row[0])
return out_list


def scrape(weeks, countries):
print("running scrape")
for country in countries:
# create dir for country if non-existent
country_path = os.path.join("chart_scrapes",country)
if os.path.exists(country_path) == False:
os.makedirs(country_path)

# iterate through each week for the current country
for week in weeks:
# format the request url and make the request
url = "https://spotifycharts.com/regional/%s/weekly/%s/download" % (country,week)
r = requests.get(url,allow_redirects=True)

# format filename and write request's return to file
filename = week+country+'.csv'
writeFile = os.path.join("chart_scrapes",country,filename)
with open(writeFile, 'wb') as f:
f.write(r.content)
# done with country
print("just finished getting data for ",country,"!")



def stitch(weeks, countries):
print("running stitch")
data={}
with open( "Datasets/songs.csv", 'w') as csvfile:
songwriter = csv.writer(csvfile)
params=['chart_position','track','artist','streams','url']
# songs = pd.DataFrame(columns = params )
for country in countries:
print(country)
for week in weeks:
# for date in range(len(dates)):
# c_date=str(dates[row][0])
day_file = os.path.join(path,country,week+country+".csv")
# day_file=path+"/"+country+"/"+week+country+'.csv'
# print(day_file)
titles={}
with open(day_file, newline='', encoding="utf8") as csvfile:
reader=csv.reader(csvfile,delimiter=',')
line1=next(reader)
line1_item=str(line1[0])
if(line1_item=='<!doctype html>'):
print('true')
continue
next(reader)
for row2 in reader:
song={}
song['chart_position']=row2[0]
song['track']=row2[1]
song['artist']=row2[2]
song['streams']=row2[3]
song['url']=row2[4]
titles[row2[0]]=song
songwriter.writerow(([row2[0],row2[1], row2[2], row2[3], row2[4]]))

data[week]=titles
data[country]=country
titles={}


def flatten():
print("running flatten")

def meta_scrape():
print("running meta_scrape")
c_id = 'bf02bcb1126f4363b3a4a057c623d182'
c_secret = '6c563e24a4ff4700a2e25adc0a662d80'

birdy_uri = 'spotify:artist:2WX2uTcsvV5OnS0inACecP'
spotify = spotipy.Spotify(client_credentials_manager=SpotifyClientCredentials(client_id=c_id, client_secret=c_secret))


# Meta Data that will be harvested for each song:
# Top Genre, Year, BPM, Energy, Dance, loudness, liveness, valence, mode, speechiness, acousticness, instrumentalness, tempo, duration_ms
urlLists = []
metaList = {}
urlList = []
# with open('Datasets/unique_songs.json', encoding="utf8") as f:
# dict= json.load(f)
with open("Datasets/unique_songs.csv") as f:
reader = csv.DictReader(f, delimiter=',')
for row in reader:
urlList.append(row['url'])
urltempList = urlList
# print(urlList)

x = 0
y = len( urltempList )

for i in range(x, y,100):
print(i)
trackMetas = spotify.audio_features( urltempList[x:x+100] )
for row in range(len(trackMetas)):
metaList[urltempList[x+row]]=trackMetas[row]
x+=100

# write song meta data to a json for further use
with open( 'Datasets/songMeta.json', 'w' ) as outfile:
json.dump( metaList, outfile )

def meta_flatten():
print("running meta_flatten")
with open('Datasets/songMeta.json', encoding="utf8") as f:
dict=json.load(f)


flatWriter = csv.writer(open('Datasets/flat_meta.csv', 'w', newline=''), delimiter=',')
flatWriter.writerow(["track_ref", "danceability", "energy", "key", "loudness", "mode", "speechiness", "acousticness", "instrumentalness",
"liveness", "valence", "tempo","type", "id", "uri", "analysis_url", "duration_ms", "time_signature"])
for record in dict:
print(dict[record])
if dict[record] == None:
continue
else:
track_ref=[record][0]
danceability=dict[record]['danceability']
energy=dict[record]['energy']
key=dict[record]['key']
loudness=dict[record]['loudness']
mode=dict[record]['mode']
speechiness=dict[record]['speechiness']
acousticness=dict[record]['acousticness']
instrumentalness=dict[record]['instrumentalness']
liveness=dict[record]['liveness']
valence=dict[record]['valence']
tempo=dict[record]['tempo']
type=dict[record]['type']
id=dict[record]['id']
uri=dict[record]['uri']
analysis_url=dict[record]['analysis_url']
duration_ms=dict[record]['duration_ms']
time_signature=dict[record]['time_signature']

flatWriter.writerow([track_ref,danceability,energy,key,loudness,mode,speechiness,acousticness,instrumentalness,liveness,valence,tempo,type,id,uri,analysis_url,duration_ms,time_signature])
def unique_songs():
print("running unique_songs")
allSongs = pd.read_csv("Datasets/songs.csv")
print("Length: " + str(allSongs.size ))
uniqueSongs = allSongs.drop_duplicates(['track','artist'],keep='last')
print("Length: " + str(uniqueSongs.size ))
uniqueSongs.to_csv("Datasets/unique_songs.csv")

def meta_add():
print( "Adding metadata to unique songs")
uniqueSongs = pd.read_csv("Datasets/unique_songs.csv")
# print(uniqueSongs.head())
metaData = pd.read_csv("Datasets/flat_meta.csv")

# print(metaData.head())
songsWithMeta = uniqueSongs.join(metaData.set_index('track_ref'), on='url')
# print(songsWithMeta.head())
songsWithMeta.to_csv("Datasets/uniqueSongsWithMeta.csv")

def weekly_meta_calc(weeks, countries):
songsMeta = pd.read_csv("Datasets/flat_meta.csv")
print(songsMeta.head())
country_data = []
country_key = {}
for country in countries:
print(country)
flatWriter = csv.writer(open('Datasets/Countries/' + country + '/average.csv', 'w', newline=''), delimiter=',')
flatWriter.writerow(["week", "danceability", "energy", "key", "loudness", "mode", "speechiness", "acousticness", "instrumentalness",
"liveness", "valence", "tempo", "duration_ms", "time_signature"])
country_path = os.path.join("Datasets/Countries",country)
if os.path.exists(country_path) == False:
os.makedirs(country_path)
data = []
dataJSON = {}
for week in weeks:
day_file = os.path.join(path,country,week+country+".csv")
# print(str(day_file))
with open(day_file) as f:
first_line = f.readline()
# print(first_line)
if '<!doctype html>' in first_line:
continue
else:
weekData = pd.read_csv(day_file, header=1)
# print(weekData.head())
weekWithMeta = weekData.join(songsMeta.set_index('track_ref'), on='URL')
weekWithMeta.reset_index(drop=True, inplace=True)
# print(weekWithMeta.head())
outputPath = os.path.join(basepath,"Datasets/Countries", country, week+country )
weekWithMeta.to_csv(outputPath+"_meta.csv")

weekAverages = weekWithMeta.mean()
outputPath = os.path.join(basepath,"Datasets/Countries", country, "average_"+country )
valueList = [week,weekAverages["danceability"],weekAverages["energy"],weekAverages["key"],
weekAverages["loudness"], weekAverages["mode"],weekAverages["speechiness"],weekAverages["acousticness"],
weekAverages["instrumentalness"], weekAverages["liveness"],weekAverages["valence"],weekAverages["tempo"], weekAverages["duration_ms"],weekAverages["time_signature"]]
flatWriter.writerow( valueList )
data.append(
{
"week":week,
"danceability":weekAverages["danceability"],
"energy":weekAverages["energy"],
"key":weekAverages["key"],
"loudness":weekAverages["loudness"],
"mode":weekAverages["mode"],
"speechiness":weekAverages["speechiness"],
"acousticness":weekAverages["acousticness"],
"instrumentalness":weekAverages["instrumentalness"],
"liveness":weekAverages["liveness"],
"valence":weekAverages["valence"],
"tempo":weekAverages["tempo"],
"duration_ms":weekAverages["duration_ms"],
"time_signature":weekAverages["time_signature"]
}
)
dataJSON[week] = {
"danceability":weekAverages["danceability"],
"energy":weekAverages["energy"],
"key":weekAverages["key"],
"loudness":weekAverages["loudness"],
"mode":weekAverages["mode"],
"speechiness":weekAverages["speechiness"],
"acousticness":weekAverages["acousticness"],
"instrumentalness":weekAverages["instrumentalness"],
"liveness":weekAverages["liveness"],
"valence":weekAverages["valence"],
"tempo":weekAverages["tempo"],
"duration_ms":weekAverages["duration_ms"],
"time_signature":weekAverages["time_signature"]
}

weekAverages.to_csv(outputPath + "_average_meta.csv")
country_data.append(
{"country":country,
"Alpha_3" : pc.country_name_to_country_alpha3(pc.country_alpha2_to_country_name(country.upper())),
"data":data}
)
country_key[pc.country_name_to_country_alpha3(pc.country_alpha2_to_country_name(country.upper()))] =dataJSON


with open('Datasets/Master_JSON.json', 'w') as json_file:
json.dump(country_data, json_file)
with open('Datasets/country_key.json', 'w') as json_file2:
json.dump(country_key, json_file2)


def find_minmaxes():
print("running find_minmaxes")
with open('Datasets/country_key.json', encoding="utf8") as f:
dict=json.load(f)

modalities = dict["AND"]["2020-09-04--2020-09-11"].keys()
# print(modalities)
mm_dict = {}#dict of modalities
for m in modalities:
if(m == "temp"):
mm_dict[m] = {"min":200.0, "max":0.0}
mm_dict[m] = {"min":1.0, "max":0.0}

print(mm_dict)

for country in dict:
for week in dict[country]:
for modality in dict[country][week]:
if dict[country][week][modality] < mm_dict[modality]["min"] :
mm_dict[modality]["min"] = dict[country][week][modality]
# print("NEW MIN: country:",country,", week: ",week,", modality: ",modality)
if dict[country][week][modality] > mm_dict[modality]["max"] :
mm_dict[modality]["max"] = dict[country][week][modality]
# print("NEW MAX: country:",country,", week: ",week,", modality: ",modality)

#loudness is negative, swap positions
tmp = mm_dict['loudness']["max"]
mm_dict['loudness']["max"] = mm_dict['loudness']["min"]
mm_dict['loudness']["min"] = tmp
print(mm_dict)
with open('Datasets/mode_domains.json', 'w') as json_file:
json.dump(mm_dict, json_file)



def parser():
arg_parser = argparse.ArgumentParser()
arg_parser.add_argument(
"--function",
type=str,
required= True,
choices=["scrape","stitch","flatten","meta_scrape","meta_flatten", "unique_songs", "meta_add", "weekly_meta_calc","find_minmaxes"],
help="feature type 1 to extract, either audio, text or gps"
)
arg_parser.add_argument(
"--overwrite_old",
type=bool,
required= False,
default= False,
choices=[True,False],
help="whether or not to overwrite old spotifycharts scrape data"
)
return arg_parser


if __name__ == "__main__":
print("\nrunning spotify feature processing utility")
args = parser().parse_args()
main(args.function, args.overwrite_old)
23 changes: 23 additions & 0 deletions Data_Processing/old/flatten.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,23 @@
import json
import os
import csv


with open('Datasets/dict.json', encoding="utf8") as f:
dict=json.load(f)

flatWriter = csv.writer(open('Datasets/flat_dict.csv', 'w', newline=''), delimiter=',')
for record in dict:
date=[record][0]
for song in dict[record]:
pos=dict[record][song]['position']
track=dict[record][song]['track']
artist=dict[record][song]['artist']
streams=dict[record][song]['streams']
url=dict[record][song]['url']

try:
flatWriter.writerow([date,pos,track,artist,streams,url])
except UnicodeEncodeError:
print('contained characters not compatible with csv file')
continue
Loading