My name is Devansh Trivedi and this is the log of my progress in the 100 Days of ML code challenge
I'm following guidelines from Siraj Raval: Click on the thumbnail below to watch the video
The idea is simple:
- Pick an industry
- Find a Problem
- Locate a Dataset
- Apply AI to Data
- Create a Solution
Find my everyday progress at this Twitter Moment
is this things on?
*microphone screeches*
*Looks at @SirajRaval @ @SchoolOfai*
*clears throat*
I'm publicly pledging to the #100DaysOfMLCode challenge starting today!
Learn more @ DVNSH.com — Devansh Trivedi (@devanshRtrivedi)
September 17, 2018
✅ Step 1: Picked Music Industry
✅ Step 2: Found a Problem: ProActive Music Selection
✅ Made Word-Pair Frequency Graph in Neo4j using Cypher language
Used Starboy lyrics csv
This idea comes from a movie as old as me, Flubber (1997).
Robin Williams plays 'The Professor' with this cute flying robot 'Weebo'. It uses memes to communicate
It always has the perfect cartoon/clip for the occasion. I felt such need for sound effects in my life.
Some songs are perfect for some moods and contexts.
Hopefully, I will be able to code an AI so everyone can train it to their taste, and experience music effortlessly.
The idea is simple: The song should match the environment. Initially, I will try Lyrics Text Analysis to match the lyrics with the words spoken.
Then Linguistic Analysis of song lyrics will help me detect and interpret emotions, social tendencies, and language style to analyze emotions and feelings that musical artists express in their songs.
Link to work: StarBoy Repository
✅ Coded together a Python script with officially supported 'idiomatic' Neo4j Python driver v1 to use GraphDatabase
NLP is possible to implement natively in Node4j using Cypher queries.
However, the ability to access it from python will open doors to many new possibilities
Link to work: StarBoy.py
✅ Fetch word frequencies into a Python dictionary
To achieve this, I'm using Auto-commit transactions in the driver's session.
I concluded that a dictionary would be the right data structure to store the frequencies of each word
Link to work: Commit with changes
✅ Word-Pair Frequency into Python Dictionary with tuples as keys and count as values
I made the variable name easier to read.
Now that I have the Text Adjacency Graph, I can go ahead for the Mining Word Associations
Link to work: Commit with changes
✅ Implemented Left1, Right1, Count Left1 and Right1, Find highest Left1 and Right1
✅ Uploaded starboy.csv
✅ Created a list of queries on Github repository for everyone else to try the same
I was amazed by the fact that the grammar articles "the" and "a" came out to be the most frequent Adjacent nodes in the graph. I'm not sure how implementing Left2 and Right2 will help me, and how to implement those the right way.
✅ Mining Paradigmatic Word Associations using Jaccard Index to compute similarity
Very successful, many interesting discoveries
Yet I'm still surprised and sad that the AI could not find similarity of the word "starboy"
Link to work: Commit with changes
✅ Graph based Summarization and Keyword Extraction
I'm grateful for, yet disappointed by TextRank
I will now work on comparing my results with the environment
Commit: Queries in starboyGraphQueries.cql
TextRank Python code

✅ Working on a Jupyter notebook on Content Recommendation
I now need datasets more than ever.
MusixMatch has
43 million tracks
14 million lyrics
Link: Jupyter Notebook
✅ Designing 'The Song Attribute Graph' data model and schema for Content Based filtering and Recommendation
I'm advocating content based recommendations at this stage of the project
Today I had to guess what parameters can be taken into account to find patterns that match the environment
To keep my project simple, I have chosen Release Date, Featuring Actor, Genre AND Keywords
Link to work: Commit with Queries
✅ Created python script to generate User Preference and Song Attributes Graph for Content-based Recommendations
✅ Signed up for MusixMatch API, got API key and Studied its documentation
I wanted to create separate functions for queries to add user, add song, add metadata and link. But because of the limitation of query variables, their scope only exists with the query. So I had to create a common function. If I still wanted to create separate ones, I'd need to MATCH them before making associations to a new variable, which would further complicate the code. I will have a workaround in Python to remedy this by creating query templates and loops.
✅ Fetched Top Artists using chart.artists.get
Following APIs are paid: track.search track.subtitle.get matcher.subtitle.get
✅ Updated Python script to Iterated over names of Top Artists in #India from JSON response
Now I will combine both python files : The one that fetches data and the one that fires queries to generate nodes in the graph database to associate data.
Link to work: Commit to python code
✅ Another API call to get songs of Top Artists in India
✅ Show these associations in the neo4j Graph Database
I need more metadata, as I couldn't get some info about tracks from MusixMatch API
Link to work: That new python file I thought about making yesterday
✅ Fixed Top Artists for India (country parameter was for US) API call
✅ Added Top Tracks in India list
track.snippet.get didn't work for starboy track_id 144134659, not even in the playground
I'm planning to use the following to compute relevance:
chart.tracks.get
chart.artists.get
matcher.track.get
track.get
artist.albums.get
matcher.lyrics.get
track.lyrics.get
Link to work: Commit to the Fix
✅ Made a function to handle APIs better
✅ Combined approach: Finding Best songs OF Best artists
I tested lyrics by watching the Starboy video on Youtube with MusixMatch Chrome Extension and it worked fine.
This proves that they DO have the lyrics with them.
But I'm not able to find the lyrics programmatically.
Now I will try to get metadata, eventually reflecting those in the Graph.
Link to work: Commit with changes
✅ Convert listOfTracks to a dictionary with track_name, track_id as values
✅ Fetch and Parse the Release Date
Now I have the Release Date Metadata, but that's not it. A Date Object is convenient in Python. I'm not sure it's a good move to store that information as Node in Graph. I might have to come up with a Hybrid approach if I can't sort it out tomorrow.
Link to work: Commit with changes
✅ Dropped the getReleaseDateFromTrackId function in favour of iterating from data in Top Tracks which already had the dates
The function I defined was taking too long to execute, as it was fetching track details for every top artist's every single top track. That's too many API calls (about 100, depending on number of items).
After inspecting data from previous API calls, I noticed that if I could iterate into the nested dictionary inside tuple inside list inside the callback (phew)!
Link to work: Commit with changes
✅ Take Top Tracks in the geographical place as Training Data for AI
At this point my project seems to be leaning towards Collaborative filtering based on other users' activity
Link to work: Commit with changes
✅ Finally managed to fetch lyrics of Top Songs in #India songs to train the AI
✅ Made API method much better
I'm not able to fetch the whole lyrics. I'll work with what I have now.
I'm planning to take Top Tracks of India as Training Data and use Top Artists' Songs to find recommendations.
Link to work: Commit with changes
✅ Cleared other parts in the lyrics, so it's much more readable now (Compared to yesterday)
✅ Refactored the code
This is where my previous work about generating weighted keywords can pitch in to UNDERSTAND the lyrics.
Link to work: Commit with changes
✅ New Python file to parse the #StarBoy lyrics
✅ Removed background vocals in brackets
Genius.com resources helped understand parts of the lyrics
The data in my file is analogous to the csv file that I manually generated at the beginning of the project. Getting lyrics is now officially automated !!
Link to work: Commit to New File
😭 Failed to import line by line lyrics into @neo4j (without using CSV)
Tried doing this for 4 Hours. It's too complicated.
✅ Michael Hunger helps me import the data to the graph
✅ Mining Paradigmatic Word Associations in lyrics from API calls
Today's date is 10 on 10 (10/10), and so was my progress.
As I was wondering why the word frequency from the API fetched lyrics didn't match that of the CSV lyrics, it was comforting to recollect that I don't get FULL lyrics from the free account. Hence the difference.
Despite that, my mining was very accurate.
I was tempted to use an iterator for the word-by-word query but when it didn't work, so I sticked with coding that part from the scratch.
Link to work: Commit with changes
✅ Create nodes from words of ALL top tracks in india
I tried to create nodes with separate labels for each song, but it seems that's not possible. So now I will have to mine each of the tracks one-by-one inside the same loop instead of batch mining !
Link to work: Commit Creating 'Mining Top Tracks in india.py
✅ Adding names of songs as Labels using APOC
Now I can batch-mine all songs in the graph, and know which song a particular cluster of nodes belongs to :D
Link to work: Commit with changes
✅ Mining from Lyrics of Songs
Some words are repeated as nodes. I will need to debug this, as most weighted pairs are the same words in the results
Link to work: Commit with changes
👀 Deep inspection of the mined data
I found multiple nodes(at least two) with same words
Something is wrong. I'm expecting unique nodes.
Link to image: pic on Twitter
✅ Refactor the Mining process
After refactoring the mining process, another instance of the mined graph indicates that the issue has prevailed.
Link to work: Commit with changes
✅ Underscored Variable names for readability
🕵🏻♂️Found issues in parsing of lyrics
I have no idea of the weird unexpected codes I'm looking at. This has never happened before.
Link to work: Commit with changes
✅ Implemented query and calls with a different approach to prevent duplicate nodes before Mining
Finally it works as intended, can't wait to fetch and display keywords mined from each song now !
Link to work: Commit with changes
⚙️ Importing weighted keywords from mined graph
The script is taking significant time to execute
Link to work: Commit with changes
⚙️ Trying different approaches to read Response code from the BoltStatementResult object
I really wish I could just iterate on it!
Link to work: Commit with changes
✅ Learned Pypher
✅ Finally imported from Graph (though strangely all weights are 1.0)
I couldn't find how Pypher could help me access records from the graph, so I implemented the code without it
Link to work: Commit with changes
✅ Recreated the same problem with Starboy Lyrics
This is really interesting, and I suspect this is due to the way I'm passing words to the graph
Link to work: Commit with changes
✅ Create CSV file to store lyrics and FIRE the original Cypher Query for proper mining to get expected weight range
I hope to get a Comma-Separated-LINES instead of Comma-Separated-WORDS tomorrow
Link to work: Commit with changes
✅ Exported lyrics into a CSV
Interestingly, some lines were blank
Link to work: Commit with changes
✅ Imporved Lyrics cleaning using loops
✅ CSV export is tip top, no blank lines now
The number of nodes surprisingly increased after removing blank lines
Link to work: Commit with changes
✅ Export Starboy lyrics csv directly to the path of graph import directory
I wish there was a way to FIND OUT where is the current graph located using the driver
Link to work: Commit with changes
✅ Mine the lyrics imported from CSV
✅ Store the result in a Python dictionary
Found '0' values for words...
I checked the CSV file, and it didn't have any '0' in it !
Link to work: Commit with changes
✅ Replicate the successful mining in Starboy in the loop of 'Mining Top Tracks in india' script
My code is right, but the Database throws Illegal Character error for songs that have SPACE in their name
Link to work: Commit with changes
😿 Encoding the File Name didn't help
Feels like I'm hitting a wall here
Link to work: Commit with changes
✅ Fix file encoding issue
"Cannot merge node using null property value for word"
I removed the last empty line in the CSV manually and checked for other NULL values with Python.
Starboy lyrics had null and still worked fine.
Link to work: Commit with changes
✅ Sentiment Analysis using TextBlob
Graph based Mining was fancy, but on hiatus for now, until I figure out how to import from CSV without that NULL error
Link to work: Commit with changes
✅ Sentiment polarity to Percentage
✅ Fixed script breaking when @MusixMatch doesn't have lyrics
✅ Only showing lines with non-zero sentiment
Sometimes, MusixMatch doesn't have lyrics, in which case it sends an empty list instead of dictionary. So that was breaking my script today (even though it worked fine yesterday, as it had lyrics for all top songs in india of yesterday's charts). So I had to put a condition to make sure the script doesn't fail
I'm not showing lines and songs that don't yield a sentiment (i.e. polarity is 0)
Link to work: Commit with changes
✅ Test Language Detection
✅ Test NLTK Part Of Speech (POS) tagging
Kamariya is controversial. It's based on a Gujarati song, but the song in Hindi. So detecting it as either is good enough; I'm happy it detected it as Gujarati.
However, Language Detection was not accurate for all songs. Proper Patola is in Punjabi, yet it was detected as Gujarati. I will see if I can verify it with data from the API.
- Basic:
- N Noun
- V Verb
- ADJ Adjective
- ADV Adverb
- P Preposition
- CON Conjunction
- PRO Pronoun
- INT Interjection
- Advanced
- CC coordinating conjunction
- RB adverbs
- IN preposition
- NN noun
- JJ adjective
- VBP present tense verb
Link to work: Commit with changes
✅ Compare Language Detection result with data from API
Initially, I should mine tracks that are in English according to both: API and Language Detection result
Link to work: Commit with changes
✅ Restrict mining to english by detecting the language and verifying the detection with labeled data from API
I will now work on finding all the ways to compare environmental factors with lyrics in English language
Link to work: Commit with changes
✅ Sad percentage value should be absolute, not negative
✅ Naive Bayes Tokenization and comparison with Pattern Analyzer
I would tolerate percentage difference, but in some cases the polarity is totally opposite. I will need to listen these songs, read lyrics and manually verify if the sentiments are consistent with the algorithms (which can be subjective).
Link to work: Commit with changes
✅ Use @IBM Watson from BlueMix Console for Natural Language Understanding of 'Entities and Keywords'
I will need more time to go through the documentation for taking full advantage of Watson's services
Link to work: Commit with changes
✅ Line-By-Line comparison of Sentiment Analysis Result from PatternAnalyzer and NaiveBayesAnalyzer (an NLTK classifier)
I will use IBM Watson's Natural Language Understanding only when there is conflict between these two (One thinks sad, another happy)
Link to work: IBM NLU.py
✅ Emotional Understanding by IBM Watson of The Weeknd song 'Starboy'
I wish I could find out why Watson thinks Jamie Foxx has something to do with this song.
Link to work: Commit with changes
✅ Sentiment Analysis head to head showdown "Pattern Analyzer" vs. "Naive Bayes" judged by our chief guest IBM Watson
This is hard to judge as I'm not getting full lyrics from the MusixMatch API
Link to work: Commit with changes
✅ Deeper insights from IBM Watson from full lyrics of The Weeknd 'Starboy'
This is very useful. I think I should stick with IBM for understanding both - enviroment and song
Link to work: Commit with changes
✅ Search for Song ID in @Genius.com database
✅ Use song ID to fetch description of the song
✅ Sentiment Analysis of : Lyrics + Description
Implementing OAuth2 Access Token based Authorization was really challenging
Link to work: Commit with changes
✅ Scrape Full lyrics from @Genius.com
✅ Perform Sentiment Analysis on it
✅ Change Country from india to USA as I only want to mine English songs
Didn't get any output from the script as I had configured it to only mine English songs, and all top songs happened to be non-English today.
I'm not sure if scraping is allowed, I didn't find anything against it in the official documentation
Will keep MusixMatch as a fallback in case I can't find lyrics from Genius.com
Link to work: Commit with changes
✅ Get lyrics from #PoweredByMusixMatch when not available in @Genius
✅ Extract Noun phrases from the blob
✅ Change Country back to India for testing
I also did some refactoring that made the script faster
I will need to decide how to deal with Language labels coming from different APIs
Maybe I can match noun phrases with data from environment
Link to work: Commit with changes
✅ LIVE Watson Speech-To-Text from Microphone
✅ Submit Pull Request to update their example script from docs
Tried Google Speech to Text API but enabling the service requires connecting a Payment method that has automatic recurring payments support My card doesn't have that
Link to work: Pull Request
✅ IBM Watson continuous Speech-To-Text transcription from Microphone using WebSocket
"Now the output is much more neat"
We're not shaking hands yet
Link to work: sttibm.py
✅ Setup a Configuration file for my API keys and authentication passwords in so that they don't appear in the Github
I will need to edit all the scripts to use the config file
I will also release a template for others to edit and be able to use the scripts using their credentials
Link to work: config.py
✅ Update deprecated Neo4j driver
✅ Modify each script to use the config file for credentials
✅ Analyze Taki Taki and summarize using TextRank
I was surprised to find some phrases when I was coding AI to understand Lyrics of Songs
- I'm learning Spanish
- I have respect for their culture
- I just watched TakiTaki Song's Video
Now I do CARA PALMA (FacePalm: 🤦🏻♂️)
Disclaimer
I'm disappointed by the message, not music
Links to work: Commits Today
✅ Dynamically add method to Graph class using Decorator
✅ Added Modularity by making Modules for Graph and MusixMatch
I did major refactoring today.
I'm thinking about putting IBM code in a separate module too
Link to work: Commits Today
✅ Detect words in lyrics from the environment using Speech-To-Text from @IBMWatson
It detected "week"
but not "weeks"
Tomorrow: Lemmatization
Link to work: MatchSong.py
✅ Clean up lyrics from Genius.com by removing section names
✅ LIVE Lemmatization of Speech To Text as well as the lyrics
Now detects "weeks" !
Link to work: Commit with changes
✅ Federated search for song's lyrics and metadata
✅ Standard module to repeat the process for Top Tracks in other countries
I'm finding it very hard to resist refactoring.
Link to work: TopLyrics.py
✅ Lower case strings in word list for comparison
✅ Detect which song is the word from
Sometimes, (like in Taki Taki), Genius.com scarping gives something unexpected instead of Lyrics... Like a list of artists and songs. I'll need to make sure that I'm always getting lyrical text corpus in the python dictionary for lemmatization to make accurate matches.
Link to work: Commit with changes
✅ Fixed Lyrics fetching from Genius.com
✅ Gestalt Pattern Matching to find closest matches from search results
Just read the Terms of Service and I think it's not 'OKAY' to scrape Genius.com lyrics, even for NLP research purpose. I'm going to find a legit way to get lyrics.
However, today's hard work paid off to find closest match from OFFICIAL Genius API search query.
I tried few cut-off values and referred to this Stack Overflow question for code reference
Link to work: Genius.py
✅ Request Musixmatch and Genius.com for Lyrics
✅ Reach out to other Lyrics providers and read their TOS
✅ Ask for support from Google for STT
I'm empty and thirsty. Without lyrics to work with, I'm afraid I can't recommend songs.
I'm considering other dimensions of 'Music' like analysing timbre (Psychoacoustics).
✅ Last.FM API
I now have a worldwide, royalty-free, non-exclusive licence from Last.FM
Link to work: LastFM.py
✅ Using PIL (Python Image Library) to show image from Terminal
This will be useful to show the Cover Art for the Albums or Tracks
Link to work: imageCode.py
✅ Find user's Geo Location using Google Maps API
I'm only getting Geographical Coordinates. I'll need to convert it into country names and codes to fetch lyrics accordingly
Link to work: locateUser.py
✅ import Flubber Google Cloud App to DialogFlow for Google Assistant surfaces
✅ Custom intent with Training phrases for sentiment analysis
I tried to fix the Google Maps API but it is still very unreliable.
I hope to set up continuous listening in DialogFlow


✅ Build fulfilment directly in DialogFlow via Cloud Functions for Firebase
Now, I can execute a function for the Default Fallback Intent


✅ Setup Firebase
firebase-client didn't work, I had to use firebase-tools
I might need to export and import the data as the Fallback function will be coded in JavaScript



✅ Deploy a DialogFlow fulfilment cloud function using the Firebase client



✅ Using TKinter 'Pack' Geometry Management Method to display text on Graphical User Interface
I should make a gui for NLTK too.
That would help people learn NLP faster!
Link to work: interface.py
✅ Display image on GUI using Python 3 PhotoImage object
Link to work: Commit with changes
✅ Display Top 10 Tracks by country in GUI
✅ Use MacOS Native theme
✅ Object Oriented code for easy import
Link to work: Commit with changes
✅ Tree View implementation for Display of top tracks
Link to work: Commit with changes
✅ 'Track List' is now the heading inside the box
Link to work: Commit with changes
✅ Fixed Background
Link to work: Commit with changes
✅ Window has title 'Flubber'
Link to work: Commit with changes
✅ Fetch button shows "Fetching..." intermediate state
✅ Show Lyrics for each song
Link to work: Commit with changes
✅ Display properly formatted lyrics
Link to work: Commit with changes
✅ Make Sentiment Polarity to Percentage conversion code modular for importing into the interface with Functional Programming paradigm
I'll prefer to use the Functional Programming approach primarily in Python too, just like I do in JavaScript
Link to work: Commit with changes
✅ GUI for Sentiment Analysis by IBM Watson Emotion options
Link to work: Commit with changes
✅ Interactive Sentiment Analysis by IBM Watson Emotion for Lyrics of Top songs by country
























































































































































































