Have you ever made a playlist or mixtape and are stuck on the order to put the songs in? Maybe we can learn from different spotify users what makes a good playlist. The best playlists have a good flow. This is what separates a good DJ from a bad DJ, given they have the same tracks and technical aptitude. Build-ups and break-downs make for an interesting experience, and it’s more than just picking the most similar song to the last one.
Deep Sequential Content Optimization or "DISCO"
- Ordered recommendations using recurrent nerual networks.
- The main focus of this project is a content-based algorithm that would sit on top of a layer of collaborative filtering.
- Recommendation Systems
- Sequence Learning
- Recurrent Neural Networks
- Computational Music Theory
- Spotify API
- Keras
- Plotly
- pipeline.ipynb - This is the algorithm in action with a full pipeline of transformations and predictions to build playlists.
- /cloud/model.ipynb - RNN trained on Amazon SageMaker
- /data-wrangling/preprocessing.ipynb - the majority of data preprocessing and EDA is here. This is also where PCA and scalers are trained.
- A sub-set of songs is selected using collaborative filtering or a simple query based on subgenre. I'm using Spotify's Api to select roughly 200-400 songs.
- A recurrent neural network determines the ideal feature vector for the next song based on the previous sequence of songs.
- The next song is selected based on minimum loss from the sub-set selected in step 1. The loss function is determined based on the distance from a song to the ideal feature vector as well as the consonance of song key transition and similarity of tempo. This is a greedy algorithm which does not consider whether the song might better fulfill the objective function better later in the sequence.
- Next song is plugged into the RNN and the process repeats from step 2 until the playlist is a satisfactory length.
- Inital Data...
- 15,918 users
- 157,504 playlists
- 2,032,044 songs
- The data used
- Very large and very small playlists removed
- Things like “liked from radio” dropped
- Used that to build search strings and hit spotify’s API for like literally a week straight
- Training Data for RNN is a 72051 x 50 x 9 tensor
User playlists are used in training as a proxy for listening history or more intentionally curated playlist. Improved data quality woulld do a lot for an improved RNN model.
Metadata from Spotify "Features" API
Concrete Features
- Key
- Mode
- Tempo
“Abstract” Features
- Acousticness
- Danceability
- Energy
- Instrumentalness
- Liveness
- Loudness
- Speechiness
- Valence
A recurrent neural network is different from other deep learning architectures because it learns sequences rather than a single set of values. While RNN applications in recommendation systems typically involve one-hot encoding for the next item in a sequence, I've employed RNNs for multivariate time series forecasting of the different "abstract features" which describe the character of songs in a playlist. The RNN architecture is 9 inputs, 8 outputs, with two 16-node hidden layers. 8 input/output nodes correspond to the 8 "abstract features," and one additional one is used in the input layer for mode. (More on this later.) The model's mean absolute error is 0.5848 and the mean absolute deviation in the training data is 0.8535.
The model uses a many-to-many sequence learning format, and in its implementation is used as many-to-one, where the output is not fed back into the input (without some modification... more on that in the next section). At each step of the RNN, the whole computation graph (above) is used.
Standard Scaler and Yeo-Johnson Power Transformation applied to training set with duplicates removed, to give the data better distributions both for training as well as distance metrics. Furthermore, some features, especially "Loudness," benefit from reducing the extreme long tails.
Although Euclidian distance is ideal for model implementation, MSE often leads to under-estimation of weights and biases as gradients lead to local minima near zero, as outliers are heavily penalized. This is why MAE is used as an objective function instead.
Linear activations were used in all layers as they are less likely to under-estimate features and produce a higher-variance model. Weights are initialized randomly, and Adam optimizer was used instead of RMSProp, though the latter is more common for RNNs. The logic gates of GRU and LSTM are not necessary as long-term dependency is not a major concern.
Three parameters are used to pick the best next song
As mentioned above, mode is not part of the output vector because first, it's used insteead with key to determine key transition consonance, and second, because I didn't want errors to backpropagate. Two tuning parameters are associated with this distance metric:
- Flow: how much to count distance in the overall argmin that determines the next song to pick
- Spicyness: a scaler for the RNN output, since parameters are often underestimated
The circle of fifths is the backbone of this part of the algorithm. Distance in the circle of fifths determines how close two keys are in both a sonic and simple mathematical sense, so the number of steps is the basis for this part of the loss function for a song. Research in computational music theory has more complex and elegant solutions to this problem, but the circle of fifths will do for now.
Minor keys are assigned to their relative majors and distances are calculated from there. Fifths and fourths are assigned the same distance as the same octave, so the function sees no difference between those three options. The tuning parameter "sweetness" adjusts how much the argmin function counts key similarity in making its decisions.
One of the hardest feature engineering questions in this project was how to use tempo. Surely it's an important feature, but how to treat it mathematically was not immediately apparent. I took an approach which expands tempo to two dimensions so that a similarity metric can be calculated as the distance between points. A circle is used to caputre the cyclical nature of tempo similarity, and then the function was transformed monotonically to give a simpler version:
A plot of similarity against tempo ratio is shown below:
The tuning parameter "smoothness" determines how important tempo similarity is in the song selection process.
Use the notebook Pipeline.ipynb to pick 3 songs. These starter sequence generates 200-400 candidate songs by using Spotify recommendations through their API. The RNN predicts the next feature vector and the algorithm picks ten more songs. A visualization of the playist's flow is generated using Plotly as shown below. The 3 dimensions are a projection of the 8 "abstract" feature dimensions done with a PCA transformation trained on the original training data. Lines connect songs sequentially.
- Incorporate collaborative filtering
- Investigate possible bug in Spotify API Client
- Better data quality
- Continued RNN tuning
- Limiting algorithm “greediness”
- More research into computational music theory
- Sequence-Aware Recommender Systems: https://arxiv.org/pdf/1802.08452.pdf
- RNNs: http://karpathy.github.io/2015/05/21/rnn-effectiveness/