This blog post was inspired by this blog post #blogception.

I'm addicted to Spotify. What gets me somewhat excited for Monday morning is a little nugget called Discover Weekly. It's a playlist of recommended songs based on a user's preferences, which I'm guessing is based on play history. It's a machine learning-powered playlist generated by those magicians at Spotify. A software engineer explains how these song recommendations are made here.

My Discover Weekly playlist is hit or miss. Sometimes I find a few really good songs in there, but other times the majority of the songs are just "meh". So I decided to create a playlist of songs that I KNOW I like, and a playlist of songs that I KNOW I do not like. I'll combine these tracks into one playlist and use them as training data to feed into a machine learning algorithm. Once the algorithm is sufficiently trained, the hope is that it will be able to create me a filtered Discover Weekly playlist.

I use the Spotipy Python library to access the Spotify Web API and obtain data on song features.

Spotipy configuration¶

In [1]:

import spotipy
import spotipy.util as util
from config import client_id, client_secret, redirect_uri, username, good_playlist_id, bad_playlist_id
from dw_id import dw_playlist_id
import numpy as np
import pandas as pd

scope = 'playlist-modify-private playlist-modify-public playlist-read-private user-library-read'
token = util.prompt_for_user_token(username, scope, client_id=client_id, client_secret=client_secret, redirect_uri=redirect_uri)
if token:
    sp = spotipy.Spotify(auth=token)
else:
    print("Can't get token for", username)

Pull data for good and bad playlists¶

I created these methods to clean up the code that pulls the tracks from a playlist. A full explanation is in my previous post.

In [2]:

def get_playlist_tracks(username, playlist_id):
    results = sp.user_playlist_tracks(username, playlist_id)
    tracks = results['items']
    while results['next']:
        results = sp.next(results)
        tracks.extend(results['items'])
    return tracks

def tracks_to_df(username, playlist_id, data_array):
    tracks = get_playlist_tracks(username, playlist_id)
    for i in range(len(tracks)):
        row = [tracks[i]['track']['id'],
              tracks[i]['track']['name'],
              tracks[i]['track']['artists'][0]['name'],
              tracks[i]['track']['popularity']]
        data_array.append(row)
        
    data_df = pd.DataFrame(data=data_array,columns=['id','name','artist','popularity'])
    return data_df

Collect tracks into a dataframe with columns for id, name, artist and popularity.

In [3]:

data_good = []
df_good = tracks_to_df(username, good_playlist_id, data_good)

data_bad = []
df_bad = tracks_to_df(username, bad_playlist_id, data_bad)

Pull features¶

Below are two methods to help pull features from each track. Again, an explanation of the code clean up is in my previous post.

In [4]:

def chunks(mylist, chunk_size):
    # For item i in a range that is a length of l,
    for i in range(0, len(mylist), chunk_size):
        # Create an index range for l of n items:
        yield mylist[i:i+chunk_size]

In [5]:

def features_to_df(ids, data_array):
    # Create a list from the results of the function chunks, get features for batch of ids, append to array
    for i in range(0, len(list(chunks(ids, 50)))):
        ids_batch = list(chunks(ids, 50))[i]
        features_temp = sp.audio_features(tracks=ids_batch)
        data_array.append(features_temp)

    columns = list(data_array[0][0].keys())
    columns.sort()

    # convert to df
    # instantiate empty dataframe
    df_features = pd.DataFrame(columns = columns)

    for i in range(0, len(data_array)):
        df_temp = pd.DataFrame(data_array[i], columns = columns)
        df_features = df_features.append(df_temp, ignore_index=True)
    
    return df_features

Save the track IDs into a list.

In [6]:

good_ids=df_good['id'].tolist()
bad_ids=df_bad['id'].tolist()

Append all of the track features (for good and bad tracks) into one dataframe called data.

In [7]:

good_features = []
df_features_good = features_to_df(good_ids, good_features)

bad_features = []
df_features_bad = features_to_df(bad_ids, bad_features)

df_features_good['target'] = 1
df_features_bad['target'] = 0

data = df_features_good.append(df_features_bad, ignore_index=True)

Create test and training data¶

In a previous post I plotted the distribution of features of songs I like and dislike. The features that seemed to have the most variation from "like" to "dislike" were danceability, energy, tempo and valence. As such, I have created a subset of the full 11 features called features_variation for the training data set.

I chose a test size of 25%. I played around with this parameter a bit, trying 0.3 and 0.4, but the model prediction accuracy seemed to be best at 0.25. This results in a training sample of 378 songs and a test sample of 127 songs.

In [8]:

#Define the set of features that we want to look at
features_full = ["acousticness", "danceability", "energy", "instrumentalness", "liveness", "loudness", "speechiness", "tempo", "valence", "key", "duration_ms"]
features_variation = ["danceability", "energy", "tempo", "valence", "key", "duration_ms"]
features = features_variation

from sklearn.model_selection import train_test_split
x_train, x_test, y_train, y_test = train_test_split(data[features], data['target'], test_size = 0.25)

In [9]:

x_train.shape

Out[9]:

(378, 6)

In [10]:

x_test.shape

Out[10]:

(127, 6)

Models¶

In this section I feed the training data into various classifiers i.e. train them to make predictions on which songs I will like and dislike. I'm not yet familiar with all of these algorithms but the blog post I was following used all of them. 😂😂😂

1. Decision Tree Classifier¶

In [11]:

from sklearn.metrics import accuracy_score
from sklearn.tree import DecisionTreeClassifier

tree = DecisionTreeClassifier(min_samples_split=100)
tree.fit(x_train, y_train)
tree_pred = tree.predict(x_test)
score = accuracy_score(y_test, tree_pred) * 100
print("Accuracy using Decision Tree: ", round(score, 1), "%")

Accuracy using Decision Tree:  72.4 %

2. K Neighbours Classifier¶

In [12]:

from sklearn.neighbors import KNeighborsClassifier
knn = KNeighborsClassifier(3)
knn.fit(x_train, y_train)
knn_pred = knn.predict(x_test)
score = accuracy_score(y_test, knn_pred) * 100
print("Accuracy using K Neighbours: ", round(score, 1), "%")

Accuracy using K Neighbours:  51.2 %

3. Multi-layer Perceptron¶

In [13]:

from sklearn.neural_network import MLPClassifier
mlp = MLPClassifier()
mlp.fit(x_train, y_train)
mlp_pred = mlp.predict(x_test)
score = accuracy_score(y_test, mlp_pred) * 100
print("Accuracy using Multi-layer Perceptron: ", round(score, 1), "%")

Accuracy using Multi-layer Perceptron:  52.8 %

4. Random Forest Classifier¶

In [14]:

from sklearn.ensemble import RandomForestClassifier
forest = RandomForestClassifier(max_depth=5, n_estimators=10, max_features=1)
forest.fit(x_train, y_train)
forest_pred = forest.predict(x_test)
from sklearn.metrics import accuracy_score
score = accuracy_score(y_test, forest_pred) * 100
print("Accuracy using Random Forest: ", round(score, 1), "%")

Accuracy using Random Forest:  70.9 %

5. AdaBoost Classifier¶

In [15]:

from sklearn.ensemble import AdaBoostClassifier
ada = AdaBoostClassifier(n_estimators=100)
ada.fit(x_train, y_train)
ada_pred = ada.predict(x_test)
from sklearn.metrics import accuracy_score
score = accuracy_score(y_test, ada_pred) * 100
print("Accuracy using AdaBoost: ", round(score, 1), "%")

Accuracy using AdaBoost:  77.2 %

6. Naive Bayes¶

In [16]:

from sklearn.naive_bayes import GaussianNB
gauss = GaussianNB()
gauss.fit(x_train, y_train)
gauss_pred = gauss.predict(x_test)
score = accuracy_score(y_test, gauss_pred)*100
print("Accuracy using Gaussian Naive Bayes: ", round(score, 1), "%")

Accuracy using Gaussian Naive Bayes:  67.7 %

7. K Means Clustering¶

In [17]:

from sklearn.cluster import KMeans
k_means = KMeans(n_clusters=3, random_state=0)
k_means.fit(x_train, y_train)
predicted= k_means.predict(x_test)
score = accuracy_score(y_test, predicted)*100
print("Accuracy using K Means: ", round(score, 1), "%")

Accuracy using K Means:  52.0 %

8. Gradient Boosting Classifier¶

In [18]:

from sklearn.ensemble import GradientBoostingClassifier
gbc = GradientBoostingClassifier(n_estimators=100, learning_rate=.1, max_depth=1, random_state=0)
gbc.fit(x_train, y_train)
predicted = gbc.predict(x_test)
score = accuracy_score(y_test, predicted)*100
print("Accuracy using Gradient Boosting: ", round(score, 1), "%")

Accuracy using Gradient Boosting:  77.2 %

Now apply predictions to my Discover Weekly playlist¶

In [19]:

data_dw = []
df_dw = tracks_to_df(username, dw_playlist_id, data_dw)

df_dw

Out[19]:

	id	name	artist	popularity
0	5GukxVkcnm6wyuw17nYevK	Done - R3hab Remix	Nikki Vianna	46
1	0ap4E0W70EcUjXqItoM74l	Walk Away - 3LAU Deep Mix	3LAU	39
2	3ZuLTogqYwaL7DLqAP43t3	Growing Pains - Justin Caruso Remix	Alessia Cara	29
3	2bcTdyGjBUR8fknw2GeH0z	Gone (feat. Marvin Brooks) - Flyboy Remix	Maan On The Moon	43
4	1MqBckcnN45W32KSSHnylW	Sometimes	DallasK	59
5	7mAYdYyUrkUSArOdSrC7rR	Drew Barrymore	LU2VYK	47
6	0xARbGHzPT1o5t1sFlmyO2	Grip - Jay Pryor Remix	Seeb	54
7	4MuYNxE0Dgw0PFXz9Aquw6	Trampoline - BKAYE Remix	SHAED	53
8	1UBDqRniw09drFPk7hgzOF	All That She Wants	Jordan Jay	44
9	3SoHRFBuaJ11rD7uxxG5Uq	Off My Back	Thoreau	30
10	5XNltFLO0aM3PHfKYRkuH2	Bad Habits	dzill	30
11	6xUy203RnyyOfbqf96Nven	Selfish	Dimitri Vegas & Like Mike	69
12	0McMlTPzi5QjtrQUOCffaZ	What About Us	WizG	29
13	4ABdTWafMCXfATpILRuZFW	IDWK	DVBBS	63
14	5ek8fux89OY3S3E8DVNJ4i	All The Way Up	Glazy	38
15	4apGmexRZUxpTL6f8z42Qt	Congratulations	Carda	29
16	69HVwrOSZdcFPwJUnuTN1n	Always on My Mind	Nick Martin	35
17	328QDttJ2uYhtFyFmsiuI6	Lay Me Down	Timeflies	50
18	16uC4HSJUTNcqaAVbHgWwk	No One Has To Know (Adam Kahati X Deerock Remix)	GoldFish	6
19	2xFSwFeA7UNM4Tlu2nX9Vz	Wish You Well (feat. Trove) - Club Mix	Famba	33
20	5DHp41RoSuq0Lv8x9AnQRg	Into My Bed	Harpoon	36
21	2ZrMXdHe6RfVWv1dlN52as	All U Need	Dizaro	39
22	2lmyHaEaM1ATZyiFXjI3jg	Stay Here	Zaxx	33
23	3NSjJE5P1RNWOkDAaUSgra	White Flag	Noah Neiman	35
24	5pYVOAWWO774uGCouME1wU	Love Thang (feat. Ookay)	YDG	35
25	0mT29GxaF6xs61GuAd6End	Wild Like The Wind	Deorro	47
26	7MLZc2C7guizPRLIq6DspS	Turn It Up (COE Remix)	Mike Parr	42
27	6abIrYu5OWE3z3F4p8MlyO	Creep On Me (feat. French Montana & DJ Snake) ...	GASHI	38
28	78j7afPUzFV0kAn2qNd1jZ	Treat Me Like A Lady (feat. Jeanne Naylor)	Francis Mercier	30
29	1AJG3n8tWJut45jn1o2cEH	Getting Closer - Watson Remix	NEW CITY	31

In [20]:

dw_ids=df_dw['id'].tolist()

In [21]:

dw_features = []
data_discover_weekly = features_to_df(dw_ids, dw_features)

In [22]:

pred_gbc = gbc.predict(data_discover_weekly[features])
pred_tree = tree.predict(data_discover_weekly[features])

In [23]:

likedSongs = 0
i = 0
for prediction in pred_tree:
    if(prediction == 1):
        print ("Song " + str(likedSongs+1) + ": " + df_dw["name"][i] + ", By: "+ df_dw["artist"][i])
        # add each song to a new playlist
        sp.user_playlist_add_tracks(username, '2RARDnZLQGVPo0sXScDA8g', [df_dw['id'][i]])
        likedSongs= likedSongs + 1
    i = i +1

Song 1: Done - R3hab Remix, By: Nikki Vianna
Song 2: Walk Away - 3LAU Deep Mix, By: 3LAU
Song 3: Gone (feat. Marvin Brooks) - Flyboy Remix, By: Maan On The Moon
Song 4: Sometimes, By: DallasK
Song 5: Grip - Jay Pryor Remix, By: Seeb
Song 6: All That She Wants, By: Jordan Jay
Song 7: What About Us, By: WizG
Song 8: IDWK, By: DVBBS
Song 9: Congratulations, By: Carda
Song 10: Always on My Mind, By: Nick Martin
Song 11: Wish You Well (feat. Trove) - Club Mix, By: Famba
Song 12: Into My Bed, By: Harpoon
Song 13: All U Need, By: Dizaro
Song 14: Stay Here, By: Zaxx
Song 15: Wild Like The Wind, By: Deorro
Song 16: Turn It Up (COE Remix), By: Mike Parr
Song 17: Treat Me Like A Lady (feat. Jeanne Naylor), By: Francis Mercier
Song 18: Getting Closer - Watson Remix, By: NEW CITY

In [24]:

from IPython.display import IFrame
IFrame("https://open.spotify.com/embed/playlist/2RARDnZLQGVPo0sXScDA8g", width=600, height=380)

Out[24]:

SO WHAT'S THE VERDICT?!¶

I think this process was reasonably successful. The Decision Tree algorithm was able to cut the playlist down from 30 songs to 18, a 40% reduction! I think the playlist wasn't cut down further because my Discover Weekly playlist is actually fairly customized for me already. As you can see, it's all dance/pop type music which is the majority of what I listen to and what my other playlists are made of. Spotify just knwos me too well.

Other classification algorithms¶

To be honest, I don't know what these do. I haven't familiarized myself with all of the different types of classification algorithms...

In [25]:

from sklearn.discriminant_analysis import QuadraticDiscriminantAnalysis
qda = QuadraticDiscriminantAnalysis()
qda.fit(x_train, y_train)
qda_pred = qda.predict(x_test)
score = accuracy_score(y_test, qda_pred)*100
print("Accuracy using Quadratic Discriminant Analysis: ", round(score, 1), "%")

Accuracy using Quadratic Discriminant Analysis:  65.4 %

In [26]:

from sklearn.svm import SVC
svc_lin = SVC(kernel="linear", C=0.025)
svc_lin.fit(x_train, y_train)
svc_pred = svc_lin.predict(x_test)
from sklearn.metrics import accuracy_score
score = accuracy_score(y_test, svc_pred) * 100
print("Accuracy using Support Vector Machine: ", round(score, 1), "%")

Accuracy using Support Vector Machine:  62.2 %

In [27]:

from sklearn.gaussian_process import GaussianProcessClassifier
from sklearn.gaussian_process.kernels import RBF
gpc = GaussianProcessClassifier(1.0 * RBF(1.0), warm_start=True)
gpc.fit(x_train, y_train)
gpc_pred = gpc.predict(x_test)
from sklearn.metrics import accuracy_score
score = accuracy_score(y_test, gpc_pred) * 100
print("Accuracy using Gaussian Process: ", round(score, 1), "%")

Accuracy using Gaussian Process:  47.2 %

The Least Square
Trying to automate my job away.

Custom Spotify Discover Weekly Playlist

Monica

Spotipy configuration¶

Pull data for good and bad playlists¶

Pull features¶

Create test and training data¶

Models¶

1. Decision Tree Classifier¶

2. K Neighbours Classifier¶

3. Multi-layer Perceptron¶

4. Random Forest Classifier¶

5. AdaBoost Classifier¶

6. Naive Bayes¶

7. K Means Clustering¶

8. Gradient Boosting Classifier¶

Now apply predictions to my Discover Weekly playlist¶

SO WHAT'S THE VERDICT?!¶

Other classification algorithms¶

Comments

The Least Square Trying to automate my job away.

Custom Spotify Discover Weekly Playlist

Monica

Spotipy configuration¶

Pull data for good and bad playlists¶

Pull features¶

Create test and training data¶

Models¶

1. Decision Tree Classifier¶

2. K Neighbours Classifier¶

3. Multi-layer Perceptron¶

4. Random Forest Classifier¶

5. AdaBoost Classifier¶

6. Naive Bayes¶

7. K Means Clustering¶

8. Gradient Boosting Classifier¶

Now apply predictions to my Discover Weekly playlist¶

SO WHAT'S THE VERDICT?!¶

Other classification algorithms¶

Comments

The Least Square
Trying to automate my job away.