Today I will look at the differences in features of songs I like and dislike. Along the way to a nice, pretty graph of song features, this post shows how to extract features from songs contained in spotify playlists. It makes use of the Spotipy Python library to access the Spotify Web API. What's nice about this library is that it contains easy helper methods to access different data from the API. For example, I use the user_playlist_tracks and audio_features methods.

Step 1: access the Spotify API¶

Because I am trying to get data from my specific user account the methods I want to use from Spotipy require authentication. That means I need to generate an authorization token which indicates the user has granted me permission to perform specific tasks from my application (my code). Basically, I need to send a request to myself to authenticate myself. This is all handled simply with Spotipy's util.prompt_for_user_token, which will coordinate the authorization with the web browser (pass the client credentials, ask the user for access, return the granted access via a token). Spotipy's documentation explains it a lot better than me.

In [1]:

import spotipy
import spotipy.util as util
from config import client_id, client_secret, redirect_uri, username, good_playlist_id, bad_playlist_id
import numpy as np
import pandas as pd

scope = 'user-library-read playlist-read-private'
token = util.prompt_for_user_token(username, scope, client_id=client_id, client_secret=client_secret, redirect_uri=redirect_uri)
if token:
    sp = spotipy.Spotify(auth=token)
else:
    print("Can't get token for", username)

Step 2: get list of track IDs from a playlist¶

In this case, I am grabbing track IDs from my "good songs" playlist and my "bad songs" playlist. The Spotify API only lets you grab 100 tracks at a time so you have to use a while loop where results['next'] is true/false.

In [2]:

def get_playlist_tracks(username, playlist_id):
    results = sp.user_playlist_tracks(username, playlist_id)
    tracks = results['items']
    while results['next']:
        results = sp.next(results)
        tracks.extend(results['items'])
    return tracks

Here I convert the raw JSON data from the API request into a dataframe so I can make sure I have the good and bad playlists labelled correctly. There really was no other reason to put it into a dataframe other than to see the data nicely in a table! I could have just collected the IDs into an array because that is all I need to pull the features for each song.

In [3]:

def tracks_to_df(username, playlist_id, data_array):
    tracks = get_playlist_tracks(username, playlist_id)
    for i in range(len(tracks)):
        row = [tracks[i]['track']['id'],
              tracks[i]['track']['name'],
              tracks[i]['track']['popularity']]
        data_array.append(row)
        
    data_df = pd.DataFrame(data=data_array,columns=['id','name','popularity'])
    return data_df

Pull good tracks¶

In [4]:

data_good = []
df_good = tracks_to_df(username, good_playlist_id, data_good)

df_good.head()

Out[4]:

	id	name	popularity
0	6wQYKyGePaqstYov2C1S5b	C'est La Vie	40
1	0byab45cmZntz74RxcASrf	Close To Me (with Diplo) (feat. Swae Lee) - Fe...	28
2	40xLrjniJVkxuWbka7KavC	Decisions (feat. Maia Wright)	65
3	3FClUxQc36bQc6kNlXzadI	Don't You Know	40
4	2yuoF9C4QF9kCsoDf3yUHP	Feel the Same - EDX Dubai Skyline Remix Edit	49

Pull bad tracks¶

In [5]:

data_bad = []
df_bad = tracks_to_df(username, bad_playlist_id, data_bad)

df_bad.head()

Out[5]:

	id	name	popularity
0	2Fxmhks0bxGSBdJ92vM42m	bad guy	95
1	6MWtB6iiXyIwun0YzU6DFP	Wow.	88
2	4y3OI86AEP6PQoDE6olYhO	Sucker	88
3	5p7ujcrUXASCNwRaWNHR1C	Without Me	87
4	6u7jPi22kF8CTQ3rb9DHE7	Old Town Road (feat. Billy Ray Cyrus) - Remix	94

Save the track IDs¶

Because the audio_features method only lets you pull track features for 50 tracks at a time, I have had to write a method to split the array of IDs into chunks of 50.

In [6]:

def chunks(mylist, chunk_size):
    # For item i in a range that is a length of l,
    for i in range(0, len(mylist), chunk_size):
        # Create an index range for l of n items:
        yield mylist[i:i+chunk_size]

In [7]:

good_ids=df_good['id'].tolist()
bad_ids=df_bad['id'].tolist()

Step 3: get track features¶

In [8]:

def features_to_df(ids, data_array):
    # Create a list from the results of the function chunks, get features for batch of ids, append to array
    for i in range(0, len(list(chunks(ids, 50)))):
        ids_batch = list(chunks(ids, 50))[i]
        features_temp = sp.audio_features(tracks=ids_batch)
        data_array.append(features_temp)

    columns = list(data_array[0][0].keys())
    columns.sort()

    # convert to df
    # instantiate empty dataframe
    df_features = pd.DataFrame(columns = columns)

    for i in range(0, len(data_array)):
        df_temp = pd.DataFrame(data_array[i], columns = columns)
        df_features = df_features.append(df_temp, ignore_index=True)
    
    return df_features

Pull good features¶

In [9]:

good_features = []
df_features_good = features_to_df(good_ids, good_features)

df_features_good.head()

Out[9]:

	acousticness	analysis_url	danceability	duration_ms	energy	id	instrumentalness	key	liveness	loudness	mode	speechiness	tempo	time_signature	track_href	type	uri	valence
0	0.0383	https://api.spotify.com/v1/audio-analysis/6wQY...	0.839	171375	0.613	6wQYKyGePaqstYov2C1S5b	0.000000	2	0.323	-6.457	1	0.0463	125.011	4	https://api.spotify.com/v1/tracks/6wQYKyGePaqs...	audio_features	spotify:track:6wQYKyGePaqstYov2C1S5b	0.667
1	0.0145	https://api.spotify.com/v1/audio-analysis/0bya...	0.733	223533	0.839	0byab45cmZntz74RxcASrf	0.000187	1	0.162	-2.840	0	0.0335	126.009	4	https://api.spotify.com/v1/tracks/0byab45cmZnt...	audio_features	spotify:track:0byab45cmZntz74RxcASrf	0.471
2	0.0534	https://api.spotify.com/v1/audio-analysis/40xL...	0.732	212788	0.785	40xLrjniJVkxuWbka7KavC	0.000048	0	0.388	-5.081	1	0.0470	121.987	4	https://api.spotify.com/v1/tracks/40xLrjniJVkx...	audio_features	spotify:track:40xLrjniJVkxuWbka7KavC	0.256
3	0.0933	https://api.spotify.com/v1/audio-analysis/3FCl...	0.761	203976	0.696	3FClUxQc36bQc6kNlXzadI	0.168000	6	0.158	-6.837	1	0.1270	124.002	4	https://api.spotify.com/v1/tracks/3FClUxQc36bQ...	audio_features	spotify:track:3FClUxQc36bQc6kNlXzadI	0.664
4	0.3100	https://api.spotify.com/v1/audio-analysis/2yuo...	0.876	209634	0.741	2yuoF9C4QF9kCsoDf3yUHP	0.018500	11	0.110	-4.836	0	0.0435	122.967	4	https://api.spotify.com/v1/tracks/2yuoF9C4QF9k...	audio_features	spotify:track:2yuoF9C4QF9kCsoDf3yUHP	0.695

Pull bad features¶

In [10]:

bad_features = []
df_features_bad = features_to_df(bad_ids, bad_features)

df_features_bad.head()

Out[10]:

	acousticness	analysis_url	danceability	duration_ms	energy	id	instrumentalness	key	liveness	loudness	mode	speechiness	tempo	time_signature	track_href	type	uri	valence
0	0.3280	https://api.spotify.com/v1/audio-analysis/2Fxm...	0.701	194088	0.425	2Fxmhks0bxGSBdJ92vM42m	0.130000	7	0.1000	-10.965	1	0.3750	135.128	4	https://api.spotify.com/v1/tracks/2Fxmhks0bxGS...	audio_features	spotify:track:2Fxmhks0bxGSBdJ92vM42m	0.562
1	0.1630	https://api.spotify.com/v1/audio-analysis/6MWt...	0.833	149520	0.539	6MWtB6iiXyIwun0YzU6DFP	0.000002	11	0.1010	-7.399	0	0.1780	99.947	4	https://api.spotify.com/v1/tracks/6MWtB6iiXyIw...	audio_features	spotify:track:6MWtB6iiXyIwun0YzU6DFP	0.385
2	0.0427	https://api.spotify.com/v1/audio-analysis/4y3O...	0.842	181040	0.734	4y3OI86AEP6PQoDE6olYhO	0.000000	1	0.1060	-5.065	0	0.0588	137.958	4	https://api.spotify.com/v1/tracks/4y3OI86AEP6P...	audio_features	spotify:track:4y3OI86AEP6PQoDE6olYhO	0.952
3	0.2970	https://api.spotify.com/v1/audio-analysis/5p7u...	0.752	201661	0.488	5p7ujcrUXASCNwRaWNHR1C	0.000009	6	0.0936	-7.050	1	0.0705	136.041	4	https://api.spotify.com/v1/tracks/5p7ujcrUXASC...	audio_features	spotify:track:5p7ujcrUXASCNwRaWNHR1C	0.533
4	0.0533	https://api.spotify.com/v1/audio-analysis/6u7j...	0.878	157067	0.619	6u7jPi22kF8CTQ3rb9DHE7	0.000000	6	0.1130	-5.560	1	0.1020	136.041	4	https://api.spotify.com/v1/tracks/6u7jPi22kF8C...	audio_features	spotify:track:6u7jPi22kF8CTQ3rb9DHE7	0.639

Step 4: plot the distribution of features for good and bad tracks¶

In [11]:

%matplotlib inline
import matplotlib.pyplot as plt
import datetime
import numpy as np
import pandas as pd
import seaborn as sns; sns.set()

In [12]:

import warnings
warnings.filterwarnings('ignore')

sns.set_palette("tab10")
sns.set_style("ticks")

f, axes = plt.subplots(3, 3, figsize=(10,10))

sns.despine(fig=f, ax=axes)

# this could be refactored majorly lol
sns.distplot(df_features_good['acousticness'], hist=False, ax=axes[0,0]);
sns.distplot(df_features_bad['acousticness'], hist=False, ax=axes[0,0], color='red');
sns.distplot(df_features_good['danceability'], hist=False, ax=axes[0,1]);
sns.distplot(df_features_bad['danceability'], hist=False, ax=axes[0,1], color='red');
sns.distplot(df_features_good['energy'], hist=False, ax=axes[0,2], label='Like');
sns.distplot(df_features_bad['energy'], hist=False, ax=axes[0,2], color='red', label='Dislike');

sns.distplot(df_features_good['instrumentalness'], hist=False, ax=axes[1,0]);
sns.distplot(df_features_bad['instrumentalness'], hist=False, ax=axes[1,0], color='red');
sns.distplot(df_features_good['liveness'], hist=False, ax=axes[1,1]);
sns.distplot(df_features_bad['liveness'], hist=False, ax=axes[1,1], color='red');
sns.distplot(df_features_good['loudness'], hist=False, ax=axes[1,2]);
sns.distplot(df_features_bad['loudness'], hist=False, ax=axes[1,2], color='red');

sns.distplot(df_features_good['speechiness'], hist=False, ax=axes[2,0]);
sns.distplot(df_features_bad['speechiness'], hist=False, ax=axes[2,0], color='red');
sns.distplot(df_features_good['tempo'], hist=False, ax=axes[2,1]);
sns.distplot(df_features_bad['tempo'], hist=False, ax=axes[2,1], color='red');
sns.distplot(df_features_good['valence'], hist=False, ax=axes[2,2]);
sns.distplot(df_features_bad['valence'], hist=False, ax=axes[2,2], color='red');

plt.subplots_adjust(hspace = 0.3)

# put legend on one subplot
axes[0,2].legend(['Like', 'Dislike'], bbox_to_anchor=(0.9, 1), loc=2, borderaxespad=0.)

%config InlineBackend.figure_format = 'svg'

Interpretation¶

I've plotted the features of the good and bad songs in overlayed distribution charts to see if there are any clear differences. Immediately I can see the most variation in danceability, energy, tempo and valence. That there is variation across my good and bad playlists is positive for when I input these features into a classification algorithm in order to create my own Spotify Weekly playlist. The songs I like tend to have higher danceability, energy and tempo, but less valence. Spotify's API gives some nice definitions of these features:

Danceability: "...how suitable a track is for dancing based on a combination of musical elements including tempo, rhythm stability, beat strength, and overall regularity."
Energy: "...a perceptual measure of intensity and activity. Typically, energetic tracks feel fast, loud, and noisy."
Tempo: pace of the track in beats per minute (BPM)
Valence: "...musical positiveness conveyed by a track. Tracks with high valence sound more positive (e.g. happy, cheerful, euphoric)..."

The other features do not seem as meaninful. There is virtually no variation in liveness or loudness across good and bad songs. This kind of makes sense since most of the songs in both of these playlists are pop/electronic songs. There are small differences for the acousticness and speechiness features. This also makes sense because none of these songs have any "spoken word" attributes like podcasts would have (speechiness) and they don't have acoustic guitar (except for maybe some of the P!nk songs I threw into the "bad" category).

The F u t u r e¶

My next post will use the song features from my good and bad playlists to create a customized Discover Weekly playlist (or just filter the already-created Discover Weekly playlist that Spotify generates). Similar to this post I will get different classification algorithms such as Decision Trees, K-Nearest Neighbors, Adaptive Boosting or Gradient Boosting to predict which songs I will like.

The Least Square
Trying to automate my job away.

Songs I like and dislike

Monica

Step 1: access the Spotify API¶

Step 2: get list of track IDs from a playlist¶

Pull good tracks¶

Pull bad tracks¶

Save the track IDs¶

Step 3: get track features¶

Pull good features¶

Pull bad features¶

Step 4: plot the distribution of features for good and bad tracks¶

Interpretation¶

The F u t u r e¶

Comments

The Least Square Trying to automate my job away.

Songs I like and dislike

Monica

Step 1: access the Spotify API¶

Step 2: get list of track IDs from a playlist¶

Pull good tracks¶

Pull bad tracks¶

Save the track IDs¶

Step 3: get track features¶

Pull good features¶

Pull bad features¶

Step 4: plot the distribution of features for good and bad tracks¶

Interpretation¶

The F u t u r e¶

Comments

The Least Square
Trying to automate my job away.