[This post is courtesy of Ryan Smith, dot net user @ryansmith534, a data scientist formerly at Spotify.Thank you, Ryan! -Ed.]
Every Phish fan undoubtedly has their own answer to this question – but is there a universal truth across all fans? Using setlist data and user ratings from Phish.net, we can attempt to answer this question empirically.
To do this, we can borrow methodology from basketball and hockey analytics, specifically the concept of RAPM (regularized adjusted plus-minus). This metric attempts to quantify an answer to the question: how much does the presence of a given player on the court contribute to a team’s point differential? In our case, the question becomes: how much does the presence of a given song in a setlist contribute to a show’s rating on Phish.net?
We first need to gather the necessary data, a process made significantly easier because of the convenience of the Phish.net API. After doing a bunch of cleaning and manipulation, we get a dataset that looks like this:
We have one row for every show, a column with the show’s rating, and a column for every song in Phish’s repertoire – with a 0 or 1 value representing whether the song was played at a given show.