NBA Salaries

Rating Players and Understanding Compensations in 2008-2018

UPDATED JUN 7, 2019   |    Franco Betteo & Francisco Valentini

Are NBA players paid according to their observable contribution to winning games? Who are the most underpaid players? Who are the most overpaid? Which variables are useful to explain differences in salaries?

To answer these questions we need:

  1. Data. We retreived salaries data from data.world. Personal stats were obtained from Kaggle. We scraped relevant events —injuries, fines, sanctions and personal issues— from Pro Sports Transactions. We only kept records between seasons between 2008 and 2018 for players with an annual salary over 1 million dollars and more than 400 minutes played each season.

  2. A statistical model. To understand how personal stats and events determine salaries we fitted a Generalised Additive Model (GAM). As opposed to standard linear regression, GAMs allow us to estimate non-linear effects between the target and each feature, whilst preserving additivity and thus interpretability. In our model salaries we use annual salaries as the target variable — to make them comparable over the years we adjust them with the Federal Reserve CPI so that they are held at constant 2018 dollars. Salaries are explained by a set of 20 predictor variables, made up of by-season game stats and historical events.

All code is available at our GitHub repository.

The most important variables behind salaries

This kind of modelling enables us to discover why some players make more money than others —or in other words, how players' observable features determine their salaries. Below we can see the shape of the effects of the statistically most important features, represented as a green line. Behind the line we plot the confidence band —the wider the band, the less confidence we can have in that the fitted effect is correct.

For example, in the plot for Age we find that, holding all other features fixed, salary tends to increase at ages around 28-30, while it is lower for the very young and the very old —which is quite reasonable.

Overpayment and underpayment

Given the fitted impact of the predictor variables on salaries we can estimate the expected salary for each player each season. This allows us to give an estimate of residuals —that is, the difference between the expected salary and the actual salary. By inspecting residuals we can identify which players get payments way different from what the model suggests. Some of them have been overpaid, getting much more money than the expected according to what they did on the court and their personal features. On the other hand, some players contributed a lot more to their teams than what they got paid for.

Top 10 Underpaid Players

Top 10 Overpaid Players

Search by team and season!