Kelly Babstock shooting a puck in a game against the Beauts. Photo By: Matt Raney
As the NWHL approaches the third season in league history, it’s important to look towards the future. Sometimes, though, in order to look ahead, you have to visit the past. This being the third year of the league is important from a statistical standpoint, because now we have two seasons of metrics to dive into.
By finding the relationship between metrics from the 15-16 to 16-17 season, we can compute expected values for the 17-18 season, including expected points per game.
With any good analysis, though, it’s important to point out the flaws. The sample sizes we are working with are quite small. The smaller the sample size, the more error we can expect. Another flaw is that we don’t have enough data to run a regression sample and compare it to a test sample. This is something we’ll be able to do before the 18-19 season, though!
The first step here was to determine the outcome for the model. In this case, we were interested in calculating an expected points-per-game total for players in the upcoming 17-18 season. To do this, we need to look back at our previous year’s data, and isolate important variables to work out a regression analysis.
If you’re not familiar with regression analyses, what they do is allow us to estimate the relationship between dependent and independent variables. From there, we can calculate the expectation of a dependent variable given our independent variables. Essentially, given inputs, we can calculate expected output.
For this analysis, our regression was set up as follows:
Dependent variable: 2016-2017 points per game played
Independent variables: 2015-2016 goals per game, assists per game, and shots on goal per game.
Our regression analysis allows us to calculate an expected dependent variable given our independent inputs.
The regression returned a formula of:
Expected points-per-game = 0.08 + goals-per-game * 0.29 + assists-per-game * 0.53 + shots-per-game * 0.08
The regression analysis returned an adjusted r-squred of 0.63. What this tells us is that 63% of our output (points-per-game) is explained by the input variables (goals per game, assists per game, and shots per game). Again, since all for fun, we can just keep going!
To review the regression analysis, we plotted the residuals for each player in the sample. What the residual shows is the difference between the actual and the expected. In this case, our plot shows us a player's actual 16-17 points-per-game minus the regression expected points-per-game.
As we can see, the model does miss a few players by quite a bit, including Janine Weber by nearly a full point-per-game! The reason for this very likely being that Weber would record 10 goals on just 41 shots on goal last year. Since the model includes shots on goal per game as an input, it likely underestimates Weber’s shooting ability, and expects a lower points-per-game for her because of it.
Using the formula above, we can input metrics for each player in the NWHL and get an expected points-per-game output for the 17-18 season.
Here’s how the model expects the 17-18 NWHL season to shape up. (This model estimation only includes players who played in the league last year)
Don’t think any Riveters fan can be mad at this, with four of the seven expected top-scoring positions this year going to the Riveters. Janine Weber finds herself as the model’s top expected scorer in the league, carrying over her success from last year with the Riveters to the Pride this year, a definite coup for the Boston-based team this summer in free agency. It also looks like Weber will be involved in a heated race for the league’s leading scorer this year with Connecticut Whale star, Kelly Babstock.
As with any regression, this certainly isn’t fact! Just what we can expect based off of the data that we have to work with. Surely, there will be some breakout stars in the league this year with more than 1 point-per-game, which the model does not seem expect.
Personally, I’m looking forward to a few players in the league busting this model wide open, and proving it completely wrong. This will help us strengthen our projections for the 18-19 season when the time comes to do so.