Scraping Player Data Using R from NHL.com for the 2011-2012 Season

Now that the NHL is finally starting and the lockout is over, I’ve been involved in 3 different fantasy hockey drafts. I thought it would be fun to scrape data from NHL.com for the 2011-2012 season and pre-rank my players before one of my drafts.

Before I get into my player ranking strategy, I want to show off how powerful ggplot2, the R graphing library, can be. I wanted to compare a player’s plus/minus rating, points, average shifts per game and position – all in one visual. Here’s how it came out in R using ggplot2.

2011-2012 SeasonBruins

You can very quickly see that the upper right corner is where the superstars typically end up. However, take a look at Krejci, whose +/- is in the negative yet he has almost as many points as Seguin and Bergeron. Could this discrepancy be attributed to who his linemates are? The Lucic-Krejci-Horton line can be inconsistent, but for the most part they scored on a fairly consistent basis. Plus/Minus is one of those statistics in hockey that is an oversimplified way of determining how much a player contributes to the team. The Houston Rockets were the first NBA team to adopt a modified version of the statistic to reveal that light-scoring Shane Battier made his team much better when he was on the court, and their opponents much worse.

In order to present a more interactive version of this visualization, I imported the text data into Tableau Public.

If you want to see the code I used for the above it’s located here. R provides an easy way to parse HTML and bring table data directly into R for pre-processing, analysis and visualization.

As far as pre-ranking my players for the draft, I used a combined statistic which was average points scored per game (based on the scoring settings in my league) and average projected points per game. Since players averaged 75 games played in 2011-2012, or 91% of the 82 game regular season, I assumed everyone would play on average 91% of the 48 game shortened regular season. I then used ESPN’s projections, which are customized for my league’s scoring, and combined those with last year’s average points per game, weighting last year’s stats more. I’ll post a follow-up that shows how I combined all of that data together, but admittedly it was rather messy and included importing excel sheets into R for sake of time.

Comments (1)

  1. Love the visual.
    I do/publish analytic research on sports stats . have a small project to create program that will scrape play by play data from NHL.com> The site is very difficult to work with and I only have limited program skills. Would you know anyone available?. Renumeration available
    Thanks Dan
    danceraldi@gmail.com

    Do you know anyone with experience that would be bale to help/consult with me

Pingbacks list

Join the discussion, leave a reply!