programming opiethehokie: February 2017

Thursday, February 23, 2017

Vectorization and Eigenvectors: Sports Rating Examples

While calculating team ratings for a machine learning-based March Madness prediction, I ran into a couple of situations where my code got slow as I expanded it to include all the teams over all the seasons. By slow I mean several hours, and that was longer than I was willing to wait. I needed to be able to recalculate ratings on-demand, in a few minutes at most.

For my offensive-defensive rating, I started with an implementation similar to the one in Offensive and defensive team ratings for the Premier League 2014-2015:

It's looping over all the rows and columns in the matrix many times. With college basketball having over three-hundred teams, this wasn't going to work for me. I figured out that it could be vectorized and that numpy could handle it more efficiently than me:

Not only is it faster, but I would argue the more concise code is easier to understand as well.

For my Markov Chain ranking, I started with an implementation similar to the one in A Markov Chain ranking of Premier League teams (14/15 season):

I don't mean to disparage these two posts in any way. They are awesome and really helped me. I just had a different situation and had to worry about performance, and here I figured out that I could get rid of both loops:

Eigenvectors to the rescue. The eigenvector for the largest eigenvalue is the stationary distribution we are trying to find and we don't need to do the 100,000 step random walk.

Pages

Thursday, February 23, 2017

Vectorization and Eigenvectors: Sports Rating Examples