While calculating team ratings for a machine learning-based March Madness prediction, I ran into a couple of situations where my code got slow as I expanded it to include all the teams over all the seasons. By slow I mean several hours, and that was longer than I was willing to wait. I needed to be able to recalculate ratings on-demand, in a few minutes at most.
For my offensive-defensive rating, I started with an implementation similar to the one in Offensive and defensive team ratings for the Premier League 2014-2015:
It's looping over all the rows and columns in the matrix many times. With college basketball having over three-hundred teams, this wasn't going to work for me. I figured out that it could be vectorized and that numpy could handle it more efficiently than me:
Not only is it faster, but I would argue the more concise code is easier to understand as well.
For my Markov Chain ranking, I started with an implementation similar to the one in A Markov Chain ranking of Premier League teams (14/15 season):
I don't mean to disparage these two posts in any way. They are awesome and really helped me. I just had a different situation and had to worry about performance, and here I figured out that I could get rid of both loops:
Eigenvectors to the rescue. The eigenvector for the largest eigenvalue is the stationary distribution we are trying to find and we don't need to do the 100,000 step random walk.