Saturday, September 3, 2011

Predicting Pitcher Injuries with Neural Networks

My first blog entry documented the Oakland A’s quantitative strategy as described in the book Moneyball.  Although seeking more novel applications of QM, I recently stumbled on The Extra 2%: How Wall Street Strategies Took a Major League Baseball Team from Worst to First which includes a chapter on the quantitative strategy deployed by the Tampa Bay Rays, to be examined below.

Pitchers Are Expensive

By far the most interesting application of QM by the Rays is predicting pitcher injuries. Because pitchers are expensive and significantly influence the team's defense, Tampa Bay sought a way to predict and thereby prevent pitcher injuries. This effort was targeted at pattern detection in the final ten throws preceding a disabling injury, which was fed into an algorithm along with countless other data points. The Rays ultimately tasked an employee with the absolute power to pull a pitcher if the algorithm predicted an injury in the immediate future. While this is premised largely on the idea that fatigue and over-exertion contribute to injuries, this has been confirmed by other pitching analysis such as Pitch F/X.

Pitch F/X

The path to pitcher injury prediction was opened when MLB.com implemented the Pitch F/X system in 2007 to precisely track and record data points on every pitch. 

Now the computer generates all of this [data] automatically—how high the pitcher's throwing hand was off the ground when he released the ball, how fast the ball was traveling both when it left his hand   and   when it crossed the plate, to what degree and in what direction the ball diverted from a straight path on its way to the plate, and finally, if the pitch really was four inches inside and a couple of inches above the knees.”

This wealth of data points as well as 'arm slot', 'velocity' and injury data feeds a neural network (a non-linear algorithm that loosely imitates brain function) trained to identify tell-tale signs of pitcher injury.While injury prevention is the most interesting application of QM (and a very profitable one), there were other analyses with profound implications for the team.

Extra Bases Run

Tampa Bay also applied QM to their base running strategy.  This involved training players to keep running if fielders didn’t have the ball. A simple heuristic to be sure, but a non-standard one in Major League Baseball and one which propelled them from 25th to 1st in extra bases run. Given the potency of this rule, I’m confident that a statistician tested it using league data before Tampa Bay implemented it.

Statistical Justification of Onfield Strategy

Moneyball detailed the first Major League efforts to debunk baseball tactics such as sacrifice bunts and stealing bases although The Extra 2% offers several of its own. These include using extra in-fielders for batters who frequently hit ground balls, extra out-fielders for batters with a tendency for fly outs, shifting the infield left or right depending on a batters ground out tendencies, and an extra fielder in center right against Derek Jeter.
“Maddon, whose more adventurous against-the-Book decisions included issuing the infamous bases-loaded intentional walk to Josh Hamilton, sending a runner while down 9-0, using unusually aggressive defensive shifts, and starting same-hand hitters against a handful of quirky pitchers.”  (Page 172)

Quantification of Benefits

  • Won the American League pennant
  • Went from 25th to 1st in the league for extra bases run
  • Greater pitcher ROI and utilization which reduces rotation of non-injured pitchers
  • Pitcher Arbitrage
  • Similar to the Oakland A's, they can compete against the Yankees with a third the player salaries

 Personal Commentary

Confidentiality of Methodology: The management of this team knows what they’re doing. For example,  when they hired the Pitcher injury prediction specialist, they hid it from the world by not adding his name to the company directory, not letting him announce why he was closing his blog, etc.

Injured Pitcher Arbitrage: I’m especially interested though in the implications of the pitcher injury prediction. More specifically, a team with this capability can profitably arbitrage injury prone pitchers. Familiarity with league-wide injuries as well as the additional information known from the injury-prone pitcher’s history would produce a robust injury prevention capability. After all, to what extent is an injured pitcher the result of poor management vs. poor self-knowledge?

Data Cleansing as Competitive Advantage: Several articles mention calibration problems with the Pitch F/X cameras, leading to different baselines in different stadiums. Could an analytically oriented team gain an advantage over other analytical teams by deliberately skewing the cameras in their own stadium, and by excelling in their normalization or data cleansing methods?

The automated collection and public distribution of data creates an inflection point.