Monday, 29 June 2015

'Big Data Baseball' and the use of statistics in football

I recently finished reading Travis Sawchick’s Big Data Baseball; as both a fan of the Pittsburgh Pirates and having an interest in data in sport, it was something that I was always going to enjoy – and I would label it a must read for anyone with an interest in analytics in any sport, not just baseball.

The central theme is how the Pirates improved from a 79-83 team in 2012 following a second successive late season collapse to a 94-68 teams in 2013, ending a 20 year losing streak that plagued the franchise – and the role that embracing data played. I had suspected that research by Dan Fox, the Pirates’ Director of Player Systems Development, was not making its way to the field at the end of that 2012 season after re-reading his series on base running for Baseball Prospectus, which was confirmed to be the case – with Fox and assistant Mike Fitzgerald then taking a central role as the Bucs embraced pitch framing and defensive shifts, among other things. The Pirates' analysts found inefficiencies – and exploited them to the max.

If Moneyball was the beginning of analytics playing a role in player acquisition and building a roster, Big Data Baseball details the team strategical version of statistical analysis in sport. Of course, when a team has success with something, it doesn’t take long for others to catch up and do the same, yet the smartest clubs will continue to be ahead of the curve, continuing to search for every marginal gain that they can. This, along with Arsenal’s addition of Petr Cech, led me to thinking of similar possibilities for football – and whether something that causes one club to excel could then be uniformly implemented by all clubs.

There are obviously some things that could be done by all clubs; however on a tactical level there is a strong possibility of sides cancelling each other out, making an inefficiency inefficient. In my article for STATS on the Cech move, I discussed the value of goal prevention in football – and this is currently any area of inefficiency. Goalkeepers cost so little in comparison to other positions, with defenders not that far ahead of them; Cech, for example, is moving to Arsenal for a similar price to what Sunderland, Hull, Southampton and West Ham paid for Jack Rodwell, Abel Hernandez, Shane Long and Enner Valencia respective, to name a few – even though Cech is regarded as one of the best players in his position in the world, and the others are pretty unexceptional in comparison.

This is despite the value of goal prevention. As discussed by Chris Anderson and David Sally in The Numbers Game. A clean sheet was worth almost 2.5 points per match between the 2001/02 and 2010/11 seasons, so they have been incredibly valuable, with clubs having to score more than two goals to surpass that value – with the 2014/15 season trend continuing along the same lines.

A club like Arsenal can afford to spend money on a keeper like Cech and have funds left over for other positions, yet not all teams can do this. Investing in offense is expensive, and few sides actually average close to scoring two goals per match on any case, so given the value of goal prevention combined with the cost of attacking players, clubs could focus on this and invest in an effort to improve defensively – it would certainly be understandable for sides down the bottom of the division, aiming to avoid relegation. A total of 11 clubs averaged less than 1.5 goals per match last term and only Manchester City averaged more than 2.0 goals per match – although their 2.18 goals per match average drops to 1.97 per match when adjusted with penalties and own goals. It’s a simple example, as it’s obvious that a club cannot lose with a clean sheet, however the numbers currently back up the overall value of prioritising defensive strength over improving offensively if funds were limited.

The limitation with this possible strategy is that if many clubs elected to focus on defensive strength, beginning a new era of the catenaccio, it then becomes possible that they cancel each other out, so this is where clubs need to be ahead of the rest to maintain an advantage. There is always going to be some value in a clean sheet, but this value would decrease if it results in a higher number of draws, as scoring two or more goals could then becoming valuable. Sunderland did their best to provide an example of this last term, as there has to be some balance when building the side.

Sunderland won on just four occasions out of the 13 games in which they shut out their opposition – resulting in an average of 1.62 points per match when keeping a clean sheet, whereas they averaged 1.83 points per match when scoring twice (although they only scored two or more goals in eight games last season). The club’s attack could simply not take advantage when the defence provided them with the opportunities to win games, as Sunderland conceded 0 or 1 goals on 25 occasions, drawing 15 (60%) of these games. The small sample size of one club’s results is the obvious caveat here, yet clubs who do focus on a defensive style could cancel each other out with weaker options offensively.

This is just an example of data lead strategical changes; there are other possibilities, including such things like substitutions and set pieces. However, clubs do attempt to replicate the successes of other sides in all sports (as the shifts have been in baseball), which can be seen tactically over the history of football. Strategical analytical advancements having a positive influence on the field would likely see other clubs follow suit; although it will likely be tough to decipher whether tactical changes were data driven or not at this point – with credit likely to go to the coaches rather than the analysts involved.

No comments:

Post a Comment