Tuesday, August 26, 2014

Spain at FIFA 2014: a fruitful collapse

The World Cup is over (did you know that? And Germany were actually able to win Women’s under 20 in Montreal, also), and football is coming back to continental and national leagues - actually, we have already seen Real Madrid crush Sevilla frail hopes in the UEFA Super Cup. So what are the mortal remains of FIFA 2014? As odd as it can be, FIFA 2014 in Brazil was barely able to increase the average market value of player who played there. In order to single out the effect of FIFA 2014, we restricted the analysis to the players who were in the starting eleven of the teams who reached the quarterfinals (Ronaldo, Iniesta, and Balotelli may disagree, but their appearance in FIFA did not significantly change their value). And we got the data that appeared on transfermarkt.com both before (as of june 1) and after (as of august 15) the tournament. We thought that, by considering these 88 players, the average market value would have significantly increased in the considered time span; as a matter of fact, being in the quarterfinals could be considered as a success (except for Brazil, that’s for granted…). And this happened to be true, but only to a certain extent: on average, the increase was around 15% of the value before the tournament. But who benefitted the most at the league (and at the team) level?

Despite the poor performance of Spain, LFP saw the highest value increase among leagues. And the clubs who benefitted the most in the Liga - either because they sold players in the summer time spell, or because they saw their players increase their value - have been Real Madrid and Real Sociedad (the latter basically as a consequence of Griezmann performance), with Barcelona following. Monaco managed to monetise James Rodriguez’s fine production in Brazil.

Interestingly, Premier League did not see their player dramatically change their average value: Hazard and Chelsea’s Oscar and David Luiz changes were contrasted by an evident decrease of Van Persie and Ozil values.

Monday, July 28, 2014

How do football leagues go social…

We had a look at the way the four most popular football leagues in Europe are getting on with social audience. We were not really interested in ranking the most popular teams (and, yes, for those of you who are interested, it is still a clash between Barcelona and Real Madrid giants, with Manchester United as the odd team out). We wanted to have a look at the leagues, to check if there is a strategy behind each league.
To do this, we captured the number of Facebook fans and Twitter followers for the first 32 teams of each of the following four countries: Spain, England, Italy, Germany. Why 32? Well, actually, because we used NFL teams as benchmark. Basically, 32 teams for each of the European leagues meant having the whole first tier, and a selection of the second tier, with some teams from the third tier. Twitter and Facebook data were summed (yup, you are right: there is a big difference between following your town foe, and actually liking it on Facebook, and yes, there is multiplicity both across teams and across social platforms, but we hypothesized this wouldn’t bias differences between leagues), and just the “home” twitter account was counted, leaving out the international replicas, based on the ground that people from outside the team’s own country may be both followers of the original twitter account and its translated version.
First, the totals: Spain and England are able to drag as many as around 190 million people worldwide, while Italy lags behind at around as many as its inhabitants, and Germany stops at approximately 43 millions. For comparison, NFL teams are followed by less than 90 millions.
The striking difference between NFL and football leagues in Europe was, instead, on the way these figures were distributed: if we rank the 32 teams in order of the numbers of fans, and consider the cumulative values, relative to the overall audience, as a function of the cumulative number of teams (nothing new, it’s the Lorenz curve overused in economics), the behavior changes dramatically between NFL and Football Leagues in Europe, as the figure below shows. From the Lorenz Curve, a measure of skewness (Gini would have said inequality) can be also calculated.

Lorenz Curve for Facebook + Twitter followers: cumulative share of teams from lowest to highest number of followers (x-axis), and cumulative share of number of followers (y-axis)

While most of the european leagues stood on the "unequal" side of the coefficient (with Spain leading with a Gini coefficient value of 0.82, and all above 0.75), NFL displays a whopping 0.30 (in economics, this would have meant socialism!), which basically highlights the ability of NFL to have each team in the league being followed by a relevant share of the overall audience. And this is reflected also at the social level. Even if the number of fans is not necessarily related to the ability of each team to win a league, We predict that these numbers may end upin  having an effect on the overall equality of a league (and, as a result, on the interest it draws). After all, Vitrue said that one fan is worth around 3 €/year...

Saturday, July 12, 2014

You better run!

Most stats in this FIFA 2014 World Cup focus on shoots, attempts, saves, and fouls, while metrics associated with distance have been somewhat overlooked. While there may be a significant change in the kind of play between the former World Cup champions and the next ones in these terms (remember tiqui-taca?), let’s have a look at what happens to distances covered by the eight teams that reached the final stage of FIFA 2014.
Among all the parameters that can be calculated from match analysis, we chose a metric associated with the amount of “useful” distance covered by a team during a match, and we called it Run dominance:

As it can be seen from the equation, Run dominance is calculated as the fraction between two components:
- at the numerator, you average the distance covered by the team when in ball possession with the distance covered by the opposing team when the latter is defending. This summarizes the ability of each team to move when in possession, and to make the defending team move to oppose the attacks.
- at the denominator, you calculate the average distance covered in total by the two teams: it can be considered as an average measure of intensity in the match, to be used as a normalizing factor between low intensity ones (i.e. with low distances covered) and high intensity ones.
Thus, Run dominance can be seen as a measure of the amount of physical power associated with movement that a team displays during a match. The higher limit of 1 could be reached if the team were always in possession (and in that case, the opponent yields 0). Usually, distance covered in possession, distance covered in non-possession, and distance covered during dead times lie in the range 20-40% of the overall distance, which usually sums at around 200-250 km per match. With these figures, and under the hypothesis that the total distance was equally subdivided between the 3 situations, Run dominance would yield 0.33 for both teams. We collected data to obtain this parameter for the matches of the Round of 16 at FIFA 2014, and this is what we got for the teams that advanced to the round of 8:

By taking a look at the quarterfinal pairings, this metric seemed to predict that both Germany–France and Netherlands–Costarica would be lopsided, with the Teutons and the Dutch being the move-makers. Argentina might have proven similar against Belgium, while Colombia and Brazil were very close in these regards.
Make no mistake, though, dominance does not necessarily equal win in the football stadia (…but it might help ;)). Analysing extensively all the matches played up to now, it looks like Run Dominance is all but a bold predictor of the level of success of the national teams: despite Germany and Argentina ranking at the top of Run dominance, regression power stops at less than 0.1, which basically means that it does not predict scores, not even goals conceded.

However, Run dominance does predict fairly well the number of attempts that a team is able to produce in a game, as it can be seen from the figure below. And this predictor strongly correlates with possession percentage, with which it shares the regression power with the number of attempts per game.

By plotting Run dominance against possession percentage, teams above the line are those teams that are more prone to make the ball run more than the players when in possession, while teams below that line tend to move the ball when in possession. By looking at this latter graph, teams can be ranked according to the overall ability to be in control of the game. Most UEFA countries seem to lie above the regression line, almost regardless of the ability to direct a match, while CONMEBOL countries lie below that line (if we exclude Argentina, which have crossed the bar only after the final match). The other confederations do not a show a common behaviour, even if Australia and United States seem to behave in a similar manner to UEFA countries, as expected, given the presence of european managers.

Thus, with the representation of these two variables - possession percentage and Run dominance - one is able to collect two different measures: the amount of control in the match each team is able to produce (or concede to the other team) based on its position along the regression line, and the quality of this control (i.e., by making the players move when in possess, or making the ball move), captured by the relative distance from that line. This is even clearer in the figure below, where regression residuals are represented against possession percentage (CONMEBOL teams labelled in red, UEFA in green).