Let’s add data to soccer’s debates!

For decades, the soccer world has been populated with pundits offering their opinion as to what happened before, during, and after a match. Around the turn of the 21st century, the soccer world discovered a statistical world beyond counting goals and fouls. These advanced statistics have added a new dimension to how the beautiful game is played and interpreted.

However, for the sake of simplicity, most opinions remain unverified. I’m sure you’ve heard most of these opinions before. For instance:

  • Player X is a good player “because he has a nose for goal”  or “because he sees the game well” or “because he knows when to take on defenders”
  • Team Y is a good team “because they don’t sit on the ball” or “because they hold their shape well” or “because they overload the wide spaces”

Some pundits may be good at using advanced statistics to support their opinions, but even at this level, this anecdotal lacks academic rigor and context.

Still, these opinions are hypotheses, and hypotheses must be tested! And, thanks to the generosity of StatsBOMB, we can test these hypotheses with rigorous data analysis! Enter, this blog!

The Dataset: World Cup 2018 and Women’s World Cup 2019


The StatsBOMB dataset contains data from 112 matches:

  • 64 matches from World Cup 2018, and
  • 48 matches from Women’s World Cup 2019.

Match event data represents every touch of the ball during these games, as well as situational characteristics, such as:

  • Is the player under pressure?
  • Is the player using a body part other than his or her feet?
  • Is the ball on the ground or in the air?

Ideas to be explored


After four months of drowning in new data, I realized I need to have my thoughts more organized if I’m going to actually perform any proper analysis. So, currently on the idea board are the following (and consider this a working document):

  1. How important is the expected goals metric to a team’s offensive and defensive success?
  2. How important are individual players in a team’s defensive system? What does defensive shape look like? How can we evaluate its effectiveness? 
  3. Can we distinguish between classical positional players (i.e., right full-back, left forward) and modern role-based players (i.e., ball-winner, play-maker) in a team’s play?