Data Methodology & Known Limitations
Transparency is a cornerstone of any data-driven project. This page outlines the process of data collection, verification, and the methodological choices made during the creation of this visualization.
Data Collection: An Evolving Process
The initial datasets for England and the Netherlands were gathered through an extensive trial-and-error process using various prompts with Google's Gemini, resulting in CSV files. This early phase revealed the complexities of sourcing consistent historical data. To streamline and improve data quality, the workflow was standardized: all data was migrated to a Firestore database and subsequent data collection for France, Germany, Italy, Portugal, and Spain was performed using a single, optimized prompt to generate structured JSON files.
The standardized prompt used for this process is shown below.
Data Verification and Enrichment
After the initial generation, all data underwent a rigorous verification process. While initial plans to build automated API checks into the dashboard proved overly complex, a focused, prompt-based approach with Gemini, using specific online sources, turned out to be the most reliable method.
Trophies
All trophy data was meticulously checked against dedicated Wikipedia pages listing the winners of national titles, national cups, and European competitions. The following definitions were used:
- European Trophy (europese_prijs): A 'Y' was assigned for winning any of the following: European Cup I, Champions League, UEFA Cup, Europa League, UEFA Cup Winners' Cup, Inter-Cities Fairs Cup, or the Conference League.
- National Cup (nationale_beker): A 'Y' was assigned for winning the primary domestic cup competition. In countries with multiple cups, like England (FA Cup and EFL Cup), winning either was sufficient. Winning multiple cups in one season still results in a single 'Y' and one icon in the visualization.
Coaches
The coach listed for each season was manually verified using a combination of Wikipedia manager lists and historical data from Transfermarkt.com. This was a crucial step to ensure accuracy according to the project's core principle: identifying the single, non-interim manager who was in charge for the majority of matches in a given season.
Images
Finding correct images for each coach was a significant challenge. A custom tool was built into the dashboard that queries the Wikipedia/Wikimedia API to find a corresponding image for each coach's name. However, this automated process was not flawless and required manual oversight to correct errors, such as initially displaying an image of the darts player Phil Taylor instead of the former Liverpool manager of the same name.
Methodological Choices & Known Limitations
This dataset is built on a specific set of rules and, like any dataset, has its limitations. The following choices were made to prioritize clarity and consistency in the visualization:
- One Coach Per Season: The primary principle is to display only one coach per season—the one who managed the most matches. This was a difficult choice made to keep the visualization clean and readable. A significant consequence is that it omits the "chaos" of mid-season sackings and the impact of interim managers. For example, in the 2023/24 season for Ajax, Maurice Steijn is listed, even though interim coach John van 't Schip managed more games, as Steijn was the official head coach at the start of the season.
- Edge Cases: Determining the longest-serving manager can be complex, especially in cases of mid-season dismissals around the winter break. The changing length of football seasons over the decades adds another layer of complexity. While every effort has been made to be accurate, this remains a potential weak point in the dataset.
- Naming Conventions: For readability, the name under which a coach is most commonly known is used. This may be a widely used nickname (e.g., 'Petit' for Armando Teixeira) instead of their full legal name.
- Historical Nationalities: Nationalities are recorded as they were relevant in the given time period. This explains the presence of designations for since-dissolved countries, such as Yugoslavia.
- Missing Data (Portugal): During verification, the historical data for Boavista and S.C. Braga from the 1950s to the early 1970s was found to be highly unreliable, with many entries likely fabricated by the AI. These seasons have been marked as "[Data Unavailable]" in the visualization to reflect this.
- Merged Clubs: Paris Saint-Germain, FC Twente, and AZ are all merger clubs. For the early seasons in the dataset, the data shown corresponds to their predecessor clubs (Stade Saint-Germain, S.C. Enschede, and AZ'67, respectively). A clearer visual distinction for these periods is on the to-do list.
Future Development
This project is a continuous work in progress. A major future ambition is to expand the dataset to include multiple coaches per season. This would allow for a more nuanced visual representation of managerial instability and provide deeper insights into the core research question.