Data Methodology & Known Limitations

Transparency is a cornerstone of any data-driven project. This page outlines the process of data collection, verification, and the methodological choices made during the creation of this visualization.

Data Collection: An Evolving Process

The initial datasets for England and the Netherlands were gathered through an extensive trial-and-error process using various prompts with Google's Gemini, resulting in CSV files. This early phase revealed the complexities of sourcing consistent historical data. To streamline and improve data quality, the workflow was standardized: all data was migrated to a Firestore database and subsequent data collection for France, Germany, Italy, Portugal, and Spain was performed using a single, optimized prompt to generate structured JSON files.

The standardized prompt used for this process is shown below.

"Genereer een JSON-array met de managergeschiedenis voor de volgende 5 Portugese voetbalclubs: Sporting CP, Benfica, FC Porto, Boavista, en S.C. Braga. Instructies: Periode: De data moet elk seizoen van 1955/56 tot en met 2024/25 bevatten. Volledigheid: Zorg ervoor dat elk afzonderlijk seizoen voor elke club aanwezig is in de output. Sla geen jaren over. Elke club moet precies 70 seizoenen in de data hebben. Manager: Gebruik voor elk seizoen de manager die het grootste deel van dat seizoen de leiding had. Gebruik geen interim-managers. JSON Structuur: De output moet één enkele, grote JSON-array zijn. Elk object in de array vertegenwoordigt één seizoen voor één club en moet de volgende structuur hebben: { "club": "Naam van de Club", "land": "Portugal", "seizoen": "YYYY/YY", "coach": "Volledige Naam van de Coach", "nationaliteit": "Nationaliteit in het Engels", "nat_code": "ISO 3166-1 alpha-2 landcode (in kleine letters)", "landstitel": "Y/N", "nationale_beker": "Y/N", "europese_prijs": "Y/N" } Voorbeeld van een object: { "club": "Sporting CP", "land": "Portugal", "seizoen": "2023/24", "coach": "Ruben Amorim", "nationaliteit": "Portuguese", "nat_code": "pt", "landstitel": "Y", "nationale_beker": "N", "europese_prijs": "N" } Start nu met het genereren van de volledige JSON-array."

Data Verification and Enrichment

After the initial generation, all data underwent a rigorous verification process. While initial plans to build automated API checks into the dashboard proved overly complex, a focused, prompt-based approach with Gemini, using specific online sources, turned out to be the most reliable method.

Trophies

All trophy data was meticulously checked against dedicated Wikipedia pages listing the winners of national titles, national cups, and European competitions. The following definitions were used:

Coaches

The coach listed for each season was manually verified using a combination of Wikipedia manager lists and historical data from Transfermarkt.com. This was a crucial step to ensure accuracy according to the project's core principle: identifying the single, non-interim manager who was in charge for the majority of matches in a given season.

Images

Finding correct images for each coach was a significant challenge. A custom tool was built into the dashboard that queries the Wikipedia/Wikimedia API to find a corresponding image for each coach's name. However, this automated process was not flawless and required manual oversight to correct errors, such as initially displaying an image of the darts player Phil Taylor instead of the former Liverpool manager of the same name.

Methodological Choices & Known Limitations

This dataset is built on a specific set of rules and, like any dataset, has its limitations. The following choices were made to prioritize clarity and consistency in the visualization:

Future Development

This project is a continuous work in progress. A major future ambition is to expand the dataset to include multiple coaches per season. This would allow for a more nuanced visual representation of managerial instability and provide deeper insights into the core research question. (Status: August 2025)