figshare
Browse
DatabaseAB_anonymized.xlsx (352.71 kB)

anonymized_database.xlsx

Download (352.71 kB) This item is shared privately
dataset
modified on 2024-02-16, 13:26

The dataset for the empirical study was constructed entirely from secondary data that can be accessed online at the websites www.cqranking.com, www.procyclingstats.com, several news websites and social media. All cyclists who held a position in the top 200 of the CQ ranking at the end of the year between 2010 and 2019 are part of this study. The cycling seasons of the years 2020 and 2021 are not considered due to many races cancelled due to the COVID-19 pandemic. 


  

Performance

The variable to be explained, performance, is defined as the number of CQ points obtained during the studied period of 120 (resp. 150) days. For both time spans, performance is slightly higher in the reference period, with 173.68 (209.43, respectively) CQ points compared to 148.01 (189.42, respectively) CQ points after child birth.

Childbirth

The central explanatory variable in the analysis, "childbirth", tests whether the performance of cyclists during a period of 120 (resp. 150) days after the birth of a child differs from the performance during the same period in the previous year. This variable is 1 for the test period after the birth, and 0 for the reference period one year earlier.

Number of race kilometers

The number of race kilometers during the periods under study is an important control variable in the regression. Logically, a higher number of race kilometers provides more opportunities to score points, and so performance should be higher for riders who have ridden more kilometers. As Table 1 indicates, the average race kilometers for both time periods is slightly lower after having a child than in the previous year. However, a T-test shows that the null hypothesis of a zero difference cannot be rejected (p>0.1).

Age

Research from ProCyclingStats has shown that riders with a career of ten years or more reach a performance peak on average at the age of 28, and their performance begins to steadily decline from the age of 30 (Donlevy, 2020). In the broadest sample from this study, the mean age of the rider at the date of birth of his child is 29.24 years (and 28.24 years at the start of the reference period), with a standard deviation of 3.36 years, which is mainly in the peak period, before performance starts to decline. 

First year with current team

According to previous research, the number of years a rider has been riding with a particular team does not impact his performance (Prinz & Wicker, 2012; Rodríguez-Gutiérrez, 2014). However, Rodríguez-Gutiérrez (2014) does find a significant negative effect for riders who have just changed teams and thus are riding for the first year with a particular team. Therefore, we include the variable "first year with current team", a dummy variable that equals 1 when the rider rides with a different team during the time period under study than in the year before. This is the case for just over a quarter of the observations in the dataset. Table 2 shows the frequencies of all categorical control variables in more detail.  

Relative role in the team

There exists a certain hierarchy in cycling teams, whereby the higher-ranking team leaders ('leaders') are supported by lower-ranking helpers ('domestiques') who sacrifice their own chances of winning. For this reason, it can be assumed that riders perform better the stronger their leader's role is. This study measures relative leadership ("Relative role in the team") as the CQ score of a rider divided by the average CQ score of his teammates (excluding the rider) during the period under study. This variable thus measures how many times a rider is stronger than the average of his teammates over the considered period. 

Team strength

A logical hypothesis is that having a stronger team positively influences the performance of a rider. This hypothesis assumes that the riders under study are team leaders. Nevertheless, the performance of team leaders can also be negatively affected by a higher team quality if this means that there are multiple leaders, for which helpers have to divide their support among these different leaders (Prinz & Wicker, 2012). The variable "Relative role in the team" will control for these issues. In line with the approach of Rodríguez-Gutiérrez (2014), a proxy variable for team strength is constructed. This variable is defined as the total number of CQ points accumulated by the rider's teammates (excluding the rider's own CQ points) over the period under study. 

Proportion of races in home country

The variable "proportion of races in home country" was calculated as the number of races organized in a rider’s home country in which he participated during the studied period, in relation to the total number of races in which he participated during that period. The theoretical assumption, which was empirically confirmed by Rodríguez-Gutiérrez (2014), is that riders are more familiar with the terrain in their home country and/or more supported by local supporters there, so they perform better when they participate in more races in their home country. 

Olympic Games

In general, different cycling seasons in the 21st century can be compared well based on the CQ scores, since the races with a lot of CQ points to earn are organized at about the same time each year. An important irregularity are the Olympic Games. They only take place once every four years, but they do offer a lot of CQ points (400 for the winner of the road race and 250 for the winner of the individual time trial). The event may also have an impact on riders who do not participate in the Games, since the field of participants will on average be less strong in the other races taking place around this period. To control for this, a dummy variable "Olympics" is added, which takes the value 1 when an Olympic cycling competition takes place during the relevant period.