Assessment of the Service Quality Measurement Model for Youth Football Academies

Abstract:While football in China is experiencing great enthusiasm from the broad masses due to the unprecedented promotion from the government, to encourage more youth participations in this sport has become an essential for Chinese football revitalization. Under this circumstance, the youth football training industry has been highly capitalized as both foreign and domestic youth football academies have sprung up. Guided by service quality literature, it can be argued that understanding what aspects influence player’s perceived service quality evaluation are necessary to be competitive to gain and sustain youth players for youth football academies in the long-term. Presented herein is a multi-dimensional and hierarchical service quality conceptualization, including four primary dimensions: physical aspects, program, personnel, and personal development. A total of nine sub-dimensions were designed to support these four higher-order dimensions. Data were collected from youth players (n=543) at two youth football academies located in two cities in south-eastern China. In phase one of the study, an exploratory factor analysis was conducted to specify and confirm the factor structure of the proposed measurement scale. Phase two implemented the confirmatory factor analysis to further validate the revised model based on EFA results. The results provide support for internal reliability, convergent validity, and discriminant validity, with 46 items retained and two sub-dimensions (i.e., employee expertise and employee attitude) merged and relabeled as employee trustworthiness. This study made an initiative attempt to provide both theoretical and practical insights in terms of perceived service quality assessment in the context of youth football academies.


Introduction
Empirical evidence supports that sport participation is an important form of children's physical activity (Sabo & Veliz, 2008).From a public health perspective, youth sport participation has been widely linked to positive outcomes, including life skills development and mental health improvement (Cairney et al., 2018).Over the years, an increasing number of scholars have focused on identifying key elements in generating desired outcomes from youth sport participation (Bean & Forneris, 2016).For example, Eccles and Gootman (2002) proposed several features to strengthen the experience of youth sport participation, such as appropriate structure, opportunities for belonging, support for effi cacy Sport Marketing Quarterly, 2022, 31, 305-321, © 2022 West Virginia University and mattering, and opportunities for skill-building.A Canadian sport activity organization named Sport for Life has also highlighted three components (good programs, good people, and good places) for quality sport program assessment (Bean, 2018).To this day, however, only a few studies have examined the measures of assessing service quality in youth sport, which are either too broad to be applied to a specific context (e.g., Shonk & Chelladurai, 2008) or only focused on one specific quality dimension such as program quality (Bean et al., 2018) or outcome quality (Bean & Kramers, 2017).
Moreover, although a plethora of previous studies in the participant sport industry have modelled various dimensions into service quality measurement scales under different scenarios such as fitness and recreation service (Afthinos et al., 2005;Alexandris et al., 2006;Chang & Chelladurai, 2003;Lam et al., 2005), sport centers (Ko & Pastore, 2005;Howat & Assaker, 2016), and sport tourism (Shonk & Chelladurai, 2008), there is a lack of research assessing youth players' comprehensive perceptions of the quality of service they received from sport organizers.This is particularly the case for the youth football training participation in China, for instance, where the government has great ambitions in developing this sport nationwide, but no context-specific measurement models are available for the assessment of service quality (Qian et al., 2017).With over 6,000 football academies currently across the country, youth players' perceptions of service quality have become crucial for both researchers and practitioners (Nong, 2017).As offering quality service to satisfy customers is a key determinant for the success of sport organizations (Ko & Pastore, 2005), it is imperative to clearly define the service quality concept and develop a psychometrically sound measurement tool for youth football participation through a thorough review of existing literature.
A review of the existing literature indicates that the gap-based concept was originally employed to define service quality, with the discrepancy between expectations and actual delivered services (perceptions) used to be viewed as service quality evaluation from customers (Parasuraman et al., 1988).However, methodological and applicable challenges (e.g., fluctuation of customer expectation and application in other organizational contexts) have been identified by further studies.Previous scholars believed that service quality is a long-term attitude while consumer judgements are fluctuant over time, which cannot be simply measured by the expectations-disconfirmation model.Thus, a performance-only concept was then developed to overcome these shortcomings (Cronin & Taylor, 1994;Murray & Howat, 2002).The youth football training industry is not exempt from these concerns.Football training service is a long-term process that includes various elements of the service encounter such as the service environment and service delivery process as well as the service program and outcome.That is because players need to participate for a certain period of time to evaluate their perceived service quality from these aspects.Besides, the context-specific nature is another issue to be considered, as simply employing existing measurements developed for other industries might lead to missing the uniqueness of youth football academies.
Against this background, the purpose of this study is twofold: (1) to conceptualize a multidimensional and hierarchical model for the perceived service quality evaluation of youth football participation and (2) to detail the psychometric properties of the proposed model through a systematic scale development process.This study therefore aims to fill the conceptual knowledge void existing among youth football academies by providing a comprehensive and industry-specific instrument for quality evaluation.A valid measurement tool can provide practitioners with reliable evaluations of service performance from the eyes of youth participants.Consequently, this study can aid future research by laying a foundation for further examination of perceived service quality and offering a practical tool for evaluating the service quality of youth sport participation.

Multidimensional Model of Service Quality in Sport
As Tokuyama and Greenwell (2011) argued, sport can be consumed in two different ways, either as spectator sport or as participant sport.For spectator sport services, previous studies have developed specific service quality measurement models for different sport spectating settings such as football matches (Theodorakis et al., 2013), American football (Wakefield et al., 1996), and baseball games (Yoshida & James, 2010).Many researchers have then extended the concept of service quality to participant sport, mainly by focusing on the fitness and recreation service (Afthinos et al., 2005;Alexandris et al., 2004;Chang & Chelladurai, 2003;Ko & Pastore, 2005;Lam et al., 2005) and sport tourism (Andam et al., 2015;Shonk & Chelladurai, 2008).Among them, a hierarchical structure tends to be proposed (e.g., Howat & Assaker, 2016;Ko & Pastore, 2005;Lam et al., 2005;Shonk & Chelladurai, 2008).Studies conceptualizing service quality scales have supported the multidimensional structure and provided insights to help the understanding of various dimensions for all sport in general and participant sport in particular.Although their work is of value, the quality models they proposed are too specific to be applied universally, since they do not fully reflect the unique service attributes of youth football academies, while neglecting the much-needed context-specific validity.While the comprehensive review of the previous literature in service quality offers a deep understanding of the conceptual scales in participant sport, the need for establishing a context-specific and hierarchical service quality scale has been argued repeatedly (Howat & Assaker, 2016;Ko & Pastore, 2005).In response to this, it is argued that a systematic investigation is needed to develop a multidimensional service quality model to assess the perceived service quality of youth football participation as shown in Figure 1.

Model Conceptualization
Building on existing service quality measurement scales in participant sport (e.g., Howat & Assaker, 2016;Ko & Pastore, 2005;Lam et al., 2005), the model proposed in this study has a hierarchical structure with multiple-dimensions on three levels.Firstly, the physical aspect mainly focuses on facilities and physical surroundings in the previous scales (Andam et al., 2015;Afthinos et al., 2005;Alexandris et al., 2004;Ko & Pastore, 2005) and is comprised of three subdimensions: equipment, ambiance, and convenience.Secondly, the program quality is perceived from two aspects: the range of program (e.g., programs for different age groups, programs for diverse levels of players, and flexible program schedules) and the feasibility of receiving up-to-date program information.Thirdly, the personnel quality mainly focuses on the appearance, knowledge, attitude, problem-solving abilities, courtesy, responsiveness, and empathy of the way the service is delivered (Chang & Chelladurai, 2003;Ko & Pastore, 2005;Lam et al., 2005).The three subdimensions of employee expertise, employee attitude, and employee performance are incorporated to assess the interaction quality.Lastly, personal development is comparable to outcome quality in previous studies, which represents the "perceived physical, psychological, and sociological benefits" (Alexandris et al., 2012, p. 62) throughout the service process.A two-dimensional personal development descried by sociability and individual improvement is included.Overall, the hierarchical service quality model consists of four primary dimensions and 10 subdimensions (see Figure 1).

Physical Aspect
In this study, the term physical aspect is used in assessing youth football academies.Three subdimensions (equipment, ambiance, and convenience) are identified to contribute to the analysis of the quality of the physical aspect.Firstly, the attributes associated with equipment and materials (e.g., modern-looking training equipment, variety of equipment, and overall maintenance of equipment) can be found in studies of fitness-health clubs (Afthinos et al., 2005;Aslan & Kocak, 2011;Lam et al., 2005).Also, the supporting facilities are also identified in previous service quality studies, such as parking facility (Afthinos et al., 2005;Ko & Pastore, 2005;Lam et al., 2005), snack bar, and accommodation (Costa et al., 2004;Lam et al., 2005).
Secondly, Kim and Kim (1995) included the non-visual elements of service surroundings, such as comfortable temperature, adequate space, brightness, cleanliness, and pleasant interior, as ambient attributes.Items related to lightning, surrounding décor, and safety were also identified in other studies (Costa et al., 2004;Howat et al., 1996;Shonk & Chelladurai, 2008).A favorable training ambiance can help youth players maintain good physical and mental conditions in training, both of which are beneficial for their training performance and skill improvement (Maughan et al., 2004).
Finally, the term convenience normally refers to the measurement of access to locations, service change options (e.g., approaches to pay bills, easy to change membership services), and hours of operation (Andam et al., 2015;Afthinos et al., 2005;Chang & Chelladurai, 2003;Ko & Pastore, 2005;Shonk & Chelladurai, 2008).In a recent study conducted by Yoshida and Nakazawa (2016), convenience innovativeness was proposed as one single dimension for the service innovative sport consumption scale.Similar to Yoshida and Nakazawa's (2016) model, the concept of convenience is proposed as a subdimension under physical aspect.Taking all the above into consideration, the following hypotheses are posed: H1: Equipment significantly influences service quality in youth football.
H2: Ambiance significantly influences service quality in youth football.
H3: Convenience significantly influences service quality in youth football.

Program
In the youth football training context, academies that provide various types of programs covering different age groups, levels, and time slots with up-to-date program information are usually perceived as better service quality by participants (Howat & Assaker, 2016).Two major sub-dimensions are conceptualized under the program dimension in this study, which include the range of program (e.g., Afthinos et al., 2005;Aslan & Kocak, 2011;Costa et al., 2004;Howat et al., 1996;Kim & Kim, 1995;Ko & Pastore, 2005;Lam et al., 2005) and program information (e.g., Afthinos et al., 2005;Aslan & Kocak, 2011;Howat et al., 1996;Kim & Kim, 1995;Ko & Pastore, 2005;Lam et al., 2005).For the range of program that represents the diversity of courses or classes that are provided for participants to choose, previous studies in participant sport have indicated the great importance of participants' quality evaluations (Afthinos et al., 2005;Howat et al., 1996;Ko & Pastore, 2005;Lam et al., 2005).
Program information refers to the access to upto-date information for programs, which could help customers better understand the programs offered (Ko & Pastore, 2005).It also includes the availability of program time arrangement, appropriate class size and age groups, program brochure, and so on (Chang & Chelladurai, 2003;Howat & Assaker, 2016;Ko & Pastore, 2005;Lam et al., 2005).Building on from previous research, the attributes of the range of program is assessed to see whether the academy has configured separate training sessions from beginner to advanced players, offered echelon building for different age groups, and organized various activities and competitions for players.Moreover, up-to-date information, accurate and user-friendly instructions, and informative brochures for the program introduction can be considered program information elements as well.Therefore, the following hypotheses are proposed: H4: Range of program significantly influences service quality in youth football.
H5: Program information significantly influences service quality in youth football.

Personnel
Personnel quality has a direct influence on customer's perception toward the service providers as most types of service involve interactions (Lam et al., 2005).It represents the dynamic interactions between service employees and customers during the service encounter process (Ko & Pastore, 2005).In this study, employee expertise, employee attitude, and employee performance are incorporated to assess the interaction quality between coaches and youth players.
Firstly, service personnel's expertise refers to staff 's professional knowledge and skills, which originates from the assurance dimension under the SERVQUAL scale (Parasuraman et al., 1988).In participant sport, all reviewed studies have reconfirmed the importance of employee expertise (e.g., Afthinos et al., 2005;Aslan & Kocak, 2011;Costa et al., 2004;Ko & Pastore, 2005;Lam et al., 2005;Shonk & Chelladurai, 2008).The lack of experienced and professional coaches can be a significant reason leading to poor development of youth and grassroot football (Connell, 2018).It is worth noting that many academies have now initiated football-specific initiatives in regard to this.For instance, bringing foreign coaches equipped with professional qualifications and knowledge is viewed as a key differentiating factor (Lee & No, 2020).Indicators of employee expertise are designed with attributes of staff 's knowledge and skills, responsibility, and consistency.
Secondly, many scholars have documented the importance of employee attitude, including courteousness and respect (Afthinos et al., 2005;Aslan & Koçak, 2011;Shonk & Chelladurai, 2008), understanding of customers' needs (Afthinos et al., 2005), and dealing with complaints (Howat et al., 1996;Lam et al., 2005).Youth football training is considered a sport participation with high levels of interaction between coaches and players.The coaches' manner of speaking and overall attitude can thus influence the players' subjective evaluation of staff quality.
Finally, employee performance can be understood as the actual behavior of a service provider during the service delivery (Parasuraman et al., 1988).Those indicators that relate to the measurement of on-time service and promise keeping in their models can be included to the evaluation of coaches' actual performance (Aslan & Kocak, 2011;Lam et al., 2005).Based on the previous discussion, coaches' teaching effectiveness, communications with parents and children, and the promise service made by academies contribute to the factor of actual coach performance.As a result, the following hypotheses are developed: H6: Employee expertise significantly influences service quality in youth football.
H7: Employee attitude significantly influences service quality in youth football.
H8: Employee performance significantly influences service quality in youth football.

Personal Development
Personal development is referred to as outcome quality in previous studies, which represents what customers gain when the act of the service ends (Martinez Garcia & Martinez Caro, 2010), and it is considered two-dimensional (sociability and individual improvement) in this study.Firstly, sociability indicates customers' gains from social interaction (Ko & Pastore, 2005).Previous studies have applied elements of social experiences, such as social-interacting opportunities, sense of enjoyment among family and friends, and socialization by participating in sport, into their service quality measurement scales (Afthinos et al. 2005;Howat & Assaker, 2016;Ko & Pastore, 2005).These service attributes offer researchers a closer perspective to understand the sport participants' emotional response (Ko & Pastore, 2005).For youth football participants, sociability refers to more implicit achievements including teamwork spirit, respect to coaches and teammates, persistency, and so on.
Further, individual improvement represents physical, mental, and ability benefits received after the service consumption process (Brady & Cronin, 2001).Competition success, physical change (e.g., improving health), psychological well-being (e.g., reducing stress and improving mood), and relaxation and stress release have been all confirmed as ingredients of outcome quality (Alexandris et al., 2004;Howat & Assaker, 2016).Individual improvement in the youth football training setting is thus related to elements of physical fitness change, skills and techniques progress, and psychological gains.Hypotheses are thus proposed as follows: H9: Sociability significantly influences service quality in youth football.
H10: Individual improvement significantly influences service quality in youth football.
Taking all of the above into consideration, the service quality scale is proposed as a hierarchical structure with four primary dimensions supported by 10 subdimensions.The method employed to test the reliability and validity of these measures is described next.

Research Participants and Data Collection
To test the proposed model, rigorous analytical techniques were conducted, and youth football participants in China were targeted for data collection.The context was chosen due to the unprecedented boom of football in China that has stimulated the flourishing development of youth football training industry, making it a fertile ground for further investigation (Connell, 2018).Research ethics were approved by the primary researcher's university according to the institutional guidelines.
The process of data collection in this study went through two stages.In the first stage, the researchers used their own personal connections in the youth football training industry in China to recruit desirable participants.The researchers contacted the sport director of one overseas-oriented academy and explained the aim and process of conducting the questionnaire survey among youth football players.Due to the target sample being underage participants, all instructors were asked to send all the documents to their parents or legal guardians, and consent was sought for their participation in the study.Once approved, the survey link was then sent to players.However, invalid data needed to be excluded due to missing values of several items and/or a repeated or consecutive pattern of answers.After the deletion of poor-quality data, the remaining 134 datasets were lower than the expected number of samples required for the analyses.Thus, the researchers contacted a Chinese marketing and consulting company named iResearch, which is the leading provider of data products, analytics, and consulting services in China, to further collect data.As iResearch is headquartered in Shanghai, an overseas-oriented football academy in Shanghai was targeted for the second stage of data collection, and 409 datasets were collected with a non-probability convenience sampling technique.Together with the 134 datasets in stage one, a total of 543 valid datasets were finally obtained, which is appropriate for statistical analysis.

Item Generation
An initial pool of items was mainly generated through a systematic review of existing service quality literature in participant sport (e.g., fitness-health clubs, recreational sport, and sport tourism) for the specified dimensions.It is necessary to note that some items were self-generated due to a limited number of service quality studies on youth football training.The process of item generation at this stage was to form the initial item pool that clearly and accurately represents the proposed dimensions in the context of youth football academies in China.As a result, the initial pool with 102 items measuring 10 dimensions was prepared for further qualitative examinations.

Content Validity
In order to assess content validity of the items, a panel of five experts including four knowledgeable scholars in the fields of sport marketing and/or scale development in the United Kingdom and Singapore and one qualified U14 elite football head coach in China were invited.The role of the panel of experts was to determine whether the contents of the measures represented all facets of their respective construct or if there were any missing components.The panel was also asked to provide comments on other aspects of the item quality such as clarity, conciseness, grammar, face validity, redundancy, and the need to add or delete items (Worthington & Whittaker, 2006).Based on the feedback from the panel members, 38 out of 102 items were revised: Some words that were too absolute or strong were toned down and ambiguous phrases were clarified to be more accurate for participants' understanding.In addition, it was suggested that five items were moved to different dimensions.After eliminating nine items from the initial item pool because of irrelevance or irrationality, 93 items were brought to the next stage.

Q-sort Analysis
After the content validation, a Q-sorting analysis was implemented to evaluate the associations between the hierarchical dimensions and items related to the dimensions.Participants for the Q-sorting analysis of this study were recruited from a wide pool of full-time students enrolled in sport-related postgraduate programs.The Q-sort survey was then completed by 39 postgraduate students attending a large university in the UK, with 28 of them being masters' students.All 93 items were randomly shuffled, and the participants were asked to sort items into specific dimensions that best described their opinions.To evaluate the Q-sort survey feedback, the criteria suggested by Ekinci and Riley (2001) were adopted.They asserted that an item could be retained as long as it was allocated to the same category by over 60% of the participants (Ekinci & Riley, 2001).After removing 16 items that did not meet the cut-off, the remaining 77 items were retailed for the questionnaire.

Instrument Design
Randomly shuffled items were presented in the third part, including 10 dimensions: equipment (7 items); ambiance (8 items); convenience (9 items); range of program (7 items); program information (9 items); employee expertise (8 items); employee attitude (9 items); employee performance (6 items); individual improvement (7 items); and sociability (7 items).A 7-point Likert scale, anchored with strongly disagree (1) and strongly agree (7), was used.In addition, several demographic items such as age, gender, education level, and name of football academy were included to understand the participants' basic demographic profile.To ensure the quality of the translated questionnaire, the primary English version was translated into the Chinese version, using a back-translation method (Brislin, 1990).The original English version was firstly translated into Chinese by a bilingual Chinese individual who was fluent in both languages.Then, one Chinese doctoral researcher, who was fluent in English and had not seen the original version, was asked to translate it back into English.After that, another doctoral researcher was asked to assess the differences between the two versions and suggest modifications to enhance the accuracy of the translation.Finally, the back-translation version was examined and corrected, and it was then back-translated into Chinese again to form the final questionnaire in Chinese.

Data Analysis
Data were randomly split into two sets and analyzed in the following steps: preliminary analysis and exploratory factor analysis (EFA) for the first sample set and preliminary analysis and two confirmatory factor analyses (CFAs) for the second sample set.The preliminary analysis drew an overall picture of the data by screening data, identifying outliers, checking normality, and testing internal consistency of the measures, which are essential preparations for a further analysis (Hair et al., 2010).Then, EFA was undertaken to reduce the large number of variables into smaller sets and group them together (Kline, 2011).This helped to explore inappropriate items and examine whether items load on their dimensions.After that, a two-step method was employed to conduct the CFA based on the EFA-derived scale.Specifically, the measurement model with nine lower-order factors was assessed in the first place.In this stage, the relationships between the observable items and their latent constructs (i.e., nine subdimensions) were specified in the model, and the overall model fit, reliability, and validity of the individual measures were examined.Next, the full measurement model was tested by determining the relationships between the nine sub-dimensions and the four primary dimensions as well as the relationships between the four primary dimensions and the overall service quality.Overall model fit, convergent, and discriminant validity were tested as well.

Preliminary Analysis
Data screening was conducted in the initial data preparation phase.Extreme responses including consecutive patterns (e.g., 7, 7, 7 …) and repeated patterns (e.g., 6, 7, 6, 7 …) were deleted as they could harm the quality of the data and then bias results for further analyses (Meade & Craig, 2012;Tabachnick & Fidell, 2001).Demographic information gathered included sex, age, and education level.For the first data set for EFA (n = 203), male and female participants comprised of 97.45% and 2.55%, respectively.The age of respondents ranged from 7 to 16 years old.The majority of participants fell into groups of nine and 11 years old, which represented 13.79% and 14.78%, respectively.The less represented age group was 16 years old, accounting for 1.48%.In addition, respondents were asked for their education level.Most youth players attended primary school (71.43%) and just over a quarter (26.11%) were in middle school.Only 2.46 % of participants were in high school.
All measures were subjected to examinations of normality and outliers.Based on the rule of thumb from Kline (2011), variables with absolute skewness values and kurtosis values less than two are considered to demonstrate a normal distribution.Besides, outliers are items with standardized values (z-scores) of four or higher and could also add bias to the results (Kline, 2011).Descriptive statistics showed that skewness and kurtosis values ranged from -0.46 to 1.04 and from -1.06 to 0.15, respectively.No outlier was found either.The results showed that all items met the criteria and thus were acceptable, and no item was deleted.
The reliability was also examined prior to the EFA.Cronbach's alpha (> .70)and correlated item-to-total correlation (> .50)values were calculated to examine internal consistency of the measures (Hair et al., 2010).The Cronbach's alpha of all dimensions was higher than .70,ranging from .85 to .95.However, several items showed low item-to-total correlation values, which needed to be removed to strengthen the internal consistency.More specifically, two items (AM5 and AM6) in ambiance, one item (CO8) in convenience, one item (RP7) in range of program, two items (PI8 and PI9) in program information, one item (EA4) in employee attitude, two items (IN6 and IN7) in individual improvement, and two items (SO4 and SO5) in sociability revealed the item-to-total correlation scores below the .50cut-off.Eleven items were removed from the reliability tests, and thus 66 items were retained for further EFA.

Exploratory Factor Analysis
Using the first split sample set (n = 203), the internal consistency test and EFA were conducted in step one.Eleven items were removed during the reliability test due to low item-total correlation scores that were below the .50cut-off (Nunnally & Bernstein, 1994), and the remaining 66 items were ready for EFA.For EFA, the KMO statistic was generated at .95, which is above the minimum requirement of .70 (Hair et al., 2010).This indicated that over 95% of variance was common variance and that the variables were satisfactory and adequate for factor analysis.Furthermore, the Bartlett's test of sphericity had a value of p < .05,which was meaningful to a factor analysis.Next, a total of 11 factors with eigenvalues ranging from 1.04 to 30.06 were extracted, explaining 76.90% of the total variance.Finally, the results of the rotated component matrix detected several problematic items with loadings less than .40,loading on the wrong factors, merged factors, and cross-loading items; 14 items (EQ3, EQ4, CO6, CO7, CO9, EE1, EE2, EE3, EE5, EE7, EE8, EA2, EA8, and S2) were then eliminated.In addition, employee expertise and employee attitude were merged on the same factor and relabeled as employee trustworthiness.
In conclusion, the refined model had a hierarchical structure with four primary dimensions supported by a total of nine subdimensions.A total of 52 items were retained for a further validation test.Table 1 shows the results of the EFA.

Step One: CFA for the First-Order Measurement Model
The overall model fits were examined in the first place and adequate; χ 2 (1238) = 1883.19,χ 2 /df = 1.52,RMSEA = 05, SRMR = .06,TLI = .91,CFI = .92(Hair et al., 2010;Kline, 2011).For the reliability test, the CR values ranging from .87 to .93 of all dimensions were satisfactory.For convergent validity, nine subdimensions were measured by 52 indicators.Specifically, one item (ET8) had quite a low loading (.16), and five items' (i.e., ET7, PI5, PI7, EP3, and EP4) factor loadings ranging .57to .69 were close to but still lower than the cut-off of .707,thus all removed from the first CFA.All dimensions had AVEs above .50,ranging from .55 to .71,supporting convergent validity of the measures (see Table 1 for more details).Details of the individual factor loadings, CRs, and AVEs are presented in Table 2.
Moreover, in relation to discriminant validity, correlation estimates between dimensions need to be lower than .85(Kline, 2011).Table 3 shows that the correlation estimates for all nine dimensions, ranging from .07 to .72, were lower than this threshold.Furthermore, the squared root of the AVE values of all subdimensions ranging from .74 to .84 were greater than their respective correlation coefficients, indicating that the indicators had more in common with their respective dimensions than the other dimensions in the study domain, which supported discriminant validity.
Step Two: CFA for the Second-Order Measurement Model Since the current model was developed as a hierarchical and multilevel structure, the four primary dimensions were proposed as second-order dimensions that explained nine first-order subdimensions.Therefore, a second-order CFA was conducted to test the relationships between the four primary dimensions and the nine first-order subdimensions using the remaining 46 indicators.
Firstly, the model fit indices for the second-order model were found to be adequate: χ 2 (1261) = 1445.72,χ 2 /df = 1.48,RMSEA = .05,SRMR = .07,CFI = .93,TLI = .93.Similar to the first-order CFA results, the CR values for all the four primary dimensions ranging from .71 to .84 were over the .70cut-off (Hair et al., 2010; see Table 4).Therefore, the internal consistency was demonstrated, and the construct reliability for the second-order model was also established.Next, with regards to convergent validity, three dimensions (equipment, convenience, and individual improvement) had factor loadings lower than the cut-off of .707value, which were .62,.56. and .52,respectively (see Table 4).In addition, the AVE value for physical aspects including equipment, ambiance, and convenience was .46,revealing a lack of convergent validity.Finally, for the discriminant validity, the correlation estimates among the four primary dimensions were all under the .85cut-off, ranging from .29 to .56 (Kline, 2011).Moreover, all the four primary dimensions' squared root of AVE values ranging from .68 to .85 were larger than their corresponding inter-construct correlation estimates ranging from .29 to .56, which offered further evidence for discriminant validity.Hence, the dimensions in the model were deem distinct from each other and satisfied the requirements of discriminant validity.

Discussion
This study sought to propose and validate a service quality framework for youth football academies while demonstrating the hierarchical structure with four higher-order dimensions that are further supported by nine subdimensions.Compared with the conceptual framework, the final results indicate that the hierarchical measurement scale is psychometrically sound with some modifications made.
In the EFA stage, 14 items were identified as problematic through the examination of the rotated pattern matrix, such as low factor loading values, cross-loading, and improper loading factors.Specifically, EQ3, EQ4, CO6, CO7, CO9, and SO2 were eliminated due to ambiguous words used in the item description which caused biased understanding.The words "sufficient" and "well maintained" used in EQ3 and EQ4, for instance, were too ambiguous to be clearly comprehended and thus resulted in misunderstandings for the respondents.Furthermore, the dimensions of employee attitude and employee expertise were merged and relabelled as employee trustworthiness, since many items proposed for these two dimensions loaded on a single factor.This phenomenon is identified as under factoring, which indicates extracted variables are mixed to load on the wrong factor (Fava & Velicer, 1996).For instance, the phrases such as "to organize" and "help us" composed in EE4 in the employee expertise dimension might induce the respondents to perceive this item as the evaluation of coaches' initiative and enthusiasm in organizing and helping during trainings, which is not relevant to coaches' professionalism.The results are consistent with previous studies that include both expertise-and attitude-related attributes under one single dimension (Lam et al., 2005;Shonk & Chelladurai, 2008).After all this, 52 items were finally retained for the CFA.
Two CFAs were then conducted toward the first-order and second-order factor models independently, and the model goodness-of-fit, reliability, and validity were examined.All in all, satisfactory results were obtained throughout the two-stage CFA, showing strong evidence of the psychometric properties of the measures in both models.To be specific, all model fit estimates for the first-order CFA were acceptable, which demonstrated a good global fit.For the internal  fit, ET8 (γ = .16),ET7 (γ = .57),PI5 (γ = .68),PI7 (γ = .68),EP3 (γ = .68),and EP4 (γ = .69)indicated that those items had more unique variance than common variance and thus were removed.The data evidenced reliability, convergent validity, and discriminant validity for the first-order measurement model.For the second-order CFA, the results of both absolute and relative model fits were satisfactory.After eliminating six items based on first-order CFA results, the factor loadings of the remaining 46 items were satisfactory (> .707).However, problematic factor loading values in higher-order dimensions were identified.The relationships between equipment and physical aspect (.62), convenience and physical aspect (.56), individual improvement and personal development (.52), physical aspect and overall service quality (.64), personnel and overall service quality (.69), and personal development and overall service quality (.42) were lower than the .707threshold.It needs to be noted that the employment of CFA in evaluating factorial validity is not straightforward, especially when examining a multidimensional scale with many psychometric measures (Hopwood & Donnellan, 2010).Hopwood and Donnellan (2010) added that using same criteria used for relative short and simple models is not appropriate for lengthy and complex measures.
It is lopsided to reject a model solely depending on a weak result of model fit index or factor loadings, but other types of validity results should not be ignored.Taking the model complexity into consideration, the .707cut-off could be too stringent to determine its validity.As the CR values of all dimensions were acceptable, convergent validity was adequate, even though the AVEs of physical aspect dimension were less than .50.The validated hierarchical and multidimensional structure is in line with previous studies (Ko & Pastore, 2005;Howat & Assaker, 2016;Shonk & Chelladurai, 2008).For instance, three subdimensions under the physical aspect that have been widely identified in the participant sport industry are considered to be essential for youth football academies (Alexandris et al., 2004;Andam et al., 2015;Aslan & Koçak, 2011;Ko & Pastore, 2005;Lam et al., 2005).Moreover, the two-dimensional personal development quality is quite similar with the study of Ko and Pastore (2005).Therefore, considering the model complexity with multi-order dimensions, the second-order CFA is deemed to exhibit a psychometrically sound model.A total of 46 items were finally retained for the current service quality measurement model.

Theoretical Implications
Since previous studies have asserted that service quality evaluation is a vital practice in creating superior value and fulfilling customers' satisfaction, it is desirable to understand the determinants of service quality in youth football academies from the players' perspective.Toward this end, this multidimensional and hierarchical conceptualization is presented and validated, which offers a significant theoretical contribution to the quality assessment of participant sport service in two ways.Firstly, by extending the previous scales of participant sport, the proposed hierarchical model contributes to the sport marketing literature by establishing the construct validity of the proposed scale.Specifically, although Ko and Pastore (2005) conceptualized and tested the scale of service quality in recreational sport (SSQRS) model for recreational sport segments, it is still too contextual and cannot necessarily be utilized beyond the context of campus recreational sport service.In summary, the hierarchical and integrated model of service quality may fill the conceptual gap of lacking in understanding of service quality assessment for youth football participation, which exists in the wider research realm of service quality in the participant sport industry.
Secondly, this study offers a deeper examination of outcome quality for participant sport from a more  discrete rather than blended perspective, which is similar to Ko and Pastore (2005), and Howat and Assaker (2016), who proposed multiple subdimensions to represent the overall outcome quality.Outcome quality has a similar nature with personnel quality, where participants are cocreators of service and their perceived quality is formed through the interactions during service encounters.As a result, they tend to assess outcome quality based on the results of social, physical, and mental changes, which are less controlled by service providers.However, previous studies have ignored mental improvement attributes of participants yet loaded all outcome items on a single dimension (Alexandris et al., 2004;Ko & Pastore, 2005).In youth football academies, players come to learn football with different purposes (e.g., to make more friends, to improve football skills, or to cultivate strong teamwork spirit).Therefore, these cocreation outcome perceptions that players might have about academies are necessary to be differentiated to form their quality assessment toward different aspects.The CFA results revealed clear evidence of convergent and discriminant validity of the measures in sociability and individual improvement, providing strong support for the multidimensional structure of personal development quality.Future research could consider rectifying measures to be relevant to different purposes and constructs rather than combining them into a universal dimension.

Practical Implications
The present findings provide key practical implications for the youth football training industry.Firstly, the proposed model can provide academy managers with a reliable and valid analytical tool for measurement of their youth players' perceptions of different service quality aspects.Academy managers can obtain scores across all subdimensions, which help them to predict results of four primary dimensions and overall service quality as well.These quality perception scores are practical evidence for academies to learn their merits and demerits from the view of the players.More specifically, the nine dimensions can be used to identify potential problems in delivering training service and then offer guidance for future improvements.They can track the level of service perceived by players and adjust their service performance to satisfy various needs.For example, when players show greater attention on quality of coaches' credibility and reliability, managers can facilitate more work trainings to their coaches and make them to show greater enthusiasm and willingness to help during the training delivery.Youth football academies would be more successful if they better understand what ingredients influence their players' quality assessment from either an overall abstract level, a dimensional level, or a subdimensional level and thus improve their on-pitch and off-pitch services.

Limitations and Future Research
This study has limitations that should be acknowledged.The first limitation is the geographical location of the research population; as such, the findings might be difficult to generalize beyond Chinese youth football academies.The research data for the present study were drawn from academies in two southeastern cities in China by using a convenience sampling strategy, thus limiting the generalizability of the findings.Future research could test whether the current conceptualizations of the dimensions and the subdimensions can be empirically supported in a cross-cultural setting for youth sport participation, thus enhancing external validity.Due to restricted data characteristics, additional work is needed to investigate the validation of the measurement scale across different populations.Moreover, although the demographic characteristics of respondents covered both male and female players, a low data volume from female players may limit the strength of the proposed model.Besides, several higher-order subdimensions and dimensions (i.e., equipment, convenience, individual improvement, physical aspect, personnel, and personal development) had relatively low loadings due to the complexity of the hierarchical and multidimensional model.Future research needs to further explore these factors with different data to see if the factor loadings can be improved.

Conclusions
This study was designed to establish the measurement mode for perceived service quality of youth football academies through conceptual and empirical developments.Consistent with existing service quality scales in the participant sport segment (e.g., Howat & Assaker, 2016;Ko & Pastore, 2005), the proposed scale in this study supports the notion that a hierarchical structure with multiple dimensions is acceptable to evaluate young players' perceptions.The final service quality scale consists of nine subdimensions with 46 items: equipment (five items), ambiance (six items), convenience (five items), range of program (six items), program information (five items), employee trustworthiness (six items), employee performance (four items), sociability (four items), and individual improvement (five items).These nine subdimensions further support the four primary dimensions (physical aspects, program, personnel, and personal development), and the overall service quality is the highest order in the hierarchical structure.This study provides an insightful understanding to advance our knowledge of perceived service quality among players in youth football academies.The proposed model also serves to enrich sport marketing research by filling the gap existing in the conceptualization of service quality in the youth football training industry and is thus particularly useful in relation to the complexity of the quality-assessing process for both youth participants and service providers.

Figure 1 .
Figure 1.The conceptual service quality measurement model

Table 1 (Continued). Results of Exploratory Factor Analysis (n = 203)
*Items removed after CFA; the remaining 46 items in this table were retained at the end of the study. *