On-farm welfare assessment of dairy goat farms using animal-based indicators: the example of 30 commercial farms in Portugal

ABSTRACT Welfare assessment can play multiple roles in the path to welfare improvement. In the dairy goat area, identification of the main welfare problems across countries and different production systems is needed. By the application of a prototype welfare assessment protocol, based on animal-based indicators, we aimed to provide an insight into the main welfare problems affecting intensively kept dairy goats in Portugal. Thirty farms, organised in three size categories, were assessed. The main areas of concern were claw overgrowth, queuing at feeding and hindquarter dirtiness, with larger farms heading higher concerns. Additionally, this paper aimed to investigate indicators’ consistency over time. Ten of the 30 farms were revisited four months later, during which no major husbandry changes were made. Our results showed an overall consistency. This study can help define intervention thresholds or minimum legal levels for each indicator, by determining their overall prevalence.


Introduction
For the past 20 years a worldwide growing interest in goat milk and goat milk products has taken place, with the most developed programmes for selection, processing and commercialisation of goat milk being situated in Europe (Morgan et al., 2003;Dubeuf et al., 2004). Europe is the only continent in which goat milk represents a substantial commercial importance (Dubeuf, 2010), and the largest producers of goat milk in 2012 were France (0.6 million tons), Spain (0.44 million tons) and Greece (0.4 million tons) (FAOSTAT, 2015).
In the face of an increasing production and intensification and the concurrent increase in consumers' ethical concerns, tools for farm animal welfare assessment are urgently needed, in order to identify and prioritise potential welfare problems (Whay, 2007). In this study the focus was set on intensive dairy production, as this system is becoming more frequent in Europe and welfare threats are potentially severe, although still largely unknown (European Commission, 2011).
There is no single gold standard indicator for overall welfare. Originally, on-farm welfare assessment focused on the evaluation of resources provided to the animal (Bartussek, 1999;Bracke et al., 2002), offering a quick, easy and reliable measurement of the animal's housing conditions (Waiblinger et al., 2001). More recently, the interest in measuring the actual welfare of the animal through direct measures has increased (Johnsen et al., 2001). Therefore the use of animal-based indicators is now predominant, following the approach suggested by EFSA (2012) for welfare evaluation. The decision to include mostly animal-based indicators in a welfare assessment protocol relies on the assumption that these indicators reproduce more precisely the actual animals' welfare status, regardless of how they are housed or managed (Webster, 2009;EFSA, 2012). In consensus with this new approach environmental-based indicators should be considered as risk factors that might affect welfare.
Up to this date only two published studies have provided a general overview of dairy goats' welfare using animal-based indicators: one was carried out in 24 commercial dairy goat farms in the United Kingdom (Anzuino et al., 2010), and the other in 30 commercial dairy goat farms in Norway (Muri et al., 2013). These studies have helped to identify the main welfare problems affecting intensive dairy goat farms, but have also identified the need for further studies in different countries and farming conditions. New studies should provide data regarding each indicator's prevalence, helping to set European thresholds that can be tailored to a specific welfare threat.
Besides analysing on-farm prevalences, it is essential to evaluate assessment consistency over time (COT). Only a few studies in other farm species have addressed the quality criteria of these measurements (e.g. Plesch et al., 2010;Temple et al., 2013). As stated by Capdeville and Veissier (2001) and Winckler et al. (2003), the consistency of results over time ensures that these are representative of the longer-term farm situation and not sensitive to small changes in environmental, management or animals' internal conditions.
The development of a dairy goat welfare assessment protocol, combining different indicators, should aim to deliver sufficiently robust outcomes, so as to provide a reliable overall picture of the animals' welfare state, regardless of farm size, and preventing some welfare problems from remaining undetected. One of the first steps in the development of the AWIN welfare assessment protocol for dairy goats in intensive husbandry systems was a prototype testing, in Portugal and in Italy (Battini et al., 2016a), combining a set of preselected animal-based welfare indicators. The objectives of the present study are: to identify the main welfare problems affecting dairy goats in intensive dairy goat Portuguese farms; to analyse whether these problems vary according to farm size and to assess the indicators' prevalence variation over time.

Farm sample
In this study, following Direcção-Geral de Alimentação e Veterinária (DGAV) livestock production systems' official classification, we define intensive housing systems as those where goats are kept indoors with no or only occasional access to pasture. In these systems, diet is generally based on total mixed ration or forage (mainly hay or haylage) and concentrate, distributed once or twice per day, and goats are milked twice a day. Kids are separated from their mothers after birth.
The prototype was applied to 30 intensive dairy goat farms in Portugal, from January to March 2014. Information regarding the study population was requested from DGAV, and records from a national database, Sistema Nacional de Informação e Registo Animal (SNIRA) for small ruminants, were obtained in the beginning of January 2014. The farms were sampled from the total national dairy goat farms under intensive production system (n = 269), in which breeds such as Murciano-Granadina, Saanen and Serrana are predominant, according to DGAV. Further details on the breeds present at the 30 farms are given in Table 1. Considering farm size distribution, three farm size categories were created: small size farms (50-100 goats; n = 92), medium size farms (101-500 goats; n = 161) and largesize farms (>501 goats; n = 16). As very small intensive dairy farms do not exist in Portugal, only those with a total number of adult dairy goats above 50 were taken into consideration. From the 269 national farms a convenience sample of 10 farms from each category, predetermined by the AWIN project, was drawn. These 10 farms from each category were selected through a simple random sampling performed in Microsoft Office Excel 2013.
Farm managers were contacted by phone before the farm visits, to discuss the visit's objectives, timetable and methods. It was also ensured that the day of the visit was a regular day to avoid events (e.g. veterinary visits) that would disrupt the normal functioning of the routine. Lastly, security and bio-security issues were discussed to assure that all farm rules were followed.

AWIN protocol: description and use in the current study
Twenty-five animal-based indicators, classified in accordance with the 4 principles and 12 criteria developed by Welfare Quality ® (Botreau et al., 2007), were assessed. The AWIN prototype protocol combines 14 indicators at group-level and 11 indicators at individual level, which are assessed in order to produce each indicator's prevalence. Descriptors used to assess each animal-based indicator are presented in Table 1 (group-level observations) and Table 2 (individual assessment) provided as supplementary material. When existent, research studies related to the indicators are referred, but most of the information regarding the use of these indicators may be found in the AWIN welfare assessment protocol for goats (AWIN, 2015; http://www.animal-welfare-indicators.net/site/flash/pdf/AWINProtocolGoats.pdf). To simplify now onwards 'prototype protocol' will be referred as 'prototype'. Group-level observations were made in one single pen containing only adult lactating goats, with all the animals being evaluated. The pen considered to present the potentially greatest welfare risk was selected (e.g. highest density, lower feeding/drinking space per animal). For the individual assessment, the goats were restrained and a sampling strategy similar to the strategy developed by Welfare Quality ® for dairy cows (Welfare Quality ® , 2009) was adopted. This strategy involved the inspection of a number of animals proportional to the pen size, with percentages ranging from 100% of the subjects (in pens with fewer than 30 animals) to a minimum of 25% of the subjects, in the pens with more than 150 animals. This strategy assumed a 50% prevalence and considered a 90% interval of confidence and an accuracy of 10%.
In January 2014, a total of six assessors with varying degrees of experience working with dairy goats (four veterinarians and two animal scientists) were trained before the farm visits were initiated. The training consisted of a week's period of classroom presentations and exercises, followed by practical field assessments. Each farm assessment was carried out by two of these trainees, dressed in identical dark overalls. Inter-observer reliability (IOR) was assessed by examining test agreement between two different observers, as reported in several studies (e.g. Winckler & Willen, 2001;Mullan et al., 2011). The two observers performed the assessments simultaneously in 10 of the 30 farms. Pearson's correlation coefficient (r) was used to determine IOR for qualitative behaviour assessment (QBA)'s dimensions. For descriptors, Spearman's rank correlations (r s ) were applied assuming, for both, Martin and Bateson (2007) thresholds.

On-farm assessment of animal-based indicators
Group-level observations began with the recording of the number of goats improperly disbudded, queuing at feeding/drinking place, with poor hair coat condition, showing oblivion behaviour or signs of thermal stress (either shivering or panting) and abnormally lying or kneeling at the feeding rack. Immediately after this assessment, QBA was conducted. Subsequently, the assessor entered the pen, and human-animal relationship (HAR) tests (latency to the first contact and avoidance distance [AD] tests) were carried out (see Table 1 of the supplementary material). Finally, observations of individual animals were performed and animal-based indicators (e.g. Body Condition Score [BCS], cleanliness, overgrown claws; see Table 2 of the supplementary material) were collected. All animal-based indicators included in the individual assessments were recorded on the same animals, with both sides (left and right) being considered, and were scored using a binary assessment system (present or absent), except for BCS and knee calluses. After individual observations, the group was again assessed to identify severe lameness and kneeling.
Assessments began immediately after feed distribution following a strict order, to ensure a continuous flow of collection, reducing the disturbance for both animals and farmers, and to guarantee that the results of the behavioural observations were not influenced by animal handling or other sources of disruption. No male goats (or bucks) were present in the assessed pens, as their presence may influence the results. A checklist was used to ensure that all the observations were completed in a standard order, and the time needed to collect each indicator was recorded (on-farm feasibility assessment). Finally, a questionnaire was delivered to the farmer to gather data concerning environmental-based indicators; the main indicators are presented in Table 3 (supplementary material). For each farm, all data were collected on the same day.

Consistency of animal-based indicators over time
To investigate the indicators' consistency across seasons (COT), 10 of the 30 farms (three small farms, three medium farms and four large farms) were revisited during the summer period (July 2014), by the same assessors who executed the first assessments. All these farm revisits followed the methods described above and were performed in random samples.
An average of 3.7 ± 1.0 months (SD≈30 days) passed between the two visits and no significant alterations, in management and housing conditions, were implemented during this period. The number of adult dairy goats on each revisited farm ranged from 46 to 2000 animals, with a mean (±SD) of 512 (±613) adult dairy goats. Group assessment was accomplished in 1116 animals, and individual examination was performed on 494 adult dairy goats. The mean number of animals in the evaluated pen was 153 (±95) animals, and the individual animals sampled varied from 32 to 61.

Data management and statistical analysis
Data were entered, compiled and statistically analysed using SPSS v22 (IBM ® SPSS ® Statistics, NY, USA). To perform an overall analysis, the prevalence of each welfare indicator was calculated at farm and farm category level. First, the prevalence of the indicators within each farm (number of animals affected in the farm divided by the total number of animal observed in the farm) was calculated. Afterwards, the mean prevalence of each indicator was calculated for the population of farms. In each farm, queuing at feeding and drinking indicators were recorded as the proportion of animals queuing on the total of animals assessed. Latency to the first contact and goats' response in the AD tests were expressed in seconds (±SD) and as the proportion of contacts and/or acceptances, respectively. Frequencies of individual-level indicators regarding both sides of the animals were calculated and a Pearson's chisquare test was used to compare results. The lower and upper boundaries of the 95% confidence intervals (CIs) were calculated based on the mean and standard deviation of the prevalence of each indicator within each farm. The presence of at least one animal affected was arbitrarily assumed as a welfare problem in the farm.
For the purpose of QBA data analysis, for each of the descriptors (e.g. aggressive, curious and lively) the distance from minimum to where the assessor ticked the VAS scale was measured in millimetres. The 13 QBA descriptors' values were used as variables and submitted to Principal Component Analysis (PCA) using a correlation matrix with no rotation. Two principal components (PCs) were extracted. A loading plot was produced in order to explore the relationships among variables on the first two PCs (word chart), and a score plot was also generated in order to visualise the position of each farm (classed according to farm size) on the first two PCs.
The mean time necessary to collect group-level and individual-level indicators, and for the overall assessment, was calculated (mean ± SD; min-max).
To preliminarily evaluate the consistency of the indicators over time, a Wilcoxon signed rank test was performed to verify whether the results obtained during the two visits were significantly different at the 0.05 level, as performed by Temple et al. (2013) in pig farms. Also a brief overall analysis of the variation in indicators' results between visits (%Δ = Visit 1 − Visit 2) was made.
Analysis of variance (One-way ANOVA) was used to compare the prevalence of each indicator within each farm depending on farm category, at the 0.05-level, with a Bonferroni statistical adjustment. In order to meet the assumptions of normality and homogeneity of residual variance, all indicators were previously arcsine square root transformed (Woodward, 2013). Similar approaches have been successfully used in other studies (e.g. Jones et al, 2006;Marchewka et al., 2015). A Normal probability-probability (P-P) plot based on the standardised residuals was used to check the validity of the model.

Welfare indicators
All farms kept the animals indoors on concrete, soil or grit covered with straw as bedding material. Goats were grouped in one or more pens depending on farm size (Table 3 of  3.1.1. Most prevalent indicators 3.1.1.1. Total population under study. A preliminary analysis of the collected data showed that there was no statistical difference for cleanliness, abscesses, overgrown claws, lesions and knee calluses scored between the right and left side of the animal (Table 5 of the supplementary material). Therefore, the prevalences of these indicators are always from the animal's left side. Some indicators, such as overgrown claws, poor hair coat condition, queuing animals at the feeding rack and hindquarters dirtiness were shown to be highly prevalent, when compared to the others (Table 2). In general, the indicators under study presented moderate to high levels of agreement between observers, either for categorical (ĸ and ĸ w : 0.5-1) or continuous data (intraclass correlation [ICC]: 0.8-1), according to specific guidelines (e.g. Landis & Koch, 1977;Shrout, 1998;Fleiss et al., 2003).  Table 3. Welfare problems of the 30 commercial dairy goat farms in Portugal, according to farm size category (mean values for the population of farms): Small -n = 10; Medium -n = 10; Large -n = 10; CI (%) -95% CI for the study population.  farms. A Bonferroni post-hoc test revealed that severe lameness (F(2, 27) = 4.6, p = .019) and overgrown claws (F(2, 27) = 5.4, p = .011) presented statistically higher prevalences in large farms (3.9 ± 4.6%, 48.5 ± 29.5%, respectively), compared to small farms (0.9 ± 1.7%, 11.4 ± 12.4%, correspondingly). According to the post-hoc test, the prevalence of lesions on lower legs (F(2, 27) = 4.1, p = .029), head (F(2, 27) = 3.9, p = .032) and neck lesions (F(2, 27) = 3.7, p = .039) were also statistically higher in large farms (17.5 ± 17.9%, 37.0 ± 27.4% and 20.9 ± 20.7%, accordingly) in comparison to small farms (0.6 ± 1.4%, 10.9 ± 11.3% and 4.3 ± 7.2%, respectively). All the other considered indicators showed no statistical difference depending on farm category (p < .05). Detailed information concerning the variation of the indicators across farm categories is given in Table 3. The percentage of goats that showed acceptance and contact during the AD test were higher in large farms (2.3 ± 1.8% and 1.8 ± 1.5%, respectively) than in medium (0.6 ± 1% and 0.3 ± 0.4%, correspondingly) and small farms (0.9 ± 1.2% and 0.4 ± 0.8%, respectively). Large farms showed statistically higher percentages of acceptance (p = .043) and contact (p = .022), according to the Bonferroni post-hoc test performed.

Consistency of animal-based indicators over time
Welfare problems such as presence of very fat animals, knee calluses (score 2), queuing at feeding, lower legs dirtiness and body abscesses increased over time (12.6%, 11.1%, 9.1%, 6.6% and 5.6%, respectively). In contrast, during this period of time, the prevalences of hindquarters dirtiness, presence of knee calluses (score 1), head lesions and overgrown claws showed a decrease of 17.9%, 17.3%, 14.3% and 13.5%, correspondingly. Similarly, welfare issues as poor hair coat condition, shivering (score 1) and faecal soiling also declined over time (9.5%, 9.2% and 6.8%, accordingly). However, according to the Wilcoxon signed rank test, only head lesions differed significantly between the two visits (p = .037). Animals showing oblivion behaviour, lying abnormally, severely lame, kneeling in the pen, very thin and presenting vulvar discharge, hindquarters and neck abscesses showed a prevalence variation below 1%. Regarding the AD test, the variation of the percentage of goats accepting being touched/stroked was also below 1%. The prevalences of panting (score 2) and shivering (score 2) animals demonstrated no variation between visits, because no cases were recorded in either visit. Statistical differences between visits were only found for severe lameness and overgrown claws (p < .001). Based on the Bonferroni post-hoc test performed, small farms showed a statistically lower prevalence of severely lame animals in both visits (p < .05). In the first visit, overgrown claws' prevalence was significantly higher in large farms than in small farms (p = .016). During the second visit, large farms presented a significantly higher prevalence of overgrown claws than medium farms (p = .031). Also during the second visit, in medium farms the prevalence of very fat goats was statistically lower than in the other farm categories (p < .003). All the other indicators showed no statistical differences between visits, or among farm categories at each visit (p < .05). Table 4 presents the welfare problems of the two visits according to farm category.

On-farm feasibility assessment
Overall, the mean time necessary to execute group-level observations and individual assessments was approximately 87 ± 33 min, ranging from 43 to 154 min. In small-, medium-and large-sized farms the time necessary to perform the different stages of the prototype was 71 ± 29, 88 ± 25 and 117 ± 36 min, respectively. No statistical differences were found depending on farm category (p < .05).

Discussion
The AWIN project presented the first welfare assessment of dairy goats in the Mediterranean region, incorporating only animal-based indicators. These indicators were collected through group-level observations and individual assessment of animals, adding important information to the intensive dairy goat production sector. The objectives of the present study were to identify the main welfare problems affecting dairy goats in Portuguese intensive farms and to analyse whether these problems vary according to farm size, and over time. Further discussion regarding the selection of the animal-based in detriment of environmental-based indicators, and also discussion regarding the overall feasibility of the AWIN protocol is presented in Battini et al. (2015b).

Welfare indicators and variations according to farm category
Claw overgrowth (35.5%), poor hair coat condition (22.9%), queuing animals at the feeding rack (22.8%) and hindquarters dirtiness (18%) showed high prevalence and should therefore be considered as major welfare problems. Claw overgrowth was identified in large farms at a statistically higher prevalence (48.5%) compared to smaller farms (11.4%). High prevalences of severe overgrowth were also found in the studies performed in British and Norwegian dairy goat farms, where severe claw overgrowth reached a prevalence of 32% and 14.8%, respectively (Anzuino et al., 2010;Muri et al., 2013). According to Smith and Sherman (2009), this can result not only from a lack of claw wear when animals are housed on straw bedding, as reported by Anzuino et al. (2010), but also from a lack of routine foot-trimming (Ajuda et al., 2014). Although claw trimming was performed at different times before the visits, it was noticeable that in large farms a high animals:human ratio corresponded with less time to observe and trim individual animals, as also reported by Stafford and Gregory (2008). This was confirmed in the present study, as small farms presented a much lower animals:human ratio (54) than large farms (304; Table 3 of the supplementary material).
Hair coat condition is an indicator recently developed that reflects not only goats' nutritional status but also their health status (Battini et al., 2015a). Poor hair coat condition was found at a prevalence of around 20-25% in all farm categories, functioning as a first warning on goats' nutritional or health status.
The proportion of queuing animals at the feeding rack was higher on large farms (29.8%) with high stocking densities corresponding to a high goats/feed space ratio (Table 3 of the supplementary material). The natural synchronous behaviour of goats (Miranda-de la Lama & Mattiello, 2010) increases the probability of finding this indicator and its impact on welfare. Muri et al. (2013) reported an 18% prevalence of dirty hindquarters which is also in line with our overall findings (18%). However, in Portugal, large farms presented higher prevalences (27.1%). Animal cleanliness is often used as a welfare indicator in several species (Krebs et al., 2001;Sans et al., 2014), and may provide information not only on animal comfort but also on stockpeople's attitudes and care for animals, as supported by De Rosa et al. (2009). In Portugal and within this production system, cleanliness depends mostly on how often bedding material is replaced or added, as animals are housed on straw bedding all year round. However, cleanliness assessment can be challenging in some breeds, being easier to assess in white breeds such as Saanen. This may account, to some extent, for the higher prevalences of hindquarter dirtiness found in large farms, as Saanen was one of the most common breeds in this farm category. This fact led to the non-inclusion of this animal-based indicator in the final AWIN protocol (Battini et al., 2015b).
Regarding the assessment of specific indicators in British dairy goat farms, Anzuino et al. (2010) found a 3% prevalence of very thin animals, which is close to the prevalence found in our farms (4.9%), with no differences due to farm size. Compared with our results (17.4%), a much lower prevalence of very fat animals (3%) was reported in the British study, which is probably related to the feeding strategy followed. A roughage: concentrate ratio higher than 60:40 is recommended by several authors (e.g. Bruni & Zanatta, 2009). However, in our study 33.3% of farms presented a roughage:concentrate ratio lower than 60:40 and 20% of farms had a ratio of 60:40. The inclusion of BCS in on-farm welfare assessment schemes allows the identification of animals scoring at the extreme ends (close to endpoints), as these are the ones more likely associated with welfare problems (Vieira et al., 2015). Obesity is usually associated with higher predisposition for metabolic diseases, and emaciation may be either a sign (e.g. chronic disease such as paratuberculosis) or a cause (e.g. pregnancy toxaemia) of welfare problems (Smith & Sherman, 2009).
In our farms we found a severe lameness prevalence of 2.1%, which is in line with Anzuino et al. (2010) and Muri et al.'s (2013) studies, which reported lameness prevalences of 3% (score 2 and score 3) and 1.7%, respectively. Herd size can be considered a risk factor for lameness, as mentioned in several studies (Alban, 1995;Katsoulos & Christodoulopoulos, 2009). Individual observation and care may be more difficult in large herds than in smaller ones, which can lead to a lower detection of lame animals, and can explain the higher prevalence values observed in large farms. In terms of overall welfare assessment, the identification of only the severely lame animals may not provide sufficient information. Lameness assessment is often performed while the animals are housed in their pens with soft straw surfaces, which tends to hide the mild cases (Anzuino et al., 2010). Therefore, in order to ensure the detection of the mild cases, alternative practical solutions for lameness scoring are needed.
Udder asymmetry is a sign of chronic alteration that can affect the welfare and production of dairy goats. This alteration has been associated with chronic intramammary infection such as Caprine arthritis and encephalitis, caprine contagious agalactia and retroviral mastitis, causing fibrosis and atrophy of one half (Alawa et al., 2000;Paterna et al., 2014). This indicator of udder health is therefore relevant in the assessment of dairy goat welfare. In our study, udder asymmetry presented prevalences between 2% and 6% in all farm categories, which is in accordance with the severe udder asymmetry prevalences reported by Anzuino et al. (2010) and Muri et al. (2013).
As regards the analysis of variance (One-way ANOVA) performed, the fact that all farms were under an intensive production system with similar management can partly account for the lack of statistical differences, at the 0.05-level, between most of the animal-based indicators and the farm categories considered. However, in general, the prevalence of most of the health indicators was affected by farm size, being higher in larger farms.
HARs were generally better in large farms than in small farms, especially considering the latency period to the first contact between goat and assessor. This was unexpected, as HAR is usually better in small farms, where the relationship between the stockperson and the animals is very close (Mattiello et al., 2009. For instance, significant differences depending on farm size were reported by Mattiello et al. (2008Mattiello et al. ( , 2010, namely shorter ADs in dairy goats that were reared in small old farms in comparison to large modern farms, and higher percentages of contacts in small farms than in large farms, respectively. Jackson and Hackett (2007) also found that dairy goats exposed to human gentle handling approached the observer more quickly than the control animals during the latency test, habituating faster to his presence. However, our results may potentially be explained by the breed differences among farms (Table 1). Breeds such as Saanen and Murciano-Granadina, which are very common in large-sized farms in Portugal, are reported to be docile and easier to handle, being more suited for intensive systems (Sinn & Rudenberg, 2008;Martínez et al., 2010;Escareño et al., 2012). A higher variety of breeds were found in small farms -Serrana, Alpine, Malagaña, Murciano-Granadina, Saanen, Florida, Charnequeira and crossbreeds (Saanen with Alpine). Some of these more rustic breeds are known to be more suspicious (Caroprese et al., 2015). Additionally, in small farms, goats had access to outdoor grazing, enhancing the expression of natural, foraging and exploratory behaviours (Dwyer, 2009;Ekesbo, 2011). In the majority of these farms, animals only had contact with the stockperson, reacting negatively to the presence of other humans, which is in accordance with the work of Mattiello et al. (2008). As regular manipulation of goats during daily activities seems to decrease the animals' fear responses towards humans , a controlled access to other humans might lead to a better HAR in Portuguese farms.
The AD test evaluates the distance at which an animal retreats from an approaching human . During the on-farm testing of the prototype the application of the AD test presented some feasibility limitations. For instance, breeds such as Saanen and Murciano-Granadina accepted contact or gentle stroking more frequently, sometimes even complicating the assessment by grouping around the assessor, as supported by the statistically higher percentages of acceptance and contact in large farms. On the other hand, breeds such as Serrana (Transmontano ecotype) showed strong avoidance behaviour, making it difficult to carry out the test in a standardised way. Muri et al. (2013) also reported similar limitations, which suggests that the feasibility of the AD test may depend on breed and production systems differences. These findings highlight the need for further studies on the development of new, or modified, methods to assess this indicator, in order to overcome the reported limitations and obtain an objective and consistent welfare assessment outcome.
A PCA analysis on QBA revealed two dimensions of goat emotional states: PC1 and PC2. PC1 of the QBA, which carries most of the relevant variance, allows for the differentiation between farms with animals that appeared to be in a more positive mood from farms that presented animals with a more negative mood. According to Wemelsfelder and Lawrence (2001), descriptors such as agitated, lively and alert reflect the animals' experience of a situation being directly relevant to the assessment of their welfare. The homogenous overall distribution of farms throughout the two axes supports the notion that, within the same husbandry system (intensive production system), herd size is not relevant for highlighting differences in goats' emotional state, as already found in Italian dairy goat farms by Battini et al. (2016a). In general, intensive production systems prevent goats from performing much of their behavioural repertoire (Stookey, 1994). On the contrary, when goats are kept under different farming systems (e.g. intensive vs extensive), QBA has been proven to be able to discriminate between different emotional states (Grosso et al., 2016). However, the authors are aware of the limits of these results, due to the observers' moderate training, as a crucial requisite for applying QBA is thorough training, in which the observers should discuss the meaning of each QBA descriptor and watch video clips representing each descriptor, so as to standardise the evaluations (Napolitano et al., 2015).

Consistency of animal-based indicators over time
The consistency of indicators' prevalence over time identifies welfare issues that persistently continue onfarm. An overall consistency was apparent. However, this was a preliminary analysis and further studies are required to robustly establish the significance of these indicators' variations across time. Only in the case of head lesions was a significantly different result between the two visits found, which decreased over time. This finding might be explained by the training intensity and the break period between visits. In a study performed by Gibbons et al. (2012) a five-day break in a programme aiming at training observers to score injuries on dairy cows resulted in decreased agreement for all injury scores, improving again in the next day after practice. This highlights the importance of continual training in order to 'recalibrate' the observers to a reference standard, as defended by EFSA (2012). Regarding farm categories, almost all indicators showed no statistical significant difference.

On-farm feasibility assessment and provision of welfare/health information
The number of animals per pen, the stocking densities and the animals' behaviour affected the time required to collect each welfare indicator. 'Queuing' and 'Clinical scoring' were the most time consuming, with the average time to accomplish the different stages of the prototype increasing with the number of animals that had to be sampled. Although the mean time required to perform the prototype did not differ statistically between farm categories, only one pen was assessed. In a future protocol, more pens, and therefore more animals, may have to be evaluated, which will lead to different results.
According to Willeberg (1991), preventing disease is a major animal welfare topic. Most of the disease issues are not only production limiting events, but also welfare problems, having however become inherent pieces of the intensive animal production for which there are no easy solutions. Nevertheless, with few exceptions (e.g. Muri et al., 2016), there are no epidemiological studies related to these welfare considerations for goats. Although the goat sector is growing, there has been less research on goats than on other production species, especially regarding welfare aspects (Sahlu & Goetsch, 2005;Anzuino et al., 2010). This makes the development of welfare assessment protocols for this species a much more difficult task.
The prevalence of welfare indicators in dairy goats provides information on the general health and welfare status of farms. Knowing the general prevalence of these indicators in goat production is paramount to be able to set thresholds for intervention. Establishing these thresholds will allow farmers, vets and other technicians to identify main welfare issues in their own farms, and therefore set plans of action to improve the general welfare/health condition, since improved disease diagnosis and identification of pain and poor welfare in farm animals are important in a growing farming industry (Scott et al., 2001). Although on-farm welfare assessment has been used mostly for non-regulatory purposes, such as producer education or to qualify for voluntary welfareassurance programmes, setting these thresholds might also be very important for welfare legislation implementation. On the other hand, prevalence analysis can also provide data for benchmarking purposes (Blokhuis et al., 2013). Recently, benchmarking has been used as an approach for helping farmers manage the welfare of their animals (e.g. planning programmes on organic farms in Europe, dairy cow comfort or road transport practices in North America; Colditz et al., 2014) allowing the farmers to determine how well or badly they are performing in relation to others.
Despite being a valuable source of information, the extent to which a convenience sample of 30 commercial dairy goat farms represents the entire population cannot be known. In fact, the number of farms may have affected not only the reliability of the results (e.g. conditions with low prevalence) but also the power to detect differences, which emphasises the need for further studies on the on-farm prevalence of the selected indicators.

Conclusions
In Portugal, the main welfare areas of concern were claw overgrowth, queuing at feeding and hindquarters dirtiness. Most of the assessed indicators presented similar prevalences to those included in previous studies in northern European countries (Anzuino et al., 2010 in UK;Muri et al., 2013 in Norway), suggesting common problems. The analysis of the variation in the indicators' prevalence between different seasons revealed an overall consistency of the indicators. This study contributes to an increased awareness of the main welfare issues affecting intensively kept dairy goats.