Adjusting for Founder Relatedness in a Linkage Analysis Using Prior Information

In genetic linkage studies, while the pedigrees are generally known, background relatedness between the founding individuals, assumed by definition to be unrelated, can seriously affect the results of the analysis. Likelihood approaches to relationship estimation from genetic marker data can all be expressed in terms of finding the most likely pedigree connecting the individuals of interest. When the true relationship is the main focus, the set of all possible alternative pedigrees can be too large to consider. However, prior information is often available which, when incorporated in a formal and structured way, can restrict this set to a manageable size thus enabling the calculation of a posterior distribution from which inferences can be drawn. Here, the unknown relationships are more of a nuisance factor than of interest in their own right, so the focus is on adjusting the results of the analysis rather than on direct estimation. In this paper, we show how prior information on founder relationships can be exploited in some applications to generate a set of candidate extended pedigrees. We then weight the relevant pedigree-specific likelihoods by their posterior probabilities to adjust the lod score statistics.

more difficult due to the probabilistic nature of genetic inheritance whereby unrelated individuals can have genetic information that indicates that they are related, while the observed genetic data may give higher support to an incorrect relationship than the true one.
All relationship estimation problems can be described in terms of reconstructing the pedigree connecting the individuals of interest. In principle, this simply involves consideration of all possible alternatives and finding the one with the highest likelihood [10]. While it can be shown that this is possible for small problems, it is not generally practical unless the set of possible alternatives can be considerably reduced [11,12]. Sequential reconstruction methods which build a single structure starting with the assumption that all individuals are unrelated and gradually accepting sibships based on the associated increase in log-likelihood, will not necessarily assign individuals to their true relationships [13,14].
Although it has been known for a long time that the results of a genetic analysis can be extremely sensitive to misspecification of the relationships amongst the individuals under study [15], an increase in false positive findings in a linkage study can also arise when supposed founders are mistakenly assumed to be unrelated [16,17]. Indeed, as demonstrated on a complex Hutterite pedigree, genetic association studies which assume that all individuals are unrelated can also yield dramatic increases in false positive signals if this assumption is violated [18]. Consanguineous populations are important for homozygosity mapping of rare disease susceptibility loci so the standard linkage statistics based on the assumption of outbreeding populations must be appropriately adjusted. Leutenegger et al. [19] adapted the Maximum Lod Score statistic for affected sib-pair analyses by modifying the expected IBD sharing vector to reflect known parental relatedness as parameterised by the kinship coefficient and both parental inbreeding coefficients. They also noted that the accepted threshold value of 3: 36 corresponding to an overall genome-wide significance level of 5% is not appropriate for related parents and that the correct threshold depends on this relationship. Abney et al. [20] present methods for homozygosity and association mapping in large complex pedigrees with known genealogical structure. Hössjer [21] describes a hidden Markov model for the IBD configuration of a family with possible inbreeding among founders. Although the focus of the paper is on nonparametric linkage analysis for affected sib-pair designs, the method is purportedly valid for arbitrary family structures. It is, however, sensitive to the choice of input values for the kinship coefficient and expected length of chromosome segments shared IBD. Ideally these should be estimated, perhaps along the lines suggested by Leutenegger et al. [22] for estimating an individual's inbreeding coefficient, but genotype data on the founders would be required.
In this paper, we also consider the case where we suspect that the founders of a pedigree are related but do not know the precise nature of this relationship. However, we will assume that there is sufficient prior information, both on global features of the overall population from which these founders derive and on local features pertaining to individuals or specific pairwise relationships, that enables us to restrict the set of possible alternative extended pedigree structures to a manageable size. We are not interested in using the marker data to try to infer the true extended pedigree as we argue that this background relatedness is more of a nuisance factor in these applications. Rather, we propose a general method that allows for such background relatedness to be integrated out of the linkage analysis. Specifically, posterior probabilities for the alternative pedigrees in the sample space are calculated from the marker data that appears to be unlinked to the disease and then used to adjust the lod scores based on the full data. We will begin by discussing the method [described in detail in Sheehan & Egeland,12] as it applies to this particular context. We will then demonstrate the potential usefulness of the approach by assessing its performance in a realistic simulated scenario.

The Method and the Context
In many practical situations, such as in a genetic counselling environment, families may be included in a linkage study because they appear to have an unusual number of cases of a rare autosomal disease. For a recessive disease due to a single mutation with prevalence p at a particular locus, unaffected carriers are assumed heterozygous with one copy of the disease predisposing allele. Ignoring new mutations as a possible explanation, both parents of an affected individual have to be carriers. For the diseases we have in mind, the mutated allele will be extremely rare and so carriers are far more likely to be related than not [23]. For instance, assuming Hardy-Weinberg equilibrium, the probability that they are carriers when they are first cousins is 7 times the probability that they are unrelated carriers when p = 0.01 but is 636 times that probability when p = 0.0001. Failure to account for such relatedness may lead to false positive linkage signals as the expected levels of allele sharing will be misleadingly low and the differences between expected and observed sharing hence more marked than they are in reality [9].
It would be extremely unusual to have no additional information whatsoever on what the possible founder relationships could be in these genetic counselling situations. A formal structured approach to incorporating such prior information is hence useful and the problem may be sufficiently tractable to formulate in the four-stage framework of Sheehan and Egeland [12]: define a sample space of (extended) pedigrees, assign a prior probability distribution on this space, calculate the likelihood of the genetic data for each pedigree and compute posterior probabilities using Bayes' Theorem. These posterior probabilities can be used to infer the true relationship [11] but, as we will show in Section 3 below, they can also be used to account for the presence of background relatedness as a weighting on the pedigree-specific likelihoods for linkage calculations.
In order to consider relationships between the founders of a pedigree, extra individuals for whom genetic data are probably not available must be included. In principle, there could be an infinite number of these. Reducing the set of alternative pedigree structures is usually the difficult step in this approach but, in this situation, the information provided by the families concerned should provide considerable restrictions. For each pedigree g in the sample space, we assign a prior probability of the form: is the normalisation constant and n is the number of individuals in the pedigree. M 1 , ..., M s are non-negative parameters that allow pedigrees to be weighted according to s specified global characteristics which might include demographic features such as cultural prejudices and marriage laws. Each integer exponent b i ( g ) is a particular measure of the relevant characteristic, is internal to pedigree g and provides the degree of the relative weightings of different pedigrees for the i -th characteristic. For example, if the i -th characteristic is inbreeding, we could define a pedigree to be inbred if it contains marriages between individuals whose parents are related via other members of the pedigree. In this case, b i ( g ) could be the number of such marriages (with b i ( g ) = 0 for noninbred structures) and is hence one measure of the extent of inbreeding. The R jk are the local parameters and relate to specific parts of the pedigree or to particular individuals. As given here, they allow prior information on parent-offspring links to be incorporated. The corresponding pedigree-specific exponents are defined as o jk ( g ) = 1 if j is the parent of k in pedigree g and o jk ( g ) = 0 if j is not the parent of k . (Clearly, o jk ( g ) + o kj ( g ) ^ 1 for j 0 k in any pedigree g .) Defining 0 0 { 1 means that a value of 0 for any global parameter M i will eliminate all pedigrees for which b i ( g ) 6 1 (e.g. all inbred pedigrees in the example above). Likewise, R jk = 0 rules out the pedigrees featuring j as a parent of k , i.e. those structures g for which o jk ( g ) = 1. Setting all local and global parameters to 1 amounts to assigning a flat prior whereby all generated pedigrees have equal probability a priori and there is no penalty associated with any particular feature. A value between 0 and 1 decreases the prior probability of the relevant characteristic while a value exceeding 1 increases it. Thus, R jk 1 1 would favour all pedigrees featuring j as a parent of k . For a sample space of k pedigrees g 1 , ..., g k connecting the individuals of interest, and giv-en the genetic data, the likelihood for any pedigree is L i = P ( data ͉ g i ). By Bayes' Theorem, the posterior probability of the i -th pedigree is thus and the posterior distribution can be used to infer the true pedigree.
The prior probability function is generally not as cumbersome in practice as is perhaps suggested by (1) since many parameters will be set to unity. Despite interpretability issues with regard to the choice of values for the M and R parameters, precise definition of the global characteristics of interest and its multiplicative form, it is interesting to note that many existing approaches to incorporating essential non-genetic information can be expressed as special cases of this prior and are less transparent (see Sheehan and Egeland [12] for further discussion).

An Example Linkage Scenario
Consider a linkage study with a nuclear family comprising a couple 1 and 2 and their seven children, four of whom have been affected with a rare lethal disease. By definition, the founding members, 1 and 2, are assumed to be unrelated as depicted in figure 1 (a). In fact, there is some uncertainty about this and there is a suggestion that 1 and 2 might be related via the father of 1 and the mother of 2. All relationships via other parental combinations have been ruled out. In particular, given the rarity of the disease, together with the information that they come from a community where cousin marriages are quite common, the choice of relationships can be narrowed down considerably. Suppose that, in this case, the family members feel that the most likely possibilities are that they are either half-first cousins or full first cousins and thus have one or two grandparents in common. By use of what Sheehan and Egeland [12] refer to as 'hard' prior information, the sample space of possible alternative pedigrees for this particular genetic counselling application can be restricted to the four shown in figure 1 . Note that 4 unobserved males and 4 unobserved females have to be included with the original 9 typed family members to describe these four possibilities. However, as most of the pairwise relationships are fixed (1 and 2 are the parents of all seven children, 3 is the father of 1 and 4 is the mother of 2), we only need to consider the pairwise relationships between a possible common grandfather 5 and grandmother 6 with each of 3 and 4. Note that this is sufficient to cover all orderings of the four relevant pedigrees by the prior of (1). In the absence of data on these extra individuals, pedigrees figure 1 (b) and (c) will be indistin-guishable on the basis of likelihoods and lod scores unless mtDNA or Y-chromosome data are incorporated. Their adjusted lod scores may differ, however, if prior information indicates that one alternative is more likely than the other.
The local part of the prior model in Equation 1 is hence formulated by the four parameters R 5,3 , R 5,4 , R 6,3 and R 6,4 . The first cousin alternative figure 1 (d) is most likely a priori when these are all greater than unity and the strength of such prior belief is reflected by the magnitude of the parameter values. The half-cousin option with common grandfather figure 1 (b) will be favoured when R 5,3 and R 5,4 are assigned values greater than 1 while R 6,3 and R 6,4 are both given values less than 1. Setting all four parameters to 1 will make all four possibilities equally likely a priori. For simplicity, a flat prior over all global parameters could be assumed but, for illustration, we will also include the 'promiscuity' parameter M P of Egeland et al. [11] as it might be the case that multiple marriages are socially discouraged. Here, the degree of promiscuity in a pedigree is defined in terms of b P ( g ), the number of pairs of offspring in pedigree g with exactly one common parent (i.e. half-siblings) who is also in g . Thus, M P = 0 excludes the two half-first cousin pedigrees while M P 1 1 favours them but, as noted in Sheehan and Egeland [12], different definitions of this parameter are possible. The corresponding pedigree-specific index, b P ( g ), is simply a count of the number of pairs of half-siblings in pedigree g . The prior probabilities for the four pedigrees in figure 1 are given in table 1 .

A Simulation Study
100 sets of microsatellite marker data were generated on pedigree figure 1 (b), i.e. 1 and 2 are half-first cousins via a common grandfather, using Allegro [24]. Recall (Section 3) that individuals 1 and 2 and their seven children have marker data. All additional individuals are unobserved. Each data set comprised 10 markers on each of 10 chromosomes, labelled 1, ..., 100 and positioned 10 cM apart which is an acceptable spacing interval to avoid issues arising from possible linkage disequilibrium. All markers were simulated with 5 equifrequent alleles. We considered an autosomal recessive disease allele with frequency 0.0001 and penetrance probabilities of 0.98 for homozygous carriers and 0 for both non-carriers and heterozygous carriers. The disease locus was set at 49 cM on chromosome 1, i.e. 1 cM to the left of marker 5. Three different specifications of the prior function (1) were used: (1) a flat prior where all local and global parameters are set to 1: M P = R 5,3 = R 5,4 = R 6,3 = R 6,4 = 1; (2) a prior that downweights multiple marriages, and makes the half-sib options half as likely apriori as the alternatives but doesn't  Pedigree Prior probability distribution Adjusting for Founder Relatedness in a Linkage Analysis Hum Hered 2008;65:221-231 225 favour any parent-offspring relationships: M P = 0.5, R 5,3 = R 5,4 = R 6,3 = R 6,4 = 1, and (3) a prior that favours the common grandfather structure: M P = 1, R 5,3 = R 5,4 = 10, R 6,3 = R 6,4 = 1. This parametrisation implies that pedigree figure 1 (b) is 100 times more likely than figure 1 (c) a priori and also a posteriori in this case, since the likelihoods coincide.

An Adjusted Lod Score Analysis
A multipoint linkage analysis was carried out for each of the four pedigrees in figure 1 using Merlin [25]. The rest of the analysis in this section was done in R (http:// www.r-project.org). It was assumed that marker allele frequencies, disease allele frequency, disease penetrance and marker genetic map were known and equal to those used in the simulations. We used Haldane's map function, both for the simulations and the analysis. Let L 0, j denote the likelihood of all the marker data for pedigree j under the null hypothesis i.e. there is no linkage with the disease. L Ai , j is the likelihood for the disease locus being at marker i , conditional on all the flanking marker data, for pedigree j . The lod-score associated with the position of marker i and pedigree j is lod i , j = log 10 ( L Ai , j / L 0, j ). As would be hoped, a linkage signal was reported on chromosome 1 at marker 5 in all cases and there was no strong evidence for linkage found on the remaining chromosomes on average over the 100 simulations. Thus, the data confirm that the markers on chromosomes 2, ..., 10 are unlinked to the disease status.
To produce an adjusted lod score for each marker position, we must weight the likelihood components of the corresponding ratio. We could do this naively by using our prior probabilities for the pedigrees in the sample space or, for that matter, by using any set of probabilities that have been otherwise ascertained as reasonable. We have chosen to update our prior knowledge with the likelihoods based on the unlinked data (i.e. with chromosome 1 excluded) and will use these pedigree posterior probabilities, P ( g i ͉ data ) where i = 1, ..., k and k = 4 here, as the weighting factors. Specifically, the likelihoods for marker i are and the ratio is the pedigree-adjusted lod score.
Note that an expected lod score could be calculated for each marker along the lines suggested by Ott [15]: This is similar to Ott's ELOD but here the expectation is taken over the posterior pedigree distribution and not over the distribution of the number of recombinations. We should stress, however, that the adjusted lod score (2) is a proper likelihood ratio statistic and hence has all the required asymptotic properties for likelihood ratio hypothesis testing. The expected lod score (3) does not have such a rigorous theoretical justification. On these grounds, we will prefer to use adjusted lod scores rather than expected lod scores. Figure 2 shows the posterior probability distributions (over the 100 simulated datasets) calculated from the unlinked data on chromosomes 2, ..., 10, corresponding to each of the priors described in Section 4 for the four pedigrees in the sample space. It is clear from the plots corresponding to the flat prior, where no particular features have been favoured or downweighted, that there is enough information in the unlinked marker data to indicate that a common grandparent (i.e. pedigree fig. 2 (b) or (c)) is generally the most likely relationship between the founding individuals 1 and 2. This relationship is deliberately downweighted by the second prior making the first cousin relationship of pedigree figure 2 (d) more likely than the others. In this case, the evidence provided by the marker data largely counteracts the downweighting effect but a more extreme prior would induce more marked differences. Marker data alone will never distinguish between the true pedigree figure 2 (b) and the common grandmother option of pedigree figure 2 (c): strong prior information, such as is provided by the third prior, is required to make such a distinction.

Results
Mean posterior probabilities for each pedigree and each prior are given in table 2 . For the first and third prior, the true relationship (b) between 1 and 2 has the largest posterior probability on average. As can be seen from figure 2 , the posterior distributions are actually quite skewed and taking medians rather than means would accentuate this trend. For the second prior where the true relationship has been deliberately downweighted, the first cousin relationship (d) is now the most likely option a posteriori. Figure 3 shows the distribution (over 100 datasets) of multipoint lod scores at marker 5, the marker closest to the disease locus at 49 cM. The first four columns are the raw lod scores calculated for each of the four pedigrees in the sample space. The last three columns depict the distributions of adjusted lod scores at this marker position corresponding to each prior with the likelihoods weighted by the posterior probabilities of figure 2 , as given in (2). Lod scores around the disease location of 49 cM on chromosome 1 tend to be higher for the half-first cousin relationships (b) and (c) than for the unrelated (a) and first-cousin (d) alternatives, as would be anticipated. However, even when the posterior probabilities were calculated from a flat prior which does not favour any particular structure a priori, the adjusted lod scores all tended to be much higher than those for (a) and slightly higher than those for (d). Average adjusted lod scores in this area reflected this trend ( table 3 ).
As is consistent with findings reporting increased false linkage signals [17,18], multipoint lod scores generally seemed to be higher at locations away from the disease locus when the founder relationships were misspec-ified. Some indication of this can be seen in figure 4 showing the relevant distributions over the 100 datasets for a randomly sampled location away from chromosome 1. Although we did not look for a particular location that showed clear false linkage signals, lod scores for the unrelated case (a) are generally higher than those that take account of relatedness, either when a particular relationship is assumed, or when the lod scores are adjusted for several alternatives. There is not a marked difference between the distributions for first and half-first cousin relationships in figure 4 and the adjusted lod scores all have similar distributions and are close to the true relationship. Similar patterns were observed for other sampled locations.

Sensitivity Issues
We have presented a general method for adjusting lod scores in a linkage analysis to reflect uncertainty about founder relatedness for applications where prior infor-  The mean posterior probabilities from 100 simulations for each pedigree in figure 1 and each prior in Section 4    mation on the nature of such relationships is typically available. Our simulation study illustrates that the approach is potentially useful. However, sensitivity of the approach to choice of marker polymorphism and allele frequencies, disease location and genetic model requires investigation. We present some preliminary discussion on some of these issues here but note that a full sensitivity analysis is beyond the scope of this paper. We note that choice of sample space and local and global prior parameters in the prior specification is also important [12] but these will be particular to the application at hand and cannot be discussed satisfactorily in general terms. In order to consider the effects of allele frequency misspecification, we reanalysed our 100 sets of simulated microsatellite data by estimating the allele frequencies from the data in Merlin. The same three priors were used and the prior probability distributions over the pedigree sample space, together with the mean posterior probabilities when frequencies were assumed to be known and equal to those used in the simulations, are given in table 2 . Comparing the original results with those for estimated allele frequencies in table 4 , it can be seen that the two sets of posterior distributions are very similar. The mean adjusted lod scores at the marker closest to the disease locus are identical for pedigree (a) for known and estimated frequencies since the founders are typed, and are very similar for the other three. In this particular example, the effects of estimating allele frequencies are practically negligible. However, this will not generally be the case.
The approach we have presented in this paper is not restricted to microsatellite markers. Diallelic SNP data can also be used. To illustrate this, we also simulated 100 data sets of 100 SNP markers with equifrequent alleles on 10 chromosomes at 10 cM intervals on pedigree (b). These were analysed in Merlin, as described above, when allele frequencies were known (and equal to 0.5), and when they were estimated from the data. Now the posterior probability distributions based on the unlinked data on chromosomes 2, ..., 10 were not informative enough to place more weight on pedigrees (b) and (c). Lod scores calculated when allele frequencies were estimated (not shown) were again practically identical to those calculated for known frequencies. Table 5 shows the mean adjusted lod scores at the marker closest to the disease locus, for both SNP and microsatellite data. Expected lod scores, calculated as described in Equation (3), are also shown in table 5 for comparison although, for the reasons given in Section 4.1, we would advise that adjusted, rather than expected, lod scores be reported. Not surprisingly, the lod scores based on SNPs are lower than those for microsatellites as 100 microsatellites are more informative than 100 SNPs. However, with huge numbers of SNP data becoming more readily available, this lack of informativeness is not really an issue. The SNP mean adjusted lod scores, however, are also all higher than the mean lod score calculated under the assumption that pedigree (a) was the true relationship i.e. when 1 and 2 were assumed to be unrelated. The same pattern is observed for the expected lod scores.
How does this approach perform when the true relationship is not actually one of those listed in the sample space? For the example presented here, suppose that individuals 1 and 2 could be second cousins. We simulated microsatellite data on 100 datasets exactly as before for the second cousin pedigree of figure 5 and then analysed them as before using the original sample space of figure 1 . Since the strength of the second cousin relationship is between the unrelated and half first cousin options, it is perhaps not surprising ( table 6 ) that the posterior probability distribution based on the unlinked data for the flat prior assigns more weight to pedigrees (b) and (c) than to (a) and (d). The mean adjusted lod scores at marker 5 ( table 7 ) are all higher than the mean lod score for the unrelated case (a) and so the method appears to work reasonably well even when the true relationship is not among those considered. However, we should note that the true relationship was still reasonably close to those of the sample space and further investigations on more distant possibilities are required.

Discussion
This paper presents an approach to adjusting the lod scores of a linkage analysis when pedigree founders are believed to be related. We exploit our previous method [12] for formal incorporation of prior information in relationship estimation problems. Importantly, we rely on being able to restrict the set of alternative extended pedigrees to a reasonable size and so consider applications where such information is often available. The focus is quite different, however. Here we are not interested in estimating the founder relationship as we believe this to be a nuisance factor for these applications. Instead, we update our prior probabilities with genetic data believed to be unlinked to the disease locus and use these posterior probabilities to weight the relevant likelihoods for the marker lod scores. The prior probabilities could be heavily influential when the data are not as informative as they were for our simulations and will always be required to separate pedigrees that have the same degree of relatedness. The choice of parameters for the prior distribution depends heavily on the particular application and general recommendations are not really sensible. However any reasonable set of probabilities on pedigrees, based possibly on expert knowledge, could in principle be used. We also note that adjusted lod scores should be calculated in preference to expected lod scores. Lod scores can be parametric (as presented here) or non-parametric.
The simulation study was designed to test the potential usefulness of the method and we used suitably polymorphic loci. There is no reason why SNP marker data cannot be used but the density of SNP marker maps may necessitate consideration of linkage disequilibrium between the markers since ignoring it will lead to severe biases in lod score calculations [26]. Linkage disequilibrium also depends on the population being studied. Our approach will benefit from extensive sensitivity checking to marker polymorphism and allele frequencies, disease location and genetic model as we have barely touched on these issues here. It made sense to ignore the data from markers that we suspected were linked to the disease (i.e. those on chromosome 1 in our simulations) for our posterior probability calculations. Alternatively, given the number of markers one would expect to have in practice, one could simply ignore markers with lod scores exceeding a fairly conservative threshold (e.g. 1) and their immediate neighbours.
As with any statistical analysis, violation of the underlying assumptions could lead to the wrong answers. If the Table 6. Mean posterior probabilities (based on unlinked data) from 100 simulations for each pedigree of figure 1 and each prior in Section 4 when the data were simulated on the second cousin pedigree of figure 5 Pedigree  true relationship between the founders is known, it should be used and the linkage calculations modified appropriately [19,20]. Alternatively, parameters describing the relationship could be estimated [21,22]. However, it is difficult to estimate quantities like the inbreeding coefficient in clinical practice because there is rarely sufficient information. In human genetics, it is customary to take the closest relationship between the founders (i.e. the shortest loop) as the best estimate and ignore all the more distant possibilities, but this can lead to serious underestimation of homozygosity in consanguineous populations [23]. Our approach is to look at a finite set of alternative possible relationships connecting all the individuals of interest, rather than specific relatedness parameters for the founders. In particular, it does not restrict us to considering pairs of founders: relationships between any number of founders can be specified. It does rely on having sufficient prior information on family histories, as we believe would often be the case for the applications we have in mind, to suitably restrict the set of possible alternatives. All the calculations we have presented can be done with existing freely available software. It would seem plausible that the degree of relatedness or precise relationship amongst the founders in a linkage analysis is not quite as important as the realisation that they are not unrelated. In our simple experiment with the second cousin example, we did not suffer too badly from not having the true relationship in the sample space and our adjusted lod scores were quite insensitive to the priors we used. It would be interesting to see how much of an improvement over the unadjusted analysis can be obtained from a simple modification that allows for 'general relatedness', in some sense. Liu et al. [27] consider highly looped inbred pedigree structures which are computationally too complex to handle. Their proposal is to simplify the pedigree but keep the inbreeding values close to the true values thus creating 'a single hypothetical loop' for each individual. They demonstrated that this approach yielded a considerable decrease in the number of false positive linkage signals when compared with the standard approach of using only the shortest loops. The authors note that this hypothetical loop will ignore the differences in genome sharing between closely related individuals and those more distantly related, due to the different numbers of recombinations expected. Consequently, the probability of recombination may be underestimated which, in turn will affect the power to detect linkage. Nonetheless, the idea is appealing and could perhaps be adapted to other applications.
Since the precise relationship amongst the founders is not particularly of interest in these applications, sequential approaches to relationship estimation which will quickly provide a good estimate could also be used here. However, even when inbreeding between founders has been accounted for, additional bias may be introduced when combining information from several families in a linkage study. As noted by Leutenegger et al. [19], the sample likelihood is not necessarily the product of the individual family likelihoods when the families all derive from a genetic isolate or a population with a long history of consanguineous marriages. That aside, we have shown that it is not necessary to require founders to be unrelated in linkage applications when that assumption is dubious and when some information on the possible form of their relatedness is available. As more and more marker data become available, estimating relationships will be easier, the effects of the prior on the posterior distribution will be less, and the adjusted lod scores we have presented here will be closer to the 'true' lod scores.