Sex chromosomes and genetic association studies

Although the literature concerning statistical testing for genotype-phenotype association in family-based and population-based studies is very extensive, until recently the sex chromosomes have received little attention. Here it is shown that the X chromosome in particular presents special problems with respect to efficient analysis of mixed-sex population studies, and as a result of X inactivation. This paper reviews recent developments in approaching these problems.


Introduction
The statistical problem of testing for association between phenotype and genetic markers on the sex chromosomes has received less attention than tests for autosomal markers. The advent of genome-wide association studies has hugely increased the number of studies of associations with the sex chromosomes and, in this context, it has recently been recognized that the X chromosome, in particular, poses special problems [1].
Firstly, in population-based case-control studies involving both male and female subjects, associations can be confounded by differences in sex ratio between cases and controls even when, as is usually the case, allele frequencies do not differ between the sexes. Conventional epidemiological approaches to deal with this confounding can be very inefficient.
Secondly, the phenomenon of X inactivation, which affects most loci on the X chromosome in females, means that the risk attributable to a single allele would generally be expected to be less in females than in males. An efficient statistical test would allow for this. This review describes approaches to statistical testing for association with loci on the sex chromosomes, largely in the context of case-control studies of binary phenotypes. The X chromosome will be the focus of most of the review. Later sections will briefly discuss family-based association studies, quantitative phenotypes and methods for the Y chromosome.

Case-control studies
Before turning to the special problems presented by the X chromosome, we shall review simple methods of analysis for autosomal loci in case-control studies.

Autosomal loci
Counting chromosomes Many early analyses of association between a binary phenotype and a genetic marker used simple tests for association in contingency tables in which cell entries were counts of chromosomes rather than people. Thus, for an autosomal locus, the total cell count is twice the number of subjects studied, and associations were tested simply by comparing allele frequencies between cases and controls. In the diallelic case, this reduces to the analysis of a 2 × 2 table (Table 1). The most commonly used test was the familiar chi-squared test for association which, here, has one degree of freedom (df). The calculations of the chisquared test statistic, T say, can be broken down in a manner which aids later discussion as follows, where N is the total sample size and A and a the two alleles at the locus: However, under the assumption of Hardy-Weinberg equilibrium (HWE), the two copies carried by each subject can be regarded as drawn randomly and independently from a population of chromosomes. Thus, with this assump tion, the analysis is valid in the sense of maintaining the correct type 1 error rate [2].

Counting subjects
This analysis can be contrasted with testing for association in the 3 × 2 contingency table of genotype (A/A, A/a, or a/a) against phenotype, in which the counts are of subjects. This makes no HWE assumption, but delivers a chisquared test on two df rather than on one df. The difference in df between these tests reflects the different alternative hypotheses against which they are powerful. The one-df test based on chromosome counts turns out to be most powerful against a rather restrictive 'trend' model in which the odds ratios between adjacent rows (that is, A/A versus A/a, and A/a versus a/a) are equal [2]. For a rare disease, these odds ratios correspond to relative risks for disease in the population, so that this alternative hypothesis corresponds to a model in which each copy of the A allele multiplies the risk by a constant (the 'allelic' odds ratio or relative risk). Since this test is a score test in the wider class of models in which the expected case:control ratio for the A/a genotype is intermediate between those for a/a and A/A genotypes, it is also locally most powerful against small effects in this wider class of models. In contrast, the two-df test is powerful against unrestricted alternatives but, in consequence, is considerably less powerful against the trend alternative.
In modern complex disease genetics, most associations discovered to date have been of the trend type and one-df tests are therefore usually regarded as the most useful (although most analysts would also carry out a two-df test). However, chromosome counting in the 2 × 2 table has largely been abandoned owing to its reliance on the HWE assumption, having been supplanted by the Cochran-Armitage test for trend [3][4][5] in the 3 × 2 tabulation of subjects by genotype and phenotype. Writing x i as the genotype for subject i, scored 0, 1 or 2; x as its mean over all subjects; and M 1 , M 0 and M =M 1 + M 0 as the numbers of cases, controls and subjects, respectively, this test can be calculated as follows: In this test, the 'score', U, is identical to that for the test on chromosome counts. The difference between the two approaches is in the formula for its variance, V; the Cochran-Armitage test uses a variance estimate that does not assume HWE. (As for the simple 2 × 2 table, M -1 is often replaced by M in most derivations, although this leads to a very slightly biased variance estimate.)

Control for confounding and interaction
When there is a danger of confounding, for example by population structure, these tests can be extended. Cases and controls are classified into strata, for example by a score based on the first few principal components in a genome-wide study [6]. Each stratum then provides a 2 × 2 or 3 × 2 table. Extended tests which combine the evidence across strata may be carried out by: • calculating U and V in each stratum, • summing U and V values over strata to form a combined U and V, and • calculating the test statistic in the usual way: T = U 2 /V. The same method can be used to combine data from different studies in meta-analysis. The resultant tests still have one df and maintain high power against the trend alternative. However, this power is obtained at a cost of an additional assumption: that the effect of genotype on phenotype (as measured by the allelic odds ratio) does not vary across strata. In the case of a 2 × 2 table, this approach yields the Mantel-Haenszel test [7], while, for trend tests in 3 × 2 tables, it yields the Mantel-extension test [8]. The two-df test for the 3 × 2 table may be extended in a similar manner [5,9]. An alternative to the use of stratification to control for confounding is logistic regression analysis, with case-control status treated as the binary dependent variable [10].
When the assumption of constant genotype effect across strata is violated, there is said to be 'interaction', and the power of the above tests is reduced; in the rather unlikely case in which the effect is in opposite directions in males and females, little or no power would remain. An alternative way to combine evidence is to sum chi-squared Table 1 A 2 × 2 table of chromosome counts: counts of chromosomes for cases and controls according to allele of a diallelic marker locus where N is the total sample size and A and a the two alleles at the locus

Allele Cases Controls Total
values over strata, while summing df in the same way. Thus, if there are K strata, summing the one-df tests results in a test with K df, and summing the two-df tests leads to a 2K df test. Such tests preserve power in the case where there is strong interaction, although they are inefficient otherwise.

The X chromosome
In this section we consider the applicability of the above standard methods to loci on the X chromosome, and discuss some recently developed improvements. For simplicity we will concentrate on the case of a diallelic locus, although the methods described can be generalized.
Most of the difficulties concern mixed-sex studies [11], particularly those in which the sex ratio differs, perhaps markedly, between cases and controls. Although in general one might consider this to be a failure of design, many large-scale genome-wide association studies make use of genotype data for standard control sets [1]. The sex ratio can then be very different between cases and controls.

Counting chromosomes
For markers on the X chromosome (other than those markers in the pseudo-autosomal region of the Y chromosome), the derivation of association tests is more complicated. If all subjects were female, we would analyze the 3 × 2 table of subject counts. Conversely, if all subjects were male, each subject would contribute only one chromo some and the analysis would revert to that of 2 × 2 tables. But what if the study contains both male and female subjects? There are several simple approaches to this difficulty. The first approach is to revert to chromosome counting, analyzing the 2 × 2 table in which each female case has contributed two observations and each male case one. This has two obvious problems: • it relies on the HWE assumption for females, and • the association could be confounded by sex if (a) allele frequencies differ between the sexes, and (b) the sex ratio differs between cases and controls.
The second of these problems can be addressed by standard methods for control for confounding [7]. However, in the case where the sex ratio varies between cases and controls but allele frequencies do not differ by sex, sex is not truly a confounder and treating it as such can be very inefficient. To take an extreme example, in a study of breast cancer in which the available control sample contained both sexes, stratification by sex would effectively discard the data from male controls from the comparison. Although males rarely contract breast cancer, they nevertheless provide valuable information concerning allele frequencies in the population.

Counting subjects
It is not immediately obvious how one would apply methods based on comparing distributions of genotypes in mixed-sex studies, since male and female genotypes are qualitatively different. One might decide to combine each of the two male genotypes with one of the three female genotypes (although quite how to do this is not obvious), or one could make the judgement that there are five distinct genotypes and analyze the 5 × 2 contingency table. Neither of these approaches is satisfactory, not least because any differences in the sex ratio between cases and controls would give rise to an apparent association, even when allele frequencies do not differ between the sexes.
An alternative approach is to stratify by sex. Females then contribute a 3 × 2 table and males a 2 × 2 table. Assuming no marked interaction between sex and genotype, a one-df trend test can be obtained by combining the trend tests for these tables as for autosomal loci, while, for the contribution of females, a variance estimate that allows for deviation from HWE is used. In the presence of strong interaction, better power would be obtained by adding the chi-squared values to yield a two-df test. Zheng et al. discussed these tests, and proposed two alternative ways of combining evidence across strata [11].
Only females allow calculation of a two-df genotype-based test. Assuming no interaction between sex and genotype, a combined two-df test test can be obtained by adding the difference between the one-and two-df tests in females to the combined one-df test. Alternatively, to allow for strong interaction with sex, the two-df chi-squared for females can be added to the one-df chi-squared for males, to obtain a three-df test.
Stratification by sex avoids the HWE assumption in females but, as for methods based on chromosome counts, can be very inefficient if the sex ratio differs between cases and controls. If the allele frequency also differs between sexes, then sex confounds the association and stratification is essential. Otherwise, this loss of efficiency is un necessary; we shall describe how it can be avoided in the next section.

The role of X inactivation: 'dosage compensation'
The above approaches suffer from a further, less obvious, problem. Unless male and female genotypes are to be regarded as completely different (as in an analysis of the 5 × 2 table), the effect of an allele is implicitly assumed to be the same in males and females. Formally, the alternative hypothesis against which these approaches would be most powerful assumes that the allelic odds ratio would not differ between the sexes. This is unlikely to be the case; most loci on the X chromosome are subject to X inactivation [12] in females; only one allele from each pair of alleles is expressed. Inactivation takes place at an early stage of fetal development and, except in rare circumstances, the inactivated allele in each cell is selected at random, so that, on an average, 50% of cells in the adult female will express one allele and 50% of cells the other [13]. A consequence of this is that the effect of the A allele in males should be equivalent to the difference between a/a and A/A homo zygous females. To preserve optimal power against this alternative, males must be given twice the weight of females.
In the chromosome-counting analysis, we must either count each allele twice in males or, equivalently and more intuitively, count each allele in females as ½, reflecting a 'dosage compensation' for X inactivation. This was the approach employed by the Wellcome Trust Case-Control Consortium [1] and described in detail by Clayton [14].
In the sex-stratified analysis, differential weighting of males and females is straightforward. The score, U F , for the 3 × 2 table of genotype frequencies in females is weighted by ½, while the score, U M , for the 2 × 2 table of allele frequencies is unweighted. The combined, stratified test is given by: and the chi-squared test statistic is given by T = U 2 /V as usual.
As pointed out above, stratification by sex loses power and should be avoided when, as will almost always be the case, allele frequencies can be assumed not to differ between the sexes. In Clayton's test, U is calculated from allele counts pooled across the two sexes, in females counting each allele as ½, as described above, and a variance estimate is used which allows for deviation from HWE in females. Clayton also proposed a two-df test in which the additional degree of freedom is based on data from females alone [14]. Although this test employed dosage compensation for X inactivation, it could easily be adapted for loci in which X inactivation is thought to be unlikely.

Regression generalizations
To control for an additional extraneous factor, the tests described in the previous section may be extended to allow for a stratification of the data; as before, the score statistics U, and their variances, V, are calculated for each stratum and simply added over strata. However, some situations may call for use of regression models, for example when multiple covariates are involved. Regression programs also potentially provide a way of carrying out tests similar to those described above when specialist software is not available. Regression generalization of the testing problem may be approached in two ways, depending on whether the genotype or the phenotype is treated as the dependent variable.

Genotype as dependent variable
The natural regression generalization of the chromosomecounting approach for autosomal loci is to treat the measured genotype score, x i , as representing the number of 'successes' in two binomial trials. This is declared as the dependent variable in a logistic regression model in which case/control status appears as one of the explanatory variables. The coefficient of case/control status is then the allelic odds ratio. This approach assumes HWE (conditional upon explanatory variables), but this assumption can be relaxed by use of 'robust' estimates of the variance of regression coefficients [15,16]. These are available in many computer packages. This approach generalizes the one-df testing procedure, but does not lead to natural generalization of the two-df test.
For the X chromosome, the genotype in a female could be treated in exactly the same way, while treating the geno type for a male as the outcome of a single trial. However, this would make no allowance for X inactivation. The alternative approach is to treat the male genotypes as either a/a or A/A (x i = 0 or 2), but giving them weight ½ in the regression analysis in view of their greater variance. Use of robust variance estimates for coefficients is then obligatory.

Phenotype as dependent variable
In epidemiological studies of disease outcome, it is more usual to treat phenotype (disease status) as the dependent variable in a logistic regression, even when the data have been obtained by case/control sampling [17]. A generalization of the one-df test can be obtained using this approach by entering genotype codes, x i , as an explanatory variable in a logistic regression with case/control status as dependent variable. X inactivation can be taken into account by scoring x i as 0, 1 or 2 in females, and 0 or 2 in males. If the sex ratio varies between cases and controls, the disease status depends on sex and it would be natural to include sex as an additional explanatory variable. This mirrors the simple analysis in which sex is introduced as a stratification and, as we have seen, this can be inefficient. If allele frequency does not depend on sex, sex can be omitted from the regression without compromising the test of the regression coefficient for genotype, but, if sex and disease status are truly related, omission of sex means that the model is mis-specified and it is necessary to use robust variance estimates for coefficients. An attraction of this method of analysis is that it allows several markers to be entered into the regression simultaneously in analyses whose aim is to narrow down potential causal variants when several markers in linkage disequilibrium with one another are all related to phenotype.
At first sight, this approach could also provide a two-df test by adding an explanatory indicator variable that contrasts the heterozygous genotype from the two homozygous genotypes (males again being coded as homozygous). Unfortunately, this indicator variable is related to sex and, if sex and disease status are related but sex has been omitted from the regression in the interest of efficiency, its coefficient is confounded and the test is not valid. Clayton suggests that the contribution to the test of the second degree of freedom can be estimated from a second regression analysis carried out only in females [14].

Family-based studies
Association tests for loci on the X chromosome have received rather more attention in the context of family-based studies.

Case-parent trios
The case-parent trio design has been widely advocated as providing protection against false associations due to confounding by population structure. The transmission/ disequilibrium test (TDT) [18] is a one-df test for association, which is optimal against the same alternative hypo thesis as the one-df test for population-based studies, which was discussed in the first section of this review. Transmissions of alleles from heterozygous parents are counted, and the TDT tests whether counts of transmissions of A or a alleles depart from a 1:1 ratio. If these counts are denoted X A and X a , respectively, a chi-squared test on one df may be calculated as follows: More general derivations [19][20][21] lead also to a two-df test against the wider class of alternative hypotheses, by comparing transmissions of genotypes from parent to affected offspring against expected frequencies based on Mendelian transmission.
For the X chromosome, transmissions from fathers are confounded with the sex of the affected offspring and are therefore uninformative; only transmissions from heterozygous mothers contribute to the test, which has been termed the XTDT [22] (such acronyms are used liberally in this literature but serve to confuse rather than illuminate; they will not be used further in this review). Note that mother-son transmission can be determined unambiguously without knowledge of the father's genotype but, for mother-daughter transmissions, the father's genotype must also be available in order to determine which of the daughter's two copies was received from the mother and hence how to score the maternal transmission. Calculation of a one-df test follows that for the TDT, except that, to allow for X inactivation, mother-son transmissions should be given twice the weight given to mother-daughter transmissions [23]. Thus, transmissions are counted separately for male and female affected offspring and, using subscripts M and F to denote male and female contributions to the test, we calculate U = U F /2 + U M and V = V F /4 + V M . This mirrors the analysis of a populationbased study stratified by sex.
As for population-based studies, a two-df test may be calculated by adding an additional contribution reflecting deviation from the trend model which underlies the one-df test. Only mother-daughter transmissions contribute to this. The counts of transmissions of allele A or allele a to daughters are further subdivided according to whether the father carries A or a on his X chromosome, thus yielding a 2 × 2 table. The chi-squared test for association in this table provides the additional contribution which, when added to the one-df test, provides the two-df test.

Discordant sib pairs
An alternative to the case-parent trio design, often advocated for late-onset diseases in which parents are not available, is to compare genotypes of sib pairs discordant for disease status. This design is simply an example of the one-to-one matched case-control study which is widely used in epidemiology and, for autosomal loci, its analysis follows standard methods [10]. In effect these are stratified analyses in which each sib pair forms a stratum [7]. Here the difference referred to earlier between M and M 1 in the variance formulae in tests such as the Cochran-Armitage test is important (since M = 2).
For loci on the X chromosome, the methods discussed above can readily be adapted by stratification by sib pair. However, a complication is presented by unlike-sex sib pairs. The conditional argument requires one to argue conditionally on both genotypes, the information for association coming from whether the male sib was affected and the female sib unaffected, or vice versa. But, as pointed out by Horvath et al [22], the probabilities of these outcomes are also affected by sex differences in disease risk. Horvath et al. discussed inclusion of an additional parameter in the model, but were concerned about possible model mis-specification and advocated omission of such sib pairs, as have later authors [24,25].

More general pedigrees
The above methods can be generalized to allow integration of information from nuclear families including both parents and disease-discordant siblings, those containing more than two sibs, and in more general pedigrees. The general idea is to combine contributions to a score-type statistic, U, for all trios and discordant sib pairs first within each pedigree, and then over all pedigrees. The variance of the test statistic is then estimated using an estimator robust to associations between the contributions within pedigrees -for example, in the 'pedigree disequilibrium test' [26], by the sum of the squared contributions of each pedigree.
A practical problem with such tests is incomplete data.
Horvath et al. [22] discussed extension of the 'reconstruction combined' TDT [27] to the X chromosome. This test fills in missing genotypes when they can be inferred from other family members, correcting for the biases that occur as a result. Ding et al. [24] extended the pedigree disequilibrium test to the X chromosome, using a Monte Carlo approach to deal with missing genotypes. This approach to missing data used estimates of allele frequencies without allowing for their uncertainty, a deficiency later corrected in the approach of Chung et al. [25]. Finally, Schneiter et al. [28] described an extension to X loci, of the Rabinowitz and Laird approach to family-based association testing in the presence of incomplete data [29]. None of these methods employed dosage compensation for X inactivation.

Quantitative traits
Analysis of population-based studies of quantitative traits may be carried out by conventional regression methods, with or without dosage compensation. This follows similar lines to the analysis of case-control studies by logistic regression methods as discussed in the last section. Alternatively, Clayton's test discussed above, with a change of notation, can be applied to quantitative phenotypes [14]. However, analysis of family-based studies is more challenging, requiring achievement of two aims: • partition of the information for association into between-family and within-family components, where only the latter is robust to confounding by hidden population structure [30], and • allowance for correlations within family.
Zhang et al. proposed such a method, dealing with the problem of correlation by using a mixed model [31]. This method also allows analysis of effects of two-locus haplotypes, using the Expectation-Maximization (EM) algo rithm to reconstruct missing parental genotypes and to impute haplotype phase in females. Zhang et al. also discussed implementation of their method both with and without dosage compensation, since not all loci on the X chromosome are subject to X inactivation. They suggested using both tests in a sequential procedure, with the dosagecompensated test used first. Since most loci are subject to X inactivation, they suggested choosing an α level for the dosage-compensated test four times that for the noncompensated test.

The Y chromosome
The Y chromosome presents none of the problems discussed above. Loci in the pseudo-autosomal region can be treated as autosomal, while marker loci in the nonhomologous region only occur in males, who each carry just one copy inherited from their father. A similar situation applies for mitochondrial loci, in which there is a single copy inherited from the mother.
Analysis of single loci on the Y chromosome is straightforward. However, since cross-over recombination cannot occur, linkage disequilibrium extends across the entire chromosome and association data provide no information about the location of the causal variant or variants. Perhaps partly because of this, and partly because there are no problems of haplotype phase uncertainty on the Y chromosome, most studies of disease associations with the Y chromosome do not stop at single marker analyses, but go on to perform cladistic analyses of haplotype risk. Since such analyses can be carried out for markers on other chromosomes, albeit with some extra difficulty due to phase uncertainty, they will not be discussed further here.
Studies of case-parent trios are uninformative about associations on the Y chromosome.

Conclusion
In conclusion, it has been demonstrated that testing for genetic associations with loci on the X chromosome is not as straightforward as was often imagined. Although in population-based studies involving single sexes there is no difficulty in applying standard methods, in mixed-sex studies difficulties are encountered, both in finding a way to combine evidence from the two sexes without necessarily assuming that sex is a confounder (that is, that allele frequencies differ between sexes, an assumption that can lead to substantial loss of power), and in appropriately weighting the evidence taking account of the likely effect of X inactivation.
For family-based association studies, design and analysis can also be modified when the interest is in the X chromosome. For affected sons, it is necessary only to geno type the mother in order to obtain all relevant transmission data and, as for population-based studies, the evidence from affected sons and daughters should be differentially weighted. Taking account of both of these considerations, it follows that mother-son pairs would be expected to be more informative than daughter-parent trios. In studies of disease-discordant sib pairs, mixed-sex pairs are particularly problematic and are best avoided.