 Review
 Open Access
 Published:
Sex chromosomes and genetic association studies
Genome Medicine volumeÂ 1, ArticleÂ number:Â 110 (2009)
Abstract
Although the literature concerning statistical testing for genotypephenotype association in familybased and populationbased studies is very extensive, until recently the sex chromosomes have received little attention. Here it is shown that the X chromosome in particular presents special problems with respect to efficient analysis of mixedsex population studies, and as a result of X inactivation. This paper reviews recent developments in approaching these problems.
Introduction
The statistical problem of testing for association between phenotype and genetic markers on the sex chromosomes has received less attention than tests for autosomal markers. The advent of genomewide association studies has hugely increased the number of studies of associations with the sex chromosomes and, in this context, it has recently been recognized that the X chromosome, in particular, poses special problems [1].
Firstly, in populationbased casecontrol studies involving both male and female subjects, associations can be confounded by differences in sex ratio between cases and controls even when, as is usually the case, allele frequencies do not differ between the sexes. Conventional epidemiological approaches to deal with this confounding can be very inefficient.
Secondly, the phenomenon of X inactivation, which affects most loci on the X chromosome in females, means that the risk attributable to a single allele would generally be expected to be less in females than in males. An efficient statistical test would allow for this.
This review describes approaches to statistical testing for association with loci on the sex chromosomes, largely in the context of casecontrol studies of binary phenotypes. The X chromosome will be the focus of most of the review. Later sections will briefly discuss familybased association studies, quantitative phenotypes and methods for the Y chromosome.
Casecontrol studies
Before turning to the special problems presented by the X chromosome, we shall review simple methods of analysis for autosomal loci in casecontrol studies.
Autosomal loci
Counting chromosomes
Many early analyses of association between a binary phenotype and a genetic marker used simple tests for association in contingency tables in which cell entries were counts of chromosomes rather than people. Thus, for an autosomal locus, the total cell count is twice the number of subjects studied, and associations were tested simply by comparing allele frequencies between cases and controls. In the diallelic case, this reduces to the analysis of a 2 Ã— 2 table (Table 1). The most commonly used test was the familiar chisquared test for association which, here, has one degree of freedom (df). The calculations of the chisquared test statistic, T say, can be broken down in a manner which aids later discussion as follows, where N is the total sample size and A and a the two alleles at the locus:
Here, U is a test 'score' which, under the 'null' hypothesis of no association, has expected value zero; V is its variance, again under the null hypothesis. Note that U corresponds with the usual 'observed minus expected' frequency calculation for the top lefthand cell in the table. (The traditional test based on Î£(O  E)^{2}/E over all four cells of the table is equivalent to the above but with a slightly biased variance estimate in which (N  1) is replaced by N.) At first sight, doubling the sample size by counting chromosomes rather than people would seem questionable.
However, under the assumption of HardyWeinberg equilibrium (HWE), the two copies carried by each subject can be regarded as drawn randomly and independently from a population of chromosomes. Thus, with this assumption, the analysis is valid in the sense of maintaining the correct type 1 error rate [2].
Counting subjects
This analysis can be contrasted with testing for association in the 3 Ã— 2 contingency table of genotype (A/A, A/a, or a/a) against phenotype, in which the counts are of subjects. This makes no HWE assumption, but delivers a chisquared test on two df rather than on one df. The difference in df between these tests reflects the different alternative hypotheses against which they are powerful. The onedf test based on chromosome counts turns out to be most powerful against a rather restrictive 'trend' model in which the odds ratios between adjacent rows (that is, A/A versus A/a, and A/a versus a/a) are equal [2]. For a rare disease, these odds ratios correspond to relative risks for disease in the population, so that this alternative hypothesis corresponds to a model in which each copy of the A allele multiplies the risk by a constant (the 'allelic' odds ratio or relative risk). Since this test is a score test in the wider class of models in which the expected case:control ratio for the A/a genotype is intermediate between those for a/a and A/A genotypes, it is also locally most powerful against small effects in this wider class of models. In contrast, the twodf test is powerful against unrestricted alternatives but, in consequence, is considerably less powerful against the trend alternative.
In modern complex disease genetics, most associations discovered to date have been of the trend type and onedf tests are therefore usually regarded as the most useful (although most analysts would also carry out a twodf test). However, chromosome counting in the 2 Ã— 2 table has largely been abandoned owing to its reliance on the HWE assumption, having been supplanted by the CochranArmitage test for trend [3â€“5] in the 3 Ã— 2 tabulation of subjects by genotype and phenotype. Writing x_{ i }as the genotype for subject i, scored 0, 1 or 2; as its mean over all subjects; and M_{1}, M_{0} and M = M_{1} + M_{0} as the numbers of cases, controls and subjects, respectively, this test can be calculated as follows:
In this test, the 'score', U, is identical to that for the test on chromosome counts. The difference between the two approaches is in the formula for its variance, V; the CochranArmitage test uses a variance estimate that does not assume HWE. (As for the simple 2 Ã— 2 table, M  1 is often replaced by M in most derivations, although this leads to a very slightly biased variance estimate.)
Control for confounding and interaction
When there is a danger of confounding, for example by population structure, these tests can be extended. Cases and controls are classified into strata, for example by a score based on the first few principal components in a genomewide study [6]. Each stratum then provides a 2 Ã— 2 or 3 Ã— 2 table. Extended tests which combine the evidence across strata may be carried out by:
â€¢ calculating U and V in each stratum,
â€¢ summing U and V values over strata to form a combined U and V, and
â€¢ calculating the test statistic in the usual way: T = U^{2}/V.
The same method can be used to combine data from different studies in metaanalysis. The resultant tests still have one df and maintain high power against the trend alternative. However, this power is obtained at a cost of an additional assumption: that the effect of genotype on phenotype (as measured by the allelic odds ratio) does not vary across strata. In the case of a 2 Ã— 2 table, this approach yields the MantelHaenszel test [7], while, for trend tests in 3 Ã— 2 tables, it yields the Mantelextension test [8]. The twodf test for the 3 Ã— 2 table may be extended in a similar manner [5, 9]. An alternative to the use of stratification to control for confounding is logistic regression analysis, with casecontrol status treated as the binary dependent variable [10].
When the assumption of constant genotype effect across strata is violated, there is said to be 'interaction', and the power of the above tests is reduced; in the rather unlikely case in which the effect is in opposite directions in males and females, little or no power would remain. An alternative way to combine evidence is to sum chisquared values over strata, while summing df in the same way. Thus, if there are K strata, summing the onedf tests results in a test with K df, and summing the twodf tests leads to a 2K df test. Such tests preserve power in the case where there is strong interaction, although they are inefficient otherwise.
The X chromosome
In this section we consider the applicability of the above standard methods to loci on the X chromosome, and discuss some recently developed improvements. For simplicity we will concentrate on the case of a diallelic locus, although the methods described can be generalized. Most of the difficulties concern mixedsex studies [11], particularly those in which the sex ratio differs, perhaps markedly, between cases and controls. Although in general one might consider this to be a failure of design, many largescale genomewide association studies make use of genotype data for standard control sets [1]. The sex ratio can then be very different between cases and controls.
Counting chromosomes
For markers on the X chromosome (other than those markers in the pseudoautosomal region of the Y chromosome), the derivation of association tests is more complicated. If all subjects were female, we would analyze the 3 Ã— 2 table of subject counts. Conversely, if all subjects were male, each subject would contribute only one chromosome and the analysis would revert to that of 2 Ã— 2 tables. But what if the study contains both male and female subjects? There are several simple approaches to this difficulty. The first approach is to revert to chromosome counting, analyzing the 2 Ã— 2 table in which each female case has contributed two observations and each male case one. This has two obvious problems:
â€¢ it relies on the HWE assumption for females, and
â€¢ the association could be confounded by sex if (a) allele frequencies differ between the sexes, and (b) the sex ratio differs between cases and controls.
The second of these problems can be addressed by standard methods for control for confounding [7]. However, in the case where the sex ratio varies between cases and controls but allele frequencies do not differ by sex, sex is not truly a confounder and treating it as such can be very inefficient. To take an extreme example, in a study of breast cancer in which the available control sample contained both sexes, stratification by sex would effectively discard the data from male controls from the comparison. Although males rarely contract breast cancer, they nevertheless provide valuable information concerning allele frequencies in the population.
Counting subjects
It is not immediately obvious how one would apply methods based on comparing distributions of genotypes in mixedsex studies, since male and female genotypes are qualitatively different. One might decide to combine each of the two male genotypes with one of the three female genotypes (although quite how to do this is not obvious), or one could make the judgement that there are five distinct genotypes and analyze the 5 Ã— 2 contingency table. Neither of these approaches is satisfactory, not least because any differences in the sex ratio between cases and controls would give rise to an apparent association, even when allele frequencies do not differ between the sexes.
An alternative approach is to stratify by sex. Females then contribute a 3 Ã— 2 table and males a 2 Ã— 2 table. Assuming no marked interaction between sex and genotype, a onedf trend test can be obtained by combining the trend tests for these tables as for autosomal loci, while, for the contribution of females, a variance estimate that allows for deviation from HWE is used. In the presence of strong interaction, better power would be obtained by adding the chisquared values to yield a twodf test. Zheng et al. discussed these tests, and proposed two alternative ways of combining evidence across strata [11].
Only females allow calculation of a twodf genotypebased test. Assuming no interaction between sex and genotype, a combined twodf test test can be obtained by adding the difference between the one and twodf tests in females to the combined onedf test. Alternatively, to allow for strong interaction with sex, the twodf chisquared for females can be added to the onedf chisquared for males, to obtain a threedf test.
Stratification by sex avoids the HWE assumption in females but, as for methods based on chromosome counts, can be very inefficient if the sex ratio differs between cases and controls. If the allele frequency also differs between sexes, then sex confounds the association and stratification is essential. Otherwise, this loss of efficiency is unnecessary; we shall describe how it can be avoided in the next section.
The role of X inactivation: 'dosage compensation'
The above approaches suffer from a further, less obvious, problem. Unless male and female genotypes are to be regarded as completely different (as in an analysis of the 5 Ã— 2 table), the effect of an allele is implicitly assumed to be the same in males and females. Formally, the alternative hypothesis against which these approaches would be most powerful assumes that the allelic odds ratio would not differ between the sexes. This is unlikely to be the case; most loci on the X chromosome are subject to X inactivation [12] in females; only one allele from each pair of alleles is expressed. Inactivation takes place at an early stage of fetal development and, except in rare circumstances, the inactivated allele in each cell is selected at random, so that, on an average, 50% of cells in the adult female will express one allele and 50% of cells the other [13]. A consequence of this is that the effect of the A allele in males should be equivalent to the difference between a/a and A/A homozygous females. To preserve optimal power against this alternative, males must be given twice the weight of females.
In the chromosomecounting analysis, we must either count each allele twice in males or, equivalently and more intuitively, count each allele in females as 1/2, reflecting a 'dosage compensation' for X inactivation. This was the approach employed by the Wellcome Trust CaseControl Consortium [1] and described in detail by Clayton [14].
In the sexstratified analysis, differential weighting of males and females is straightforward. The score, U_{F}, for the 3 Ã— 2 table of genotype frequencies in females is weighted by 1/2, while the score, U_{M}, for the 2 Ã— 2 table of allele frequencies is unweighted. The combined, stratified test is given by:
and the chisquared test statistic is given by T = U^{2}/V as usual.
As pointed out above, stratification by sex loses power and should be avoided when, as will almost always be the case, allele frequencies can be assumed not to differ between the sexes. In Clayton's test, U is calculated from allele counts pooled across the two sexes, in females counting each allele as 1/2, as described above, and a variance estimate is used which allows for deviation from HWE in females. Clayton also proposed a twodf test in which the additional degree of freedom is based on data from females alone [14]. Although this test employed dosage compensation for X inactivation, it could easily be adapted for loci in which X inactivation is thought to be unlikely.
Regression generalizations
To control for an additional extraneous factor, the tests described in the previous section may be extended to allow for a stratification of the data; as before, the score statistics U, and their variances, V, are calculated for each stratum and simply added over strata. However, some situations may call for use of regression models, for example when multiple covariates are involved. Regression programs also potentially provide a way of carrying out tests similar to those described above when specialist software is not available. Regression generalization of the testing problem may be approached in two ways, depending on whether the genotype or the phenotype is treated as the dependent variable.
Genotype as dependent variable
The natural regression generalization of the chromosomecounting approach for autosomal loci is to treat the measured genotype score, x_{ i }, as representing the number of 'successes' in two binomial trials. This is declared as the dependent variable in a logistic regression model in which case/control status appears as one of the explanatory variables. The coefficient of case/control status is then the allelic odds ratio. This approach assumes HWE (conditional upon explanatory variables), but this assumption can be relaxed by use of 'robust' estimates of the variance of regression coefficients [15, 16]. These are available in many computer packages. This approach generalizes the onedf testing procedure, but does not lead to natural generalization of the twodf test.
For the X chromosome, the genotype in a female could be treated in exactly the same way, while treating the genotype for a male as the outcome of a single trial. However, this would make no allowance for X inactivation. The alternative approach is to treat the male genotypes as either a/a or A/A (x_{ i }= 0 or 2), but giving them weight 1/2 in the regression analysis in view of their greater variance. Use of robust variance estimates for coefficients is then obligatory.
Phenotype as dependent variable
In epidemiological studies of disease outcome, it is more usual to treat phenotype (disease status) as the dependent variable in a logistic regression, even when the data have been obtained by case/control sampling [17]. A generalization of the onedf test can be obtained using this approach by entering genotype codes, x_{ i }, as an explanatory variable in a logistic regression with case/control status as dependent variable. X inactivation can be taken into account by scoring x_{ i }as 0, 1 or 2 in females, and 0 or 2 in males. If the sex ratio varies between cases and controls, the disease status depends on sex and it would be natural to include sex as an additional explanatory variable. This mirrors the simple analysis in which sex is introduced as a stratification and, as we have seen, this can be inefficient. If allele frequency does not depend on sex, sex can be omitted from the regression without compromising the test of the regression coefficient for genotype, but, if sex and disease status are truly related, omission of sex means that the model is misspecified and it is necessary to use robust variance estimates for coefficients. An attraction of this method of analysis is that it allows several markers to be entered into the regression simultaneously in analyses whose aim is to narrow down potential causal variants when several markers in linkage disequilibrium with one another are all related to phenotype.
At first sight, this approach could also provide a twodf test by adding an explanatory indicator variable that contrasts the heterozygous genotype from the two homozygous genotypes (males again being coded as homozygous). Unfortunately, this indicator variable is related to sex and, if sex and disease status are related but sex has been omitted from the regression in the interest of efficiency, its coefficient is confounded and the test is not valid. Clayton suggests that the contribution to the test of the second degree of freedom can be estimated from a second regression analysis carried out only in females [14].
Familybased studies
Association tests for loci on the X chromosome have received rather more attention in the context of familybased studies.
Caseparent trios
The caseparent trio design has been widely advocated as providing protection against false associations due to confounding by population structure. The transmission/disequilibrium test (TDT) [18] is a onedf test for association, which is optimal against the same alternative hypothesis as the onedf test for populationbased studies, which was discussed in the first section of this review. Transmissions of alleles from heterozygous parents are counted, and the TDT tests whether counts of transmissions of A or a alleles depart from a 1:1 ratio. If these counts are denoted X_{ A }and X_{ a }, respectively, a chisquared test on one df may be calculated as follows:
More general derivations [19â€“21] lead also to a twodf test against the wider class of alternative hypotheses, by comparing transmissions of genotypes from parent to affected offspring against expected frequencies based on Mendelian transmission.
For the X chromosome, transmissions from fathers are confounded with the sex of the affected offspring and are therefore uninformative; only transmissions from heterozygous mothers contribute to the test, which has been termed the XTDT [22] (such acronyms are used liberally in this literature but serve to confuse rather than illuminate; they will not be used further in this review). Note that motherson transmission can be determined unambiguously without knowledge of the father's genotype but, for motherdaughter transmissions, the father's genotype must also be available in order to determine which of the daughter's two copies was received from the mother and hence how to score the maternal transmission. Calculation of a onedf test follows that for the TDT, except that, to allow for X inactivation, motherson transmissions should be given twice the weight given to motherdaughter transmissions [23]. Thus, transmissions are counted separately for male and female affected offspring and, using subscripts M and F to denote male and female contributions to the test, we calculate U = U_{F}/2 + U_{M} and V = V_{F}/4 + V_{M}. This mirrors the analysis of a populationbased study stratified by sex.
As for populationbased studies, a twodf test may be calculated by adding an additional contribution reflecting deviation from the trend model which underlies the onedf test. Only motherdaughter transmissions contribute to this. The counts of transmissions of allele A or allele a to daughters are further subdivided according to whether the father carries A or a on his X chromosome, thus yielding a 2 Ã— 2 table. The chisquared test for association in this table provides the additional contribution which, when added to the onedf test, provides the twodf test.
Discordant sib pairs
An alternative to the caseparent trio design, often advocated for lateonset diseases in which parents are not available, is to compare genotypes of sib pairs discordant for disease status. This design is simply an example of the onetoone matched casecontrol study which is widely used in epidemiology and, for autosomal loci, its analysis follows standard methods [10]. In effect these are stratified analyses in which each sib pair forms a stratum [7]. Here the difference referred to earlier between M and M_{1} in the variance formulae in tests such as the CochranArmitage test is important (since M = 2).
For loci on the X chromosome, the methods discussed above can readily be adapted by stratification by sib pair. However, a complication is presented by unlikesex sib pairs. The conditional argument requires one to argue conditionally on both genotypes, the information for association coming from whether the male sib was affected and the female sib unaffected, or vice versa. But, as pointed out by Horvath et al [22], the probabilities of these outcomes are also affected by sex differences in disease risk. Horvath et al. discussed inclusion of an additional parameter in the model, but were concerned about possible model misspecification and advocated omission of such sib pairs, as have later authors [24, 25].
More general pedigrees
The above methods can be generalized to allow integration of information from nuclear families including both parents and diseasediscordant siblings, those containing more than two sibs, and in more general pedigrees. The general idea is to combine contributions to a scoretype statistic, U, for all trios and discordant sib pairs first within each pedigree, and then over all pedigrees. The variance of the test statistic is then estimated using an estimator robust to associations between the contributions within pedigrees  for example, in the 'pedigree disequilibrium test' [26], by the sum of the squared contributions of each pedigree.
A practical problem with such tests is incomplete data. Horvath et al. [22] discussed extension of the 'reconstruction combined' TDT [27] to the X chromosome. This test fills in missing genotypes when they can be inferred from other family members, correcting for the biases that occur as a result. Ding et al. [24] extended the pedigree disequilibrium test to the X chromosome, using a Monte Carlo approach to deal with missing genotypes. This approach to missing data used estimates of allele frequencies without allowing for their uncertainty, a deficiency later corrected in the approach of Chung et al. [25]. Finally, Schneiter et al. [28] described an extension to X loci, of the Rabinowitz and Laird approach to familybased association testing in the presence of incomplete data [29]. None of these methods employed dosage compensation for X inactivation.
Quantitative traits
Analysis of populationbased studies of quantitative traits may be carried out by conventional regression methods, with or without dosage compensation. This follows similar lines to the analysis of casecontrol studies by logistic regression methods as discussed in the last section. Alternatively, Clayton's test discussed above, with a change of notation, can be applied to quantitative phenotypes [14]. However, analysis of familybased studies is more challenging, requiring achievement of two aims:
â€¢ partition of the information for association into betweenfamily and withinfamily components, where only the latter is robust to confounding by hidden population structure [30], and
â€¢ allowance for correlations within family.
Zhang et al. proposed such a method, dealing with the problem of correlation by using a mixed model [31]. This method also allows analysis of effects of twolocus haplotypes, using the ExpectationMaximization (EM) algorithm to reconstruct missing parental genotypes and to impute haplotype phase in females. Zhang et al. also discussed implementation of their method both with and without dosage compensation, since not all loci on the X chromosome are subject to X inactivation. They suggested using both tests in a sequential procedure, with the dosagecompensated test used first. Since most loci are subject to X inactivation, they suggested choosing an Î± level for the dosagecompensated test four times that for the noncompensated test.
The Y chromosome
The Y chromosome presents none of the problems discussed above. Loci in the pseudoautosomal region can be treated as autosomal, while marker loci in the nonhomologous region only occur in males, who each carry just one copy inherited from their father. A similar situation applies for mitochondrial loci, in which there is a single copy inherited from the mother.
Analysis of single loci on the Y chromosome is straightforward. However, since crossover recombination cannot occur, linkage disequilibrium extends across the entire chromosome and association data provide no information about the location of the causal variant or variants. Perhaps partly because of this, and partly because there are no problems of haplotype phase uncertainty on the Y chromosome, most studies of disease associations with the Y chromosome do not stop at single marker analyses, but go on to perform cladistic analyses of haplotype risk. Since such analyses can be carried out for markers on other chromosomes, albeit with some extra difficulty due to phase uncertainty, they will not be discussed further here.
Studies of caseparent trios are uninformative about associations on the Y chromosome.
Conclusion
In conclusion, it has been demonstrated that testing for genetic associations with loci on the X chromosome is not as straightforward as was often imagined. Although in populationbased studies involving single sexes there is no difficulty in applying standard methods, in mixedsex studies difficulties are encountered, both in finding a way to combine evidence from the two sexes without necessarily assuming that sex is a confounder (that is, that allele frequencies differ between sexes, an assumption that can lead to substantial loss of power), and in appropriately weighting the evidence taking account of the likely effect of X inactivation.
For familybased association studies, design and analysis can also be modified when the interest is in the X chromosome. For affected sons, it is necessary only to genotype the mother in order to obtain all relevant transmission data and, as for populationbased studies, the evidence from affected sons and daughters should be differentially weighted. Taking account of both of these considerations, it follows that motherson pairs would be expected to be more informative than daughterparent trios. In studies of diseasediscordant sib pairs, mixedsex pairs are particularly problematic and are best avoided.
Abbreviations
 df:

degree of freedom
 HWE:

HardyWeinberg equilibrium
 TDT:

transmission/disequilibrium test.
References
 1.
Wellcome Trust Case Control Consortium: Genomewide association study of 14,000 cases of seven common diseases and 3,000 shared controls. Nature. 2007, 447: 661678. 10.1038/nature05911.
 2.
Sasieni P: From genotypes to genes: doubling the sample size. Biometrics. 1997, 53: 12531261. 10.2307/2533494.
 3.
Cochran W: Some methods of strengthening the common c^{2} test. Biometrics. 1954, 10: 417451. 10.2307/3001616.
 4.
Armitage P: Test for linear trend in proportions and frequencies. Biometrics. 1955, 11: 375386. 10.2307/3001775.
 5.
Agresti A: Categorical Data Analysis. 1990, New York: Wiley
 6.
Cooper J, Smyth D, Smiles A, Plagnol V, Walker N, Allen J, Downes K, Barrett J, Healy B, Mychaleckyj J, Warram J, Todd J: Metaanalysis of genomewide association study data identifies additional type 1 diabetes loci. Nat Genet. 2008, 40: 13991401. 10.1038/ng.249.
 7.
Mantel N, Haenszel W: Statistical aspects of the analysis of data from retrospective studies of disease. J Natl Cancer Inst. 1959, 22: 719748.
 8.
Mantel N: Chisquare tests with one degree of freedom: extension of the MantelHaenszel procedure. J Am Stat Assoc. 1963, 58: 690700. 10.2307/2282717.
 9.
Birch M: The detection of partial association II: The general case. J R Stat Soc Series B Stat Methodol. 1965, 27: 111124.
 10.
Breslow N, Day N: Statistical Methods in Cancer Research. The Analysis of CaseControl Studies. 1980, Lyon: IARC Scientific Publications, I:
 11.
Zheng G, Joo J, Zhang C, Geller NL: Testing association for markers on the X chromosome. Genet Epidemiol. 2007, 31: 834843. 10.1002/gepi.20244.
 12.
Chow J, Yen Z, Ziesche S, Brown C: Silencing of the mammalian X chromosome. Annu Rev Genomics Hum Genet. 2005, 6: 6992. 10.1146/annurev.genom.6.080604.162350.
 13.
AmosLandgraf JM, Cottle A, Plenge RM, Friez M, Schwartz CE, Longshore J, Willard HF: X chromosomeinactivation patterns in 1,005 phenotypically unaffected females. Am J Hum Genet. 2006, 79: 493499. 10.1086/507565.
 14.
Clayton D: Testing for association on the X chromosome. Biostatistics. 2008, 9: 593600. 10.1093/biostatistics/kxn007.
 15.
Huber P: The behaviour of maximum likelihood estimates under nonstandard conditions. Proceedings of the Fifth Berkely Symposium in Mathematical Statistics and Probability. Edited by: Cam LML, Neyman J. 1967, Berkely, CA: University of California Press, 221233.
 16.
White H: A heteroskedasticityconsistent covariance matrix estimate and a direct test for heteroskedasticity. Econometrika. 1980, 48: 817830. 10.2307/1912934.
 17.
Prentice R, Pyke R: Logistic disease incidence models and casecontrol studies. Biometrika. 1979, 66: 403411. 10.1093/biomet/66.3.403.
 18.
Spielman R, McGinnis R, Ewens W: Transmission test for linkage disequilibrium: The insulin gene region and insulindependent diabetes mellitus. Am J Hum Genet. 1993, 52: 506516.
 19.
Self S, Longton G, Kopecky K, Liang K: On estimating HLAdisease association with application to a study of aplastic anemia. Biometrics. 1991, 47: 5361. 10.2307/2532495.
 20.
Schaid D, Sommer S: Genotype relative risks: methods for design and analysis of candidategene association studies. Am J Hum Genet. 1993, 53: 11141126.
 21.
Schaid DJ: General score tests for associations of genetic markers with disease using cases and their parents. Genet Epidemiol. 1996, 13: 423449. 10.1002/(SICI)10982272(1996)13:5<423::AIDGEPI1>3.0.CO;23.
 22.
Horvath S, Laird NM, Knapp M: The transmission/disequilibrium test and parentalgenotype reconstruction for Xchromosomal markers. Am J Hum Genet. 2000, 66: 11611167. 10.1086/302823.
 23.
Barrett J, Clayton D, Concannon P, Akolkar B, Cooper J, Erlich H, Julier C, Morahan G, Nerup J, Nierras C, Plagnol V, Pociot F, Schuilenberg H, Smyth D, Stevens H, Todd J, Walker N, Rich S, the Type 1 Diabetes Genetics Consortium: A genomewide association study and metaanalysis indicate that over 40 loci affect risk of type 1 diabetes. Nat Genet.
 24.
Ding J, Lin S, Liu Y: Monte Carlo pedigree disequilibrium test for markers on the X chromosome. Am J Hum Genet. 2006, 79: 567573. 10.1086/507609.
 25.
Chung RH, Morris RW, Li Zhang YJL, Martin ER: XAPL: an improved familybased test of association in the presence of linkage for the X chromosome. Am J Hum Genet. 2007, 80: 5968. 10.1086/510630.
 26.
Martin E, Monks S, Warren L, Kaplan N: A test for linkage and association in general pedigrees: the pedigree disequilibrium test. Am J Hum Genet. 2000, 67: 146154. 10.1086/302957.
 27.
Knapp M: A note on power approximations for the transmission/disequilibrium test. Am J Hum Genet. 1999, 64: 861870. 10.1086/302285.
 28.
Schneiter K, Degnan JH, Corcoran C, Xu X, Laird N: EFBAT: exact familybased association tests. BMC Genet. 2007, 8: 8610.1186/14712156886.
 29.
Rabinowitz D, Laird N: A unified approach to adjusting association tests for population admixture with arbitrary pedigree structure and arbitrary patterns of missing marker information. Hum Hered. 2000, 50: 211223. 10.1159/000022918.
 30.
Abecasis G, Cardon L, Cookson W: A general test of association for quantitative traits in nuclear families. Am J Hum Genet. 2000, 66: 279292. 10.1086/302698.
 31.
Zhang L, Martin ER, Morris RW, Li YJ: Association test for XLinked QTL in familybased designs. Am J Hum Genet. 2009, 84: 431444. 10.1016/j.ajhg.2009.02.010.
Acknowledgements
Thanks are due to the referees for helpful comments on an earlier draft and for pointing out much literature that I was not aware of. The author is supported by a Wellcome Trust Principal Research Fellowship.
Author information
Additional information
Competing interests
The author declares that they have no competing interests.
Rights and permissions
About this article
Cite this article
Clayton, D.G. Sex chromosomes and genetic association studies. Genome Med 1, 110 (2009). https://doi.org/10.1186/gm110
Published:
Keywords
 Dosage Compensation
 Autosomal Locus
 Female Genotype
 Robust Variance Estimate
 Pedigree Disequilibrium Test