Objectives: Genome-wide genotypes of large samples of unrelated individuals can be used to estimate the proportion of phenotypic variance captured by SNPs. For samples of cases and controls the estimated proportion is most interpretable if expressed on the underlying liability scale of the population even though the variances are estimated on the 0/1 disease scale of a sample with extreme ascertainment of cases compared to a population sample. Previously we derived formulae to transform the estimated sample variance on the binary scale to the population variance on the liability scale when variances were estimated using maximum likelihood. However, the extreme ascertainment imposes challenges in estimation of the variance components especially since it is possible for variance explained by SNPs to exceed the phenotypic variance.
Methods: We estimated variance components from simulated case/control data using Least Squares Estimation (LSE) and compared results to estimation using Restricted Maximal Likelihood (REML). For LSE we used Haseman-Elston regression, by regressing disease concordance or discordance of pairs of individuals against a genome-wide estimate of their relatedness from SNP data.
Results and Conclusions: Across all simulation scenarios the LSE of the Haseman-Elston regression gave unbiased estimates of the additive variance components even when the variance explained by SNPs exceeded the phenotypic variance. In contrast, REML underestimated the variance explained by SNPs in some scenarios dependent on factors such as sample size, the number of effective causal variants, the prevalence of the disease, and heritability. For WTCCC sets the estimates of variance components from the both methods were similar.