Analysis of admixed Greenlandic siblings shows that the mean genotypic values for metabolic phenotypes differ between Inuit and Europeans

Background Disease prevalence and mean phenotype values differ between many populations, including Inuit and Europeans. Whether these differences are partly explained by genetic differences or solely due to differences in environmental exposures is still unknown, because estimates of the genetic contribution to these means, which we will here refer to as mean genotypic values, are easily confounded, and because studies across genetically diverse populations are lacking. Methods Leveraging the unique genetic properties of the small, admixed and historically isolated Greenlandic population, we estimated the differences in mean genotypic value between Inuit and European genetic ancestry using an admixed sibling design. Analyses were performed across 26 metabolic phenotypes, in 1474 admixed sibling pairs present in a cohort of 5996 Greenlanders. Results After FDR correction for multiple testing, we found significantly lower mean genotypic values in Inuit genetic ancestry compared to European genetic ancestry for body weight (effect size per percentage of Inuit genetic ancestry (se), −0.51 (0.16) kg/%), body mass index (−0.20 (0.06) kg/m2/%), fat percentage (−0.38 (0.13) %/%), waist circumference (−0.42 (0.16) cm/%), hip circumference (−0.38 (0.11) cm/%) and fasting serum insulin levels (−1.07 (0.51) pmol/l/%). The direction of the effects was consistent with the observed mean phenotype differences between Inuit and European genetic ancestry. No difference in mean genotypic value was observed for height, markers of glucose homeostasis, or circulating lipid levels. Conclusions We show that mean genotypic values for some metabolic phenotypes differ between two human populations using a method not easily confounded by possible differences in environmental exposures. Our study illustrates the importance of performing genetic studies in diverse populations. Supplementary Information The online version contains supplementary material available at 10.1186/s13073-024-01326-3.


Figure S1
. Illustration of the issue with extrapolating outside of the data range for weight.The difference of European ancestry proportions between siblings varies from -0.133 to 0.105.The blue lines are the sliding means of difference of weight along difference of ancestry proportion in a step of 50.The red line represents the linear regression based on the data.The horizontal green line is between 0 and 1.The vertical green line represents the overestimated difference in genotypic values between European and Inuit ancestries due to missing data between 0.105 and 1 for x axis.Simulations of participation bias.We define siblings at risk of not participating if their mother has a high Inuit ancestry proportion above 0.8 and a low phenotype value.Siblings at risk have a 20% probability of not participating in the study.Both mothers and their offspring were then simulated to have phenotype values following ( () ∆,  -) . -= 1 in panel A,  -= 1/10 in panel B. We have estimated the ancestry proportion for both parents, however, we don't know which one is the mother.We chose a random parent as the mother.We defined the threshold of having a low phenotype based on the quantiles of the simulated phenotypes of the mothers and performed simulations for different quantiles.We performed 10,000 simulations for each threshold and estimated ∆G using model 1b and calculated the mean phenotype value of all participating siblings as  3 .

Figure S2 .
Figure S2.Linear relationship between  and .In each row are three plots for one trait.From left to right are (1) raw data distribution of  along , (2) quantiletransformed  along , (3) residuals along fitted values from model 1b using data in (2).The blue lines are sliding means in a step of 50 binned by the variable in x axis.The red lines are locally weighted scatterplot smoothing curves to help identify patterns in the residuals.

Figure S3 .
Figure S3.Distribution of height according to birth year.Birth year of participants were obtained by sampling year subtracting age of participants recorded when sampled, along with height.The blue lines are the sliding means of difference of height along birth year in a step of 50.We used 2010 as an approximate sampling year for IHIT, 1999 for B99, and 2018 for B2018.A linear regression between height and birth year for all individuals shows an increase of 0.206 cm per year.Individuals are coloured by sex.

Figure S4 .
Figure S4.Estimated difference in mean genotypic values by model 2. Estimates of differences in mean genotypic value,  !"#$%$&' &, obtained from admixed full siblings with a more complex model that takes into account that some of the siblings in the dataset are from the same families (using model 2).Points show the estimates for transformed phenotypes using age and sex as covariates.Error bars are the 95% confidence intervals.Red colour means FDR adjusted p value < 0.05.

Figure S5 .
Figure S5.Estimation of ∆G and ∆Y with simulated ancestry-by-environment interaction upon each family's average ancestry proportion.Results from a simulation of ancestry-by-environment interaction where the environment is correlated with the family's average ancestry proportion and each sibling's phenotype value is as  () ~ ( () ∆ +  "*+  , /// ,  -).The simulations were based onthe sibling pairs from this study and their admixture proportions.The left plot shows the estimated difference in mean genotypic values using model 1b, and the right plot shows the estimated difference in mean phenotypic values using 1a.We performed 10,000 simulations for each value of  "*+ .

Figure S6 .
Figure S6.Estimation of ∆G and ∆Y with simulated ancestry-by-environment interaction upon each sibling's ancestry proportion.Results from a simulation of ancestry-by-environment interaction where the environment is correlated with the individual sibling's ancestry proportion and each sibling's phenotype value is as  () ~ ( () ∆ +  "*+  () ,  -).The simulations were based on the sibling pairs from this study and their admixture proportions.The left plot shows the estimated difference in mean genotypic values using model 1b, and the right plot shows the estimated difference in mean phenotypic values using 1a.We performed 10,000 simulations for each value of  "*+ .

Figure S7 .
Figure S7.Estimation of ∆G and ∆Y with simulated participation bias based on siblings' phenotypic values.Results from a simulation of participation bias where individuals with a "high" phenotype value have a high (20%) probability of not participating in the study.The left plot shows the mean phenotypic value across all participating individuals and the right plot shows the estimated difference in mean genotypic values between the populations.In both plots each box corresponds to results from a specific threshold for defining a "high" phenotype value based on the top quantiles of the data (ranging from 0 to 10%).If one individual does not participate then their sibling will also be removed.The simulations were based on the sibling pairs from this study and their admixture proportions.To keep the number of individuals the same regardless of degree of participation bias a new phenotype was simulated for non-participating siblings.

Figure S8 .
Figure S8.Estimation of ∆G and ∆Y with simulated participation bias based on mothers' phenotypic values.

Figure S9 .
Figure S9.Bias in the estimation caused by error in ancestry inference estimated using simulations.Each box shows the estimates of ∆ 4 measured in SD for a given absolute mean error size in ∂Q with the middle line showing the median and the box indicating the 25th and 75th quantiles.We used ∂Q of siblings in our study as the true ∂Q.The orange line shows the mean of the same estimates.The red dotted line shows the true difference in genotypic values that was simulated and the blue dotted line shows an estimate of the ∂Q present in the data analysed in this study based on F1 siblings.All the estimates were performed with model 1b.