.0 | Summary
Race and genetic ancestry are distinct concepts with distinct causes and consequences. Race is defined within a social context. Genetic ancestry is defined within the context of reference populations.
Race, either a historical “essentialist” views or a more contemporary “population” views, provides a poor model of true genetic variation. Genetic variation follows a “nested subset” model where the most variation is observed in Africa and subsets of that variation are observed in Europe, Asia, and the Americas. Model fitting of race to genetic data provides a much worse fit than a nested subset model that is incompatible with race (Long, Li, and Healy 2009).
Comparing pairs of sequenced individuals, the largest number of differing sites were within Africa (consistent with a nested evolutionary model): a pair of Yoruba/Yoruba individuals have more differences than a pair of Yoruba/French individuals (Biddanda, Rice, and Novembre 2020). Most of the differences between pairs of individuals were because one individual carried a globally common allele and the other did not.
Estimates of FST between divergent geographic populations range between 0.11 and 0.15 (Bhatia et al. 2013), in other words, if we take the individual-level genetic variation at a typical polymorphism and condition out population labels, we will still be left with 85% of the original variance. This is in stark contrast to historic racial models that assumed racial groups were largely homozygous within populations and highly divergent between (i.e. FST close to 1.0).
A brief tour of methods for analyzing population structure in genetic data:
Principal Components Analysis (PCA), an eigendecomposition of the sample relatedness matrix, can identify individuals drawn from populations with differing allele frequencies. Theory indicates that PCA is extremely sensitive and is expected to identify structure in most large datasets (Patterson, Price, and Reich 2006). While PCA has useful genealogical properties, it is easily distorted by the sampling of individuals (McVean 2009) and can also produce arbitrary non-linearities in the presence of spatially local structure (Novembre and Stephens 2008).
Model-based clustering (STRUCTURE) attempts to identify individuals as mixtures of alleles from a fixed set of populations (Pritchard, Stephens, and Donnelly 2000). STRUCTURE is similarly distorted by the sampling of individuals, as well as the number of defined populations, including identifying admixtures that do not exist in truth or missing admixtures that do exist (Lawson, van Dorp, and Falush 2018).
Parametric models (Admixture Graphs) attempt to fit populations to trees or graphs based on tests of cross-population allele sharing. Admixture graphs can often identify incorrect graphs that provide a better fit to the data than the true graph, or many equally likely graphs. Reanalysis of published admixture graph studies demonstrated several instances where historical conclusions were drawn from data that was compatible with many different graphs (Maier et al. 2023).
All ancestry inference methods are biased by the sampling process and/or model parameters, and no method can identify “true” ancestry because the true sampling process is unknown.
A brief tour of genetic ancestry in large-scale datasets:
All large biobanks exhibit continuous population structure that is poorly explained by conventional racial groups (Wojcik et al. 2019).
When restricting to “homogenous” ancestry populations enriched for European origin, continuous PCs are observed that reflect ancestry from reference populations within Europe (Galinsky et al. 2016). When restricting to homogenous white individuals in a single European country (the UK), county level PCs are observed (Agrawal et al. 2020).
The same patterns arise in other countries: A large Chinese biobank identified PCs that correlated with within-city neighborhoods (Walters et al. 2023); a large Japanese biobank identified PCs that correlated with dozens of local regions (Sakaue et al. 2020).
A brief tour of human history from population genetics:
Diverse studies of modern and ancient DNA have demonstrated that historic admixtures and migrations were ubiquitous and highly dynamic. Genetic ancestry rarely reflects current geographic patterns and disputes simple models of isolated human development (Pickrell and Reich 2014).
Modern individuals from the Americas are generally more similar in their ancestry to European than to Native American reference individuals (Moreno-Estrada et al. 2013; Gravel et al. 2013). Native American reference individuals exhibit complex relationships to ancient Siberian genomes as well as modern Polynesian populations (Ioannidis et al. 2020), where the latter appear to have been settled directly by East Asian groups (Skoglund et al. 2016).
European individuals derive ancestry from historic populations that often no longer exist in un-admixed form (Lazaridis et al. 2014; Sikora et al. 2019) or were rapidly displaced (Olalde et al. 2018). In more recent history (the Bronze Age) both extensive migration and population structure have been observed across Europe (Antonio et al. 2024).
Admixture and migration is extensive in Africa in both modern (Fan et al. 2023) and ancient data (Skoglund et al. 2017; Lipson et al. 2020). Yet there are massive gaps in our understanding of African population history including competing theories of continuously mixing pan-African “metapopulations” (Scerri, Chikhi, and Thomas 2019; Ragsdale et al. 2023) versus recent “back to Africa” migrations (Cole et al. 2020).In short, human history has been highly dynamic, with extensive admixture, instances of rapid migration, geographic shifts, ancient introgression events, historic populations that no longer exist in unadmixed forms. Conventional models of race are irrelevant to the study of genetic variation, and even models of simple population relationships are proving to be fundamentally wrong.
Afficher uniquement les messages de l'auteur du topic