class: title-slide center middle inverse <br> # Improved ancestry and admixture detection<br>using principal component analysis<br>of genetic data <br> ## Florian PrivĂ© ### Aarhus University, Denmark #### <svg xmlns="http://www.w3.org/2000/svg" viewBox="0 0 576 512" width="1em" height="1em"><path d="M407.8 294.7c-3.3-.4-6.7-.8-10-1.3c3.4 .4 6.7 .9 10 1.3zM288 227.1C261.9 176.4 190.9 81.9 124.9 35.3C61.6-9.4 37.5-1.7 21.6 5.5C3.3 13.8 0 41.9 0 58.4S9.1 194 15 213.9c19.5 65.7 89.1 87.9 153.2 80.7c3.3-.5 6.6-.9 10-1.4c-3.3 .5-6.6 1-10 1.4C74.3 308.6-9.1 342.8 100.3 464.5C220.6 589.1 265.1 437.8 288 361.1c22.9 76.7 49.2 222.5 185.6 103.4c102.4-103.4 28.1-156-65.8-169.9c-3.3-.4-6.7-.8-10-1.3c3.4 .4 6.7 .9 10 1.3c64.1 7.1 133.6-15.1 153.2-80.7C566.9 194 576 75 576 58.4s-3.3-44.7-21.6-52.9c-15.8-7.1-40-14.9-103.2 29.8C385.1 81.9 314.1 176.4 288 227.1z" fill="white"/></svg> <svg viewBox="0 0 496 512" style="height:1em;position:relative;display:inline-block;top:.1em;fill:white;" xmlns="http://www.w3.org/2000/svg"> <path d="M165.9 397.4c0 2-2.3 3.6-5.2 3.6-3.3.3-5.6-1.3-5.6-3.6 0-2 2.3-3.6 5.2-3.6 3-.3 5.6 1.3 5.6 3.6zm-31.1-4.5c-.7 2 1.3 4.3 4.3 4.9 2.6 1 5.6 0 6.2-2s-1.3-4.3-4.3-5.2c-2.6-.7-5.5.3-6.2 2.3zm44.2-1.7c-2.9.7-4.9 2.6-4.6 4.9.3 2 2.9 3.3 5.9 2.6 2.9-.7 4.9-2.6 4.6-4.6-.3-1.9-3-3.2-5.9-2.9zM244.8 8C106.1 8 0 113.3 0 252c0 110.9 69.8 205.8 169.5 239.2 12.8 2.3 17.3-5.6 17.3-12.1 0-6.2-.3-40.4-.3-61.4 0 0-70 15-84.7-29.8 0 0-11.4-29.1-27.8-36.6 0 0-22.9-15.7 1.6-15.4 0 0 24.9 2 38.6 25.8 21.9 38.6 58.6 27.5 72.9 20.9 2.3-16 8.8-27.1 16-33.7-55.9-6.2-112.3-14.3-112.3-110.5 0-27.5 7.6-41.3 23.6-58.9-2.6-6.5-11.1-33.3 2.6-67.9 20.9-6.5 69 27 69 27 20-5.6 41.5-8.5 62.8-8.5s42.8 2.9 62.8 8.5c0 0 48.1-33.6 69-27 13.7 34.7 5.2 61.4 2.6 67.9 16 17.7 25.8 31.5 25.8 58.9 0 96.5-58.9 104.2-114.8 110.5 9.2 7.9 17 22.9 17 46.4 0 33.7-.3 75.4-.3 83.6 0 6.5 4.6 14.4 17.3 12.1C428.2 457.8 496 362.9 496 252 496 113.3 383.5 8 244.8 8zM97.2 352.9c-1.3 1-1 3.3.7 5.2 1.6 1.6 3.9 2.3 5.2 1 1.3-1 1-3.3-.7-5.2-1.6-1.6-3.9-2.3-5.2-1zm-10.8-8.1c-.7 1.3.3 2.9 2.3 3.9 1.6 1 3.6.7 4.3-.7.7-1.3-.3-2.9-2.3-3.9-2-.6-3.6-.3-4.3.7zm32.4 35.6c-1.6 1.3-1 4.3 1.3 6.2 2.3 2.3 5.2 2.6 6.5 1 1.3-1.3.7-4.3-1.3-6.2-2.2-2.3-5.2-2.6-6.5-1zm-11.4-14.7c-1.6 1-1.6 3.6 0 5.9 1.6 2.3 4.3 3.3 5.6 2.3 1.6-1.3 1.6-3.9 0-6.2-1.4-2.3-4-3.3-5.6-2z"></path></svg> privefl --- class: inverse, center, middle # Principal Component Analysis (PCA)<br>of genetic data<br>captures population structure --- ### PCA of genetic data captures continental population structure <br> <div class="figure" style="text-align: center"> <img src="figures/PCA-UKBB-Bycroft.png" alt="in the UK Biobank data" width="100%" /> <p class="caption">in the UK Biobank data</p> </div> --- ### PCA also captures sub-continental population structure <div class="figure" style="text-align: center"> <img src="figures/PCA-POPRES-EUR.png" alt="in the POPRES data (European subset)" width="75%" /> <p class="caption">in the POPRES data (European subset)</p> </div> --- ### Distance in PCA measures genetic distance <div class="figure" style="text-align: center"> <img src="figures/compare-Euclidean-to-Fst.png" alt="in the 1000 Genomes data" width="100%" /> <p class="caption">in the 1000 Genomes data</p> </div> .footnote[Source: 10.1016/j.ajhg.2021.11.008] --- class: inverse, center, middle # —Previous work— # Genetic Ancestry Deconvolution ## (with reference populations) <br> ## All individuals are genetically admixed<br>from L reference populations (known) --- ### My motivation <br> - Get a sense of the **ancestry composition of a GWAS dataset** (including when only GWAS summary statistics are available to me) - Use this information to e.g. design an LD reference for follow-up analyses (polygenic scores, fine-mapping, etc) --- ### Summix: ancestry estimation from GWAS allele frequencies (AF) Source: 10.1016/j.ajhg.2021.05.016 (not by me) <br> Estimate ancestry proportions `\(q\)` such that <style type="text/css"> .small2 { font-size: 17px; } .small3 { font-size: 18px; } .footnote2 { position: absolute; bottom: 1.6em; padding-right: 4em; font-size: 60%; } </style> <span class="small2"> `$$AF = q_\text{AFR} AF_\text{AFR} + q_\text{EAS} AF_\text{EAS} + q_\text{EUR} AF_\text{EUR} + q_\text{SAS} AF_\text{SAS} + q_\text{IAM} AF_\text{IAM} + \epsilon ~,$$` </span> where all `\(q\)` are positive and sum to 1. -- <br> More general formulation: <span class="small2"> `$$\min_{\forall l,~q_l \ge 0 \\ \sum_l {q_l}=1} ~~ \sum_{j=1}^M \left( AF_j - \sum_{l=1}^L q_l AF_j^{\text{ref}~(l)} \right)^2$$` </span> <!-- = \min_{q \ge 0 \\ \mathbb{1}^Tq=1} ~~ ||AF - AF^{\text{ref}} \cdot q||_2^2 --> <br> `\(\Rightarrow\)` Quadratic programming with linear constraints --- <img src="figures/paper9-2.png" width="100%" style="display: block; margin: auto;" /> -- <br> - Curate the UK Biobank to define 18 worldwide reference groups -- - Use PCA to maximize power to distinguish between these populations: <span class="small2"> `$$\min_{\forall l,~q_l \ge 0 \\ \sum_l {q_l}=1} ~~ \sum_{j=1}^M \left( AF_j - \sum_{l=1}^L q_l AF_j^{\text{ref}~(l)} \right)^2$$` </span> is replaced by (AFs are projected to the PC space using PC loadings) <span class="small2"> `$$\min_{\forall l,~q_l \ge 0 \\ \sum_l {q_l}=1} ~~ \sum_{k=1}^K \left( PC_k - \sum_{l=1}^L q_l PC_k^{\text{ref}~(l)} \right)^2$$` </span> --- With my method (using the PC loadings and reference AFs I provide): <img src="figures/ancestry-bigsnpr.png" width="95%" style="display: block; margin: auto;" /> -- With Summix (using the reference AFs I provide): <img src="figures/ancestry-summix.png" width="95%" style="display: block; margin: auto;" /> --- ### Admixture coefficients for individual-level data <br> **My developed method also works for individual-level data!** (by simply using genotypes, divided by 2, in place of allele frequencies) This is similar to the projection analysis from ADMIXTURE, but should have more power.. -- <br> Application to iPSYCH (genetic study in Denmark): Out of 134K individuals, can identify many non-European individuals: - Middle East: 2600 - East Africa: 450 - North Africa: 330 - South Asia: 840 - East Asia: 280 --- class: inverse, center, middle # —Current work— # Genetic Ancestry Deconvolution ## (without reference populations) <br> ## All individuals are genetically admixed<br>from L reference populations (unknown) --- ### Admixture model and ADMIXTURE method `$$G \approx Q \cdot 2F$$` - `\(Q\)` are the admixture proportions (for each sample `\(i\)` and reference `\(l\)`) - `\(F\)` are the allele frequencies (for each each reference `\(l\)` and variant `\(j\)`) -- *** ADMIXTURE uses Maximum Likelihood Estimation of <span class="small3"> `$$L(Q, F) = \sum_i \sum_j \left\{ G_{i,j} \log\left[\sum_l Q_{i,l} F_{l,j}\right] + (2 - G_{i,j}) \log\left[1 - \sum_l Q_{i,l} F_{l,j}\right] \right\}$$` </span> with constraints: `\(~0 \le F_{l,j} \le 1~\)` and `\(~Q_{i,l} \ge 0~\)` and `\(~\sum_l Q_{i,l} = 1\)` -- *** For simplicity, ADMIXTURE iteratively estimates - each `\(Q_{i,.}\)` independently, with `\(F\)` fixed - each `\(F_{.,j}\)` independently, with `\(Q\)` fixed --- ### My proposed deconvolution method `$$G \cdot V \approx Q \cdot 2F \cdot V$$` where `\(V\)` are the PC loadings of `\(G\)` `$$\Rightarrow~ PC \approx Q \cdot PC^\text{ref}$$` -- <img src="figures/PC_ukbb_with3.png" width="85%" style="display: block; margin: auto;" /> --- #### Estimating admixture coefficients `\(Q_{i,.}\)` with `\(PC^\text{ref}\)` fixed <span class="small3"> `$$\min_{\forall l,~Q_{i,l} \ge 0 \\ \sum_l Q_{i,l}=1} ~~ \sum_{k=1}^K \left( PC_{i,k} - \sum_{l=1}^L Q_{i,l} PC_{l,k}^{\text{ref}} \right)^2$$` </span> -- <br> #### Estimating admixture coefficients `\(PC_{l,.}^{\text{ref}}\)` with `\(Q\)` fixed <span class="small3"> `$$PC_{l,.}^{\text{ref}} = \dfrac{\sum_i {Q_{i,l}}^m \cdot PC_{i,.}}{\sum_i {Q_{i,l}}^m}$$` </span> <br> - this simple formula is used in e.g. fuzzy K-means - this is also related to archetypal analysis: `\(PC^{\text{ref}} = W^T \cdot PC\)` (references are weighted combinations of existing samples) `\(\Rightarrow\)` Reference allele frequencies: `\(2F = W^T \cdot G\)` --- ### Complete deconvolution algorithm <br> Iterate between - estimating admixture coefficients `\(Q_{i,.}\)`, with `\(PC^{\text{ref}}\)` fixed - estimating reference positions `\(PC_{l,.}^{\text{ref}}\)`, with `\(Q\)` fixed -- <br> But a starting point is needed.. A naive approach would pick L initial `\(PC_{l,.}^{\text{ref}}\)` at random - non-deterministic - starting point far from the optimal solution (slow convergence) - easy to get trapped in a local optimum Instead, I use an **iterative procedure with warm starts** to make the algorithm **deterministic** and much **faster to converge** --- ### Start with PC1 and 2 refs, then add 3rd ref when considering 2 PCs <img src="figures/PC_ukbb_add3.png" width="90%" style="display: block; margin: auto;" /> --- ### Add 4th reference when considering 3 PCs <br> <img src="figures/PC_ukbb_add4.png" width="100%" style="display: block; margin: auto;" /> --- ### Add 5th reference when considering 4 PCs <br> <img src="figures/PC_ukbb_add5.png" width="95%" style="display: block; margin: auto;" /> --- class: inverse, center, middle # Comparing to ADMIXTURE # in simulated data --- ### A complex admixture simulation (using R pkgs {bnpsd} and {ape}) <img src="figures/simu-admixture.png" width="90%" style="display: block; margin: auto;" /> --- ### PCA of the simulated genetic data (500 x 10,000) <img src="figures/simu-admixture-PCA.png" width="95%" style="display: block; margin: auto;" /> --- ### Reference PC positions (`\(PC^{\text{ref}}\)` vs `\(2F \cdot V\)`) are very similar <div class="figure" style="text-align: center"> <img src="figures/simu-admixture-ref.png" alt="A: with my method &#8212; B: ADMIXTURE's ref AFs projected" width="95%" /> <p class="caption">A: with my method — B: ADMIXTURE's ref AFs projected</p> </div> --- ### Admixture coefficients `\(Q\)` are very similar <img src="figures/simu-admixture-allres.png" width="100%" style="display: block; margin: auto;" /> --- ### Reference allele frequencies `\(F\)` are close to the simulated ones <br> <img src="figures/simu-admixture-AFref.png" width="100%" style="display: block; margin: auto;" /> --- class: inverse, center, middle # Comparing to ADMIXTURE # in the 1000 Genomes data --- ### Admixture coefficients `\(Q\)` using L=5 reference populations <div class="figure" style="text-align: center"> <img src="figures/1000G-Q-Q2-5.png" alt="A: with my method; B: with ADMIXTURE" width="100%" /> <p class="caption">A: with my method; B: with ADMIXTURE</p> </div> .footnote2[**ACB**: African Caribbean in Barbados; **ASW**: African Ancestry in Southwest US; **ESN**: Esan in Nigeria; **GWD**: Gambian in Western Division, The Gambia; **LWK**: Luhya in Webuye, Kenya; **MSL**: Mende in Sierra Leone; **YRI**: Yoruba in Ibadan, Nigeria; **CLM**: Colombian in Medellin, Colombia; **MXL**: Mexican Ancestry in Los Angeles, California; **PEL**: Peruvian in Lima, Peru; **PUR**: Puerto Rican in Puerto Rico; **CDX**: Chinese Dai in Xishuangbanna, China; **CHB**: Han Chinese in Bejing, China; **CHS**: Southern Han Chinese, China; **JPT**: Japanese in Tokyo, Japan; **KHV**: Kinh in Ho Chi Minh City, Vietnam; **CEU**: Utah residents with Northern and Western European ancestry; **FIN**: Finnish in Finland; **GBR**: British in England and Scotland; **IBS**: Iberian populations in Spain; **TSI**: Toscani in Italy; **BEB**: Bengali in Bangladesh; **GIH**: Gujarati Indian in Houston,TX; **ITU**: Indian Telugu in the UK; **PJL**: Punjabi in Lahore,Pakistan; **STU**: Sri Lankan Tamil in the UK] --- ### Projected reference allele frequencies (`\(2F \cdot V\)`) from ADMIXTURE <img src="figures/1000G-admixture-5refs.png" width="80%" style="display: block; margin: auto;" /> #### `\(\Rightarrow\)` highly similar to what I get with my method #### `\(\Rightarrow\)` supports using `\(K\)` PCs only for `\(L=K+1\)` reference populations --- ### Admixture coefficients `\(Q\)` using L=9 reference populations <div class="figure" style="text-align: center"> <img src="figures/1000G-Q-Q2-9.png" alt="A: with my method; B: with ADMIXTURE" width="100%" /> <p class="caption">A: with my method; B: with ADMIXTURE</p> </div> .footnote2[**ACB**: African Caribbean in Barbados; **ASW**: African Ancestry in Southwest US; **ESN**: Esan in Nigeria; **GWD**: Gambian in Western Division, The Gambia; **LWK**: Luhya in Webuye, Kenya; **MSL**: Mende in Sierra Leone; **YRI**: Yoruba in Ibadan, Nigeria; **CLM**: Colombian in Medellin, Colombia; **MXL**: Mexican Ancestry in Los Angeles, California; **PEL**: Peruvian in Lima, Peru; **PUR**: Puerto Rican in Puerto Rico; **CDX**: Chinese Dai in Xishuangbanna, China; **CHB**: Han Chinese in Bejing, China; **CHS**: Southern Han Chinese, China; **JPT**: Japanese in Tokyo, Japan; **KHV**: Kinh in Ho Chi Minh City, Vietnam; **CEU**: Utah residents with Northern and Western European ancestry; **FIN**: Finnish in Finland; **GBR**: British in England and Scotland; **IBS**: Iberian populations in Spain; **TSI**: Toscani in Italy; **BEB**: Bengali in Bangladesh; **GIH**: Gujarati Indian in Houston,TX; **ITU**: Indian Telugu in the UK; **PJL**: Punjabi in Lahore,Pakistan; **STU**: Sri Lankan Tamil in the UK] --- ### Projected reference allele frequencies (`\(2F \cdot V\)`) from ADMIXTURE <img src="figures/1000G-admixture-9refs.png" width="96%" style="display: block; margin: auto;" /> --- ### Admixture coefficients `\(Q\)` using L=18 reference populations <div class="figure" style="text-align: center"> <img src="figures/1000G-Q-Q2-18.png" alt="A: with my method; B: with ADMIXTURE" width="100%" /> <p class="caption">A: with my method; B: with ADMIXTURE</p> </div> .footnote2[**ACB**: African Caribbean in Barbados; **ASW**: African Ancestry in Southwest US; **ESN**: Esan in Nigeria; **GWD**: Gambian in Western Division, The Gambia; **LWK**: Luhya in Webuye, Kenya; **MSL**: Mende in Sierra Leone; **YRI**: Yoruba in Ibadan, Nigeria; **CLM**: Colombian in Medellin, Colombia; **MXL**: Mexican Ancestry in Los Angeles, California; **PEL**: Peruvian in Lima, Peru; **PUR**: Puerto Rican in Puerto Rico; **CDX**: Chinese Dai in Xishuangbanna, China; **CHB**: Han Chinese in Bejing, China; **CHS**: Southern Han Chinese, China; **JPT**: Japanese in Tokyo, Japan; **KHV**: Kinh in Ho Chi Minh City, Vietnam; **CEU**: Utah residents with Northern and Western European ancestry; **FIN**: Finnish in Finland; **GBR**: British in England and Scotland; **IBS**: Iberian populations in Spain; **TSI**: Toscani in Italy; **BEB**: Bengali in Bangladesh; **GIH**: Gujarati Indian in Houston,TX; **ITU**: Indian Telugu in the UK; **PJL**: Punjabi in Lahore,Pakistan; **STU**: Sri Lankan Tamil in the UK] --- ### Runtimes <br> - for ADMIXTURE (with 15 cores), it takes - 1 hour for L=5 - 4 hours for L=9 - 13 hours for L=18 <br> - for my method (with 1 core), it takes - 2 minutes to get all solutions for L=3 to L=18 --- class: inverse, center, middle # In the UK Biobank data --- ### After convergence with 17 references and 16 PCs from the UK Biobank <br> <img src="figures/PC_ukbb_with17.png" width="100%" style="display: block; margin: auto;" /> --- ### Country (of birth) counts with ancestry > 0.6 for each reference <ul class="small2"> <li>United Kingdom: 126045 – NA: 1352 – Germany: 915 – South Africa: 477 – Netherlands: 443 – USA: 400 – France: 300 – Australia: 226 – Denmark: 197 – Canada: 195 – ...</li> <li>United Kingdom: 22206 – NA: 86 – Germany: 16 – Ireland: 14</li> <li>United Kingdom: 32123 – Ireland: 289 – NA: 265 – New Zealand: 75 – Canada: 58 – India: 54 – Germany: 47 – South Africa: 43 – Australia: 40 – Kenya: 36 – Malaysia: 29 – ...</li> <li>United Kingdom: 10647 – Ireland: 9360 – NA: 290 – USA: 47 – Australia: 43 – ...</li> <li>United Kingdom: 1347 – NA: 30</li> <li>United Kingdom: 4080 – NA: 77</li> <li>NA: 752 – Poland: 599 – United Kingdom: 415 – Russia: 131 – Finland: 105 – Germany: 87 – Lithuania: 71 – Ukraine: 55 – Czech Republic: 53 – Latvia: 52 – Slovakia: 28 – ...</li> <li>India: 1852 – Kenya: 782 – Sri Lanka: 653 – NA: 547 – Pakistan: 410 – Mauritius: 273 – Bangladesh: 235 – Uganda: 231 – Tanzania: 175 – Caribbean: 114 – The Guianas: 83 – ...</li> <li>Caribbean: 2110 – NA: 2100 – Nigeria: 1017 – Ghana: 866 – Barbados: 255 – Sierra Leone: 202 – The Guianas: 151 – Gambia: 39 – Ivory Coast: 32 – ...</li> <li>Italy: 389 – NA: 353 – Cyprus: 170 – United Kingdom: 168 – Egypt: 147 – Malta: 116 – Greece: 99 – Algeria: 68 – Lebanon: 50 – Morocco: 46 – Libya: 40 – Palestine: 30 – ...</li> <li>United Kingdom: 1844 – NA: 830 – USA: 169 – South Africa: 95 – Israel: 41 – ...</li> <li>Iran: 476 – Iraq: 140 – NA: 59 – Turkey: 54 – India: 36 – Afghanistan: 13 – Pakistan: 10</li> <li>China: 287 – Japan: 241 – Malaysia: 185 – Hong Kong: 161 – Nepal: 123 – NA: 63 – Singapore: 56 – South Korea: 26 – Mauritius: 25 – Taiwan: 25 – Indonesia: 15 – ...</li> <li>Zimbabwe: 268 – Congo: 133 – Uganda: 115 – Kenya: 73 – South Africa: 59 – Zambia: 56 – NA: 41 – Tanzania: 26 – Angola: 23 – Burundi: 17 – Rwanda: 16 – Seychelles: 14 – ...</li> <li>Philippines: 315 – Malaysia: 20 – NA: 17 – Indonesia: 15 – Thailand: 13</li> <li>Peru: 33 – Ecuador: 25 – Mexico: 20 – Colombia: 17 – Bolivia: 14 – Chile: 11</li> <li>Somalia: 81 – Ethiopia: 58 – Sudan: 51 – Eritrea: 45 – NA: 20</li> </ul> --- class: inverse, center, middle # Capturing more population structure # (with less individuals) --- <img src="figures/paper4-2.png" width="95%" style="display: block; margin: auto auto auto 0;" /> -- *** In the UK Biobank data, - only the first 16 PCs actually capture population structure (PC 19–40 capture LD only; never use them!) -- When subsampling British and Irish individuals (self-reported ancestry) - can obtain 40 PCs that capture some population structure - using the best practices for PCA of genetic data -- <br> In my current work: - I've also looked at using `\(Q\)` to do the subsampling - I've run my deconvolution algorithm on K=41 PCs to get L=42 references --- ### More PCs capturing population structure when subsampling <img src="figures/PC_ukbb_new.png" width="100%" style="display: block; margin: auto;" /> --- ### Rerun the algorithm with new PCs (K=41, L=42) <style> .grid-container { display: grid; grid-template-columns: 1fr 1fr; margin: auto; font-size: 14px; } </style> <div class="grid-container"> <ul> <li>United Kingdom: 414629 – Ireland: 12416 – NA: 4731 – Germany: 1498 – South Africa: 970 – USA: 956 – Australia: 853 – New Zealand: 656 – Canada: 644 – ...</li> <li>NA: 709 – Poland: 592 – United Kingdom: 389 – Russia: 123 – Germany: 71 – Lithuania: 71 – Ukraine: 53 – Latvia: 52– ..</li> <li>Italy: 34</li> <li>Spain: 30</li> <li>United Kingdom: 1838 – NA: 831 – USA: 170 – South Africa: 95 – Israel: 40 – Canada: 18 – Hungary: 18 – France: 12</li> <li>Finland: 125</li> <li>Nigeria: 975 – NA: 292 – Caribbean: 155 – Sierra Leone: 42 – Ghana: 13</li> <li>Sri Lanka: 635 – India: 493 – Mauritius: 190 – NA: 156 – Kenya: 90 – Caribbean: 71 – Malaysia: 67 – The Guianas: 52 – ...</li> <li>Malta: 114 – United Kingdom: 15 – NA: 12 – Egypt: 10</li> <li>Iran: 494 – Iraq: 247 – Turkey: 114 – NA: 58 – Syria: 11 – United Kingdom: 10</li> <li>Ghana: 817 – NA: 68 – Ivory Coast: 27</li> <li>India: 571 – NA: 207 – Kenya: 40 – Pakistan: 28 – Malaysia: 23 – Singapore: 13</li> <li>India: 28</li> <li>Yemen: 26 – Egypt: 18 – NA: 12</li> <li>Congo: 129 – Angola: 30 – Zambia: 30 – NA: 25 – Cameroon: 24</li> <li>India: 224 – Kenya: 179 – NA: 55 – Uganda: 28 – Pakistan: 21 – Tanzania: 21</li> </ul> <ul> <li>Japan: 241 – South Korea: 26</li> <li>Thailand: 61 – Vietnam: 40 – Malaysia: 10</li> <li>Algeria: 69 – Morocco: 66 – Libya: 27 – NA: 10</li> <li>Kenya: 18 – India: 13</li> <li>Philippines: 310 – NA: 16</li> <li>Pakistan: 76 – NA: 20</li> <li>Kenya: 36 – India: 25</li> <li>India: 70 – Afghanistan: 25 – NA: 19</li> <li>India: 17 – NA: 11 – Malawi: 10</li> <li>Colombia: 115</li> <li>Sierra Leone: 38 – Gambia: 33</li> <li>India: 90 – NA: 32</li> <li>Tanzania: 24</li> <li>Pakistan: 146 – NA: 42 – India: 22 – Kenya: 18</li> <li>India: 135 – Kenya: 120 – Uganda: 80 – NA: 37 – Tanzania: 24</li> <li>Nepal: 125 – NA: 14</li> <li>Peru: 31 – Ecuador: 20 – Bolivia: 14 – Mexico: 13</li> <li>Uganda: 69 – Tanzania: 43 – Kenya: 40 – India: 24</li> <li>Kenya: 42 – India: 39 – NA: 16 – Tanzania: 14</li> <li>Uganda: 101 – Kenya: 28 – Tanzania: 11</li> <li>Kenya: 114</li> <li>India: 43 – Kenya: 38 – NA: 19</li> <li>South Africa: 48 – Zimbabwe: 25</li> <li>Sudan: 17</li> <li></li> <li>Somalia: 78</li> </ul> </div> --- ### Conclusion - A very efficient admixture deconvolution algorithm -- - Also very powerful; it can identify many reference groups (subsampling before PCA is beneficial to capture more structure) - I expected more power than ADMIXTURE for e.g. 1000 Genomes data -- - One can (should) check the results visually (also, L=K+1) -- - The algorithm is not specific to genetic data (merely a deconvolution algorithm based on PCA) `\(\Rightarrow\)` may be used for cell type deconvolution of methylation data -- - I will provide a new set of reference populations for people to use directly -- <br> .center[ ### Thank you for your attention Presentation available at [bit.ly/privefl220525](https://bit.ly/privefl220525) ]