22_May_2025

<br>

# Improved ancestry and admixture detection<br>using principal component analysis<br>of genetic data

<br>

## Florian Privé

### Aarhus University, Denmark

#### <svg xmlns="http://www.w3.org/2000/svg" viewBox="0 0 576 512" width="1em" height="1em"><path d="M407.8 294.7c-3.3-.4-6.7-.8-10-1.3c3.4 .4 6.7 .9 10 1.3zM288 227.1C261.9 176.4 190.9 81.9 124.9 35.3C61.6-9.4 37.5-1.7 21.6 5.5C3.3 13.8 0 41.9 0 58.4S9.1 194 15 213.9c19.5 65.7 89.1 87.9 153.2 80.7c3.3-.5 6.6-.9 10-1.4c-3.3 .5-6.6 1-10 1.4C74.3 308.6-9.1 342.8 100.3 464.5C220.6 589.1 265.1 437.8 288 361.1c22.9 76.7 49.2 222.5 185.6 103.4c102.4-103.4 28.1-156-65.8-169.9c-3.3-.4-6.7-.8-10-1.3c3.4 .4 6.7 .9 10 1.3c64.1 7.1 133.6-15.1 153.2-80.7C566.9 194 576 75 576 58.4s-3.3-44.7-21.6-52.9c-15.8-7.1-40-14.9-103.2 29.8C385.1 81.9 314.1 176.4 288 227.1z" fill="white"/></svg> <svg viewBox="0 0 496 512" style="height:1em;position:relative;display:inline-block;top:.1em;fill:white;" xmlns="http://www.w3.org/2000/svg">  <path d="M165.9 397.4c0 2-2.3 3.6-5.2 3.6-3.3.3-5.6-1.3-5.6-3.6 0-2 2.3-3.6 5.2-3.6 3-.3 5.6 1.3 5.6 3.6zm-31.1-4.5c-.7 2 1.3 4.3 4.3 4.9 2.6 1 5.6 0 6.2-2s-1.3-4.3-4.3-5.2c-2.6-.7-5.5.3-6.2 2.3zm44.2-1.7c-2.9.7-4.9 2.6-4.6 4.9.3 2 2.9 3.3 5.9 2.6 2.9-.7 4.9-2.6 4.6-4.6-.3-1.9-3-3.2-5.9-2.9zM244.8 8C106.1 8 0 113.3 0 252c0 110.9 69.8 205.8 169.5 239.2 12.8 2.3 17.3-5.6 17.3-12.1 0-6.2-.3-40.4-.3-61.4 0 0-70 15-84.7-29.8 0 0-11.4-29.1-27.8-36.6 0 0-22.9-15.7 1.6-15.4 0 0 24.9 2 38.6 25.8 21.9 38.6 58.6 27.5 72.9 20.9 2.3-16 8.8-27.1 16-33.7-55.9-6.2-112.3-14.3-112.3-110.5 0-27.5 7.6-41.3 23.6-58.9-2.6-6.5-11.1-33.3 2.6-67.9 20.9-6.5 69 27 69 27 20-5.6 41.5-8.5 62.8-8.5s42.8 2.9 62.8 8.5c0 0 48.1-33.6 69-27 13.7 34.7 5.2 61.4 2.6 67.9 16 17.7 25.8 31.5 25.8 58.9 0 96.5-58.9 104.2-114.8 110.5 9.2 7.9 17 22.9 17 46.4 0 33.7-.3 75.4-.3 83.6 0 6.5 4.6 14.4 17.3 12.1C428.2 457.8 496 362.9 496 252 496 113.3 383.5 8 244.8 8zM97.2 352.9c-1.3 1-1 3.3.7 5.2 1.6 1.6 3.9 2.3 5.2 1 1.3-1 1-3.3-.7-5.2-1.6-1.6-3.9-2.3-5.2-1zm-10.8-8.1c-.7 1.3.3 2.9 2.3 3.9 1.6 1 3.6.7 4.3-.7.7-1.3-.3-2.9-2.3-3.9-2-.6-3.6-.3-4.3.7zm32.4 35.6c-1.6 1.3-1 4.3 1.3 6.2 2.3 2.3 5.2 2.6 6.5 1 1.3-1.3.7-4.3-1.3-6.2-2.2-2.3-5.2-2.6-6.5-1zm-11.4-14.7c-1.6 1-1.6 3.6 0 5.9 1.6 2.3 4.3 3.3 5.6 2.3 1.6-1.3 1.6-3.9 0-6.2-1.4-2.3-4-3.3-5.6-2z"></path></svg> privefl

---

# Principal Component Analysis (PCA)<br>of genetic data<br>captures population structure

---

### PCA of genetic data captures continental population structure

<br>

<div class="figure" style="text-align: center">
<img src="figures/PCA-UKBB-Bycroft.png" alt="in the UK Biobank data" width="100%" />
<p class="caption">in the UK Biobank data</p>
</div>

---

### PCA also captures sub-continental population structure

<div class="figure" style="text-align: center">
<img src="figures/PCA-POPRES-EUR.png" alt="in the POPRES data (European subset)" width="75%" />
<p class="caption">in the POPRES data (European subset)</p>
</div>

---

### Distance in PCA measures genetic distance

<div class="figure" style="text-align: center">
<img src="figures/compare-Euclidean-to-Fst.png" alt="in the 1000 Genomes data" width="100%" />
<p class="caption">in the 1000 Genomes data</p>
</div>

---

# &mdash;Previous work&mdash;

# Genetic Ancestry Deconvolution

## (with reference populations)

<br>

## All individuals are genetically admixed<br>from L reference populations (known)

---

### My motivation

<br>

- Get a sense of the **ancestry composition of a GWAS dataset**    
(including when only GWAS summary statistics are available to me)

- Use this information to e.g. design an LD reference for follow-up analyses (polygenic scores, fine-mapping, etc)

---

### Summix: ancestry estimation from GWAS allele frequencies (AF)

Source: 10.1016/j.ajhg.2021.05.016 (not by me)

<br>

Estimate ancestry proportions `$q$` such that

<span class="small2"> `$$AF = q_\text{AFR} AF_\text{AFR} + q_\text{EAS} AF_\text{EAS} + q_\text{EUR} AF_\text{EUR} + q_\text{SAS} AF_\text{SAS} + q_\text{IAM} AF_\text{IAM} + \epsilon ~,$$` </span>

where all `$q$` are positive and sum to 1.

<br>

More general formulation:

<span class="small2"> `$$\min_{\forall l,~q_l \ge 0 \\ \sum_l {q_l}=1} ~~ \sum_{j=1}^M \left( AF_j - \sum_{l=1}^L q_l AF_j^{\text{ref}~(l)} \right)^2$$` </span>

<br>

`$\Rightarrow$` Quadratic programming with linear constraints

---

<br>

- Curate the UK Biobank to define 18 worldwide reference groups

- Use PCA to maximize power to distinguish between these populations:

<span class="small2"> `$$\min_{\forall l,~q_l \ge 0 \\ \sum_l {q_l}=1} ~~ \sum_{j=1}^M \left( AF_j - \sum_{l=1}^L q_l AF_j^{\text{ref}~(l)} \right)^2$$` </span>
    
    is replaced by (AFs are projected to the PC space using PC loadings)
    
    <span class="small2"> `$$\min_{\forall l,~q_l \ge 0 \\ \sum_l {q_l}=1} ~~ \sum_{k=1}^K \left( PC_k - \sum_{l=1}^L q_l PC_k^{\text{ref}~(l)} \right)^2$$` </span>
    
---

With my method (using the PC loadings and reference AFs I provide):

With Summix (using the reference AFs I provide):

---

### Admixture coefficients for individual-level data

<br>

**My developed method also works for individual-level data!**

(by simply using genotypes, divided by 2, in place of allele frequencies)

This is similar to the projection analysis from ADMIXTURE,    
but should have more power..

<br>

Application to iPSYCH (genetic study in Denmark):

Out of 134K individuals, can identify many non-European individuals:

- Middle East: 2600
- East Africa: 450
- North Africa: 330
- South Asia: 840
- East Asia: 280

---

# &mdash;Current work&mdash;

# Genetic Ancestry Deconvolution

## (without reference populations)

<br>

## All individuals are genetically admixed<br>from L reference populations (unknown)

---

### Admixture model and ADMIXTURE method

`$$G \approx Q \cdot 2F$$`

- `$Q$` are the admixture proportions (for each sample `$i$` and reference `$l$`)

- `$F$` are the allele frequencies (for each each reference `$l$` and variant `$j$`)

***

ADMIXTURE uses Maximum Likelihood Estimation of

<span class="small3"> `$$L(Q, F) = \sum_i \sum_j \left\{ G_{i,j} \log\left[\sum_l Q_{i,l} F_{l,j}\right] + (2 - G_{i,j}) \log\left[1 - \sum_l Q_{i,l} F_{l,j}\right] \right\}$$` </span>

with constraints: `$~0 \le F_{l,j} \le 1~$` and `$~Q_{i,l} \ge 0~$` and `$~\sum_l Q_{i,l} = 1$`

***

For simplicity, ADMIXTURE iteratively estimates

- each `$Q_{i,.}$` independently, with `$F$` fixed

- each `$F_{.,j}$` independently, with `$Q$` fixed

---

### My proposed deconvolution method

`$$G \cdot V \approx Q \cdot 2F \cdot V$$`
where `$V$` are the PC loadings of `$G$`

`$$\Rightarrow~ PC \approx Q \cdot PC^\text{ref}$$`
--

---

#### Estimating admixture coefficients `$Q_{i,.}$` with `$PC^\text{ref}$` fixed

<span class="small3"> `$$\min_{\forall l,~Q_{i,l} \ge 0 \\ \sum_l Q_{i,l}=1} ~~ \sum_{k=1}^K \left( PC_{i,k} - \sum_{l=1}^L Q_{i,l} PC_{l,k}^{\text{ref}} \right)^2$$` </span>

<br>

#### Estimating admixture coefficients `$PC_{l,.}^{\text{ref}}$` with `$Q$` fixed

<span class="small3"> `$$PC_{l,.}^{\text{ref}} = \dfrac{\sum_i {Q_{i,l}}^m \cdot PC_{i,.}}{\sum_i {Q_{i,l}}^m}$$` </span>

<br>

- this simple formula is used in e.g. fuzzy K-means

- this is also related to archetypal analysis: `$PC^{\text{ref}} = W^T \cdot PC$`    
(references are weighted combinations of existing samples)

`$\Rightarrow$` Reference allele frequencies: `$2F = W^T \cdot G$`
  
---

### Complete deconvolution algorithm

<br>

Iterate between

- estimating admixture coefficients `$Q_{i,.}$`, with `$PC^{\text{ref}}$` fixed

- estimating reference positions `$PC_{l,.}^{\text{ref}}$`, with `$Q$` fixed

<br>

But a starting point is needed..

A naive approach would pick L initial `$PC_{l,.}^{\text{ref}}$` at random
- non-deterministic

- starting point far from the optimal solution (slow convergence)

- easy to get trapped in a local optimum

Instead, I use an **iterative procedure with warm starts**    
to make the algorithm **deterministic** and much **faster to converge**

---

### Start with PC1 and 2 refs, then add 3rd ref when considering 2 PCs

---

### Add 4th reference when considering 3 PCs

<br>

---

### Add 5th reference when considering 4 PCs

<br>

---

# Comparing to ADMIXTURE
# in simulated data

---

### A complex admixture simulation (using R pkgs {bnpsd} and {ape})

---

### PCA of the simulated genetic data (500 x 10,000)

---

### Reference PC positions (`$PC^{\text{ref}}$` vs `$2F \cdot V$`) are very similar

<div class="figure" style="text-align: center">
<img src="figures/simu-admixture-ref.png" alt="A: with my method &amp;#8212; B: ADMIXTURE's ref AFs projected" width="95%" />
<p class="caption">A: with my method &#8212; B: ADMIXTURE's ref AFs projected</p>
</div>

---

### Admixture coefficients `$Q$` are very similar

---

### Reference allele frequencies `$F$` are close to the simulated ones

<br>

---

# Comparing to ADMIXTURE
# in the 1000 Genomes data

---

### Admixture coefficients `$Q$` using L=5 reference populations

<div class="figure" style="text-align: center">
<img src="figures/1000G-Q-Q2-5.png" alt="A: with my method; B: with ADMIXTURE" width="100%" />
<p class="caption">A: with my method; B: with ADMIXTURE</p>
</div>

.footnote2[**ACB**: African Caribbean in Barbados; **ASW**: African Ancestry in Southwest US; **ESN**: Esan in Nigeria; **GWD**: Gambian in Western Division, The Gambia; **LWK**: Luhya in Webuye, Kenya; **MSL**: Mende in Sierra Leone; **YRI**: Yoruba in Ibadan, Nigeria; **CLM**: Colombian in Medellin, Colombia; **MXL**: Mexican Ancestry in Los Angeles, California; **PEL**: Peruvian in Lima, Peru; **PUR**: Puerto Rican in Puerto Rico; **CDX**: Chinese Dai in Xishuangbanna, China; **CHB**: Han Chinese in Bejing, China; **CHS**: Southern Han Chinese, China; **JPT**: Japanese in Tokyo, Japan; **KHV**: Kinh in Ho Chi Minh City, Vietnam; **CEU**: Utah residents with Northern and Western European ancestry; **FIN**: Finnish in Finland; **GBR**: British in England and Scotland; **IBS**: Iberian populations in Spain; **TSI**: Toscani in Italy; **BEB**: Bengali in Bangladesh; **GIH**: Gujarati Indian in Houston,TX; **ITU**: Indian Telugu in the UK; **PJL**: Punjabi in Lahore,Pakistan; **STU**: Sri Lankan Tamil in the UK]

---

### Projected reference allele frequencies (`$2F \cdot V$`) from ADMIXTURE

#### `$\Rightarrow$` highly similar to what I get with my method

#### `$\Rightarrow$` supports using `$K$` PCs only for `$L=K+1$` reference populations

---

### Admixture coefficients `$Q$` using L=9 reference populations

<div class="figure" style="text-align: center">
<img src="figures/1000G-Q-Q2-9.png" alt="A: with my method; B: with ADMIXTURE" width="100%" />
<p class="caption">A: with my method; B: with ADMIXTURE</p>
</div>

---

### Projected reference allele frequencies (`$2F \cdot V$`) from ADMIXTURE

---

### Admixture coefficients `$Q$` using L=18 reference populations

<div class="figure" style="text-align: center">
<img src="figures/1000G-Q-Q2-18.png" alt="A: with my method; B: with ADMIXTURE" width="100%" />
<p class="caption">A: with my method; B: with ADMIXTURE</p>
</div>

---

### Runtimes

<br>

- for ADMIXTURE (with 15 cores), it takes

- 1 hour for L=5
    
    - 4 hours for L=9
    
    - 13 hours for L=18

<br>

- for my method (with 1 core), it takes

- 2 minutes to get all solutions for L=3 to L=18

---

# In the UK Biobank data

---

### After convergence with 17 references and 16 PCs from the UK Biobank

<br>

---

### Country (of birth) counts with ancestry > 0.6 for each reference

<ul class="small2">
    <li>United Kingdom: 126045 &#8211; NA: 1352 &#8211; Germany: 915 &#8211; South Africa: 477 &#8211; Netherlands: 443 &#8211; USA: 400 &#8211; France: 300 &#8211; Australia: 226 &#8211; Denmark: 197 &#8211; Canada: 195 &#8211; ...</li>
    <li>United Kingdom: 22206 &#8211; NA: 86 &#8211; Germany: 16 &#8211; Ireland: 14</li>
    <li>United Kingdom: 32123 &#8211; Ireland: 289 &#8211; NA: 265 &#8211; New Zealand: 75 &#8211; Canada: 58 &#8211; India: 54 &#8211; Germany: 47 &#8211; South Africa: 43 &#8211; Australia: 40 &#8211; Kenya: 36 &#8211; Malaysia: 29 &#8211; ...</li>
    <li>United Kingdom: 10647 &#8211; Ireland: 9360 &#8211; NA: 290 &#8211; USA: 47 &#8211; Australia: 43 &#8211; ...</li>
    <li>United Kingdom: 1347 &#8211; NA: 30</li>
    <li>United Kingdom: 4080 &#8211; NA: 77</li>
    <li>NA: 752 &#8211; Poland: 599 &#8211; United Kingdom: 415 &#8211; Russia: 131 &#8211; Finland: 105 &#8211; Germany: 87 &#8211; Lithuania: 71 &#8211; Ukraine: 55 &#8211; Czech Republic: 53 &#8211; Latvia: 52 &#8211; Slovakia: 28 &#8211; ...</li>
    <li>India: 1852 &#8211; Kenya: 782 &#8211; Sri Lanka: 653 &#8211; NA: 547 &#8211; Pakistan: 410 &#8211; Mauritius: 273 &#8211; Bangladesh: 235 &#8211; Uganda: 231 &#8211; Tanzania: 175 &#8211; Caribbean: 114 &#8211; The Guianas: 83 &#8211; ...</li>
    <li>Caribbean: 2110 &#8211; NA: 2100 &#8211; Nigeria: 1017 &#8211; Ghana: 866 &#8211; Barbados: 255 &#8211; Sierra Leone: 202 &#8211; The Guianas: 151 &#8211; Gambia: 39 &#8211; Ivory Coast: 32 &#8211; ...</li>
    <li>Italy: 389 &#8211; NA: 353 &#8211; Cyprus: 170 &#8211; United Kingdom: 168 &#8211; Egypt: 147 &#8211; Malta: 116 &#8211; Greece: 99 &#8211; Algeria: 68 &#8211; Lebanon: 50 &#8211; Morocco: 46 &#8211; Libya: 40 &#8211; Palestine: 30 &#8211; ...</li>
    <li>United Kingdom: 1844 &#8211; NA: 830 &#8211; USA: 169 &#8211; South Africa: 95 &#8211; Israel: 41 &#8211; ...</li>
    <li>Iran: 476 &#8211; Iraq: 140 &#8211; NA: 59 &#8211; Turkey: 54 &#8211; India: 36 &#8211; Afghanistan: 13 &#8211; Pakistan: 10</li>
    <li>China: 287 &#8211; Japan: 241 &#8211; Malaysia: 185 &#8211; Hong Kong: 161 &#8211; Nepal: 123 &#8211; NA: 63 &#8211; Singapore: 56 &#8211; South Korea: 26 &#8211; Mauritius: 25 &#8211; Taiwan: 25 &#8211; Indonesia: 15 &#8211; ...</li>
    <li>Zimbabwe: 268 &#8211; Congo: 133 &#8211; Uganda: 115 &#8211; Kenya: 73 &#8211; South Africa: 59 &#8211; Zambia: 56 &#8211; NA: 41 &#8211; Tanzania: 26 &#8211; Angola: 23 &#8211; Burundi: 17 &#8211; Rwanda: 16 &#8211; Seychelles: 14 &#8211; ...</li>
    <li>Philippines: 315 &#8211; Malaysia: 20 &#8211; NA: 17 &#8211; Indonesia: 15 &#8211; Thailand: 13</li>
    <li>Peru: 33 &#8211; Ecuador: 25 &#8211; Mexico: 20 &#8211; Colombia: 17 &#8211; Bolivia: 14 &#8211; Chile: 11</li>
    <li>Somalia: 81 &#8211; Ethiopia: 58 &#8211; Sudan: 51 &#8211; Eritrea: 45 &#8211; NA: 20</li>
</ul>

---

# Capturing more population structure
# (with less individuals)

---

***

In the UK Biobank data,

- only the first 16 PCs actually capture population structure    
(PC 19&#8211;40 capture LD only; never use them!)

When subsampling British and Irish individuals (self-reported ancestry)

- can obtain 40 PCs that capture some population structure

- using the best practices for PCA of genetic data

<br>

In my current work:

- I've also looked at using `$Q$` to do the subsampling

- I've run my deconvolution algorithm on K=41 PCs to get L=42 references

---

### More PCs capturing population structure when subsampling

---

### Rerun the algorithm with new PCs (K=41, L=42)

<div class="grid-container">
<ul>
    <li>United Kingdom: 414629 &#8211; Ireland: 12416 &#8211; NA: 4731 &#8211; Germany: 1498 &#8211; South Africa: 970 &#8211; USA: 956 &#8211; Australia: 853 &#8211; New Zealand: 656 &#8211; Canada: 644 &#8211; ...</li>
    <li>NA: 709 &#8211; Poland: 592 &#8211; United Kingdom: 389 &#8211; Russia: 123 &#8211; Germany: 71 &#8211; Lithuania: 71 &#8211; Ukraine: 53 &#8211; Latvia: 52&#8211; ..</li>
    <li>Italy: 34</li>
    <li>Spain: 30</li>
    <li>United Kingdom: 1838 &#8211; NA: 831 &#8211; USA: 170 &#8211; South Africa: 95 &#8211; Israel: 40 &#8211; Canada: 18 &#8211; Hungary: 18 &#8211; France: 12</li>
    <li>Finland: 125</li>
    <li>Nigeria: 975 &#8211; NA: 292 &#8211; Caribbean: 155 &#8211; Sierra Leone: 42 &#8211; Ghana: 13</li>
    <li>Sri Lanka: 635 &#8211; India: 493 &#8211; Mauritius: 190 &#8211; NA: 156 &#8211; Kenya: 90 &#8211; Caribbean: 71 &#8211; Malaysia: 67 &#8211; The Guianas: 52 &#8211; ...</li>
    <li>Malta: 114 &#8211; United Kingdom: 15 &#8211; NA: 12 &#8211; Egypt: 10</li>
    <li>Iran: 494 &#8211; Iraq: 247 &#8211; Turkey: 114 &#8211; NA: 58 &#8211; Syria: 11 &#8211; United Kingdom: 10</li>
    <li>Ghana: 817 &#8211; NA: 68 &#8211; Ivory Coast: 27</li>
    <li>India: 571 &#8211; NA: 207 &#8211; Kenya: 40 &#8211; Pakistan: 28 &#8211; Malaysia: 23 &#8211; Singapore: 13</li>
    <li>India: 28</li>
    <li>Yemen: 26 &#8211; Egypt: 18 &#8211; NA: 12</li>
    <li>Congo: 129 &#8211; Angola: 30 &#8211; Zambia: 30 &#8211; NA: 25 &#8211; Cameroon: 24</li>
    <li>India: 224 &#8211; Kenya: 179 &#8211; NA: 55 &#8211; Uganda: 28 &#8211; Pakistan: 21 &#8211; Tanzania: 21</li>
</ul>

<ul>
    <li>Japan: 241 &#8211; South Korea: 26</li>
    <li>Thailand: 61 &#8211; Vietnam: 40 &#8211; Malaysia: 10</li>
    <li>Algeria: 69 &#8211; Morocco: 66 &#8211; Libya: 27 &#8211; NA: 10</li>
    <li>Kenya: 18 &#8211; India: 13</li>
    <li>Philippines: 310 &#8211; NA: 16</li>
    <li>Pakistan: 76 &#8211; NA: 20</li>
    <li>Kenya: 36 &#8211; India: 25</li>
    <li>India: 70 &#8211; Afghanistan: 25 &#8211; NA: 19</li>
    <li>India: 17 &#8211; NA: 11 &#8211; Malawi: 10</li>
    <li>Colombia: 115</li>
    <li>Sierra Leone: 38 &#8211; Gambia: 33</li>
    <li>India: 90 &#8211; NA: 32</li>
    <li>Tanzania: 24</li>
    <li>Pakistan: 146 &#8211; NA: 42 &#8211; India: 22 &#8211; Kenya: 18</li>
    <li>India: 135 &#8211; Kenya: 120 &#8211; Uganda: 80 &#8211; NA: 37 &#8211; Tanzania: 24</li>
    <li>Nepal: 125 &#8211; NA: 14</li>
    <li>Peru: 31 &#8211; Ecuador: 20 &#8211; Bolivia: 14 &#8211; Mexico: 13</li>
    <li>Uganda: 69 &#8211; Tanzania: 43 &#8211; Kenya: 40 &#8211; India: 24</li>
    <li>Kenya: 42 &#8211; India: 39 &#8211; NA: 16 &#8211; Tanzania: 14</li>
    <li>Uganda: 101 &#8211; Kenya: 28 &#8211; Tanzania: 11</li>
    <li>Kenya: 114</li>
    <li>India: 43 &#8211; Kenya: 38 &#8211; NA: 19</li>
    <li>South Africa: 48 &#8211; Zimbabwe: 25</li>
    <li>Sudan: 17</li>
    <li></li>
    <li>Somalia: 78</li>
</ul>
</div>

---

### Conclusion

- A very efficient admixture deconvolution algorithm

- Also very powerful; it can identify many reference groups    
(subsampling before PCA is beneficial to capture more structure)

- I expected more power than ADMIXTURE for e.g. 1000 Genomes data

- One can (should) check the results visually (also, L=K+1)

- The algorithm is not specific to genetic data    
(merely a deconvolution algorithm based on PCA)    
`$\Rightarrow$` may be used for cell type deconvolution of methylation data

- I will provide a new set of reference populations for people to use directly

<br>

Presentation available at [bit.ly/privefl220525](https://bit.ly/privefl220525)
]