Match alleles between summary statistics and SNP information. Match by ("chr", "a0", "a1") and ("pos" or "rsid"), accounting for possible strand flips and reverse reference alleles (opposite effects).
snp_match(
sumstats,
info_snp,
strand_flip = TRUE,
join_by_pos = TRUE,
remove_dups = TRUE,
match.min.prop = 0.2,
return_flip_and_rev = FALSE
)
A data frame with columns "chr", "pos", "a0", "a1" and "beta".
A data frame with columns "chr", "pos", "a0" and "a1".
Whether to try to flip strand? (default is TRUE
)
If so, ambiguous alleles A/T and C/G are removed.
Whether to join by chromosome and position (default), or instead by rsid.
Whether to remove duplicates (same physical position)?
Default is TRUE
.
Minimum proportion of variants in the smallest data
to be matched, otherwise stops with an error. Default is 20%
.
Whether to return internal boolean variables
"_FLIP_"
(whether the alleles must be flipped: A <--> T & C <--> G,
because on the opposite strand) and "_REV_"
(whether alleles must be
swapped: $a0
<--> $a1
, in which case corresponding $beta
are multiplied
by -1). Default is FALSE
.
A single data frame with matched variants. Values in column $beta
are multiplied by -1 for variants with alleles reversed (i.e. swapped).
New variable "_NUM_ID_.ss"
returns the corresponding row indices of the
input sumstats
(first argument of this function), and "_NUM_ID_"
corresponding to the input info_snp
(second argument).
sumstats <- data.frame(
chr = 1,
pos = c(86303, 86331, 162463, 752566, 755890, 758144),
a0 = c("T", "G", "C", "A", "T", "G"),
a1 = c("G", "A", "T", "G", "A", "A"),
beta = c(-1.868, 0.250, -0.671, 2.112, 0.239, 1.272),
p = c(0.860, 0.346, 0.900, 0.456, 0.776, 0.383)
)
info_snp <- data.frame(
id = c("rs2949417", "rs115209712", "rs143399298", "rs3094315", "rs3115858"),
chr = 1,
pos = c(86303, 86331, 162463, 752566, 755890),
a0 = c("T", "A", "G", "A", "T"),
a1 = c("G", "G", "A", "G", "A")
)
snp_match(sumstats, info_snp)
#> 6 variants to be matched.
#> 1 ambiguous SNPs have been removed.
#> 4 variants have been matched; 1 were flipped and 1 were reversed.
#> chr pos a0 a1 beta p _NUM_ID_.ss id _NUM_ID_
#> 1 1 86303 T G -1.868 0.860 1 rs2949417 1
#> 2 1 86331 A G -0.250 0.346 2 rs115209712 2
#> 3 1 162463 G A -0.671 0.900 3 rs143399298 3
#> 4 1 752566 A G 2.112 0.456 4 rs3094315 4
snp_match(sumstats, info_snp, strand_flip = FALSE)
#> 6 variants to be matched.
#> 4 variants have been matched; 0 were flipped and 1 were reversed.
#> chr pos a0 a1 beta p _NUM_ID_.ss id _NUM_ID_
#> 1 1 86303 T G -1.868 0.860 1 rs2949417 1
#> 2 1 86331 A G -0.250 0.346 2 rs115209712 2
#> 3 1 752566 A G 2.112 0.456 4 rs3094315 4
#> 4 1 755890 T A 0.239 0.776 5 rs3115858 5