Match alleles between summary statistics and SNP information. Match by ("chr", "a0", "a1") and ("pos" or "rsid"), accounting for possible strand flips and reverse reference alleles (opposite effects).
snp_match(
sumstats,
info_snp,
strand_flip = TRUE,
join_by_pos = TRUE,
remove_dups = TRUE,
match.min.prop = 0.2,
return_flip_and_rev = FALSE
)A data frame with columns "chr", "pos", "a0", "a1" and "beta".
A data frame with columns "chr", "pos", "a0" and "a1".
Whether to try to flip strand? (default is TRUE)
If so, ambiguous alleles A/T and C/G are removed.
Whether to join by chromosome and position (default), or instead by rsid.
Whether to remove duplicates (same physical position)?
Default is TRUE.
Minimum proportion of variants in the smallest data
to be matched, otherwise stops with an error. Default is 20%.
Whether to return internal boolean variables
"_FLIP_" (whether the alleles must be flipped: A <--> T & C <--> G,
because on the opposite strand) and "_REV_" (whether alleles must be
swapped: $a0 <--> $a1, in which case corresponding $beta are multiplied
by -1). Default is FALSE.
A single data frame with matched variants. Values in column $beta
are multiplied by -1 for variants with alleles reversed (i.e. swapped).
New variable "_NUM_ID_.ss" returns the corresponding row indices of the
input sumstats (first argument of this function), and "_NUM_ID_"
corresponding to the input info_snp (second argument).
sumstats <- data.frame(
chr = 1,
pos = c(86303, 86331, 162463, 752566, 755890, 758144),
a0 = c("T", "G", "C", "A", "T", "G"),
a1 = c("G", "A", "T", "G", "A", "A"),
beta = c(-1.868, 0.250, -0.671, 2.112, 0.239, 1.272),
p = c(0.860, 0.346, 0.900, 0.456, 0.776, 0.383)
)
info_snp <- data.frame(
id = c("rs2949417", "rs115209712", "rs143399298", "rs3094315", "rs3115858"),
chr = 1,
pos = c(86303, 86331, 162463, 752566, 755890),
a0 = c("T", "A", "G", "A", "T"),
a1 = c("G", "G", "A", "G", "A")
)
snp_match(sumstats, info_snp)
#> 6 variants to be matched.
#> 1 ambiguous SNPs have been removed.
#> 4 variants have been matched; 1 were flipped and 1 were reversed.
#> chr pos a0 a1 beta p _NUM_ID_.ss id _NUM_ID_
#> 1 1 86303 T G -1.868 0.860 1 rs2949417 1
#> 2 1 86331 A G -0.250 0.346 2 rs115209712 2
#> 3 1 162463 G A -0.671 0.900 3 rs143399298 3
#> 4 1 752566 A G 2.112 0.456 4 rs3094315 4
snp_match(sumstats, info_snp, strand_flip = FALSE)
#> 6 variants to be matched.
#> 4 variants have been matched; 0 were flipped and 1 were reversed.
#> chr pos a0 a1 beta p _NUM_ID_.ss id _NUM_ID_
#> 1 1 86303 T G -1.868 0.860 1 rs2949417 1
#> 2 1 86331 A G -0.250 0.346 2 rs115209712 2
#> 3 1 752566 A G 2.112 0.456 4 rs3094315 4
#> 4 1 755890 T A 0.239 0.776 5 rs3115858 5