This is my online notebook to document and share locus-level summary results of fitting “baseline” multiple-SNP model to GWAS summary statistics of 31 human complex traits. These baseline results serve as the inferential basis for my enrichment-prioritization analyses of 31 human complex traits. Please see this manuscript for more details.
If you have any question about this notebook, please feel free to contact me: Xiang Zhu, xiangzhu@uchicago.edu
or xiangzhu@stanford.edu
.
Before displaying the results, we first clarify a few key terms.
The baseline multiple-SNP model considered here consists of a multiple regression likelihood based on single-SNP association summary statistics (Zhu and Stephens, 2017) and a previously proposed spike-and-slab prior (Guan and Stephens, 2011). Specifically, the model has the following form:
\[ \begin{aligned} \widehat{\beta} &\sim \mbox{Normal}~\left(\widehat{S}\widehat{R}\widehat{S}^{-1}\beta,~\widehat{S}\widehat{R}\widehat{S}\right),\\ \beta_j &\sim \pi\cdot \mbox{Normal}~\left(0,~\sigma_\beta^2\right)+(1-\pi)\cdot \delta_0,\\ \sigma_\beta^2 &= h \cdot \left(\sum_{j=1}^p \pi n^{-1}\hat{s}_j^{-2}\right)^{-1},\\ \pi_j &= \left(1+10^{-\theta_0}\right)^{-1}, \end{aligned} \]
where \(\widehat{\beta}:=(\hat{\beta}_1,\ldots,\hat{\beta}_p)^\intercal\), \(\widehat{S}:=\mbox{diag}(\hat{s})\), \(\hat{s}:=(\hat{s}_1,\ldots, \hat{s}_p)^\intercal\), \(\{\hat{\beta}_j,\hat{s}_j\}\) are single-SNP association summary data, \(\widehat{R}\) is the LD matrix estimated from an external reference panel (Wen and Stephens, 2010), and \(\beta:=(\beta_1,\ldots,\beta_p)^\intercal\) are the true genetic effects of SNPs on a target trait under the multiple-SNP model.
For each trait, we fit the baseline model to its GWAS summary statistics twice:
For details of these grids, please see Supplementary Table 3 of the corresponding manuscript.
To summarize the locus-level results of baseline model fitting, we consider two ways to define “locus”.
Divide the entire genome into overlapped loci of 50 SNPs (with an overlap of 25 SNPs between neighboring loci). This approach is used in Carbonetto and Stephens (2013). We term this approach “50-SNP loci”.
Download gene definitions from Homo sapiens reference genome GRCh37, and then define the transcribed region \(\pm\) 100 kb of each gene as a locus. We term this approach “genes”.
abz
, p[n]p
and ens
?For a given locus, we summarize the association signal via three statistics: abz
, p[n]p
and ens
.
abz
: the maximum absolute single-SNP \(z\)-score in the locus \[\max_{j\in\mbox{locus}}|z_j|\]
p[n]p
: the posterior probability that the locus contains at least \(n\) trait-associated SNPs \[\mbox{Pr}(\#\{j:\beta_j\neq 0\}\geq n~|~\mbox{Data})\]
ens
: the posterior expected number of trait-associated SNPs in the locus \[\sum_{j\in\mbox{locus}}\mbox{Pr}(\beta_j\neq 0~|~\mbox{Data})\]
Note that abz
is a direct summary of single-SNP GWAS summary statistics, whereas p[n]p
and ens
depend on the multiple-SNP baseline model fitting.
Here we summarize results by trait. For each trait, there are two round analyses (Round 1 and Round 2). For a given round, two tables are shown for each trait.
The first type of table reports results based on “50-SNP loci” approach. Below is a screenshot.
chr | start | abz | p1p | p2p | ens |
---|---|---|---|---|---|
2 | 114500473 | 1.897 | 0.003 | 0 | 0.003 |
4 | 17304974 | 1.317 | 0.003 | 0 | 0.003 |
6 | 2276503 | 1.720 | 0.002 | 0 | 0.002 |
8 | 4325476 | 2.967 | 0.013 | 0 | 0.013 |
10 | 54042782 | 1.560 | 0.002 | 0 | 0.002 |
The second type of table reports results based on “genes” approach. Below is a screenshot.
gene | chr | start | stop | abz | p1p | p2p | ens |
---|---|---|---|---|---|---|---|
SNTG2 | 2 | 946554 | 1371385 | 2.604 | 0.015 | 0 | 0.015 |
PLSCR1 | 3 | 146232967 | 146262628 | 2.109 | 0.008 | 0 | 0.008 |
ZNF391 | 6 | 27356524 | 27369227 | 1.730 | 0.004 | 0 | 0.004 |
LYN | 8 | 56792386 | 56923940 | 2.189 | 0.007 | 0 | 0.007 |
OR56A1 | 11 | 6047901 | 6048971 | 2.206 | 0.006 | 0 | 0.006 |
Below are links to baseline results of 31 traits used in our manuscript. It may take a while to load the tables for each trait. Please be patient.
This R Markdown site was created with workflowr