This is my online notebook to document and share locus-level summary results of fitting “baseline” multiple-SNP model to GWAS summary statistics of 31 human complex traits. These baseline results serve as the inferential basis for my enrichment-prioritization analyses of 31 human complex traits. Please see this manuscript for more details.

If you have any question about this notebook, please feel free to contact me: Xiang Zhu, xiangzhu@uchicago.edu or xiangzhu@stanford.edu.


Background

Before displaying the results, we first clarify a few key terms.

What is “baseline model”?

The baseline multiple-SNP model considered here consists of a multiple regression likelihood based on single-SNP association summary statistics (Zhu and Stephens, 2017) and a previously proposed spike-and-slab prior (Guan and Stephens, 2011). Specifically, the model has the following form:

\[ \begin{aligned} \widehat{\beta} &\sim \mbox{Normal}~\left(\widehat{S}\widehat{R}\widehat{S}^{-1}\beta,~\widehat{S}\widehat{R}\widehat{S}\right),\\ \beta_j &\sim \pi\cdot \mbox{Normal}~\left(0,~\sigma_\beta^2\right)+(1-\pi)\cdot \delta_0,\\ \sigma_\beta^2 &= h \cdot \left(\sum_{j=1}^p \pi n^{-1}\hat{s}_j^{-2}\right)^{-1},\\ \pi_j &= \left(1+10^{-\theta_0}\right)^{-1}, \end{aligned} \]

where \(\widehat{\beta}:=(\hat{\beta}_1,\ldots,\hat{\beta}_p)^\intercal\), \(\widehat{S}:=\mbox{diag}(\hat{s})\), \(\hat{s}:=(\hat{s}_1,\ldots, \hat{s}_p)^\intercal\), \(\{\hat{\beta}_j,\hat{s}_j\}\) are single-SNP association summary data, \(\widehat{R}\) is the LD matrix estimated from an external reference panel (Wen and Stephens, 2010), and \(\beta:=(\beta_1,\ldots,\beta_p)^\intercal\) are the true genetic effects of SNPs on a target trait under the multiple-SNP model.

What are “Round 1 & 2”?

For each trait, we fit the baseline model to its GWAS summary statistics twice:

  • “Round 1” analysis, using a wide and coarse grid of hyperparameters \(\{h,\theta_0\}\);
  • “Round 2” analysis, using a narrow and fine grid of \(\{h,\theta_0\}\) (informed by Round 1 analysis).

For details of these grids, please see Supplementary Table 3 of the corresponding manuscript.

What is “locus”?

To summarize the locus-level results of baseline model fitting, we consider two ways to define “locus”.

  1. Divide the entire genome into overlapped loci of 50 SNPs (with an overlap of 25 SNPs between neighboring loci). This approach is used in Carbonetto and Stephens (2013). We term this approach “50-SNP loci”.

  2. Download gene definitions from Homo sapiens reference genome GRCh37, and then define the transcribed region \(\pm\) 100 kb of each gene as a locus. We term this approach “genes”.

What are abz, p[n]p and ens?

For a given locus, we summarize the association signal via three statistics: abz, p[n]p and ens.

  1. abz: the maximum absolute single-SNP \(z\)-score in the locus \[\max_{j\in\mbox{locus}}|z_j|\]

  2. p[n]p: the posterior probability that the locus contains at least \(n\) trait-associated SNPs \[\mbox{Pr}(\#\{j:\beta_j\neq 0\}\geq n~|~\mbox{Data})\]

  3. ens: the posterior expected number of trait-associated SNPs in the locus \[\sum_{j\in\mbox{locus}}\mbox{Pr}(\beta_j\neq 0~|~\mbox{Data})\]

Note that abz is a direct summary of single-SNP GWAS summary statistics, whereas p[n]p and ens depend on the multiple-SNP baseline model fitting.

Full results

Here we summarize results by trait. For each trait, there are two round analyses (Round 1 and Round 2). For a given round, two tables are shown for each trait.

The first type of table reports results based on “50-SNP loci” approach. Below is a screenshot.

chr start abz p1p p2p ens
2 114500473 1.897 0.003 0 0.003
4 17304974 1.317 0.003 0 0.003
6 2276503 1.720 0.002 0 0.002
8 4325476 2.967 0.013 0 0.013
10 54042782 1.560 0.002 0 0.002

The second type of table reports results based on “genes” approach. Below is a screenshot.

gene chr start stop abz p1p p2p ens
SNTG2 2 946554 1371385 2.604 0.015 0 0.015
PLSCR1 3 146232967 146262628 2.109 0.008 0 0.008
ZNF391 6 27356524 27369227 1.730 0.004 0 0.004
LYN 8 56792386 56923940 2.189 0.007 0 0.007
OR56A1 11 6047901 6048971 2.206 0.006 0 0.006

Below are links to baseline results of 31 traits used in our manuscript. It may take a while to load the tables for each trait. Please be patient.


This R Markdown site was created with workflowr