Baseline results of Zhu and Stephens (2017)

This is my online notebook to document and share locus-level summary results of fitting “baseline” multiple-SNP model to GWAS summary statistics of 31 human complex traits. These baseline results serve as the inferential basis for my enrichment-prioritization analyses of 31 human complex traits. Please see this manuscript for more details.

If you have any question about this notebook, please feel free to contact me: Xiang Zhu, xiangzhu@uchicago.edu or xiangzhu@stanford.edu.

Background

Before displaying the results, we first clarify a few key terms.

What is “baseline model”?

The baseline multiple-SNP model considered here consists of a multiple regression likelihood based on single-SNP association summary statistics (Zhu and Stephens, 2017) and a previously proposed spike-and-slab prior (Guan and Stephens, 2011). Specifically, the model has the following form:

\[ \begin{aligned} \widehat{\beta} &\sim \mbox{Normal}~\left(\widehat{S}\widehat{R}\widehat{S}^{-1}\beta,~\widehat{S}\widehat{R}\widehat{S}\right),\\ \beta_j &\sim \pi\cdot \mbox{Normal}~\left(0,~\sigma_\beta^2\right)+(1-\pi)\cdot \delta_0,\\ \sigma_\beta^2 &= h \cdot \left(\sum_{j=1}^p \pi n^{-1}\hat{s}_j^{-2}\right)^{-1},\\ \pi_j &= \left(1+10^{-\theta_0}\right)^{-1}, \end{aligned} \]

where \(\widehat{\beta}:=(\hat{\beta}_1,\ldots,\hat{\beta}_p)^\intercal\), \(\widehat{S}:=\mbox{diag}(\hat{s})\), \(\hat{s}:=(\hat{s}_1,\ldots, \hat{s}_p)^\intercal\), \(\{\hat{\beta}_j,\hat{s}_j\}\) are single-SNP association summary data, \(\widehat{R}\) is the LD matrix estimated from an external reference panel (Wen and Stephens, 2010), and \(\beta:=(\beta_1,\ldots,\beta_p)^\intercal\) are the true genetic effects of SNPs on a target trait under the multiple-SNP model.

What are “Round 1 & 2”?

For each trait, we fit the baseline model to its GWAS summary statistics twice:

“Round 1” analysis, using a wide and coarse grid of hyperparameters \(\{h,\theta_0\}\);
“Round 2” analysis, using a narrow and fine grid of \(\{h,\theta_0\}\) (informed by Round 1 analysis).

For details of these grids, please see Supplementary Table 3 of the corresponding manuscript.

What is “locus”?

To summarize the locus-level results of baseline model fitting, we consider two ways to define “locus”.

Divide the entire genome into overlapped loci of 50 SNPs (with an overlap of 25 SNPs between neighboring loci). This approach is used in Carbonetto and Stephens (2013). We term this approach “50-SNP loci”.
Download gene definitions from Homo sapiens reference genome GRCh37, and then define the transcribed region \(\pm\) 100 kb of each gene as a locus. We term this approach “genes”.

What are `abz`, `p[n]p` and `ens`?

For a given locus, we summarize the association signal via three statistics: abz, p[n]p and ens.

abz: the maximum absolute single-SNP \(z\)-score in the locus \[\max_{j\in\mbox{locus}}|z_j|\]
p[n]p: the posterior probability that the locus contains at least \(n\) trait-associated SNPs \[\mbox{Pr}(\#\{j:\beta_j\neq 0\}\geq n~|~\mbox{Data})\]
ens: the posterior expected number of trait-associated SNPs in the locus \[\sum_{j\in\mbox{locus}}\mbox{Pr}(\beta_j\neq 0~|~\mbox{Data})\]

Note that abz is a direct summary of single-SNP GWAS summary statistics, whereas p[n]p and ens depend on the multiple-SNP baseline model fitting.

Full results

Here we summarize results by trait. For each trait, there are two round analyses (Round 1 and Round 2). For a given round, two tables are shown for each trait.

The first type of table reports results based on “50-SNP loci” approach. Below is a screenshot.

chr	start	abz	p1p	ens
2	114500473	1.897	0.003	0.003
4	17304974	1.317	0.003	0.003
6	2276503	1.720	0.002	0.002
8	4325476	2.967	0.013	0.013
10	54042782	1.560	0.002	0.002

The second type of table reports results based on “genes” approach. Below is a screenshot.

gene	chr	start	stop	abz	p1p	ens
SNTG2	2	946554	1371385	2.604	0.015	0.015
PLSCR1	3	146232967	146262628	2.109	0.008	0.008
ZNF391	6	27356524	27369227	1.730	0.004	0.004
LYN	8	56792386	56923940	2.189	0.007	0.007
OR56A1	11	6047901	6048971	2.206	0.006	0.006

Below are links to baseline results of 31 traits used in our manuscript. It may take a while to load the tables for each trait. Please be patient.

Baseline results of Zhu and Stephens (2017)

Xiang Zhu

2017-07-17

Background

What is “baseline model”?

What are “Round 1 & 2”?

What is “locus”?

What are `abz`, `p[n]p` and `ens`?

Full results

Anthropometric traits

Hematopoietic traits

Metabolic traits

Neurological traits

Baseline results of Zhu and Stephens (2017)

Xiang Zhu

2017-07-17

Background

What is “baseline model”?

What are “Round 1 & 2”?

What is “locus”?

What are abz, p[n]p and ens?

Full results

Anthropometric traits

Hematopoietic traits

Immune-related traits

Metabolic traits

Neurological traits

What are `abz`, `p[n]p` and `ens`?