Overview
This is my online notebook to document and share the full results of
genome-wide enrichment and prioritization analyses described in the
article:
Xiang Zhu and Matthew Stephens (2018). Large-scale genome-wide
enrichment analyses identify new trait-associated genes and pathways
across 31 human phenotypes. Nature Communications 9, 4361. https://doi.org/10.1038/s41467-018-06805-x.
We developed a new statistical method, RSS-E, to generate the results
for this study. The software that implements RSS-E is freely available
at stephenslab/rss
.
We also provide an end-to-end example
illustrating how to use RSS-E to perform the reported genome-wide
enrichment and prioritization analyses of GWAS summary statistics. This
software can be referenced in a journal’s “Code availability” section as
.
In addition, all 4,026 pre-processed gene sets used in this study
(including 3,913 biological pathways and 113 tissue-based gene sets) are
freely available at xiangzhu/rss-gsea
.
These gene sets can be referenced in a journal’s “Data availability”
section as .
If you find the analysis results, the pre-processed gene sets, the
statistical methods, and/or the open-source software useful for your
work, please kindly cite our article listed above, Zhu and
Stephens (2018).
If you have any question about the notebook and/or the article,
please feel free to contact me: Xiang Zhu,
xiangzhu[at]uchicago[and/or]stanford.edu
.
Additional resources
How can I perform similar analyses on a new GWAS summary dataset
using RSS-E?
The software that generated results of this study is freely available
at stephenslab/rss
.
I also write a step-by-step RSS-E tutorial
that illustrates how to use this software to perform genome-wide
enrichment and prioritization analyses on GWAS summary statistics.
Compared with most existing enrichment methods, the most appealing
feature of RSS-E is the automatic gene prioritization in light of
inferred enrichments. Is this gene prioritization feature available in
your software?
Yes. This feature is implemented as function compute_pip.m
in RSS-E. The step-by-step RSS-E tutorial
illustrates how to use this function.
There are two sanity checks for the more sophisticated RSS-E analysis
in Zhu and
Stephens (2018): an eyeball test and a likelihood ratio calculation.
Do you have software for these sanity checks?
Yes. The eyeball test is simply plotting marginal distribution of
GWAS z-scores, stratified by SNP-level annotations based on a given gene
set. Here we used ggplot2::geom_density
(default setting). Regarding the likelihood ratio check, I write a
stand-alone script ash_lrt_31traits.R
.
Please carefully read the instruction in this script. For more details
of these two sanity checks, please see the caption of Supplementary
Figure 17 in Zhu and
Stephens (2018).
Where can I download all 4,026 pre-processed gene sets used in this
work?
All 4,026 gene sets used in this study are freely available at xiangzhu/rss-gsea
,
where the folder biological_pathway
contains 3,913 biological pathways, and the folder tissue_set
contains 113 GTEx tissue-based gene sets. More details about these gene
sets can be found here.
Where can I find RSS-E “baseline” model fitting results of all 31
traits?
You can find summary results of “baseline” model fitting at xiangzhu/rss-gsea-baseline
.
For me, the baseline model fitting results are merely inferential
“bases” for the enrichment model fitting results shown in the “Main
results” section above. However, when I was presenting the enrichment
results during my Ph.D. thesis
defense, Prof. John Novembre and
Prof. Xin He both pointed out these
baseline results might be useful for other on-going research projects on
the “fourth floor” (i.e. the fantastic computational
space shared with the labs of Matthew Stephens, John Novembre and Xin
He). Their comments motivated me to create a separate online notebook xiangzhu/rss-gsea-baseline
to share the baseline summary results.
Where can I find “Round 1” RSS-E results of all 3,913 biological
pathways?
Currently you need to contact me directly to view our “Round 1”
results of all 3913 pathways. When this work was under review, one
referee pointed out that our previous online results, especially our
“Round 1” analysis results, were “needlessly complicated” and did not
have “any obvious benefit”. Hence, I removed the “Round 1” analysis
results from this notebook to simplify the presentation. I sincerely
hope that this change can address this referee’s comment.