SNPSelect may be used in four different ways:
Users may upload the variants in two different formats, i.e either a vcf file or a list of rsIDs. We annotate the user file against our custom database (present at the FTP repository for download) of variants affecting TFBS, and report the matches. The annotated files can be downloaded by the user.
This resource outputs three plots showing the number of variants affecting different TFs, a TF enrichment graph where users can visualilze the most affected TFs against their variants, and an annotation plot of selected regulatory variants based on their context.
User-provided lists can be uploaded in BED format (narrowpeak format is also accepted). Bedtools is used to intersect the user list with our variants selected based on significant effect on TF binding.
The tool outputs a pie chart showing the percentages of genomic regions overlapping with TF-related variants, a TF enrichment plot as well as a variant annotation plot.
Users may select single PWMs representing TFBSs from the JASPAR CORE 2014 vertebrates libray.
The tool outputs a Venn Diagram showing selected variants affecting the specific TF as well as a variant annotation bar plot. Variant match files in various formats (tab-delimited text, annotated and BED) are also included for download.
Users may select single genes by giving the gene name.
The output includes both the variant annotation and the TF enrichment plots.
The set of variants we have chosen as an example is a selected list of rsIDs from the NHGRI-EBI Catalog of published genome-wide association studies (https://www.ebi.ac.uk/gwas/home). We downloaded all the variants that are reported to be associated with diabetes in multiple studies.
Lists of the variants and related details may be found at GWAS Catalog (https://www.ebi.ac.uk/gwas/search?query=diabetes).
The set of variants we have chosen as an example is a selected list of rsIDs in VCF format from the 1000 Genomes Project, a resource coordinated by the International Genome Sample Resource (IGSR) at EMBL-EBI.
NA12878 genotypes were downloaded from 1000 Genomes (FTP Site: ftp-trace.ncbi.nih.gov/1000genomes/ftp/release/20130502) and variants with no alternate allele in the individual were filtered out. The first 10000 lines of the filtered file are used as example here.
This sample series represents a comprehensive set of human transcription factor binding sites based on ChIP-seq experiments generated by production groups in the ENCODE Consortium. It contains 690 ChIP-seq datasets representing 161 unique regulatory factors (generic and sequence-specific factors). The series represents peak calls (regions of enrichment) that were generated by the ENCODE Analysis Working Group (AWG) based on a uniform processing pipeline developed for the ENCODE Integrative Analysis effort.
The data set has been downloaded from the UCSC Genome Browser via URL: https://hgdownload.cse.ucsc.edu/goldenPath/hg19/encodeDCC/wgEncodeAwgTfbsUniform/.