GSE31755, Histone Modifications by ChIP-seq

Description

This track, produced as part of the ENCODE Project, displays maps of histone modifications genome-wide using ChIP-seq in different cell lines. The ChIP-seq method involves first using formaldehyde to cross-link histones and other DNA-associated proteins to genomic DNA within cells. The cross-linked chromatin is subsequently extracted, sheared, and immunoprecipitated using specific antibodies. After reversal of cross-links, the immunoprecipitated DNA is sequenced and mapped to the human reference genome. The relative enrichment of each antibody-target (epitope) across the genome is inferred from the density of mapped fragments.

Chemical modifications (e.g. methylation or acetylation) of the histone proteins present in chromatin influence gene expression by changing how accessible the chromatin is to transcription factors. Shown for each experiment (defined as a particular antibody and a particular cell type) is a track of enrichment for the specifically modified histone (Signal), along with sites that have the greatest enrichment (Peaks). Also included for each cell type is the input signal, which represents the control condition where no antibody targeting was performed. In general the following chemical modifications have associated genetic phenotypes:

H3K4me3 and H3K9Ac are considered to be marks of active or potentially active promoter regions. H3K4me1 and H3K27Ac are considered to be marks of active or potentially active enhancer regions. H3K36me3 and H3K79me2 are considered to be marks of transcriptional elongation. H3K27me3 and H3K9me3 are considered to be marks of inactive regions.

For data usage terms and conditions, please refer to http://www.genome.gov/27528022 and http://www.genome.gov/Pages/Research/ENCODE/ENCODEDataReleasePolicyFinal2008.pdf

Overall design

Cells were grown according to the approved ENCODE cell culture protocols. Briefly, cells were crosslinked, chromatin was extracted and sonicated using a Bioruptor sonicator (Diagenode) to an average size of 300-500bp, and individual ChIP assays were performed using antibodies to modified histones. For the K562 and Ntera2 histone ChIP-seq samples, immunoprecipitates were collected using protein G-coupled magnetic beads; a detailed ChIP and library protocol can be found at http://www.roadmapepigenomics.org/protocols. For the U2OS histone ChIP-seq samples, immunoprecipitates were collected using StaphA cells; a detailed protocol can be found at http://expression.genomecenter.ucdavis.edu/chip.html. Library DNA was quantitated using either a Nanodrop or a BioAnalyzer and sequenced on an Illumina GA2.

The sequencing reads were mapped to the genome using the Eland alignment program. ChIP-seq data was scored based on sequence reads (length ~30 bps) that align uniquely to the human genome. From the mapped tags, a signal map of ChIP DNA fragments (average fragment length ~ 200 bp) was constructed where the signal height is the number of overlapping fragments at each nucleotide position in the genome.

For each 1 Mb segment of each chromosome, a peak height threshold was determined by requiring a false discovery rate <= 0.05 when comparing the number of peaks above threshold as compared to the number obtained from multiple simulations of a random null background with the same number of mapped reads (also accounting for the fraction of mapable bases for sequence tags in that 1 Mb segment). The number of mapped tags in a putative binding region is compared to the normalized (normalized by correlating tag counts in genomic 10 kb windows) number of mapped tags in the same region from an input DNA control. Using a binomial test, only regions that have a p-value <= 0.05 are considered to be significantly enriched compared to the input DNA control.

Source

Files downloaded from GEO series: GSE31755
Input file format: SRA

Samples

From H. sapiens (March 2006 NCBI36/hg18).

	Filename	Description	Feature	GEO-ID
1	GSM788088.sga	K562 H3K27me3B	H3K27me3B	GSM788088
2	GSM788085.sga	K562 H3K4me1	H3K4me1	GSM788085
3	GSM788087.sga	K562 H3K4me3B	H3K4me3B	GSM788087
4	GSM788082.sga	K562 H3K9acB	H3K9acB	GSM788082
5	GSM788074.sga	K562 Input	Input	GSM788074
6	GSM788071.sga	NT2-D1 H3K27me3B	H3K27me3B	GSM788071
7	GSM788081.sga	NT2-D1 H3K36me3B	H3K36me3B	GSM788081
8	GSM788083.sga	NT2-D1 H3K4me1	H3K4me1	GSM788083
9	GSM788072.sga	NT2-D1 H3K4me3B	H3K4me3B	GSM788072
10	GSM788086.sga	NT2-D1 H3K9acB	H3K9acB	GSM788086
11	GSM788080.sga	NT2-D1 H3K9me3	H3K9me3	GSM788080
12	GSM788077.sga	NT2-D1 Input	Input	GSM788077
13	GSM818826.sga	PANC-1 H3K27ac	H3K27ac	GSM818826
14	GSM818827.sga	PANC-1 H3K4me1_pAb-037-050	H3K4me1_pAb-037-050	GSM818827
15	GSM818828.sga	PANC-1 Input	Input	GSM818828
16	GSM788073.sga	PBMC H3K27me3B	H3K27me3B	GSM788073
17	GSM788084.sga	PBMC H3K4me1	H3K4me1	GSM788084
18	GSM788075.sga	PBMC H3K4me3B	H3K4me3B	GSM788075
19	GSM788079.sga	PBMC H3K9me3	H3K9me3	GSM788079
20	GSM788070.sga	PBMC Input	Input	GSM788070
21	GSM788076.sga	U2OS H3K36me3B	H3K36me3B	GSM788076
22	GSM788078.sga	U2OS H3K9me3	H3K9me3	GSM788078
23	GSM788069.sga	U2OS Input	Input	GSM788069

Technical Notes

SRA files were downloaded from GEO and processed using the following bash commands:

Extract FASTQ from SRA file:
```
fastq-dump SAMPLE.sra
```

Map reads to genome using Bowtie:

bowtie --best --strata -m1 --sam -l 36 -n 3 h_sapiens_ncbi36 -q SAMPLE.fastq > SAMPLE.sam

Clean the results from unmapped reads:

awk 'BEGIN {FS="\t"} $3 != "\*" {print $0}' SAMPLE.sam > SAMPLE_clean.sam

Make BAM file:

samtools view -bS -o SAMPLE.bam SAMPLE_clean.sam

Sort it:
```
samtools sort SAMPLE.bam SAMPLE_sorted
```

Make BED file:

bamToBed -i SAMPLE_sorted.bam > SAMPLE.bed

Make SGA file:

bed2sga.pl -s hg18 -f FEATURE < SAMPLE.bed | sort -s -k1,1 -k3,3n -k4,4 | compactsga > SAMPLE.sga

References

GEO series GSE31755 Histone Modifications by ChIP-seq from ENCODE/Stanford/Yale/Davis/Harvard.