GSE32218, Histone Modifications in two cell types

Description

This track shows probable locations of the specified histone modifications in the given cell types as determined by chromatin immunoprecipitation followed by high throughput sequencing (ChIP-Seq). Each experiment is associated with an input signal, which represents the control condition where immunoprecipitation with non-specific immunoglobulin was performed in the same cell type. For each experiment (cell type vs. antibody) this track shows a graph of enrichment for histone modification (Signal), along with sites that have the greatest evidence of histone modification, as identified by the PeakSeq algorithm (Peaks).

For data usage terms and conditions, please refer to http://www.genome.gov/27528022 and http://www.genome.gov/Pages/Research/ENCODE/ENCODEDataReleasePolicyFinal2008.pdf

Overall design

Cells were grown according to the approved ENCODE cell culture protocols. For details on the chromatin immunoprecipitation protocol used, see Euskirchen et. al., (2007), Rozowsky et. al. (2009) and Auerbach et. al. (2009).

DNA recovered from the precipitated chromatin was sequenced on the Illumina (Solexa) sequencing platform and mapped to the genome using the Eland alignment program. ChIP-seq data was scored based on sequence reads (length ~30 bps) that align uniquely to the human genome. From the mapped tags a signal map of ChIP DNA fragments (average fragment length ~ 200 bp) was constructed where the signal height is the number of overlapping fragments at each nucleotide position in the genome.

For each 1 Mb segment of each chromosome, a peak height threshold was determined by requiring a false discovery rate <= 0.01 when comparing the number of peaks above said threshold to the number of peaks obtained from multiple simulations of a random null background with the same number of mapped reads (also accounting for the fraction of mapable bases for sequence tags in that 1 Mb segment). The number of mapped tags in a putative binding region is compared to the normalized (normalized by correlating tag counts in genomic 10 kb windows) number of mapped tags in the same region from an input DNA control. Using a binomial test, only regions that have a p-value <= 0.01 are considered to be significantly enriched compared to the input DNA control.

Source

Files downloaded from FTP site: ftp://ftp-trace.ncbi.nlm.nih.gov/sra/sra-instant/reads/ByStudy/sra/SRP/SRP008/SRP008276
Input file format: SRA

Samples

From M. musculus (July 2007 NCBI37/mm9).

	Filename	Description	Feature	GEO-ID
1	GSM798324.sga	MEL H3K4me3	H3K4me3	GSM798324
2	GSM798328.sga	MEL H3K4me3	H3K4me3	GSM798328
3	GSM798327.sga	CH12 H3K4me3	H3K4me3	GSM798327
4	GSM798323.sga	MEL Input treated	Input	GSM798323
5	GSM798325.sga	MEL Input	Input	GSM798325
6	GSM798326.sga	CH12 Input	Input	GSM798326

Technical Notes

SRA files were downloaded from GEO and processed using the following bash commands:

Extract FASTQ from SRA file:
```
fastq-dump SAMPLE.sra
```

Map reads to genome using Bowtie:

bowtie --sam -l 36 -n 3 mm9 -q SAMPLE.fastq > SAMPLE.sam

Clean the results from unmapped reads:

awk 'BEGIN {FS="\t"} $3 != "\*" {print $0}' SAMPLE.sam > SAMPLE_clean.sam

Make BAM file:

samtools view -bS -o SAMPLE.bam SAMPLE_clean.sam

Sort it:
```
samtools sort SAMPLE.bam SAMPLE_sorted
```

Make BED file:

bamToBed -i SAMPLE_sorted.bam > SAMPLE.bed

Make SGA file:

bed2sga.pl -s mm9 -f FEATURE < SAMPLE.bed | sort -s -k1,1 -k3,3n -k4,4 | compactsga > SAMPLE.sga

References

GEO series GSE32218 Histone Modifications by ChIP-seq from ENCODE/Stanford/Yale