For data usage terms and conditions, please refer to http://www.genome.gov/27528022 and http://www.genome.gov/Pages/Research/ENCODE/ENCODEDataReleasePolicyFinal2008.pdf
Overall design
Cells were grown according to the approved ENCODE cell culture protocols. For details on the chromatin immunoprecipitation protocol used, see Euskirchen et. al., (2007), Rozowsky et. al. (2009) and Auerbach et. al. (2009).
DNA recovered from the precipitated chromatin was sequenced on the Illumina (Solexa) sequencing platform and mapped to the genome using the Eland alignment program. ChIP-seq data was scored based on sequence reads (length ~30 bps) that align uniquely to the human genome. From the mapped tags a signal map of ChIP DNA fragments (average fragment length ~ 200 bp) was constructed where the signal height is the number of overlapping fragments at each nucleotide position in the genome.
For each 1 Mb segment of each chromosome, a peak height threshold was determined by requiring a false discovery rate <= 0.01 when comparing the number of peaks above said threshold to the number of peaks obtained from multiple simulations of a random null background with the same number of mapped reads (also accounting for the fraction of mapable bases for sequence tags in that 1 Mb segment). The number of mapped tags in a putative binding region is compared to the normalized (normalized by correlating tag counts in genomic 10 kb windows) number of mapped tags in the same region from an input DNA control. Using a binomial test, only regions that have a p-value <= 0.01 are considered to be significantly enriched compared to the input DNA control.
Filename | Description | Feature | GEO-ID | |
1 | GSM798324.sga | MEL H3K4me3 | H3K4me3 | GSM798324 |
2 | GSM798328.sga | MEL H3K4me3 | H3K4me3 | GSM798328 |
3 | GSM798327.sga | CH12 H3K4me3 | H3K4me3 | GSM798327 |
4 | GSM798323.sga | MEL Input treated | Input | GSM798323 |
5 | GSM798325.sga | MEL Input | Input | GSM798325 |
6 | GSM798326.sga | CH12 Input | Input | GSM798326 |
SRA files were downloaded from GEO and processed using the following bash commands:
fastq-dump SAMPLE.sra
bowtie --sam -l 36 -n 3 mm9 -q SAMPLE.fastq > SAMPLE.sam
awk 'BEGIN {FS="\t"} $3 != "\*" {print $0}' SAMPLE.sam > SAMPLE_clean.sam
samtools view -bS -o SAMPLE.bam SAMPLE_clean.sam
samtools sort SAMPLE.bam SAMPLE_sorted
bamToBed -i SAMPLE_sorted.bam > SAMPLE.bed
bed2sga.pl -s mm9 -f FEATURE < SAMPLE.bed | sort -s -k1,1 -k3,3n -k4,4 | compactsga > SAMPLE.sga