GSE31039, Histone Modifications in several tissues

Description:

This track shows a comprehensive survey of cis-regulatory elements in the mouse genome by using ChIP-seq (Robertson et al., 2007) to identify transcription factor binding sites and chromatin modification profiles in many mouse (C57Bl/6) tissues and primary cells, including bone marrow, cerebellum, cortex, heart, kidney, liver, lung, spleen, mouse embryonic fibroblast cells (MEFs) and embryonic stem (ES) cells.
In specific, the Ren lab examined RNA polymerase II (PolII), co-activator protein p300, the insulator protein CTCF, and two chromatin modification marks H3K4me3 and H3K4me1 due to their demonstrated utilities in identifying promoters, enhancers and insulator elements (Barski et al., 2007; Blow et al., 2010; Heintzman et al., 2009; Kim et al., 2007; Kim et al., 2005a; Visel et al., 2009). Enrichment of H3K4me3 or PolII signals is a strong indicator of active promoter, while the presence of p300 or H3K4me1 outside of promoter regions has been used as a mark for enhancers. CTCF binding sites are considered as a mark for potential insulator elements. For each transcription factor or chromatin mark in each tissue, ChIP-seq was carried out with at least two biological replicates. Each experiment produced 20-30 million monoclonal, uniquely mapped tags.

Overall design

Cells were grown according to the approved ENCODE cell culture protocols (http://hgwdev.cse.ucsc.edu/ENCODE/protocols/cell/mouse/).
Enrichment and Library Preparation: Chromatin immunoprecipitation was performed according to Ren Lab ChIP Protocol (http://bioinformatics-renlab.ucsd.edu/RenLabChipProtocolV1.pdf).
Library construction was performed according to Ren Lab Library Protocol (http://bioinformatics-renlab.ucsd.edu/RenLabLibraryProtocolV1.pdf).
Sequencing and Analysis: Samples were sequenced on Illumina Genome Analyzer II Genome Analyzer IIx, and HiSeq 2000 platforms for 36 cycles. Image analysis, base calling and alignment to the mouse genome version mm9 were performed using Illumina's RTA and Genome Analyzer Pipeline software. Alignment to the mouse genome was performed using ELAND or Bowtie (Langmead et al., 2009) with a seed length of 25 and allowing up to two mismatches. Only the sequences that mapped to one location were used for further analysis. Of those sequences, clonal reads, defined as having the same start position on the same strand, were discarded. BED and wig files were created using custom perl scripts.

Source

Files downloaded from FTP site: ftp://ftp-trace.ncbi.nlm.nih.gov/sra/sra-instant/reads/ByStudy/sra/SRP/SRP007/SRP007600
Input file format: SRA

Samples

From M. musculus (July 2007 NCBI37/mm9).

	Filename	Description	Feature	GEO-ID
1	GSM769008.sga	ES-Bruce4 H3K4me3	H3K4me3	GSM769008
2	GSM769009.sga	ES-Bruce4 H3K4me1	H3K4me1	GSM769009
3	GSM769012.sga	Lung H3K4me3	H3K4me3	GSM769012
4	GSM769013.sga	Lung H3K4me1	H3K4me1	GSM769013
5	GSM769014.sga	Liver H3K4me3	H3K4me3	GSM769014
6	GSM769015.sga	Liver H3K4me1	H3K4me1	GSM769015
7	GSM769016.sga	Kidney H3K4me3	H3K4me3	GSM769016
8	GSM769023.sga	Kidney H3K4me1	H3K4me1	GSM769023
9	GSM769017.sga	Heart H3K4me3	H3K4me3	GSM769017
10	GSM769025.sga	Heart H3K4me1	H3K4me1	GSM769025
11	GSM769036.sga	Spleen H3K4me3	H3K4me3	GSM769036
12	GSM769031.sga	Spleen H3K4me1	H3K4me1	GSM769031
13	GSM769027.sga	Cerebellum H3K4me3	H3K4me3	GSM769027
14	GSM769018.sga	Cerebellum H3K4me1	H3K4me1	GSM769018
15	GSM769021.sga	BoneMarrow H3K4me3	H3K4me3	GSM769021
16	GSM769024.sga	BoneMarrow H3K4me1	H3K4me1	GSM769024
17	GSM769026.sga	Cortex H3K4me3	H3K4me3	GSM769026
18	GSM769022.sga	Cortex H3K4me1	H3K4me1	GSM769022
19	GSM769029.sga	MEF H3K4me3	H3K4me3	GSM769029
20	GSM769028.sga	MEF H3K4me1	H3K4me1	GSM769028
21	GSM769032.sga	Heart Input	Input	GSM769032
22	GSM769033.sga	Kidney Input	Input	GSM769033
23	GSM769034.sga	Liver Input	Input	GSM769034
24	GSM769035.sga	Lung Input	Input	GSM769035
25	GSM769037.sga	Spleen Input	Input	GSM769037
26	GSM769010.sga	ES-Bruce4 Input	Input	GSM769010
27	GSM769011.sga	BoneMarrow Input	Input	GSM769011
28	GSM769019.sga	Cortex Input	Input	GSM769019
29	GSM769020.sga	Cerebellum Input	Input	GSM769020
30	GSM769030.sga	MEF Input	Input	GSM769030

Technical Notes

SRA files were downloaded from GEO and processed using the following bash commands:

Extract FASTQ from SRA file:
```
fastq-dump SAMPLE.sra
```

Map reads to genome using Bowtie:

bowtie --sam -l 36 -n 3 mm9 -q SAMPLE.fastq > SAMPLE.sam

Clean the results from unmapped reads:

awk 'BEGIN {FS="\t"} $3 != "\*" {print $0}' SAMPLE.sam > SAMPLE_clean.sam

Make BAM file:

samtools view -bS -o SAMPLE.bam SAMPLE_clean.sam

Sort it:
```
samtools sort SAMPLE.bam SAMPLE_sorted
```

Make BED file:

bamToBed -i SAMPLE_sorted.bam > SAMPLE.bed

Make SGA file:

bed2sga.pl -s mm9 -f FEATURE < SAMPLE.bed | sort -s -k1,1 -k3,3n -k4,4 | compactsga > SAMPLE.sga

References

GEO series GSE31039 Histone Modifications by ChIP-seq from ENCODE/LICR