GSE31039, Histone Modifications in several tissues

Description:

This track shows a comprehensive survey of cis-regulatory elements in the mouse genome by using ChIP-seq (Robertson et al., 2007) to identify transcription factor binding sites and chromatin modification profiles in many mouse (C57Bl/6) tissues and primary cells, including bone marrow, cerebellum, cortex, heart, kidney, liver, lung, spleen, mouse embryonic fibroblast cells (MEFs) and embryonic stem (ES) cells.
In specific, the Ren lab examined RNA polymerase II (PolII), co-activator protein p300, the insulator protein CTCF, and two chromatin modification marks H3K4me3 and H3K4me1 due to their demonstrated utilities in identifying promoters, enhancers and insulator elements (Barski et al., 2007; Blow et al., 2010; Heintzman et al., 2009; Kim et al., 2007; Kim et al., 2005a; Visel et al., 2009). Enrichment of H3K4me3 or PolII signals is a strong indicator of active promoter, while the presence of p300 or H3K4me1 outside of promoter regions has been used as a mark for enhancers. CTCF binding sites are considered as a mark for potential insulator elements. For each transcription factor or chromatin mark in each tissue, ChIP-seq was carried out with at least two biological replicates. Each experiment produced 20-30 million monoclonal, uniquely mapped tags.

Overall design

Cells were grown according to the approved ENCODE cell culture protocols (http://hgwdev.cse.ucsc.edu/ENCODE/protocols/cell/mouse/).
Enrichment and Library Preparation: Chromatin immunoprecipitation was performed according to Ren Lab ChIP Protocol (http://bioinformatics-renlab.ucsd.edu/RenLabChipProtocolV1.pdf).
Library construction was performed according to Ren Lab Library Protocol (http://bioinformatics-renlab.ucsd.edu/RenLabLibraryProtocolV1.pdf).
Sequencing and Analysis: Samples were sequenced on Illumina Genome Analyzer II Genome Analyzer IIx, and HiSeq 2000 platforms for 36 cycles. Image analysis, base calling and alignment to the mouse genome version mm9 were performed using Illumina's RTA and Genome Analyzer Pipeline software. Alignment to the mouse genome was performed using ELAND or Bowtie (Langmead et al., 2009) with a seed length of 25 and allowing up to two mismatches. Only the sequences that mapped to one location were used for further analysis. Of those sequences, clonal reads, defined as having the same start position on the same strand, were discarded. BED and wig files were created using custom perl scripts.

Source

Files downloaded from FTP site: ftp://ftp-trace.ncbi.nlm.nih.gov/sra/sra-instant/reads/ByStudy/sra/SRP/SRP007/SRP007600
Input file format: SRA

Samples

From M. musculus (July 2007 NCBI37/mm9).

Filename Description Feature GEO-ID
1 GSM769008.sga ES-Bruce4 H3K4me3 H3K4me3 GSM769008
2 GSM769009.sga ES-Bruce4 H3K4me1 H3K4me1 GSM769009
3 GSM769012.sga Lung H3K4me3 H3K4me3 GSM769012
4 GSM769013.sga Lung H3K4me1 H3K4me1 GSM769013
5 GSM769014.sga Liver H3K4me3 H3K4me3 GSM769014
6 GSM769015.sga Liver H3K4me1 H3K4me1 GSM769015
7 GSM769016.sga Kidney H3K4me3 H3K4me3 GSM769016
8 GSM769023.sga Kidney H3K4me1 H3K4me1 GSM769023
9 GSM769017.sga Heart H3K4me3 H3K4me3 GSM769017
10 GSM769025.sga Heart H3K4me1 H3K4me1 GSM769025
11 GSM769036.sga Spleen H3K4me3 H3K4me3 GSM769036
12 GSM769031.sga Spleen H3K4me1 H3K4me1 GSM769031
13 GSM769027.sga Cerebellum H3K4me3 H3K4me3 GSM769027
14 GSM769018.sga Cerebellum H3K4me1 H3K4me1 GSM769018
15 GSM769021.sga BoneMarrow H3K4me3 H3K4me3 GSM769021
16 GSM769024.sga BoneMarrow H3K4me1 H3K4me1 GSM769024
17 GSM769026.sga Cortex H3K4me3 H3K4me3 GSM769026
18 GSM769022.sga Cortex H3K4me1 H3K4me1 GSM769022
19 GSM769029.sga MEF H3K4me3 H3K4me3 GSM769029
20 GSM769028.sga MEF H3K4me1 H3K4me1 GSM769028
21 GSM769032.sga Heart Input Input GSM769032
22 GSM769033.sga Kidney Input Input GSM769033
23 GSM769034.sga Liver Input Input GSM769034
24 GSM769035.sga Lung Input Input GSM769035
25 GSM769037.sga Spleen Input Input GSM769037
26 GSM769010.sga ES-Bruce4 Input Input GSM769010
27 GSM769011.sga BoneMarrow Input Input GSM769011
28 GSM769019.sga Cortex Input Input GSM769019
29 GSM769020.sga Cerebellum Input Input GSM769020
30 GSM769030.sga MEF Input Input GSM769030

Technical Notes

SRA files were downloaded from GEO and processed using the following bash commands:

  1. Extract FASTQ from SRA file:
    fastq-dump SAMPLE.sra
    
  2. Map reads to genome using Bowtie:
    bowtie --sam -l 36 -n 3 mm9 -q SAMPLE.fastq > SAMPLE.sam
    
  3. Clean the results from unmapped reads:
    awk 'BEGIN {FS="\t"} $3 != "\*" {print $0}' SAMPLE.sam > SAMPLE_clean.sam
    
  4. Make BAM file:
    samtools view -bS -o SAMPLE.bam SAMPLE_clean.sam
    
  5. Sort it:
    samtools sort SAMPLE.bam SAMPLE_sorted
    
  6. Make BED file:
    bamToBed -i SAMPLE_sorted.bam > SAMPLE.bed
    
  7. Make SGA file:
    bed2sga.pl -s mm9 -f FEATURE < SAMPLE.bed | sort -s -k1,1 -k3,3n -k4,4 | compactsga > SAMPLE.sga
    

References

  1. GEO series GSE31039 Histone Modifications by ChIP-seq from ENCODE/LICR