GSE33213, TFBSs and open chromatin in multiple cell lines


Pol-2, CTCF and c-Myc profiles were analysed in several cell lines. These samples are part of the UCSC traks on transcription factor binding sites identified in multiple cell types by ChIP-seq.

Additional information on the cell lines can be found on the ENCODE Common Cell Types page at UCSC.


Files downloaded from GEO series: GSE33213
Input file format: SRA


From H. sapiens (March 2006 NCBI36/hg18).

Filename Description Feature GEO-ID
1 GSM822288.sga A549 Pol2 Pol2 GSM822288
2 GSM822312.sga GM12878 CTCF CTCF GSM822312
3 GSM822292.sga GM12878 Input Input GSM822292
4 GSM822270.sga GM12878 Pol2 Pol2 GSM822270
5 GSM822290.sga GM12878 c-Myc c-Myc GSM822290
6 GSM822294.sga GM12891 CTCF CTCF GSM822294
7 GSM822299.sga GM12892 CTCF CTCF GSM822299
8 GSM822278.sga GM19238 CTCF CTCF GSM822278
9 GSM822277.sga GM19239 CTCF CTCF GSM822277
10 GSM822276.sga GM19240 CTCF CTCF GSM822276
11 GSM822303.sga Gliobla CTCF CTCF GSM822303
12 GSM822268.sga Gliobla Input Input GSM822268
13 GSM822302.sga Gliobla Pol2 Pol2 GSM822302
14 GSM822297.sga H1-hESC CTCF CTCF GSM822297
15 GSM822300.sga H1-hESC Pol2 Pol2 GSM822300
16 GSM822274.sga H1-hESC c-Myc c-Myc GSM822274
17 GSM822279.sga HUVEC CTCF CTCF GSM822279
18 GSM822280.sga HUVEC Input Input GSM822280
19 GSM822306.sga HUVEC Pol2 Pol2 GSM822306
20 GSM822298.sga HUVEC c-Myc c-Myc GSM822298
21 GSM822285.sga HeLa-S3 CTCF CTCF GSM822285
22 GSM822313.sga HeLa-S3 Input Input GSM822313
23 GSM822273.sga HeLa-S3 Pol2 Pol2 GSM822273
24 GSM822286.sga HeLa-S3 c-Myc c-Myc GSM822286
25 GSM822287.sga HepG2 CTCF CTCF GSM822287
26 GSM822314.sga HepG2 Input Input GSM822314
27 GSM822284.sga HepG2 Pol2 Pol2 GSM822284
28 GSM822291.sga HepG2 c-Myc c-Myc GSM822291
29 GSM822311.sga K562 CTCF CTCF GSM822311
30 GSM822293.sga K562 Input Input GSM822293
31 GSM822275.sga K562 Pol2 Pol2 GSM822275
32 GSM822310.sga K562 c-Myc c-Myc GSM822310
33 GSM822305.sga MCF-7 CTCF CTCF GSM822305
34 GSM822308.sga MCF-7 CTCF treated CTCF GSM822308
35 GSM822283.sga MCF-7 Input Input GSM822283
36 GSM822295.sga MCF-7 Pol2 Pol2 GSM822295
37 GSM822304.sga MCF-7 c-Myc c-Myc GSM822304
38 GSM822296.sga Monocytes-CD14+ Input Input GSM822296
39 GSM822271.sga NHEK CTCF CTCF GSM822271

Technical Notes:

SRA files were downloaded from GEO and processed using the following bash commands:

  1. Extract FASTQ from SRA file:
    fastq-dump SAMPLE.sra
  2. Map reads to genome using Bowtie:
    bowtie --best --strata -m1 --sam -l 36 -n 3 h_sapiens_ncbi36 -q SAMPLE.fastq > SAMPLE.sam
  3. Clean the results from unmapped reads:
    awk 'BEGIN {FS="\t"} $3 != "\*" {print $0}' SAMPLE.sam > SAMPLE_clean.sam
  4. Make BAM file (using SamTools):
    samtools view -bS -o SAMPLE.bam SAMPLE_clean.sam
  5. Sort it:
    samtools sort SAMPLE.bam SAMPLE_sorted
  6. Make BED file:
    bamToBed -i SAMPLE_sorted.bam > SAMPLE.bed
  7. Make SGA file (using ChIP-seq): -s hg18 -f FEATURE < SAMPLE.bed | sort -s -k1,1 -k3,3n -k4,4 | compactsga > SAMPLE.sga


  1. GEO series GSE33213 Open Chromatin TFBS by ChIP-seq from ENCODE/Open Chrom(UT Austin).