Irreproducible Discovery Rate

Please be informed that the IDR code written below is taken from the ENCODE and if you need to know more about it please refer to IDR page.

Hide script

Navigate into the directory containing data and launch R. Load the following file:

source("functions-all-clayton-12-13.R")

Read peak files and genome table:

peakfile1 <- "HUVEC_hg19_CTCF_rep1.bed"
peakfile2 <- "HUVEC_hg19_CTCF_rep2.bed"

chr.size <- read.table("genome_table.txt")

Define parameters:

half.width <- NULL
overlap.ratio <- 0
is.broadpeak <- F
sig.value <- "p.value"

[half.width]: Set this to -1 (NULL) if you want to use the reported peak width in the peak files.
[overlap.ratio]: fractional bp overlap (ranges from 0 to 1) between peaks in replicates to be considered as overlapping peaks. IMPORTANT: This parameter has not been tested fully. It is recommended to set this to 0.
[is.broadpeak]: Is the peak file format narrowPeak or broadPeak. Set to F if it is narrowPeak/regionPeak or T if it is broadPeak.
[ranking.measure] is the ranking measure to use. It can take only one of the following values signal.value, p.value or q.value.

Process data and generate IDR output.

rep1 <- process.narrowpeak(paste(peakfile1, sep=""), chr.size, 
	half.width=half.width, summit="offset", broadpeak=is.broadpeak)
rep2 <- process.narrowpeak(paste(peakfile2, sep=""), chr.size, 
	half.width=half.width, summit="offset", broadpeak=is.broadpeak)
uri.output <- compute.pair.uri(rep1$data.cleaned, rep2$data.cleaned, 
	sig.value1=sig.value, sig.value2=sig.value, overlap.ratio=overlap.ratio)
em.output <- fit.em(uri.output$data12.enrich, fix.rho2=T)
idr.local <- 1-em.output$em.fit$e.z
IDR <- c()
o <- order(idr.local)
IDR[o] <- cumsum(idr.local[o])/c(1:length(o))
idr_output <- data.frame(chr1=em.output$data.pruned$sample1[, "chr"],
                    start1=em.output$data.pruned$sample1[, "start.ori"],
                    stop1=em.output$data.pruned$sample1[, "stop.ori"],
                    sig.value1=em.output$data.pruned$sample1[, "sig.value"],   
                    chr2=em.output$data.pruned$sample2[, "chr"],
                    start2=em.output$data.pruned$sample2[, "start.ori"],
                    stop2=em.output$data.pruned$sample2[, "stop.ori"],
                    sig.value2=em.output$data.pruned$sample2[, "sig.value"],
                    idr.local=1-em.output$em.fit$e.z, IDR=IDR)

write.table(idr_output, "idr_overlapped_peaks.txt", sep="", quote=F)

Getting peaks that pass the IDR threshold:

filtered_peaks <- idr_output[idr_output[,10]<=0.01,]
dim(filtered_peaks) # get the number of peaks

Hide script

ez.list <- get.ez.tt.all(em.output, uri.output$data12.enrich$merge1, uri.output$data12.enrich$merge2)
par(mar=c(5,5,0,0.5), mfrow = c(1,3), oma=c(5,0,2,0))
idr_output$col[idr_output[,10]<=0.01]="black"
idr_output$col[idr_output[,10]>=0.01]="red"
plot(log(idr_output[,4]),log(idr_output[,8]),col=idr_output[,11], pch=19, 
	xlab="log(signal) Rep1", ylab="log(signal) Rep2")
legend("topleft", c("IDR=>0.01","IDR<=0.01"), col=c("red","black"), pch=19, 
	bty="n", lty=c(1,1), lwd=c(2,2))
plot(rank(-idr_output[,4]),rank(-idr_output[,8]),col=idr_output[,11], pch=19, 
	xlab="Peak rank Rep1", ylab="Peak rank Rep2")
legend("topleft", c("IDR=>0.01","IDR<=0.01"), col=c("red","black"), pch=19, 
	  bty="n", lty=c(1,1), lwd=c(1,1))
plot(ez.list$IDR, ylab="IDR", xlab="num of significant peaks")

Chip-seq data analysis: from quality check to motif discovery and more

Data reproduction exercise: Consistency of ChIPseq replicates - analysis using IDR

Introduction

Hints and recipes

Running IDR