Chip-seq data analysis: from quality check to motif discovery and more

Lausanne, 27 April - 1 May 2015

ChIP-partitioning tool: Alignment of MNase tags around NFKB sites

Sunil Kumar and Philipp Bucher


Introduction

Current exercise is baesd on the following paper:

A. Heatmaps of MNase midpoints (columns 1–2) and DNase I cuts (column 3) surrounding 1000 randomly sampled ChIP-seq peaks for CTCF, NF-kB, Irf4, GABP and C-fos. Heatmap rows are ordered from top to bottom by the nucleosome array log likelihood ratio (LLR). B. Aggregation plot for MNase midpoint and DNase I cutsite depths across all regions and for the subset of regions with LLR>500.

We will focus on only a part of the figure, and explore the MNase pattern around NF-kB (pre and post 'alignment' or in our case using 'ChIP partitioning' algorithm).

ChIP partitioning method

In the current exercise we will use a probabilistic partitioning methods developed by our group to discover significant patterns in ChIP-Seq data [Nair et al., 2014]. Our methods take into account signal magnitude, shape, strand orientation and shifts. We have compared this methods with some of the existing methods and demonstrated significant improvements, especially with sparse data. Besides pattern discovery and classification, probabilistic partitioning can serve other purposes in ChIP-Seq data analysis.

In the current exercise we will exemplify its merits in the context of peak finding and partitioning of MNase patterns around human transcription factor NF-kB.

Hints and recipes

In order to identify patterns in MNase dataset around specific transcription factor, we will need two datasets.

We will use ChIP-Extract Analysis Module to generate a tag count matrix in defined bins around NF-kB sites. Select the parameters as shown in the picture below and then click submit. In this case, no centering is used because the MNase data are paired-end.

Download the Ref SGA File and Table (TEXT) and save as mnase_data.txt.

Performing ChIP-partitioning

The code has been taken from the supplementary material of Nair et al., 2014, Probabilistic partitioning methods to find significant patterns in ChIP-Seq data, Bioinformatics, 30, 2406-2013, PMID 24812341.

Navigate into directory containing all the data and launch R.

Load the EM function with shifting: EM function Hide script
Read the data, define input parameters and perform partitioning: R script Hide script
Define classes and shifts: Shape based EM partitioning with shifting
Shift the tags data: R script Hide script
Plotting the results: R script Hide script