ChIP-Part implements a partitioning algorithm for large-scale
Chip-Seq data sets. It has been specifically designed to find very large signal-enriched regions, occurring for instance in histone modification maps, especially those that spread over large regions (e.g. H3K36me3).
The input of ChIP-Part is a set of tag positions produced by a ChIP-Seq experiment mapped
to a reference genome. We use as a working format a simplified GFF format, called
SGA, which is sorted by sequence name and position.
In addition to SGA, ChIP-Part supports other input data formats such as
BED,
GFF,
BAM, and
FPS. Compressed input data in
gzip or
zip format is also accepted.
ChIP-Part returns a list of signal-enriched regions defined by start and end positions.
The default output format of the tool is a two-line-oriented SGA-formatted file, in which each edge of the signal-enriched region is represented by a SGA line: '+' for start and '-' for end respectively.
In addition to SGA format, single-line-oriented
GFF,
BED, and
FPS output formats are also provided.
For supported genome assemblies, a direct link to the UCSC genome browser is further provided for rapid comparison with genome annotations.
As a further option, sequences around signal-enriched region bounds can be extracted to a file in FASTA format. Sequence extraction is carried out using the FPS-formatted output.