Why is there a valley at position zero in an auto-correlation plot?
In autocorrelation mode (i.e. reference feature is equal to target feature), target features that are exactly at the same position as the reference features are disregarded, and this may lead to valleys at position zero. It happens, for instance, with peak files: in such case, the valley corresponds roughly to the peak width.
Should one always see a valley in autocorrelation mode?
The shape of the distribution depends on how the reads are distributed along the genome. If reads tend to accumulate at given genomic loci, which is the case for most ChIP-seq read distributions, we will observe a peak centered around the average distance between reads. In Autocorrelation mode, we just neglect the correlation of each read with itself.
What does the value on the y-axis of the correlation plot represent?
The value on the Y axis represents in most cases tag counts. In same cases, depending on the type of genomics features, it may represent conservation scores. We call this variable 'counts' and it is an integer variable.
The value on the Y axis can be normalized in order to take into account the coverage of a ChIP-seq experiment. Therefore, we can plot count-density values or global coverage fold-change instead of raw counts.