# Strand correlation plots and motif enrichment plots for ChIP-seq data
#-------------------------------------------------------------------------------

# Path to source data files (needs adjustement to local data environment)

series_dir=/export/data/mga/hg19/ghahhari22
series_txt=$series_dir/ghahhari22.txt

# Strand correlation plots for ChIP-seq data

 ./chipseq_strand_cor.pl $series_dir < $series_txt | sh
 ./chipseq_strand_cor_plot.pl ghahhari22 < $series_txt | Rscript --vanilla -
# or alternatively: sh ghahhari22_strand_cor.sh 

# Motif enrichment plots for selected ChIP-seq data: 

 ./motif_enr.pl $series_dir ghahhari22_motif.inp tf2mat.tab < $series_txt  | sh
 ./motif_enr_plot.pl ghahhari22 ghahhari22_motif.inp < $series_txt \
  | Rscript --vanilla -
# or alternatively: sh ghahhari22_motif_enr.sh 

# Optional cleanup/installation steps:

# mv ghahhari22_strand_cor.pdf ../
# mv ghahhari22_motif_enr.pdf ../
# rm *sga
# rm results/*

# Notes:
#-------------------------------------------------------------------------------
# - $series_dir is the path to the directory where the sample SGA files are
#   located
# - $series_text is the sample description file. Relevant fields are: 
#   (1) Filename, (3) Feature, (4) Data-type, (5) Oriented 
# - ./chipseq_strand_cor.pl and ./motif_enr.pl are generic scripts, reusable for
#   other data series. They generate numerical data for X-Y plots, which are
#   stored as individual files in the ./results subdirectory.
# - ghahhari22_motif.inp is a series specific driver file listing the
#   feature/motif combinations, for which enrichment plots should be produced.
#   Note that multiple motif enrichment plots may be produced for the same 
#   feature type.
#   Fields: (1) feature (3rd field in $series_txt), (2) motif.
#   "motif" is a generic motif name that links feature in the sample SGA files
#   to precomputed motif hitlists, and is further used as part of the horizontal
#   axis label in the motif enrichemnent plots.
# - tf2mat.tab is a file that maps motifs to precomputed motif hitlists. 
#   Fields: (1) motif, (2) feature name in hitlist, full path to hitlist. 
#   Note that the feature name in the hitlist may be different from the feature
#   in the sample SGA file. The directory path to the precomputed hitlists may
#   need adjustment to the local data environment. Note further that this is
#   a reusable file, reused for other series, and thus may contain lines that
#   are not used for this series.   
# - ./chipseq_strand_cor_plot.pl and ./motif_enr_plot.pl are generic Perl
#   scripts, which generate R code for generating multiple plots in single
#   PDF file. 
#   The series name is passed to the script as an argument and used as part
#   of the output file names. 
#   
# URL access to individual source files: 
#
# Replace /export/data/mga by https://ccg.epfl.ch/mga, for instance 
# the sample description file of this series is accessible at
#
#   https://ccg.epfl.ch/mga/hg19/ghahhari22/ghahhari22.txt
#
# Last updated: Philipp Bucher Oct 2022.
