/***********************************************
************************************************
** **
** MADAP **
** fits mixtures of normal distributions **
** and performs postprocessing stseps for EPD **
** 2007 Mauro C. Delorenzi **
** **
************************************************
************************************************/
Version 2.0 last update: 13/10/2010
** Add ShowUsage routine G. Ambrosini
Version 1.0 last update: 28/01/2007
MADAP : fits a mixture of normal distributions
to one-dimensional data with postprocessing developed for
automated processing of EPD data
It is made available without any guarantee
and distributed under the gnu gpllicence
(see http://www.gnu.org/licenses/gpl.txt),
WITHOUT WARRANTY OF ANY KIND, EITHER EXPRESSED
OR IMPLIED, INCLUDING, BUT NOT LIMITED TO, ..
FITNESS FOR A PARTICULAR PURPOSE.
Specifically, it is used entirely at the user's risk.
Compiler: g++
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _
USAGE EXAMPLE:
madapv2.0 -n5 -s1 -C2 -c8 -e0.001 -d20 -u5 -w50 -p30 -M50 -m1 -f datapoints.txt
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _
OPTIONS
[ -n minnbdatapoints] integer >= 0, minimal number of points in a component to retain it in the model
[ -s number_subtracted] integer >= 0, number of points at each position subtracted when searching for peak candidates
[ -C minimal_count] integer >= 0,do use minimal count to accept a position for modelling and likelihood
[ -c fusionsdistance] integer >= 0, explained elsewhere, see kdefaultfusionsdist
[ -e errorfraction] double in [0,1], background model density
[ -d standard_deviation] double >= 0.5, user-defined constant variance
[ -u refdist1] integer >= 0, for showing in output % of points of cluster closer to peak than this
[ -w refdist2] as -u
[ -p minimalpeakdist] integer >= 0, minimal distance between peak to retain components in model
[ -M maximal_number_of_components] integer > 0, maximal number of components used in models
[ -m minimal_number_of_components] integer > 0, minimal number of components used in models
b) algorithmic options
[-i] set minimal number of iterations
[-j] set maximal number of iterations
[-k] set difference of log lik to stop iterations
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _
PRINCIPLE
The program tries to explain one-dimensional data as being generated by
a mixture of gaussian distributions. First the distribution parameters
are determined by EM. Then the data are assigned to one of the gaussian component
or to an additional "error" model. The latter has a user-defined prior that
represents the estimated probability of a data point not being a valid measurement.
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _
DESCRIPTION
The main parameters to be chosen by the user according to their data and scope are:
A. FITTING ALGORITHM:
1. the variance
user defined: option -d (standard deviation!)
2. the estimated probability of a data point not being a valid measurement
user defined: option -e
B. POSTPROCESSING ALGORITHM:
3. components with peaks nearer than minimalpeakdist are not tolerated, the component with
less prior probability is eliminated.
user defined: option -p
4. components with less than a given nb of points are eliminated
user defined: option -n
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _
PARAMETERS
Defaults are defined in the global.h file.
The program is using static memory, the amount of memory allocated can be changed in
the same file.
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _
OUTPUT
essential results are written to stdout and more results to files:
"summary" ,
contains the basic data of all models that were fitted and three "best models"
"components"
contains information on the fitted model, components after convergence
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _