/*********************************************** ************************************************ ** ** ** MADAP ** ** fits mixtures of normal distributions ** ** and performs postprocessing stseps for EPD ** ** 2007 Mauro C. Delorenzi ** ** ** ************************************************ ************************************************/ Version 2.0 last update: 13/10/2010 ** Add ShowUsage routine G. Ambrosini Version 1.0 last update: 28/01/2007 MADAP : fits a mixture of normal distributions to one-dimensional data with postprocessing developed for automated processing of EPD data It is made available without any guarantee and distributed under the gnu gpllicence (see http://www.gnu.org/licenses/gpl.txt), WITHOUT WARRANTY OF ANY KIND, EITHER EXPRESSED OR IMPLIED, INCLUDING, BUT NOT LIMITED TO, .. FITNESS FOR A PARTICULAR PURPOSE. Specifically, it is used entirely at the user's risk. Compiler: g++ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ USAGE EXAMPLE: madapv2.0 -n5 -s1 -C2 -c8 -e0.001 -d20 -u5 -w50 -p30 -M50 -m1 -f datapoints.txt _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ OPTIONS [ -n minnbdatapoints] integer >= 0, minimal number of points in a component to retain it in the model [ -s number_subtracted] integer >= 0, number of points at each position subtracted when searching for peak candidates [ -C minimal_count] integer >= 0,do use minimal count to accept a position for modelling and likelihood [ -c fusionsdistance] integer >= 0, explained elsewhere, see kdefaultfusionsdist [ -e errorfraction] double in [0,1], background model density [ -d standard_deviation] double >= 0.5, user-defined constant variance [ -u refdist1] integer >= 0, for showing in output % of points of cluster closer to peak than this [ -w refdist2] as -u [ -p minimalpeakdist] integer >= 0, minimal distance between peak to retain components in model [ -M maximal_number_of_components] integer > 0, maximal number of components used in models [ -m minimal_number_of_components] integer > 0, minimal number of components used in models b) algorithmic options [-i] set minimal number of iterations [-j] set maximal number of iterations [-k] set difference of log lik to stop iterations _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ PRINCIPLE The program tries to explain one-dimensional data as being generated by a mixture of gaussian distributions. First the distribution parameters are determined by EM. Then the data are assigned to one of the gaussian component or to an additional "error" model. The latter has a user-defined prior that represents the estimated probability of a data point not being a valid measurement. _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ DESCRIPTION The main parameters to be chosen by the user according to their data and scope are: A. FITTING ALGORITHM: 1. the variance user defined: option -d (standard deviation!) 2. the estimated probability of a data point not being a valid measurement user defined: option -e B. POSTPROCESSING ALGORITHM: 3. components with peaks nearer than minimalpeakdist are not tolerated, the component with less prior probability is eliminated. user defined: option -p 4. components with less than a given nb of points are eliminated user defined: option -n _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ PARAMETERS Defaults are defined in the global.h file. The program is using static memory, the amount of memory allocated can be changed in the same file. _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ OUTPUT essential results are written to stdout and more results to files: "summary" , contains the basic data of all models that were fitted and three "best models" "components" contains information on the fitted model, components after convergence _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _