Solber is a simple IDL optimization routine loosely based on genetic algorithms
Download. Quick description
Multiple minima. A simple example
Period search. Termination and inbreeding
Mixture of Gaussians
email vasily at ast.cam.ac.uk
Solber is inspired by PIKAIA, a FORTRAN optimizer based on genetic algorithms. GAs and PIKAIA are described in Genetic Algorithms in Astronomy and Astrophysics by Paul Charbonneau.
Solber is originally written for chi2 fitting so minimum of fitness function is searched for (rather than maximum).
Solber produces generations of models using arithmetical crossover (i.e. real-coded, see Genetic Algorithms + Data Structures = Evolution Programs, Michalewicz 1992) and random uniform mutation. Roulette wheel is used to select individuals for breeding based on the fitness ranking and the parents are replaced with the fitter children. Elitism is enforced, i.e. the fittest individual always survives. Three simple tricks are implemented to guard against inbreeding: varying mutation rate, limit on fraction of population replaced and the mutation of the entire population if inbreeding persists for long enough.
Fitting a sine wave to a poorly/irregularly sampled dataset is a good example of how applying a conventional gradient descent optimization is not always going to return the right answer. Panel 1 of Figure 1 shows the data (black) generated from the model (grey).
From the second panel of Figure 1 it is clear that the chi2 landscape (black curve) has a number of local minima where the solution will get trapped. Initial conditions must be varied to restart the optimizer by hand in the vicinity of each minima.
Solber can sample the parameter space slightly more efficiently than the brute force chi2 minimization. Grey lines in Panel 2 show the locations of intermediate generations of solber models. This picture is typical of GA behavior: the landscape is surveyed in a Monte-Carlo like fashion, but closer attention is paid to the minima encountered.
Of course, there is already a powerful and fast solution to this kind of problem (see e.g. Fast algorithm for spectral analysis of unevenly sampled data). Panel 3 shows Lomb-Scargle periodogram of the data. The location of the highest peak in the spectrum corresponds to the underlying period (grey). This is not very different from the solution found by solber (black).
However, highest peak in a periodogram does not always point to the correct period as shown in Figure 2. Here, a valley lies at the location of the underlying period rather than a peak (grey). Solber identifies the period correctly.
Second example is more realistic as the model now includes sine signal with unknown period, amplitude and phase on top of unknown constant background: 4 free parameters in total. The actual values of background, amplitude, period and phase are [37.2, 7.15, 0.333, 0.5*!pi]. Below are three examples where highest peak of the periodogram fails to identify the correct period.
Solber calling sequence includes dimensionality of the model ndim, population size npop together with limits lim on each parameter. Fitness function arguments are passed in the structure funa:
sol = solber('_sin2_chi2', ndim, funa = funa, npop = 300, lim = [[1e-6, 100], [1e-6, 10], [0, 1], [0, 2*!pi]], plot_flag = 2, ngen_max = 500, term_fit = 11, status = status, ngen_tot = ngen_tot, gfit_best = gfit_best)
Solber terminates either when desirable fitness term_fit is reached or when maximum number of generations ngen_max is produced. Figures below show fitness evolution (top panels) for each of the three examples above, together with changes in mutation rate and fraction replaced (bottom panels).
In all cases solber quickly converges to a reasonable solution, which, however, is not good enough to terminate breeding. The difference between models with the first and the 50th percentile in fitness becomes too small and the mutation rate is increased to stop the onset of inbreeding. In Example 1, this quickly helps to identify the model with desirable chi2. In Example 2, the increased mutation rate in children does not help solber to converge fast enough and inbreeding is detected after approximately 50 generations. Then, all (but the best) individuals in the population are mutated - this generates healthy variety as shown by the rise in dotted curve (top panel) and results in quick termination with the desired chi2. Finally, in Example 3, required chi2 can not be reached after several cycles of increasing mutation and solber carries on until maximum number of generations is produced. The solution, however, has only marginally worse chi2.
Logarithm of likelihood is maximised to obtain centers, dispersions and weights for Gaussians in a mixture. Solid curves - individual Gaussian components (panel 1 for model, panel 2 for solber), thick grey - total model pdf, histogram - model realisation, thick black - solber solution. Dotted line in panel 4 is at -2 ln (Li/Li+1) = 3, i.e. the difference in degrees of freedom between consecutive models.