**Solber**

**SOL**ution **B**reed**ER**

*Solber is a simple IDL optimization routine loosely based on genetic algorithms*

**Contents**

Download. Quick description

Multiple minima. A simple example

Period search. Termination and inbreeding

Mixture of Gaussians

**email** vasily at ast.cam.ac.uk

Download solber.pro

Solber is inspired
by PIKAIA,
a FORTRAN optimizer based on genetic algorithms. GAs and PIKAIA are
described
in *Genetic
Algorithms in Astronomy and Astrophysics* by Paul Charbonneau.

Solber is originally written for chi2 fitting so minimum of fitness function is searched for (rather than maximum).

Solber produces generations of models using arithmetical crossover
(i.e. real-coded, see *Genetic Algorithms + Data Structures =
Evolution Programs*, Michalewicz 1992) and random uniform
mutation. Roulette wheel is used to select individuals for breeding
based on the fitness ranking and the parents are replaced with the
fitter children. Elitism is enforced, i.e. the fittest individual
always survives. Three simple tricks are implemented to guard against
inbreeding: varying mutation rate, limit on fraction of population
replaced and the mutation of the entire population if inbreeding
persists for long enough.

Figure 1 |
Fitting a sine wave to a poorly/irregularly sampled dataset is a good example of how applying a conventional gradient descent optimization is not always going to return the right answer. Panel 1 of Figure 1 shows the data (black) generated from the model (grey). From the second panel of Figure 1 it is clear that the chi2 landscape (black curve) has a number of local minima where the solution will get trapped. Initial conditions must be varied to restart the optimizer by hand in the vicinity of each minima. Solber can sample the parameter space slightly more efficiently than the brute force chi2 minimization. Grey lines in Panel 2 show the locations of intermediate generations of solber models. This picture is typical of GA behavior: the landscape is surveyed in a Monte-Carlo like fashion, but closer attention is paid to the minima encountered. Of course, there is already a powerful and fast solution to this
kind of problem (see
e.g. However, highest peak in a periodogram does not always point to the correct period as shown in Figure 2. Here, a valley lies at the location of the underlying period rather than a peak (grey). Solber identifies the period correctly. |
Figure 2 |

Second example is more realistic as the model now includes sine
signal with unknown period, amplitude and phase on top of unknown
constant background: 4 free parameters in total. The actual values of
background, amplitude, period and phase are `[37.2, 7.15, 0.333,
0.5*!pi]`. Below are three examples where highest peak of the
periodogram fails to identify the correct period.

Solber calling sequence includes
Solber terminates either when In all cases solber quickly converges to a reasonable solution, which, however, is not good enough to terminate breeding. The difference between models with the first and the 50th percentile in fitness becomes too small and the mutation rate is increased to stop the onset of inbreeding. In Example 1, this quickly helps to identify the model with desirable chi2. In Example 2, the increased mutation rate in children does not help solber to converge fast enough and inbreeding is detected after approximately 50 generations. Then, all (but the best) individuals in the population are mutated - this generates healthy variety as shown by the rise in dotted curve (top panel) and results in quick termination with the desired chi2. Finally, in Example 3, required chi2 can not be reached after several cycles of increasing mutation and solber carries on until maximum number of generations is produced. The solution, however, has only marginally worse chi2. | ||

Logarithm of likelihood is maximised to obtain centers, dispersions
and weights for Gaussians in a mixture. Solid curves - individual
Gaussian components (panel 1 for model, panel 2 for solber), thick
grey - total model pdf, histogram - model realisation, thick black -
solber solution. Dotted line in panel 4 is at -2 ln (L_{i}/L_{i+1}) = 3,
i.e. the difference in degrees of freedom between consecutive models.