Manual

Genetic Algorithm Control Module

In the Genetic Algorithm Control Module you can define the input parameters for a genetic algorithm optimization approach. This approach is generally carried out on a parallel computer cluster, where individual solute "genes" are simulated and their solutions are compared with the experimental data in the least squares sense to evaluate their "fitness". Using paradigms from darwinian evolution such as mutation, crossover, deletions and insertions, plague and elitism the solute combinations of the genes to be simulated will evolve over many "generations" into better fitted solute combinations. To avoid lengthy evolution periods it helps to initialize the system with known parameters about the system that have been derived in other, perhaps model-independent approaches, such as the van Holde - Weischet analysis or other distribution methods implemented in UltraScan.

The genetic algorithm approach works best for heterogeneous mixtures of non-interacting solutes that may display a varying degree of frictional properties which can not be satisfactorily fit by a single frictional ratio. This method permits determination of distributions of sedimentation-, diffusion-, and frictional coefficients, as well as molecular weight and partial concentration. More information about this method can be found in Emre Brookes and Borries Demeler. Genetic Algorithm Optimization for obtaining accurate Molecular Weight Distributions from Sedimentation Velocity Experiments. Analytical Ultracentrifugation VIII, Progr. Colloid Polym. Sci. C. Wandrey, Editor. Springer

This module assists with the definition of the input parameters into the genetic algorithm to prepare a run on a parallel cluster. The steps of the initializations are:

Loading an initializing s-value distribution
Selecting the number of maximum solutes present in the solution
Either manually or automatically assign individual solute bins from the distribution to define areas where the genetic algorithm should search for a solute
Adjust the frictional ratio limits for each solute bin to restrict the search space in the frictional domain to a reasonable value. If 2-dimensional spectrum analysis distributions are used for initialization, the distributions will also be able to further define the frictional space, otherwise, the frictional initialization needs to be done by the user.
Adjusting the genetic algorithm parameters for the evolution operators.

Explanation of fields and buttons:

Population Size: This is the number of "genes" competing inside one deme. Each gene contains all solute parameters to define one model

Number of Demes: This is the number of populations competing mostly independently from eachother

Crossover Rate: The rate with which random crossover events occur between genes, and parameters are swapped with other genes.

Mutation Rate: The rate at which random mutation events occur at one parameter location.

Plague: The random rate of loss of entire genes.

Elitism: The rate at which a favorable parameter is maintained.

Random Seed: An integer value used to initialize the random number generator. Use a non-zero value for repeatability.

Regularization: Regularization applied to the distribution function.

Number of Generations: The number of iterations each deme will be allowed to evolve. The larger this number, the better the final chi-square (up to a limit), however, the compute effort will grow linearly with the generation number.

Number of initial Solutes: Select here the number of maximum solute bins in the solution. This number has to be set before auto-assigning solute bins from an s-value distribution. If you select bins manually, the counter will automatically increase as you graphically add new solute bins with the mouse. Adding too many solute bins will significantly slow down the convergence process since more solutes need to be calculated, but it will provide a better chance for a better final solution once the optimization is converged.

f/f0 minimum: This is the minimum of the frictional coefficient ratio, defining the most globular shape possible for the selected solute bin.

f/f0 maximum: This is the maximum of the frictional coefficient ratio, defining the most non-globular shape possible for the selected solute bin.

Help: Call up this help page.

Load Distribution: Retrieve a van Holde - Weischet, C(s), or 2-dimensional spectrum analysis distribution to initialize the genetic algorithm s-value range.

Autoassign Solute Bins: This function will attempt to define solute bins based on the integral values of the distribution. The algorithm will add 5% at the top and the bottom of the given distribution, and then split the range into the number of initial solutes defined above. Instead of equally splitting the s-value range, the range is equally spaced into integral values of the distribution function, so each bin has approximately the same relative signal.

Reset Solute Bins: Clicking on this button will erase the existing bin distribution and allow you to start over.

Cancel: Cancel out of the genetic algorithm control module.

Accept: Accept the currently defined solute bins and genetic algorithm convergence parameters and exit the control module.

Solute Selection Listbox Here you can modify the f/f0 ratios for each solute individually. During assignment of each bin, the limits for the f/f0 minimum and maximum of the bin are determined by the setting in the f/f0 min/max counters. If you would like to modify the setting, select the appropriate values in the respective counters and double-click on the solute in this listbox to update it's f/f0 values with those listed in the counters.

Distribution Plot Once a distribution is loaded into the plot window, you can either manually or automatically assign the bins in which the genetic algorithm will search for appropriate parameter values. Using the sedimentation coefficient distribution from the van Holde - Weischet analysis, the C(s) analysis, or the 2-dimensional spectrum analysis as a guide, you can define bins by first clicking on the left (minimum s-value) and the right (maximum s-value) limit of the bin. You want to bracket a peak to define a solute s-value range. You can exceed either the lower limit or the upper limit of the distribution to allow for s-values that may have not been found in the other analysis methods. Adding too many solute bins will significantly slow down the convergence process since more solutes need to be calculated, but it will provide a better chance for a better final solution once the optimization is converged.

www contact: Borries Demeler

The latest version of this document can always be found at:

http://www.ultrascan.uthscsa.edu

Last modified on November 25, 2005.

	Population Size:	This is the number of "genes" competing inside one deme. Each gene contains all solute parameters to define one model
	Number of Demes:	This is the number of populations competing mostly independently from eachother
	Crossover Rate:	The rate with which random crossover events occur between genes, and parameters are swapped with other genes.
	Mutation Rate:	The rate at which random mutation events occur at one parameter location.
	Plague:	The random rate of loss of entire genes.
	Elitism:	The rate at which a favorable parameter is maintained.
	Random Seed:	An integer value used to initialize the random number generator. Use a non-zero value for repeatability.
	Regularization:	Regularization applied to the distribution function.
	Number of Generations:	The number of iterations each deme will be allowed to evolve. The larger this number, the better the final chi-square (up to a limit), however, the compute effort will grow linearly with the generation number.
	Number of initial Solutes:	Select here the number of maximum solute bins in the solution. This number has to be set before auto-assigning solute bins from an s-value distribution. If you select bins manually, the counter will automatically increase as you graphically add new solute bins with the mouse. Adding too many solute bins will significantly slow down the convergence process since more solutes need to be calculated, but it will provide a better chance for a better final solution once the optimization is converged.
	f/f0 minimum:	This is the minimum of the frictional coefficient ratio, defining the most globular shape possible for the selected solute bin.
	f/f0 maximum:	This is the maximum of the frictional coefficient ratio, defining the most non-globular shape possible for the selected solute bin.
	Help:	Call up this help page.
	Load Distribution:	Retrieve a van Holde - Weischet, C(s), or 2-dimensional spectrum analysis distribution to initialize the genetic algorithm s-value range.
	Autoassign Solute Bins:	This function will attempt to define solute bins based on the integral values of the distribution. The algorithm will add 5% at the top and the bottom of the given distribution, and then split the range into the number of initial solutes defined above. Instead of equally splitting the s-value range, the range is equally spaced into integral values of the distribution function, so each bin has approximately the same relative signal.
	Reset Solute Bins:	Clicking on this button will erase the existing bin distribution and allow you to start over.
	Cancel:	Cancel out of the genetic algorithm control module.
	Accept:	Accept the currently defined solute bins and genetic algorithm convergence parameters and exit the control module.
	Solute Selection Listbox	Here you can modify the f/f0 ratios for each solute individually. During assignment of each bin, the limits for the f/f0 minimum and maximum of the bin are determined by the setting in the f/f0 min/max counters. If you would like to modify the setting, select the appropriate values in the respective counters and double-click on the solute in this listbox to update it's f/f0 values with those listed in the counters.
	Distribution Plot	Once a distribution is loaded into the plot window, you can either manually or automatically assign the bins in which the genetic algorithm will search for appropriate parameter values. Using the sedimentation coefficient distribution from the van Holde - Weischet analysis, the C(s) analysis, or the 2-dimensional spectrum analysis as a guide, you can define bins by first clicking on the left (minimum s-value) and the right (maximum s-value) limit of the bin. You want to bracket a peak to define a solute s-value range. You can exceed either the lower limit or the upper limit of the distribution to allow for s-values that may have not been found in the other analysis methods. Adding too many solute bins will significantly slow down the convergence process since more solutes need to be calculated, but it will provide a better chance for a better final solution once the optimization is converged.