Tutorial Proposal for GP98

Cezary Z. Janikow
Department of Mathematics and Computer Science
University of Missouri - St. Louis
ph: (314) 516-6352
fax: (314)-516-5400
URL: http://www.cs.umsl.edu/Faculty/janikow

with CGP v2.1

Genetic Programming (GP) relies on the closure property, which states that every function can call any other function and that any terminal can provide values for any function argument. Unfortunately, this property leads to many invalid (domain-specific syntax and semantics)or simply unwanted (heuristics) structures (e.g., programs). This enlargement of the search space (aggravated by sufficiency) can slow down the simulated evolution.

In GP, this is dealt with by either penalizing such structures (e.g., by setting evaluation to 0) to ensure their extinction, or by extending interpretations (e.g., protected division) to validate otherwsie invalid structures - but these choices are arbitrary in general. Koza has introduced structure-preserving crossover - an ad-hoc, domain-specific approach, aimed at preventing crossover from generating undesired structures. The objective of Constrained Genetic Programming (CGP) is to provide general and domain-independent means for dealing with these problems in crossover, as well as in initialization and mutation. Arising from this need, CGP has now evolved into a more elaborate means to dictate strong (weak) constraints (heuristics) on the solution space. In fact, we are currently working on using similar mechanisms to evolve GP representation in the process of the same evolution.

CGP was developed in 1995 at NASA/JSC and UMSL, simultaneously and independently of Montana's Strongly Typed GP (STGP). CGP v1 in fact implemented the same ideas as STGP, except that 1) rather than using type names for inducing constraints, it expected explicit constraints - this allows CGP to specify semantic constraints, 2) CGP's crossover was shown to be stronger (i.e., able to generate more valid programs) 3) STGP has practically merged type constraints with tree depths. But the biggest difference lies in CGP's formal methodology.  Constraints can be entered in multiple forms because they show up in different forms. These constraints are transformed into a normal form, which is shown to be a minimal description of the same constraints. Constrained initialization, mutation, and crossover are shown to satisfy the normal constraints, and they are shown to have the same complexity (O) as the unconstrained operators. CGP v2.1 extends v1 by also introducing data typing and overloading (as in programming languages). Overloading allows defining functions, such as plus, to compute different types according to the types of its instance's arguments. It also allows weak constraints, that are heuristic preferences (weighted) on some evolved forms over others.

Another advantage that CGP v2.1 offers is its extensive implementation: it has been implemented into lil-gp ( public domain tool for developing GP). Therefore, all lil-gp's capabilities are still available (currently, lil-gp 1.02 has been used, but plug-in approach allows for easy merging CGP with future lil-gp release). The only exception is that even though CGP provides for propagating constraints between modules, the implementation has not been extended to ADFs.

In the tutorial we will present the constraint language, formal analysis of constraints and constrained operators, and comparison of CGP to STGP. We will illustrate CGP v2.1 with application examples. We will discuss the implications of being able to control the search space (theoretical and practical). We will also discuss current/future developments: ADFs, recursive functions, merging with depth limitations, and evolving GP representation.

Biographical Sketch

Cezary Z. Janikow received B.A. (1986) and M.S. (1987) from the University of North Carolina at Charlotte, and Ph.D. (1991) from the University of North Carolina at Chapel Hill, all in computer science. Currently, he is an Associate Professor of CS at UMSL.
His research interests include evolutionary computation, representation and constraints in genetic algorithms and genetic programming, machine learning, and most recently fuzzy representation for machine learning. He co-authored GENOCOP (linear constrains in GA), he authored and developed GenET (generic-GA tool), CGP (Constrained GP), GIL (GA for inductive learning) and FID (fuzzy decision tree).
His other interests include software design and development (both structured and object-oriented), C/C++ programming, and UNIX productivity tools.

Presentation Details

Proposed length: 1 Segment (tight, but possible)

I anticipate to spend roughly about 10% on discussing the context, 50 % the CGP methodology, 20% on demonstrations, 10% on practical results and discussion, and 10% on future directions.

Context introduction would discuss implications of closure, sufficiency, and tree limits on the mapping between solution and representation spaces, along with possible problems that may arise given some properties of the mapping.

CGP methodology would discuss the methodology, formalism, and compare CGP with STGP. This would include recent additions such as heuristic constraints and function overloading.

Demonstrations would illustrate how domain-specific constraints can be expressed in the constraint language, and how they affect the effective GP search space. Practical results would follow with experimental results demonstrating that huge gains are possible - but only if the right spaces are allowed to be explored. This will lead to conclusions - future work on evolving GP representation so that "the right" spaces are evolved, plus ADFs, recursive functions, etc.