print print


Getting Started with Optimization

There are many books on optimization algorithms such as EAs [1, 2, 3, d30] and here, I do not intend to provide any discussions on the various optimization methods or their foundations. Instead, I assume that you already vaguely know what optimization means and what Evolutionary Computation is (otherwise, you may want to check my summary regarding this topic.

Instead, our goal is to gain something that none of the available literature can provide you with: hands-on experience. Indeed, much knowledge can be obtained from books or papers, but — especially in the area of optimization — it can never replace personal experience. This has various reasons, some of which are:

  1. Every optimization problem is different. Each problem comes with characteristic difficulties such as ruggedness, neutrality, or dynamics in the fitness landscape, epistasis, multi or unimodality, instance size (scale), the time it takes to evaluate one solution candidate, feasibility constraints, the number of objective functions, and so on. What is a good approach for one problem may not work at all when applied to another. Often, only experience can guide you to good results, the feeling that a certain technique may work good for the problem at hand.
  2. Only few problems of practical importance are actually tangible with maths. Mathematics are important, do not get this wrong, but until now they can only offer us limited indicators — usually in form of complexity analysis or upper and lower boundaries of the runtime — on whether a certain problem can be tackled efficiently with a particular technique. While large advance has been made in this area, the problem is that adding a constraint or modifying the objective function even of a well-analyzed problem can render the available analytical estimates useless. Therefore, experience (mixed with the knowledge of theoretical results) is often most valuable guide for the practitioner in optimization.
  3. Papers, articles, and books are written by people. People who personally spent much time in advancing "their" field of optimization. Comparisons to other methods are therefore often rare and also — without any malicious intends — the authors are always biased in one way or the other. Hence, personal experience should always complement literature study.

Therefore, I here provide some toy problems which should allow you to experience some facets and different areas of optimization by yourself.

  1. the traveling salesman problem (TSP)
  2. the NK fitness landscapes

The most important thing when tackling these problems is not to use any approach from literature. Instead, you should tackle these problems solely with your own fantasy, create your own solution representation, search operators, or even algorithms. It is my strong opinion that such an approach will help you best to develop a feeling for the optimization problems and will contribute much, much more to your experience than just reproducing existing solutions.

To each of the following problems, short explanations are given along with problem instances. Also, I'd like to encourage my students as well as everyone who tackled the problem to submit their approaches (program source files, descriptive document, best solutions found etc.) to me via email to tweise@gmx.de. In return, you will receive the password to the corresponding solution repository. Please notice that this is no competition or something — we just aim at building experience and collecting views on the problem from as many different angles as possible.

1. The Traveling Salesman Problem (TSP)

The traveling salesman problem is a class of combinatorial optimization problems from which we can assume that the worst-case runtime for finding the correct solution grows exponentially with the number of involved cities. This makes this problem interesting for the application of randomized optimization methods such as EAs, Simulated Annealing, or ACO.

The problem is defined as follows: Given is a graph G defining n cities and the connections between them. These connections are weighted with the distance and may either be bidirectional or unidirectional. The goal is to find the shortest cyclic path connecting all n cities.

Here, we use a simplified version of this problem: The n cities are given by their xy-coordinates on a two-dimensional plane. Every city i can be reached from any other city j and their distance is the Euclidean distance, i.e., (xi-xj)2 + (yi-yj)2.

We provide three instances of this problem (extracted from [4]), each specified in a text file. Each line of such a file defines exactly the name and location of one city and contains three numbers: The id of the city, its real-valued x-coordinate, and its real-valued y-coordinate separated by spaces.

  1. tsp100.txt (3 kiB, version: 2009-09-25) a n=100 problem instance
  2. tsp280.txt (4 kiB, version: 2009-09-25) a n=280 problem instance
  3. tsp2392.txt (69 kiB, version: 2009-09-25) a n=2392 problem instance

When approaching this problem, the following questions should be asked:

  1. How can we represent a cyclic tour of n cities in a way that an optimization algorithm can work with most efficiently?
  2. How can we most intelligently create the initial solution candidates with which our search should start?
  3. How can we change an existing solution candidate in order to create new ones? Will we do this in a way that focuses on exploration or exploitation or both?
  4. Will we use an operator which combines two or more existing solution candidates? And if so, how should this operator work best?
  5. If we choose to apply an evolutionary approach, parameters such as
    1. the population size,
    2. the selection scheme, and
    3. the population treatment (generational versus steady-state, with or without elitism)
    have to be determined.

Of course, the chances that we find a good algorithm configuration with the first experiment are very slim. Instead, multiple operators and settings will have to be tested — this is the experience-building aspect of this exercise. As for finding out which operator or configuration is efficient, we recommend to perform the same number of search steps / objective function evaluations for multiple configurations and pick the one producing the shortest results. The objective function, i.e., the optimization criterion, is obviously the total length of the path (including the backwards connection of the last city in the sequence with the first one).

The solutions to the TSP problems which my students and me generated can be found inside the password protected area. Again, remember that we do not perform any competition or such and such, but the goal is to test a variety of ideas and to gain experience. Therefore, we do not aim at comparing these approaches to such developed in research and found in literature — we are just "playing around" with the problem ☺. (Oh, by the way, if TSPs are only the "start", vehicle routing problems are problems which involve optimizing the routes of many vehicles in order to perform a set of transportation tasks. Check, for instance, the in.west project.

2. The NK Fitness Landscapes

The NK Fitness Landscapes [5] are a special optimization problem inspired by the biological phenomenon of epistasis. Epistasis in biology basically mean that the contribution of one gene to the phenotypical traits (and hence, the fitness) depends on the allelic state of other genes. Interestingly, such aspects can be observed in different variations in many real-world optimization problems and are one of the reasons why optimization is difficult. Although you may feel the urge to read the listed reference and to find out what epistasis is — don't do it. Instead, use this exercise to find it out by yourself.

The search space of this problem are bit strings of the length N and the objective function F(x) maps this space to the real numbers (to the interval [0,1), to be more precise). Each single bit xi contributes to the overall fitness. Its contribution depends on K neighbors xi1 to xiK.

N
F(x)=1/N*Ti[xi, xi1, xi2,...,, xiK];   F:{0,1}N ↦ ℝ;   Ti:0..2K+1-1 ↦ [0,1); ∀i∊1..N   (Formula NK)
i=1

The overall fitness of a bit string is computed to the (formula NK) given above. F(x) is the sum of the contribution of the N bits. The individual contribution of the tth bit (xi) is taken from the table Ti. Each table Ti has the length 2K+1, and the tth bit (xi) along with its K neighbors are interpreted as the binary representation of a natural number y ∊ 2K+1-1 and used as index into this table. Each entry in each of the (N tables is initialized to a random number uniformly distributed in the real interval [0,1). The neighbor sets xi1, xi2,...,, xiK for each gene xi (i ∊ 1..N) can be either

  1. fixed neighbor sets: the K genes spatially closest to xi, i.e., {xi-K/2, xi-K/2+1, ..., xi-1, xi+1, ..., xi+K/2} or
  2. random neighbor sets: K randomly selected genes, i.e., a different, fixed neighbor set is created for each gene before the optimization starts.

In this exercise, the following approach should be followed

  1. implement the NK Fitness landscape for
    1. N ∊ {8, 16, 32, 64}
    2. K ∊ {0, 4, 8, 16}
    3. for both, fixed and random neighbor sets
    Ensure that for all experiments with the same N and K, the same tables Ti are used. Ensure that for all experiments with the same N, K, and neighbor set type, the same neighbor sets are used. Both can be achieved by initializing the random number generator used for the neighbor set and contribution table computation with a value computed directly from N, K, and 0/1 depending on the neighbor set type.
  2. implement a simple Genetic Algorithm for solving the NK problem. Assume maximization of the objective function F.
  3. With a small series of experiments, determine the influence of N, K, and the neighbor set configuration on the problem hardness by checking
    1. how fast the Genetic Algorithms converge, i.e., after how many objective value evaluations no improvements can be found anymore and
    2. by comparing the best values of F found for a given configuration.
  4. You may also want to check the influence of GA parameters such as population size and mutation or crossover rate on the results.
  5. The questions to be answered are:
    1. Why are the results like this? Why do N and K have the influence they have?
    2. What does this mean for optimization problems in general?
    3. Is epistasis good or bad?
    4. What does it mean when epistasis is minimal, i.e., K=0? Name one optimization method which will clearly outperform Genetic Algorithms in such a situation.
    5. We now have seen epistasis in bit-string based search spaces ... How could epistasis in search spaces based on (real) vectors from the n look like?
    6. Another interesting question is: For this problem, we may test many different parameter configurations. How do we do this best? Should we somehow automate our implementation and the test series running mechanisms?
    7. If our GA has converged, does this mean that the global optimum has been found? If not, how could we determine the optimum then?
[1]Thomas Bäck, David B. Fogel, and Zbigniew Michalewicz: "Evolutionary Computation: Basic algorithms and operators," CRC Press, 2000, ISBN: 0750306645, 9780750306645; Google ID: 4HMYCq9US78C
[2]Thomas Bäck, David B. Fogel, and Zbigniew Michalewicz: "Handbook of Evolutionary Computation," CRC Press, 1997, ISBN: 0750308958, 9780750308953; Google ID: n5nuiIZvmpAC
[3]Kenneth Alan DeJong: "Evolutionary Computation: A Unified Approach," MIT Press, 2006, ISBN: 0262041944, 9780262041942; Google ID: OIRQAAAAMAAJ
[4]Research Group Combinatorial Optimization of the Ruprecht-Karls Universität Heidelberg: "TSPLIB," online available at http://comopt.ifi.uni-heidelberg.de/software/TSPLIB95/
[5]Lee Altenberg: "NK Fitness Landscapes," chapter B.2.7.2 of "Handbook of Evolutionary Computation" edited by Thomas Bäck, David B. Fogel, and Zbigniew Michalewicz, 1997 Oxford University Press, Inc., New York, NY, USA / Institute of Physics Publishing Ltd. (IOP), Dirac House, Temple Back, Bristol, UK / CRC Press, Inc., Boca Raton, FL, USA, src 1, src 2, src 3. Book Google id: n5nuiIZvmpAC