Tuesday, June 4, 2013

A Small Study on Genetic Algorithm

Recently, I have a chance to visit a new technique in problem solving algorithm called Genetic Algorithm. I'm sure it's been around for quite some time, but it never occurred to me before. The problem with this algorithm is it doesn't seem to have a mathematical proof that it can solve a problem, or at least that what I thought. However, after a few days struggling  and coding, I finally understand how it works and why it will work. This is a co-joint work with Martin Nelly for her undergraduate thesis.

So here is the problem that we tried to solve in the last few days:  arranging a valid timetable for a university/faculty. Timetable's validity depends on the rules provided. There are a few ways to solve this. My first attempt to solve this problem is to use a tree and try to find all timetable possibility. The problem when using a tree is there are way too many possibility to explore, especially using a breadth first search. You might get lucky when using a depth first search but only if the program picked the right branch to explore first. There might be other kind of tree that might help to speed up finding the solution, but I'll figure out about that later on.

Our focus here is to find  the solution using Genetic Algorithm. Genetic Algorithm is a method to find the solution by simulating the theory of evolution in biology, or genetic theory to be precise. Basically it populates a certain amount of solutions (which will later be called individuals), pick the best individuals with the best gene, mixed them to generate another solution, and repeat those process until the desired solution is reached.

To define a good/bad timetables for our solutions, we defined two category of constraints: hard and soft. The hard constraints must be satisfied before an individual enters a population. The soft constraints is not obligatory. However, it defines the quality of an individual. We define the quality of an individual by finding it's cost: higher cost means lower quality.

Hard constraints contains three rules: no lectures booked twice in a week, only one lecture can exist at one room and one time slot, and no lectures with the same students group may appear at the same time slot. The soft constraints are no lecturer booked twice at one time slot and the lecturer is available.

The flowchart below shows how genetic algorithm works in our program.

Timetable is basically a schedule list that shows lectures based on room, day, and hour. We define a timetable by a 3 dimension array of integer, where the integer value holds an (program generated) id of each lecture. Let say a lecture "Intro. to Java" with id value 6 is held in room 10 on tuesday (index = 1) at hour 5; hence the value of timetable[10][1][5] is 6. It is that simple.

The first step of the algorithm is to set an initial population of timetables. The amount of timetables created per generation is predefined. In our system, we put 100 as the default value, which means the maximum timetable is 100 each generation. The creation of the timetables is done by picking a random room and time, and if it's still empty, book a lecture at that slot. In the end of the initialisation process, we will have 100 timetables which follows the hard constraints. To decide which (group of) timetables are the best genes, we use the cost calculation using the soft constraints. Keep in mind that it is easier to sort the group as early as possible.

The theory of evolution states that good parents have higher chance to create good offsprings. We are going to simulate this by merging best solutions on the population. To avoid overpopulation, we simply remove the solutions with bad fitness value and refill the population by combining two population members. It is possible to merge more than two parents, but in our case and to make the program simpler, we only deal with two parents per breeding. Combining two timetables is done simply by picking two random parents  to fill each slot of the new breed. Notice that there is no validity check in this process, so it is possible the new timetable conflicts the hard constraints. To fix this problem, we need to repair the new timetable by checking all slots and make sure that each lecture is done once and only once each week. We also mutate the new timetables by randomly switching a schedule. After the new timetable is created, we calculate the error value by checking it against the soft constraints. The new timetable is then sorted based on this value.

The steps mentioned above are repeated until we found a timetable that satisfy both hard and soft rules. Since we define the quality of a timetable based on soft constraints, the solution that we are trying to find is the one that has no cost.

I was skeptical about this method due to it dependant to random generators (in our program, the random value is generated by Java's Math.random()), but it seems like most of the time it works faster than the tree solution that I did before. However, multiple sampling shows that the behaviour can change significantly on each run. This is caused by the massive random values generated during the whole process. But as far as I notice, the result is quite satisfying.  Both average and best result convergence to the solution we want as shown by the chart below.
It took 826 generations to find a proper timetable. Each generation contains 100 timetables. Both average and best values seems to converge to the desired solution.

There are a lot of modification to be made so this program is usable (eg. different lecture time, room-dependant lecture such as labs), but for now it is enough and more than entertaining. It's been a fun week creating this program and I wonder how usable this technique is in real time system.

No comments: