Growing a Forest: - Genetic Decision tree Induction
MetadataVis full innførsel
In decision tree learning, the traditional top-down divide and conquer approach searches a limited part of the hypothesis space, often leading to sub-optimal solutions. By doing decision tree induction with the use of an evolutionary algorithm the hypothesis space can be searched globally, leading to stronger solutions, while maintaining the inherent comprehensibility that decision trees offers. We have developed EMTI, the Evolutionary Multi-class Tree Inductor, a genetic programming method for inducing parallel axis, poly-ary decision trees for multiclass classification problems. It focuses on creating accurate decision trees with a high degree of human readability. EMTI uses a genetic programming encoding-scheme representing individuals directly as decision trees, and implements tree-specific crossover and mutation operators. Initial population is generated in the form of minimal, one decision node trees, which grow rapidly in size as the evolution cycle count increases. The multi-objective fitness function rewards classification accuracy while favoring smaller trees over larger ones. Traditional decision tree pruning methods and early stopping methods are shown to be viable ways of avoiding overfitting in the algorithm. EMTI scores favorably in terms of classification accuracy compared to C4.5 and shows a strong ability to ignore data noise and irrelevant attributes.