On cascading small decision trees

  1. Minguillón, Julià
Supervised by:
  1. Jaume Pujol Capdevila Director

Defence university: Universitat Autònoma de Barcelona

Fecha de defensa: 09 December 2002

  1. Josep Rifà Coma Chair
  2. Jordi Herrera Joancomartí Secretary
  3. Gábor Lugosi Committee member
  4. Manuel Graña Romay Committee member
  5. Francisco Javier Torrealdea Folgado Committee member

Type: Thesis

Teseo: 90457 DIALNET lock_openTDX editor


This thesis is about using small decision trees for classification and data mining. The intuitive idea behind this thesis is that a sequence of small decision trees may perform better than a large decision tree, reducing both training and exploitation costs. Our first goal was to develop a system capable to recognize several kinds of elements present in a document such as background, text, horizontal and vertical lines, line drawings and images. Then, each element would be treated accordingly to its characteristics. For example, background regions would be removed and not processed at all, while the other regions would be compressed using an appropriate algorithm, the lossy JPEG standard operation mode for images and a lossless method for the rest, for instance. Our first experiments using decision trees showed that the decision trees we built were too large and they suffered from overfitting. Then, we tried to take advantage of spatial redundancy present in images, using a multi-resolution approach: if a large block cannot be correctly classified, split it in four subblocks and repeat the process recursively for each subblock, using all previous computed knowledge about such block. Blocks that could not be processed at a given block size were labeled as mixed, so the word progressive came up: a first low resolution version of the classified image is obtained with the first classifier, and it is refined by the second one, the third one, etc, until a final version is obtained with the last classifier in the ensemble. Furthermore, the use of the progressive scheme yield to the use of smaller decision trees, as we no longer need a complex classifier. Instead of building a large and complex classifier for classifying the whole input training set, we only try to solve the easiest part of the classification problem, delaying the rest for a second classifier, and so. The basic idea in this thesis is, therefore, a trade-off between cost and accuracy under a confidence constraint. A first classification is performed at a low cost; if an element is classified with a high confidence, it is accepted, and if not, it is rejected and a second classification is performed, and so. It is, basically, a variation of the cascading paradigm, where a first classifier is used to compute additional information from each input sample, information that will be used to improve classification accuracy by a second classifier, and so on. What we present in this thesis, basically, is an extension of the cascading paradigm and an exhaustive empirical evaluation of the parameters involved in the creation of progressive decision trees. Some basic theoretical issues related to progressive decision trees such as system complexity, for example, are also addressed.