figshare
Browse

Restricted Access

Reason: Access restricted by the author. A copy can be requested for private research and study by contacting your institution's library service. This copy cannot be republished

Multivariate alternating decision tree

thesis
posted on 2017-02-21, 02:30 authored by Sok, Hong Kuan
Decision trees are characterized by fast induction time and comprehensible classification rules. However, their classification accuracies are relatively lower in comparison to other black-box classifiers such as support vector machines. It is often possible to improve decision tree accuracies by combining them via boosting or bagging to form an ensemble of trees (i.e., forests). Unfortunately, ensemble approaches will cause the decision trees to lose their comprehensibility and significantly lengthen their induction time. The invention of the alternating decision tree (ADTree) allows the incorporation of boosting within a single decision tree to retain comprehensibility. However, the existing ADTree is univariate in nature which limits its potential to further improve the accuracy and induction time. This thesis presents the multivariate alternating decision tree, whereby multivariate decision nodes are incorporated into the ADTree learning algorithm. It can be considered as a generalization of the existing univariate ADTree. Three different variants of multivariate ADTrees are presented in this thesis, namely the Fisher’s ADTree, Sparse ADTree and regularized LogitBoost ADTree (rLADTree). These algorithms were benchmarked against other existing univariate, multivariate and ensemble-based decision trees using real-world datasets from the University of California, Irvine Machine Learning Repository and University of Eastern Finland Spectral Database. It is shown that the Fisher’s ADTree is capable of improving the accuracy of multivariate decision trees through boosting. At the same time it remains to be significantly smaller than boosted multivariate decision trees. It is also shown that the Sparse ADTree is a non-parametric extension of the Sparse Linear Discriminant Analysis (SLDA). It is therefore able to linearly partition the data when it is beneficial to do so, or to grow a tree to improve the classification accuracy when necessary. The most notable multivariate ADTree variant is the regularized LADTree, which is characterized by having no statistically significant differences in all performance metrics and offering comprehensibility when compared with the univariate, unboosted decision trees like C4.5 and CART for general datasets. For datasets with highly correlated features, the regularized LADTree outperforms the existing decision trees in terms of accuracy and comprehensibility, making it a top choice among decision tree classifiers.

History

Campus location

Australia

Principal supervisor

Melanie Po-Leen Ooi

Year of Award

2015

Department, School or Centre

School of Engineering (Monash University Malaysia)

Course

Doctor of Philosophy

Degree Type

DOCTORATE

Faculty

Faculty of Engineering

Usage metrics

    Faculty of Engineering Theses

    Categories

    No categories selected

    Exports

    RefWorks
    BibTeX
    Ref. manager
    Endnote
    DataCite
    NLM
    DC