Data mining with incremental learning inspired dynamic tree structures

2017-01-31T04:10:51Z (GMT) by Ahmad, Norashikin
Artificial learning models such as artificial neural networks have been used for discovering patterns that are hidden in the data and to solve problems in many application areas. Despite the successful implementation of the models, artificial learning models differ from human learning in several key aspects. One significant property of human learning is that human learning is incremental where the acquisition of knowledge happens gradually and / or in stages and is a lifelong process. Some aspects of learning may also be the result of genome modification over generations. Nevertheless, many of the existing artificial models do not take into account these incremental and dynamic aspects of human learning. Traditional artificial neural network models consider a data set as static and once the learning is completed cannot acquire new learning and adapt. In many models, the structures are not flexible to adapt, thus limiting the models capability in working with changing data or dynamic situations. This thesis addresses the problem of incorporating incremental learning in artificial learning models and highlights its value and utility. In particular, this research attempts to develop a data mining model which is inspired by human ability to learn incrementally. This research has resulted in the development of a new model based on a dynamic self-organizing map called Growing Self-organizing Map (GSOM) which has shown to be effective in mining large volumes of data. The GSOM and its Attribute Cluster Relationship model (ACR model) have been utilized as a potential technique to achieve incremental learning in this research. The new model is based on the building of a concept tree that facilitates knowledge discovery through visualization and analysis of hierarchical structures. Apart from having an ability to learn incrementally, similar to the GSOM algorithm, the granularity of the concept tree can be controlled using the spread factor parameter. This feature allows an analyst to choose the depth of analysis of the clusters as desired. In addition, the conceptual profiles obtained from the dynamic concept tree in the model provide means to acquire knowledge as well as to monitor incremental changes to the data over time. The experimental results demonstrate that the model is able to acquire knowledge incrementally while preserving the previous knowledge. During the development of the model, an automatic cluster identification technique for the GSOM has been proposed and demonstrated. This method helps in faster and more accurate analysis of the clusters and also allows possible or unknown clusters to be discovered. In addition, the investigation on the effect of the spread factor value towards cluster separation in the GSOM has also been carried out. This contributes to the study of the GSOM algorithm, specifically, the spread factor effect in formation and separation of the clusters in the GSOM maps whereby quantitative results are provided.