figshare
Browse
monash_63582.pdf (9.83 MB)

Recognising patterns in large data sets: a distributed approach

Download (9.83 MB)
thesis
posted on 2017-05-26, 07:39 authored by Muhamad Amin, Anang Hudaya
Advancements in computer architecture, high speed networks, and sensor/data capture technologies have the potential to generate vast amounts of information and bring in new forms of data processing. Unlike the early computations that worked with small chunks of data, contemporary computing infrastructure is able to generate and store large - petabytes - of data for day-to-day operations. These data may arise from high-dimensional images used in medical diagnosis to millions of multi-sensor data collected for the detection of natural events, these large-scale and complex data are increasingly becoming a common phenomenon. This poses a question of whether our ability to recognise and process these data, matches our ability to generate them. This question will be addressed, by looking at the capability of existing recognition schemes to scale up with this outgrowth of data. A different perspective is needed tomeet the challenges posed by the so called data deluge. So this thesis take a view which is somewhat outside the conventional approaches, such as statistical computations and deterministic learning schemes, this research considers the bringing together strengths of high performance and parallel computing to artificial intelligence and machine learning and thus proposes a distributed processing approach for scalable pattern recognition. The research has identified two important issues related to scalability in pattern recognition. These are complexity of learning algorithm and dependency on single processing (CPU-centric) scheme. Scalability in regards to pattern recognition, can be defined as the growth in the capability of pattern recognition algorithms to process large-scale data sets rapidly and with an acceptable level of accuracy. To scale up the recognition process, a pattern recognition system should acquire simple learning mechanisms and the ability to parallelise and distribute its processes for analysis of increasingly large and complex patterns. This thesis describes a new form of pattern recognition by enabling recognition procedure to be synthesised into a large number of loosely-coupled processes, using a fast single-cycle learning associative memory algorithm. This algorithm implements a divide-and-distribute approach on patterns, hence reducing the processing load capacity per compute node. By using this algorithm, patterns arising from diverse sources e.g. high resolution images and sensor readings may be distributed across parallel computational networks for recognition purposes using a generic framework. Furthermore, the approach enables the recognition process to be scaled up for increasing size and dimension of patterns, given sufficient processing capacity available in hand. Apart from this, a single-cycle learning mechanism being applied in this scheme allows recognition to be performed in a fast and responsive manner, without affecting the level of accuracy of the recogniser. The learning mechanism enables memorisation of a pattern within a single pass, therefore, adding more patterns to the scheme does not affect its performance and accuracy. A series of tests have been performed on recognition accuracy and computational complexity using different types of patterns ranging from facial images to sensor readings. This was done to study the accuracy and scalability of the distributed pattern recognition scheme. The results of these analyses have indicated that the proposed scheme is highly scalable, enables fast/online learning, and is able to achieve accuracy that is comparable to well known machine learning techniques. After addressing the scalability and performance aspects, this thesis deals with pattern complexity by including pattern recognition applications with multiple features. With the recognition process implemented in a distributed manner, the capacity for allowing more features to be added is possible. The proposed multi-feature approach provides an effective scheme that is capable to accommodate multiple pattern features within the analysis process. This is essential in data mining applications that involve complex data, such as biomedical images containing numerous features. The distributed multi-feature approach using single-cycle learning algorithm demonstrates high recall accuracy in the recognition simulations involving complex images. Finally, this thesis investigates the scheme's adaptability to different levels of network granularity and discovers important factors for the scalability of the pattern recognition scheme. This allows the recognition scheme to be deployed in different network conditions, ranging from coarse-grained networks such as computational grids, to fine-grained systems, including wireless sensor networks (WSNs). By acquiring resource-awareness, the proposed distributed pattern recogniser can be deployed in different kinds of applications on different network platforms, creating a generic scheme for pattern recognition. Further analysis on adaptive network granularity feature of distributed single-cycle learning pattern recognition scheme was conducted as a case study to examine the effectiveness and efficiency of the proposed approach for distributed event detection within fine-grained WSN networks. The outcomes of the study indicate that the distributed pattern recognition approach is well-suited for performing event detection using the divide-and-distribute approach with the in-network parallel processing mechanism within a resource-constrained environment. Furthermore, the ability to perform recognition using a simple learning mechanism, enables each sensor node to perform complex applications such as event detection. As a result, this research may give a new insight for applications involving large-scale event detection including forest-fire detection and structural health monitoring (SHM) for mega-structures.

History

Campus location

Australia

Principal supervisor

Asad Iqbal Khan

Year of Award

2010

Department, School or Centre

Information Technology (Monash University Clayton)

Course

Doctor of Philosophy

Degree Type

DOCTORATE

Faculty

Faculty of Information Technology

Usage metrics

    Faculty of Information Technology Theses

    Categories

    No categories selected

    Exports

    RefWorks
    BibTeX
    Ref. manager
    Endnote
    DataCite
    NLM
    DC