Distributed Associative Memory Approach for Cloud Computing Environments

2017-04-09T23:25:02Z (GMT) by Amir Hossein Basirat
With emerging interest to leverage massive amounts of data that are available in open sources, such as the Web for solving long-standing information retrieval problems, the question as how to effectively process immense datasets is becoming increasingly relevant. This raises the question of whether our capability to recognise and process such immense data copes with our ability to generate them. This question will be addressed in this thesis by first examining the capability of existing large-scale data-processing schemes to scale up with this outgrowth of data. To address some of their highlighted limitations, particularly regarding computational complexity and scalability, this research proposes a novel associative-memory-based scheme for big data processing that is scalable, distributable and lightweight, and that overcomes some of the issues encountered in traditional data access mechanisms for data storage and retrieval. To achieve the above goal, a distributed data access scheme that enables data storage and retrieval by association is first developed to circumvent the partitioning issue experienced within referential data access mechanisms. In our model, data records are treated as patterns. As a result, data storage and retrieval are performed using a distributed pattern recognition approach that is implemented through the integration of loosely coupled computational networks, followed by a divide-and-distribute approach that facilitates the distribution of these networks within the cloud dynamically.<br><br> To date, all implementations of MapReduce, including the Hadoop version, have interpreted data in a relational model, which limits its functionality when dealing with complex and unstructured data such as images. To address this, an associative-memory-based MapReduce is introduced to elevate the MapReduce key-value scheme to a higher level of functionality by replacing the purely quantitative key-value pairs with scalable associative-memory-based data structures that will improve parallel processing of data with complex relations. By having an associative key-value model, we can deal with data in any form and in any representation simply by using a pattern-matching model that treats data records as patterns and provides a distributed data access scheme that enables data storage and retrieval by association, thereby circumventing the scaling issue experienced within referential data access mechanisms. The principle of associative-memory-based learning is implemented through the use of connected layers in a hierarchical fashion; with local feature learning happening at the lowest layer while features are combined to form higher representations at upper layers.<br><br> In addition, this thesis investigates the extension of the proposed distributed data management scheme for different data-intensive scenarios by improving upon the existing cloud data management models for fault tolerance and scalability and reducing MapReduce communication overheads by introducing data locality. In particular, three data-intensive scenarios are considered in detail: dealing with large datasets, handling large training volumes and a neural network with an excessive number of processing neurons. Moreover, the application of our associative-memory-based approach is examined as a case study in a cloud of wireless sensor networks (Cloud-WSNs) to investigate the capabilities of the scheme in performing large-scale pattern recognition operations in resource-constrained WSNs.