Anomaly Classification Through Automated Shape Grammar Representation
Statistical learning offers a trove of opportunities for problems where a large amount of data is available but falls short when data are limited. For example, in medicine, statistical learning has been used to outperform dermatologists in diagnosing melanoma visually from millions of photos of skin lesions. However, many other medical applications of this kind of learning are made impossible due to the lack of sufficient learning data, for example, performing similar diagnosis of soft tissue tumors within the body based on radiological imagery of blood vessel development. A key challenge underlying this situation is that many statistical learning approaches utilize unstructured data representations such as strings of text or raw images, that don’t intrinsically incorporate structural information. Shape grammar is a way of using visual rules to define the underlying structure of geometric data, pioneered by the design community. Shape grammar rules are replacement rules in which the left side of the rule is a search pattern and the right side is a replacement pattern which can replace the left side where it is found. Traditionally shape grammars have been assembled by hand through observation, making it slow to use them and limiting their use with complex data. This work introduces a way to automate the generation of shape grammars and a technique to use grammars for classification in situations with limited data. A method for automatically inducing grammars from graph based data using a simple recursive algorithm, providing non-probabilistic rulesets, is introduced. The algorithm uses iterative data segmentation to establish multi scale shape rules, and can do so with a single dataset. Additionally, this automatic grammar induction algorithm has been extended to apply to high dimensional data in a nonvisual domain, for example, graphs like social networks. We validated our method by comparing our results to grammars made of historic buildings and products and found it performed comparably grammars made by humans. The induction method was extended by introducing a classification approach based on mapping grammar rule occurrences to dimensions in a high dimensional vector space. With this representation data samples can be analyzed and quickly classified, without the need for data intensive statistical learning. We validated this method by performing sensitivity tests on key graph augmentations and found that our method was comparably sensitive and significantly faster at learning than related existing methods at detecting graph differences across cases. The automated grammar technique and the grammar based classification technique were used together to classify magnetic resonance imaging (MRI) of the brain of 17 individuals and showed that our methods could detect a variety of vasculature borne condition indicators with short and long-term health implications. Through this study we demonstrate that automated grammar based representations can be used for efficient classification of anomalies in abstract domains such as design and biological tissue analysis.