A NOVEL PATTERN LEARNING AND RECOGNITION PROCEDURE APPLIED TO THE LEARNING OF VOWELS

The ability of a set of simple predicates to capture characteristic patterns in a parametric representation of vowels in continuous speech was investigated with the aid of an efficient conjunctive pattern recognition and classification system. The results compare favourably with those produced by a cluster-based minimal Euclidean distance technique, run over the identical training and test samples. The predicates used are similar to auditory receptive fields


SLIM
In this section we first give a summary of SLMs operation and then describe it in sufficient detail to understand the remainder of this paper. however, that the general but simple conjunctive pattern learning and classification technique and feature-description methodology described below has the ability to deal effectively with such difficult task domains. The phone recognition task is a hard problem which provides a good opportunity to develop the method in a domain subject to a high degree of variability. Moreover, the solution will be of immediate practical use.
The approach to be described is one of finding pattern templates which are The remainder of this paper is organized as follows. First we describe SLIM and the feature-encoding method. We then describe the data and their feature representation and the classification experiment. The results obtained are then compared with those from a labelling system in current use which employs a Euclidean distance technique applied to the same training and test examples.
We are able to conclude that the method works better than the one in current use for the vowels upon which the experiments were conducted. Moreover, an examination of the form of the abstractions suggests that formant peaks may not be as good indicators of vowel class as their "shoulders". We shall now be more explicit. SLIM first prepares a description of each exemplar in terms of user-defined boolean features. It then attempts to find characteristic conjunctive abstractions which distinguish specified classes from each other. The basic operation by which this is done is interference matching. The effect of interference matching -applied to the descriptions of two exemplars is to produce a schema (or abstraction) which comprises all and only the features common to both.
Because a schema is a set of features, as each encoded exemplar, exactly the same matching process may be applied to a schema and an exemplar.
The procedure of abstracting new diagnostic pattern templates for a class is In addition, the exemplar is itself entered as a schema. The schemata so formed are then evaluated for placement in the list, and then the next exemplar is processed. The decomposition is complete when all the positive exemplars have been processed.
The evaluation mentioned above is a calculation of the schema's performance as a diagnostic indicator of the positive against the negative class. The performance measure in current use computes a weighted difference between the a posteriori expected hit rate (Le> the frequency of matches within the positive class) and the a posteriori expected false alarm rate (Le. the frequency of matches by that schema within the negative class). The hit rate and false alarm rate are weighted by a gain and a loss factor, respectively^.
The number of schemata which may be generated in this fashion rises exponentionally with the number of training exemplars, and so some techniques are used to limit the size of the decomposition list. The power of SLIM derives from its heuristic methods for preventing a combinatorial explosion without seriously compromising the discriminative power of the templates induced. The basis of SLIIvTs approach lies in the performance measure. Performance limits may be set which act as thresholds for a new schema's acceptance into the dynamic decomposition list. The schemata are ordered in the dynamic decomposition list according to their performance. In addition, a limit on the length of the dynamic decomposition list may be set. This will have the effect of constantly pruning off the more poorly performing schemata. Other heuristic constraints may be applied to the speed the process with minimal discriminative loss^. The ones mentioned here are those referred to in the following sections.
This process is usually repeated for each of a number of classes. At the completion of the decomposition for each class, its dynamic decomposition list is merged into the final decomposition list. Once a final decomposition list has been formed, it is possible to classify test exemplars. Each exemplar to be classified is encoded into the feature representation and then matched against each schema in the decomposition list in turn. If a match occurs, i.e. if the exemplar contains all the features in one of the schemata, it is assigned to the class from which the schema was derived. The process is self-terminating, so that it is the highest-performing match which determines the classification.

The Overlapping Receptive-Field Feature Representation
Learning is considered here to be a process of inducing pattern templates which are as discriminating (t.e. precise) as possible and at the same time as general (inclusive) as necessary to characterize each class. These two goals are in conflict in that precision is necessary to take advantage of fine differences, but it is equally  gives the conjunction F4AF5AF6AF7 of common features. It should be noted that this schema defines an interval from 35 to 49 which, within the framework adopted, is at once the most precise and the most general observation which can be made from the two events. This method of overlapping intervals thus provides a solution to the problem of simultaneously discriminating and generalizing within a conjunctive abstraction framework.
This descriptive methodology can be considered as a uniform coding technique with four parameters. They are: (1) The maximum generalization interval, G, which is the distance between the upper and lower bound of each feature interval.
(2) The maximum discrimination interval, D, which is the distance by which adjacent features are shifted with respect to each other and is equivalent to the JND of a learning procedure based on these features.  L) increases, for any G, the relative efficiency of the preferred method, which is quadratically related to these parameters, becomes increasingly significant. Thus, if learning is to be based on interval discrimination and generalization, the proposed code if a highly efficient one.
We are encouraged in our use of this approach by several physiological observations. Firstly, the receptive fields of auditory perceptual system neurones are apparently distributed in an overlapping manner^. Secondly, the shapes of auditory tuning curves, which define the frequency characteristics of auditory receptive fields, are often wide'*, which suggests that the square window nature of our features may be appropriate. Thirdly, the proposed method is a very general method (not at all language-specific, as a formant-extraction approach might be) and may help to explain how animals can be trained to discriminate between speech sounds^. Lastly, the proposed code is a redundant one which would produce well-controlled generalizations if features were to be lost for some reason; although we are not able to go into the possibility or significance of discarding features from schemata here, we wish to point out the method's potential for graceful degradation under such loss.
The method, which we may call the overlapping receptive field representation, is applicable to any ordinal scale and may be used for more than one dimension at once, as is the case in the current work.

The Parametric Representation and its Preparation
The parametric representation employed here provides, for each centisecond, an amplitude for each of the 128 frequencies which may be sampled at 39.625 Hz intervals between 39.625 and 5000 Hz. They are derived from an original 10 KHz digitization via the discrete fast Fourier transform of a 14-pole Linear Predictive Coefficient filter. This provides a smoothed, amplitude-normalized spectrum. Although the signal energy for each centisecond is available along with the spectral data, only the latter were used in the current experiment. Figure 2  In this way, a final decomposition list was produced, consisting of the dynamic decomposition list resulting from decomposition of each of the four phpne classes against all the others. On classification of the test items, it was found in 23 of the 80 cases (Le. 28.752.) that the data were so variable that none of the general schemata produced could classify an exemplar. In that case ah alternative classification method within SLIM was employed. Here the decomposition process is repeated, but with the additional constraint that when each, stored exemplar is converted into a schema for possible addition to the decomposition list only those features which are also present in the example to be classified are retained. Hence only those features which are relevant to the item to be classified will enter into the decomposition. This procedure is called filtered decomposition to distinguish it from the unfiltered abstraction method described in Section 2 above. Those test exemplars remaining unclassified by the unfiltered technique were classified according to which filtered decomposition list contained the highest performing schema. Firstly, we can conclude that SLIM may be used effectively to find good characterizations in a very difficult area, at the acoustic level of description of continuous speech. Not only are the recognition rates generally good, but in most cases they show the SLIM outperforms those of another method in current use, when applied to precisely the same data. The phone III gave SLIM more trouble than it did to INTRAC We feel that this is a sign that our features are still insufficiently general, and we are continuing to refine our feature representation.
Secondly, this is accomplished without recourse to sophisticated techniques of description, such as formant extraction. Indeed, formants are somewhat less in evidence in the schemata, as Figures 2 and 3 exemplify, even though vowels were the training data.
Thirdly, we have given an example of how SLIM may be used to explore the ability of theories about relevant features of speech by testing their ability to discriminate between phones. Our work, to date, has investigated only some of the simplest forms of description which might be used. We are continuing this study by working with other phones, additional speakers and by trying different simple feature