The Tonal Comparative Method: Leveraging lexical tone in historical linguistics
The Comparative Method (CM) is one of the primary tools of historical linguists for determining phylogenetic relationships between languages and reconstructing ancestor proto-languages. Key to its scientific validity is having generally reproducible principles for distinguishing retentions and innovations from chance resemblance (Weiss 2015). The CM has focused almost exclusively on segmental reconstruction, with lexical tone sometimes used to exemplify areas where the CM is not applicable at all (e.g. Meillet 1948: 90, Campbell 2003). Indeed, early overexuberance about the relatedness of tone systems contributed to now-defunct proposals such as Sapir’s Sino-Dene Hypothesis (Golla 1984:374-382, Bengston 1994). Key to the scientific validity of the CM is having generally reproducible principles for distinguishing retentions from innovations and chance resemblance. It is understandable that principles for tone system reconstruction have not emerged, due to the relatively recent understanding of tonogenesis, and relatively less access to diverse tonal data. It is now well understood that lexical tone compensates for the loss of segmental contrasts (Krauss 1973:963). This has enabled some tonal reconstruction, but largely on a family-by-family basis, due to the difficulty of establishing comparability of tones across family and regional boundaries.
Recent increases in the quantity of tone data available provide an opportunity to extend our reach by applying the logic of the CM to lexical tone, which I propose as the Tonal Comparative Method (TCM). The rich and relatively recent tonal history of the Tai languages makes them an ideal testing ground for the TCM. The present study uses insights from computational phylogenetics to inform the application of the CM to tone systems. Using a corpus of tone boxes (Gedney 1972) from 362 Tai doculects, many of them endangered, and a family tree from Glottolog (Hammarström et al 2018), for each lect, every cell is compared pairwise against every other. The resulting traits thus form a binary representation of possible retentions and innovations. Using the D statistic test (Fritz & Purvis 2010) we identified which traits have strong phylogenetic signal and categorized them accordingly. The strongest phylogenetic signal occurs in three areas: cells that seldom split (e.g. B1=B3, 98% of lects), implying shared retention, cells that rarely merge (e.g. A3=B4, 6% of lects), implying shared innovation, and cells that frequently merge (e.g. A3=A4, 46% of lects), indicating a need to carefully distinguish shared innovation from parallel innovation and chance similarity.
Using these insights, the TCM reconstructs the system as a whole, rather than extrapolating the phonetic value of individual tones. A tone system can be conceived of as a mathematical partition of the lexicon into a set of categories. Thus, in this framework, a tone change (split or merger) is a change in the internal structure of the partition. Since each cell in this partition represents a segmentally-derived conditioning environment for a tonogenetic event, we are thus inferring changes in the conditioning environments of tones, rather than tone values themselves.
This big picture view prevents the mishandling of tonal evidence that has sometimes happened in past classifications (e.g. Chamberlain 1975). Ultimately it must be used in conjunction with the segmental evidence in order to make the strongest case for both segmental and tonal reconstructions, using all available evidence. The TCM represents an increase in scientific rigor in the study of tonal development in Tai languages, and serves as a model for how, given sufficient data across a typologically diverse set of languages, we might apply the TCM to sound change in lexical tone systems more generally.