Grammar-based compression and its use in symbolic music analysis

We apply Context-free Grammars (CFG) to measure the structural information content of a symbolic music string. CFGs are appropriate to this domain because they highlight hierarchical patterns, and their dictionary of rules can be used for compression. We adapt this approach to estimate the conditional Kolmogorov complexity of a string with a concise CFG of another string. Thus, a related string may be compressed with the production rules for the first string. We then define an information distance between two symbolic music strings, and show that this measure can separate genres, composers and musical styles. Next, we adapt our approach to a model-selection problem, expressing the model as a CFG with restricted size, generated from a set of representative strings. We show that a well-generated CFG for a composer identifies characteristic patterns that can significantly compress other pieces from the same composer, while not being useful on pieces from different composers. We identify further opportunities of this approach, including using CFGs for generating new music in the style of a composer.


Introduction
We look at music information retrieval (MIR) through the lens of algorithmic information theory. One of the theory's principal concepts "Solomonoff-Kolmogorov-Chaitin complexity" (Kolmogorov 1963;Solomonoff 1964;Chaitin 1966) or more commonly known as "Kolmogorov complexity" denotes the length of the shortest description of a string with respect to a Universal Turing Machine and is an absolute measure of its information. A more useful concept is its conditional version which measures the information content relative to another string that the Universal Turing Machine is furnished with. In the domain of MIR, such a measure is of particular interest as it implies an absolute informational or cognitive distance between two strings: if they are similar, then the conditional K complexity of one relative to the other will be smaller than its own K complexity. Since music can be expressed symbolically and abstractly as a sequence of high-level musical events (such as absolute pitch, melodic interval and pitch class) akin to a finite string, it is ideal for K complexity and information distance-based analysis.

Describing music with context-free grammars
The universal Turing model of K complexity is, however, incomputable and too powerful to be exploited in practical applications. For MIR, we need a class of description method in which the mapping from a music string to its description is effectively executable and its K complexity feasibly approximable. This article explores and demonstrates one such description method for symbolic music: Context-Free Grammars (CFG) that generate only the target music and we approximate the K complexity with the grammar. The hierarchical assembly of patterns and their repetitions that comprise music are suitably represented by a CFG's production rules and their occurrences. Moreover, the class of CFG has a desirable property of admissible description syntax: it has an element of unique minimal size, the smallest CFG. Although finding it is an NP-hard problem (Charikar et al. 2005), by focusing on optimal specification length attainable within this class, we can reduce the problem complexity from undecidability to intractability.

Information distance for music
We use a computable approximation of the smallest CFG by repetitively selecting three types of patterns to compress the grammar: the most-recurring pattern, longest repeated pattern and the pattern that yields the shortest grammar at each step. Then, we demonstrate a novel way to estimate the conditional K complexity: as CFG describes a music sequence in terms of contextfree patterns, if another string has these patterns one can replace them with the corresponding non-terminals and thus conditionally compress it. It enables us to define an Information Distance (ID) between the two strings (Bennett et al. 1998), since the more information is shared between them, the more one will be compressed with the other's dictionary elements. We normalize the distance (NID) with 0 for maximal similarity and 1 for maximal dissimilarity. Our NID assigns a smaller distance between genotypically similar (same genre and composer) music strings than those which are dissimilar, alluding to the exciting prospect that CFG-descriptions preserve musical information. By projecting the distances on a 2D plane using principal component analysis, we see that the model indeed can group music by genre, composer and even composition style.

Composer classifier
Taking advantage of the powerful concept of conditional information, we also introduce a novel way to realize the Model Selection problem posed by Kolmogorov (1974). It aims to select the best-fitting model for a given string by maximizing the related information conveyed by the candidate models. For music this problem can be conceptualized in terms of finding the most appropriate musical category, in our case, composer, to classify an unknown sequence. For each of the candidate composers, we select a number of their composed music strings and construct a model CFG from these by capturing only their characteristic patterns, patterns that are most common among the sequences, and indicative of the composer's style, preventing the model from growing too complex or large. In Section 5, we show that such a concise CFG model can compress a related (same composer) string significantly compared to a non-related string and thus correctly deduce the string's composer.

Music dataset and representation
We focus on music belonging to three genres: classical, rock and jazz and represent music as a sequence of MIDI pitch numbers (MIDI onset events without pitch duration information). We obtained 106 classical MIDI files composed by Bach, Chopin, Beethoven, Haydn and Mozart from online repositories like Midiworld (1995) andClassical Archives (1994). The 60 jazz MIDI pieces in our project were collected from the Weimar Jazz Database (Pfleiderer et al. 2017) and contained composers like John Coltrane, Miles Davis, Chet Baker and Charlie Parker.
The MIDI pieces were converted to the symbolic format with the "mftext" software written by Tim Thompson and later modified by Sleator and Temperley (2003). For the genre of rock, we utilized a list of 200 songs from the Rolling Stones 200 project of de Clercq and Temperley (2011). In addition to using the vocal melodic transcriptions of these songs available from the project, we manually downloaded their MIDI versions from Midiworld (1995) and converted them to MIDI pitch sequence like the classical and jazz music to ensure consistency of the information (instrumental vs. vocal) contained in the sequences when they are compared with each other. The unequal amount of music in each genre was due to the fact that for classical and jazz we could not find all the music pieces of the composers we selected in MIDI formats. However, in our studies, we control for this by randomly selecting a fixed number of artefacts and by taking the average for all statistics.
To demonstrate the "mftext" transcription, consider "Hey Jude" by the Beatles. The pitch sequence for first few bars is It is worthwhile to mention that "mftext" is not entirely perfect and depending on the MIDI channel it thinks a note is coming from, the arrangement of the pitch numbers can be different. This results in a smaller compression rate than we would like, but it still is predominantly consistent in forming musical patterns and allows for a feature-free information extraction as we shall see in Sections 4 and 5.
A complete list of the music pieces used in this article can be found in Mondol (2020). The chosen music may seem too elementary or non-challenging for testing performance in MIR tasks. But since the CFG-based description method is abstract and not music-specific, we thought it best to verify our model's effectiveness in high-level MIR tasks, for which the corpora described above are sufficient.
The remainder of this article is organized as follows. Related work and their contributions are discussed in Section 2. Theoretical foundation of approximating K-Complexity with CFGs comprises Section 3. Sections 4 and 5 present two novel and exciting applications of CFGbased music description and its effects in music clustering and classification. Finally, concluding remarks and perspectives are presented in Section 6.

Related work
Symbolic music information retrieval has traditionally involved extracting key features (describing its rhythm or pitch) from MIDI and **kern to be used by various classification techniques (Gjerdingen 1990;Grimaldi, Kokaram, and Cunningham 2002;Tzanetakis and Cook 2002). However, the application of a feature-free approach, like K-complexity to MIR, is also interesting because of its broad-range applicability (Li and Vitányi 2019) and proven efficiency in several other information retrieval domains Chen et al. 2004;Zhang et al. 2008). The seminal work of using data compression algorithms like GZIP, BZIP2 or LZ78 as practical lossless compressors of symbolic music was done by , Li and Ronan Sleep (2004) and Cilibrasi and Vitanyi (2005) who approximated K-complexity with the compressed length. Notable extensions on this work include CAEMSI (Ens and Pasquier 2018) where the authors compress music with GZIP and compare two corpora for similarity using statistical significance tests. One important distinction between these studies and ours is that, the former use a definition of NID ) which does not include the notion of conditional K complexity, as there is no feasible way to use a GZIP-compressed string to compress another string. Moreover, compressors like GZIP were designed to achieve a high compression rate of strings across all entropy, which is quite different from highlighting absolute information content in low-entropy strings like music (Kosaraju and Manzini 1999). In an Online Supplemental document, we take a more detailed look at general-purpose compressors for specifying music and compare them with the CFG-based approach we present in this article.
On the other end of the spectrum, algorithms like COSIATEC (Meredith 2016) are specialized compressors for polyphonic music represented geometrically on a two-dimensional plane of onset time and morphetic pitch. It looks for repeated patterns at each step, but unlike CFG, removes their occurrences from the input after compression. It prevents hierarchical structuring, and its complex geometric representation results in a worst-case runtime of O(n 4 ), where n is the length of the music sequence.
The use of CFG in music analysis solves the aforementioned problems and is intuitive as well as simple. Sidorov, Jones, and David Marshall (2014) demonstrated the use of grammars in music analysis and locating transcription error, but in this article we take the first attempt at an in-depth theoretical analysis of CFG-based music description and its potential in approximating an objective information content in music sequences.

Kolmogorov complexity
Formally, the Kolmogorov Complexity, K(x), of a string x is the length of a shortest binary program x * to compute x on a Universal Turing Machine (UTM) U. The UTM acts as a meta-parser and optimally simulates all other Turing machines computing partial recursive functions which are defined only for some inputs and are Turing-computable (Turing 1937). The Turing machine upon receiving an input computes the function on that input and halts with output x; otherwise, computes forever or halts in a non-accepting state if the input is undefined. The program x * thus encodes the Turing Machine along with the input that computes x and has a shortest length among such programs (Li and Vitányi 2019, 201-209): To avoid ambiguity, x * must also be uniquely decodable such that it is not a prefix of another program. This setup is known as the prefix-free coding which prevents x * and x * q : q ∈ {0, 1} * \{ } to be both valid programs and causes the UTM to halt with output x upon reading exactly x * (Li and Vitányi 2019, 73-76). The Kolmogorov complexity, K(x) is uncomputable, in general, but still immensely useful in the sense that, unlike Shannon's entropy, it refers to an object individually and not as a member of a set of objects with a probability distribution given on it.
The combined Kolmogorov complexity, K(x, y) is defined as the length of the shortest Turing Machine that outputs both x and y and a way to tell them apart. Note, K(x, y) ≤ K(x) + K(y) + O(1) since the smallest program p that outputs x and y in a distinguishable way will take advantage of any shared information between them Li and Vitányi (2019, 207).
Finally, the conditional Kolmogorov Complexity K(x|y) of x relative to y is defined similarly as the length of the shortest program to compute x when y is provided as an auxiliary input to the computation. When the input y is replaced with its shortest program, y * , we have K(x|y * ). It is an alternative and a more compact definition of conditional information, as y * conveys more information about y than y itself, given that y * can be seen as an analysis of y (Li and Vitányi 2019, 249-253). Since our objective is to approximate the conditional information amount in x when a compact CFG (a short program that generates only y) of y is provided as an auxiliary input, we use K(x|y * ) throughout our article, approximating y * by a CFG generating y.

Context-free grammars
A Context-Free Grammar is a 4-tuple { , , S, } in which is a finite alphabet containing all the terminals, is a set containing all the non-terminals such that ∩ = ∅. All the elements of ∪ are called symbols. S ∈ is a non-terminal reserved for representing the start symbol.
is a set of rules of the form T → α, where T ∈ is a non-terminal and α ∈ ( ∪ ) * is a string of symbols and referred to as the definition of T (Sipser 1996).
Definition 3.1 The size, |G| of a grammar G, generating a single finite-length string x, is the total number of symbols in all definitions: where |α| denotes the number of symbols in the string α in the traditional sense. Charikar et al. (2005) showed that the smallest CFG, G * (x) for a string x of length n has size |G * (x)| = (log n). To compare |G * (x)| and the size of a grammar G A (x) generated by any algorithm A, we introduce the following definition.
Definition 3.2 The approximation ratio, a(n) of an algorithm A is defined by Thus we consider the worst-case scenario to analyse A's performance. Note that there is no polynomial-time algorithm that has an approximation ratio less than 8569 8568 unless P = NP, and the decision version of this problem is also NP-complete (Charikar et al. 2005). Hence, we focus on a very natural but little-studied class of offline algorithms to generate approximations of G * (x). We refer to these as Global Algorithms and the generated grammars as Global CFG following Charikar et al. (2005).

Global context free grammars
Contrary to conventional dictionary-based compressors which are online and treat sequences sequentially, the global algorithms are offline. That is for a given string x, they start with the grammar S → x and compress it by selecting a maximal string γ at each step (Charikar et al. 2005). Apply rule T ← γ to all the right sides of all other rules: replace each occurrence of γ with T A maximal string γ selected in line 3 has a length of at least 2 and occurs at least twice without overlap as a substring in the definitions α of the non-terminals T. If the γ occurs Occ(γ ) non-overlapping times in the entire grammar, then replacing it with a non-terminal T, will reduce the grammar-size by (|γ | − 1) × Occ(γ ). Adding the rule T → γ will add |γ | + 1 to the grammar size. This means that the loop in line 3 in the above algorithm runs as long as r = (|γ | − 1)(Occ(γ ) − 1) − 2 is strictly positive.

Algorithm 1 Algorithm for Global CFG
Algorithms for global grammars only differ in the way they select the maximal string γ at each step.
For a string x of length n, Charikar et al. (2005) reported that these algorithms have approximation ratio O(( n log n ) 2 3 ). We use suffix trees to find the maximal patterns efficiently (Ukkonen 1995) and the worst-case time complexity for generating a compact CFG is O(n 3 ) (Mondol 2020). However, since the grammar size strictly decreases at each iteration, the lower-bound on the runtime remains (n 2 log n). For the smallest effective CFG for a string x, we first generate all three CFGs mentioned above and pick the one with the smallest size. The next few sections detail the process of using this CFG to approximate K(x).

K(x)
To approximate the K-complexity of x it is necessary to introduce a partial recursive decoder φ, in lieu of a Universal Turing Machine U, which outputs x (denoting a symbolic music sequence in our case) on input p.
The input p, in our case, encodes a compact context-free grammar (the smallest of Most Frequent, Greedy and Longest-First) that describes x in a certain way. Thus, we estimate K(x) as follows: To be precise, consider the Greedy grammar G(x) = { , , S, } of the string x = abcabca bcabcabb: Then p is imagined in the following way, where we use the symbols | as a separator between rules and − as a separator between the non-terminals and their definitions: The encoded CFG p contains the compressed x as its start rule followed by all the necessary definitions for the non-terminals. The decoder φ takes this start rule and quite straightforwardly replaces all the non-terminals in it with appropriate definitions to reproduce x, simulating the exploration of the parse tree of x's CFG. Along with p's production rule set, if the decoder is furnished with conditional information, y * = , which are rules from another grammar, then the definitions of the non-terminals are also searched in y * 's dictionary (Figure 2).

Algorithm 2 φ-General Decoder
Require: p = EncodedCFG(x), y * = empty string, or EncodedCFG(y) 1: Dictionary ← [] 2: S ← Start_Rule(p) 3: export all rules from p and y * to Dictionary 4: while S contains a non-terminal T do 5: Find the definition α of T in Dictionary and replace T with α return S Hence, we approximate the K(x) as where | | + T→α∈ |α| is the total number of symbols (ignoring the separators) in the grammar expressed linearly as above and log | ∪ | is the number of bits allocated to encode each symbol for a binary representation. Following is a generated CFG for the vocal melody of the song Eleanor Rigby by the Beatles where the symbols Ti denote the non-terminals in the grammar and the numbers are MIDI pitch numbers (Figure 3).

K(x, y)
K(x, y) is approximated with K(x#y) where x#y is a single string with # as a placeholder for a separator that does not occur anywhere else in x and y. Thus K(x, y) is approximated similarly as Equation (1) with the global CFG of x#y with the ability to tell the two strings apart.
Here y * encodes an efficient CFG, G(y) = { y , y , S y , y }, for y. We try to compress x as much as possible with the rules in y * : first, we use the parse-tree in the grammar G(y) for the string y to create an ordering {T 1 , T 2 , . . . , T k } of the non-terminals y such that each non-terminal T i succeeds all the non-terminals in its definition. This is always possible as G(y) is acyclic (Charikar et al. 2005) and there is exactly one rule T → α in y for each T ∈ y .
Following {T 1 , T 2 , . . . , T k }, we look for the pattern α i in x such that T i → α i and all the nonoverlapping occurrences can be replaced with T i for i = 1, . . . , k. Once the list is exhausted, for every definition α i of T i ∈ y , α i appears nowhere else in the compressed x. We denote this processed x, as x|y * . Finally, we approximate the smallest CFG, G(x|y * ) = { , , S , } of x|y * following the method described in the previous section. Thus K(x|y * ) is approximated as ( 2 ) Following is an illustration of the conditional compression of "Eleanor Rigby" with the vocal melody of another Beatles song, "I Want to Hold Your Hand" (Figure 4).

Method
With the above settings, we now explore the question of whether conditional information can identify two similar and dissimilar pieces of music with an estimation of the Normalized Information Distance between them.
Definition 4.1 The Normalized Information Distance (NID) between two sequences x and y (Li et al. 2000;Li and Vitányi 2019, 672-674) is defined as It is important to note here that because of the asymmetry of information contained in y * about x and vice versa, K(x|y * ) and K(y|x * ) are not necessarily equal. So we add these conditional complexities and normalize the sum with the combined K complexity of x and y. The numerator K(x|y * ) + K(y|x * ) also denotes the total amount of information needed to convert x to y and y to x both ways: as y * is used to generate x, data might be lost to revert back to y. But with additional K(y|x * ) bits of information y can be recovered, enabling us to losslessly switch between the two objects (Zurek 1989). We prove that Equation (3) is an admissible distance metric up to negligible errors (it satisfies the identity, symmetry and triangle inequalities along with an appropriate density condition) in an Online Supplemental document available with this article. Equipped with such a distance notion, we next apply the CFG-based NID estimate in MIR tasks.

Experiments
We first demonstrate that the CFG-based model assigns, on average, larger inter-genre distances than intra-genre distances (Tables 1 and 2). We achieve this by computing pairwise d(x, y) between randomly selected 20 inter and intra genre members, repeating the process 5 times and taking the average of the computations. This results from the underlying mechanism of reusing CFG rules in conditional compression: about 82% of the intra-genre music comparisons make use of the dictionary rules of the given CFG of another music sequence when fetching conditional information, K(x|y * ), compared to only 18% of the inter-genre comparisons. Moreover, of these comparisons, the intra-genre has on average larger count of reused rules (3.99 rules on average) than inter-genre (1.23 on average), resulting in larger distances among these less-related music sequences. This result implies that CFG-based NID estimates can be used for distance-based musicclustering by placing objects belonging to the same corpus closer together than objects belonging to different corpora. We demonstrate this by multidimensionally scaling an n × n matrix created from pairwise distances d(x, y) between n total music pieces from N corpora. We map the distance matrix on a 2D abstract plane and show that CFG-based NID can detect all N clusters of music when the corpora are divided by genre, composer and even composition style.

Genre specific
First, as a sanity check, we perform the experiment on music from all three genres by selecting two representative composers from each. Figure 5 demonstrates that music from the same genre are placed closer together than music from different genres and the presence of three genrespecific clusters.

Composer (same genre) specific
Adding more granularity to the categories, next we select composers from the same genre and experiment whether the model can recognize different composers. We conduct two experiments in Figure 6, of which one is done with three classical and the other with two rock composers and in each case the model was able to identify and separate the clusters. Note that unlike the genre-clustering, the separation here is less prominent, more so for the rock composers, perhaps resulting from some pattern-based kinship. We averaged the number of dictionary rules that were reused for conditional compression and found that on average 79% of the inter-corpora comparisons reused 2.43 rules for the classical composers, while 82% of the inter-corpora comparisons reused 4.11 rules for the rock composers.

Composition (same composer) specific
The most noteworthy result, however, is that the CFG-based model can differentiate between composition styles of the same composer. Taking Chopin as our reference composer, we conduct two experiments in Figure 7 where the model separates his Études from Préludes and Preludes from Nocturnes. This is a particularly interesting result indicating that the underlying CFG descriptions were able to detect composition style-specific patterns which were used for highlighting similarity.

Method
Extending the given information, y * in K(x|y * ) from being one string to a set of strings, we present another elegant application of conditional information, K(x|G), where G denotes a CFG model generated from a set of similar strings {x 1 , x 2 , . . . , x n }. The complexity K(x|G) can also be interpreted as the amount of irregular or incompressible information in x after the useful or compressible information consistent with the set {x 1 , x 2 , . . . , x n } has been squeezed out using G (Vitanyi 2006). The complexity of G, however, has to be constrained for it to be useful, as otherwise G can be as large as x itself. Thus given a model size parameter α and a set of models, {G 1 , G 2 , . . .}, the goal is to find G that best fits or describes x while K(G) ≤ α. This notion can be formalized with Kolmogorov's structure function (Vereshchagin and Vitanyi 2004): ( 4 ) The G that witnesses h x (α) minimizes the left-out irregularities, K(x|G) in x and thus is the best descriptor of the meaningful information in x. In the domain of music information retrieval, such a concept has a direct analogue: given a set of composers and a music sequence, is it possible to find the composer that best fits the unknown sequence? We show that taking advantage of the dictionary-like structure, we can create a CFG model G of a composer from the sequences of their composed music and examine its fitness for x by approximating K(x|G) or how much G compresses x. Since G = { , , S, } is compact and irreducible (Kieffer and Yang 2000), it can be treated as its own shortest description and we approximate K(G) similarly as Equation (1): Our estimate of K(x|G), then, is the length of the encoded description of an efficient CFG of x after it has been compressed as much as possible by the rules of G.  Charikar et al. (2005) that proves the existence of a CFG with size |G x 1 | + |G x 2 | + 2 that generates the string x 1 x 2 , we see that there exists a CFG of size n + n i=1 |G x i |, where G x 1 = { 1 , 1 , S 1 , 1 }, G x 2 = { 2 , 2 , S 2 , 2 }, . . . . . . , G x n = { n , n , S n , n } that produces the concatenation of the strings in the set above. We could model G to be {∪ n i=1 i , ∪ n i=1 i , S = S 1 S 2 . . . S n , ∪ n i=1 i }, then G will have a size of n i=1 |G x i |. In this setup, K(G) can be quite large according to our approximation above and the provided α can be much less than K(G). Then G needs to be much smaller while still capturing the essence of the strings.
For this, we allow G = { , , S, } to only contain those rules T → γ such that the pattern γ is present in at least α% of the set of the strings. As well as adjusting to meet this restriction, we also adjust and to only those terminals and non-terminals that are present in the new rule set . Consequently, we have a model G that preserves the characteristic common patterns in the composed music sequences of a composer.

Experiments
To demonstrate the model generation and the subsequent compression of x with the generated model, we present here an example. We first generate a model grammar G with α = 20% for 12 music pieces (vocal melodic transcriptions) by The Beatles, as given below: Notice that as expected, the definitions of the non-terminals are short and there are a limited number of rules in for the model. However, some hierarchical structure is found nonetheless.
We now choose two music sequences (vocal melodic transcriptions) x and y, both of which are composed of about 350 MIDI notes (for consistency and a fair comparison): x from the composer The Beatles ("Let It Be") and y from the composer Elton John ("Your Song") to show that K(x|G) K(y|G). The conditional CFG of x with respect to G has use of rules from of G as shown below (we have truncated the grammar for easy observation). While on the other hand, the conditional CFG of y with respect to G does not exhibit the use of any rule from of G as shown below (we have truncated the grammar for easy observation): . . .
To apply the above idea to "best-fitting" composer selection for an uncategorized string x, we selected seven composers belonging to the three genres: Bach, Chopin, Haydn, The Beatles, The Rolling Stones, Miles Davis and John Coltrane. The m music pieces were randomly selected from the music pieces of each composer where m = 8, 16, 20. We then randomly selected nine music pieces from those of the seven composers and computed the fraction of these pieces that were correctly classified by our best-fitting composer selection model. We repeated this process 5 times and averaged the success rate for a fixed α and m ( Table 3). The model did reasonably well in detecting the correct composer for an unknown musical piece x. As m or α increases, the system is more successful in model selection. This happens because the corresponding model G i becomes, in a sense, more informed about rarer patterns in a composer's oeuvre and is able to extract meaningful information in x more accurately. And understandably, as we restrict K(G i ) to be smaller and smaller, the success rate decreases. However, it is useful to have the complexity restriction so that G i does not over-fit and have too many redundant rules that correspond to the randomness of individual strings. To compensate for this, we can have G i represent more strings, that is, we can increase m by keeping α fixed to increase the model-selection success rate. This simple and intuitive generation of models thus reveals the underlying patterns and regularities that permeate music composition for the corresponding composer.

Conclusion
We presented a Context-Free Grammar-based method for describing symbolic music and demonstrated two new applications of such processing in music compression, genre-, composer-and composition-specific style recognition and composer classification. Results obtained on our dataset suggest that compact CFGs can identify and highlight important motifs and their recurrence with simple symbolic representation and thus play a non-trivial role in approximating the information amount inherent in the individual music sequences. Another unexpected but desirable outcome of our study indicates CFG's sophisticated ability in conditional compression, owing to which we could model a composer and use the model in detecting the regularities in an unknown music sequence to classify it. Such a composer model also alludes to a lossy compression of a music sequence created by the composer, where the essence of the music is captured by the model without any individual irregularities. Future endeavours should include an extension of this idea in automated music generation where a music generator for a particular composer is designed based on the composer model. Another exciting approach towards distance measurement between symbolic music: Edit Grammars (Charikar et al. 2002), where a compact CFG for a sequence is produced by applying edit operations on another compact CFG, remain a focal point of the future work.