Learning of construction of finite automata from examples using hill-climbing : RR : Regular set Recognizer

Abstract : The problem addressed in this paper is heuristically-guided learning of finite automata from examples. Given positive sample strings and negative sample strings, a finite automaton is generated and incrementally refined to accept all positive samples but do no negative samples. This paper describes some experiments in applying hill-climbing to modify finite automata to accept a desired regular language. We show that many problems can be solved by this simple method. We then describe the method how to 're-construct' a finite automaton if the positive and/or negative samples are slightly altered, without starting from the beginning. Finally, we have an actual system. RR: Regular set Recognizer, that learns to recognize a regular set from the samples that are given by a human teacher one by one.


Introduction
Consider the following problem: Describe the property that all strings in the right-list have but no string in the wrong-list has.Does a string (1 1 0 1) have this property?
You may answer the question by using any of the following: English, a regular expression, or a finite automaton.(1001000) (000) (1 1 1 1 1 000) (1 001 00) (01 1 1001 101) (1 1000 001 1 1 00001) (1 1 01 1 1001 10) (1 1 1 101 10001001 1100) It might be possible to construct the finite automaton by a "typical" schema-filling method (i.e., finding rough property in the samples first, comparing these strings carefully).However, in this paper, we try to construct the finite automaton directly by searching in the problem space (i.e., the set of all finite automata) using hill-climbing, rather than by analyzing the samples carefully.One of the biggest advantages of hill-climbing is its simplicity, that is, we do not have to know our problem space well, while a "typical" schema-filling method requires us to provide all possible schemas, and therefore to know everything about our problem space.We shall see that hill-climbing works much better than expected in our problem space, and in fact solved most of the problems.

The finite automata used in this paper
We restrict our problem domain to be only over {1,0}*.Furthermore, since every non-deterministic finite automaton has an equivalent deterministic finite automaton (see [Hopcroft 79]), we deal only with deterministic finite automata, that is, there is at most one 1-arrow and one 0-arrow from each state.Thus, in this paper, the terms "finite automaton", "automaton" or "machine" all mean "deterministic finite automaton".Given a string s, if there is a transition from the initial state to any of the final states, then s is accepted by the machine, otherwise s is rejected.
For example, the machine of the sample problem is shown in figure 1-1.

The problem
We now are ready to describe the problem precisely.Given a right-list (a set of positive sample strings) and a wrong-list (a set of negative sample strings), we can think of the following three tasks: 1.To find a machine that accepts all strings in the right-list but none in the wrong-list.2. To find a machine with n states that accepts all strings in the right-list but none in the wrong-list.

To find the machine with fewest states {simplest machine) that accepts all strings in the right-list but none in the wrong-list
The first task is trivial because one can easily construct a trivial machine that accepts exactly all strings in the right-list but nothing else. 2 The second task and the third task are shown to be IMPcomplete problems by [Gold 74].We call the second task construction of finite automata, and the third task simplification of finite automata.

Past Work
(1 am a r ra 1 amar) (1 a nr 1 a id a r r i a) (1 1 a m a r m a r m a) (ami a n 1 a i a r r) (ail 1 a m a r at a r)

S <-S1mS1
Their system first constructs a "trivial" grammar, and then simplifies it.As we can see In the rest of this chapter, we present the 7 sample problems, that we will consider throughout this paper (7 sample problems and their inverses).
In chapter 2, we present the result of an experiment jn constructing finite automata with n states using hill-climbing, in particular, we let n = 8.We shall see that all 14 sample problems can be solved by this method.
In chapter 3, we present the result of an experiment in simplifying the finite automata which we have found in chapter 2, also using hill-climbing.We shall see that we can find the simplest machine for most of the problems by this method.
In chapter 4, we discuss re-construction of finite automata, that is, how to re-construct a finite automaton if the right-list and the wrong-list are slightly altered.We might not want to construct it from the beginning.Rather, we want to construct the new machine by modifying the previous machine.

Finally, we have an actual system called Regular set Recognizer
[RR], using the techniques above.RR learns to recognize a regular set, given examples by a human "teacher".We present several sample runs as well as a user's manual, in chapter 5.   5. any string of even length which, making pairs, has an odd number of (0 1) or (1 0)'s.

t .5 Sample Problems
6. any string such that the difference between the numbers of 1 's and O's is 3n. 7.0*1*0*1*.

Finite Automata of Solutions
The machines corresponding to these solutions are as follows.We also consider the inverse problems of these sample problems.The inverse problems are created by exchanging the right-list and wrong-list.We use these 14 problems in our experiments and refer to the inverse problem of problem 1 as problem 1-, the inverse problem of problem 2 as problem 2-, and so on.

Construction of Finite Automata
In this chapter, we describe an experiment in constructing a finite automaton with n states from a given right-list and a wrong-list using hill-climbing.In particular, we let n equal 8.We shall see that each of the 14 problems can be solved in at most a few thousands steps.

Algorithm
The hill-climbing algorithm of this experiment is shown in figure 2-1.We first construct a random machine with 8 states.We next make a copy of this machine, where the copy is slightly altered from the original by an operator mutate.
We compare the new machine with the original by an evaluation function E. The better machine is called current generation and we make a copy of this machine, and so forth.The worse machine is simply discarded.The operator mutate and the evaluation function E are defined more precisely in the following.

Hill-Climbing vs. Exhaustive Search
To see how effectively our hill-climbing algorithm has performed, we compare our method with an exhaustive search.There are (9 x 9 x 2) 8 » about 5 x 10 17 machines in our problem space.We now want to know the number of the desired machines in our problem space, so that we can calculate the expected number of steps until the exhaustive algorithm finds the first desired machine.This can be done by the following "sampling" method: take one machine in the problem space randomly, and test if this machine is the desired machine; repeat this procedure 100,000 times.
We show the expected number of steps using the exhaustive search calculated by this procedure in figure 2-5.Although the exhaustive search works better on "easy" problems, it is obvious in general that our hill-climbing works much better than the exhaustive search.So far, we fixed the number of states to be 8.In this section, we shall try the same experiment with different numbers of states (4 -10).Figure 2-6 shows the result of this experiment In the table, indicates "it could not solve within the given time".This can happen when the hill-climbing algorithm climbs a "local hill".This table implies that the number of states n should be reasonably large to avoid climbing a local hill, and we can hardly get the simplest machine by this method.We shall, however, see that we can simplify the machine with 8 states that we have gotten in this chapter, so that it becomes the simplest machine.21  In the previous chapter, we saw that our hill-climbing method successfully produced a machine that accepts all strings in the right-list but no string in the wrong-list.However, the final machine of the result of problem Z, for example, does not accept our desired regular set (1 0)*.For instance, it does accept a string (110 0), which is not in (1 0)*.We therefore want the machine to be "generalized" so that it accepts exactly (1 0)*.In fact, the final machines of all problems except problem 1, 3 and 7, need to be generalized.

Discussion
We define the generality of a machine in terms of its simplicity.The simplicity of a machine is determined by the number of states the machine has, and if two machines have the same number of states, a machine with fewer arrows and final states is simpler.
Our task is to simplify the machines we have obtained in the previous chapter, so that the machines become the simplest or the most general .We call this task simplification of finite automata, and it can be also done by using a hill-climbing method.PZ ((0 4 1)(3 0 0)(6 0 1)(1 5 0)(2 2 0)(5 6 1)) P3

Simplification Algorithm
The algorithm for simplification is similar to the algorithm described in the previous chapter.The maioT differences are as follows: (1) the evaluation function E(M) returns a h.gher value .f the machine IM *^p.er; (2) if M does not accept some strings in the right-.ist,or does accept some strings in the wrong-list, E(M) returns minus infinity; (3) the algorithm starts with the minimized final machine of the previous experiment instead of a random machine; (4) whenever a "useless state" (i e a non-final state with neither 0-arrow nor 1 -arrow) is found, delete it.

Results
A sample trace of problem 2-is shown in figure 3-2.Each line corresponds to current generation M, and the right-most number is the cumulative number of steps.The final machine of this trace is the desired simplest machine.
The final machines of all 14 problems are shown in figure 3-3.We see that some problems could not be simplified completely within the given time, probably because the search was climbing a local hill.

Hill-Climbing vs. Exhaustive Search
We compare our method with an exhaustive search.The exhaustive search enumerates ail machines in the order of simplicity, and the first machine that accepts all strings in the right-list but none in the wrong-list is considered the simplest machine.Thus we can calculate the expected number of steps until the exhaustive search finds the desired machine 5 .The result is shown in figure 3-4. 65 Let n be the number of states of the simplest machine.Then the expected number of steps S is: where tf. is the number of ail possible machines with j states, that is,   We have seen that our hill-climbing works rather successfully, although some problems could not be simplified completely.Our method consists of 2 parts, the construction process (chapter 2) and the simplification process (chapter 3).That is, we first construct a machine with 8 states and then simplify it.One might suppose that we could get the simplest machine using only the construction process, by choosing the number of states sufficiently small.Unfortunately, in the previous chapter, we showed that the number of states should be reasonably large, and we cannot do that.One might also notice that we would not need any construction process, because we can easily construct a trivial machine, which accepts exactly all strings in the right-list but nothing else. Figure 3-6 is an example of the trivial machine.In this section, we describe some experiments to try to simplify from the trivial machine.We shall see that to simplify from the trivial machine is much less effective than our construction-simplification method.The result of the experiments is shown in figure 3-7.When we compare figure 3-3 and figure 3-7, it is obvious that our construction-simplification method is more effective than the second method.

R3-conctruction of Finito Automata
So far, we have described a method for constructing the simplest Finite Automaton from given examples.Suppose we have solved one problem, and are given another problem whose examples are very close to the previous one.To solve this new problem starting from the beginning is rather tedious because we already have some information about the solution.In this chapter, we describe how to re-construct a finite automaton if the right-list and/or wrong-list is slightly altered.
After the sample lists are altered, if the machine still accepts all strings in the right-list but no strings in the wrong-list, the previous solution is the new solution.If the machine does not accept some strings in the right-list, and/or does accept some strings in the wrong-list, we refer to such strings as inconsistent strings.Whenever we find an inconsistent string in the right-list, we call a procedure, add-trivially, which revises the machine, so that it accepts all strings in the right-list.On the other hand, whenever we find an inconsistent string in the wrong-list, we call a procedure, cut-wrong-arrow, which revises the machine, so that it accepts no string in the wrong-list.Although after calling add-trivially there is no inconsistent string in the right-list, there may now be another inconsistent string(s) in the wrong-list.In this case, we call cut-wrong-arrow.Similarly, although after calling cut-wrong-arrow there is no inconsistent string in the wrong-list, there may now be another inconsistent string(s) in the right-list.In this case, we call add-trivially.Thus, we call add-trivially and cut-wrong-arrow again and again.
We first define add-trivially and cut-wrong-arrow, and then we show that our process always terminates, producing the desired machine that accepts all strings in the right-list but no string in the wrong-list, although the machine is not the simplest.

Add-trivially
The purpose of this add-trivially routine is to accept an inconsistent string in the right-list, no matter how many strings in the wrong-list the machine comes to accept.We first define trivial state and trivial path, then finally we define add-trivially.
Definition: In each machine, we consider that there is a special arrow named starting arrow, which always points to the initial state q r Definition: If more than one arrow (including the starting arrow and the one from q itself) point to a state q, then q is called a non-trivial state.If only one arrow points to q, then q is called a trivial state.
Definition: A sequence of states q i(1) iq i(2) , q j ( k ) is called a path of a string a y a 2 a k V where each a. is in {1,0}, iff for all j such that 1 < j < k-1, if a. = 0 then A. (j) = i(j + 1) else B. (j) = i(j + 1).
% k ) i s called a trivial path, iff this sequence is a path, and for all j such that 2 < j < k, q j ( j ) is a trivial state, and for all j such that 2 < j < k-1, q i f ) is a non-final state, and q j(k) is a final state.This path accepts only one string.
1 That the machine M does not accept a string cr r a 2 , a R means either of the followings: 1.There is a path of a r a 2 ,....,a k , but the last state is a non-final state.
2. There exists an integer j such that there is a path of a r ...,a.^ , but the last state of this path does not have an a.-arrow.
For each inconsistent string in the right-list, add-trivially works as follows: in case 1. let the last non-final state be the final state; in case 2. create a trivial path from the last state so that the machine accepts the whole string.
It is easy to show that after calling add-trivially the machine accepts all strings in the right-list.
However, it also may come to accept some strings in the wrong-list, as we mentioned before.In this case, we call cut-wrong-arrow defined below.

Cut-wrong-arrow
If there are some inconsistent strings in the wrong-list (i.e. the machine does accept the strings), we call cut-wrong-arrow so that the machine comes to accept none of these strings, no matter how many strings in the right-list the machine comes to reject.
For each inconsistent string in the wrong-list, cut-wrong-arrow works as follows: Let q j(1) ,q j(2) ,....,q j(k) be a path of the string w that should not be accepted.To reject w, one of the arrows of the path must be cut.Let q j(n) be one of the non-trivial states in the path. 7Cut the arrow from q Kn 1) t0 q i(n)' lf q i ( initial state ) is the on 'y non-trivial state, then let the machine M be ((0 0 0)), wnicn does not accept anything.
It is easy to show that after calling cut-wrong-arrow all strings in the wrong-list are rejected, although the machine may come to reject some strings in the right-list.In this case, we call addtrivially.

Termination
In this section, we show that the algorithm above always terminates.
Theorem: The algorithm above always terminates.
Proof: Consider the following partial ordering: non-triviality of state: the number of arrows which point to the state.
non-triviality of machine: total of non-triviality of all non-trivial states.
We denote this by Jl£(M), where M is a machine.Note that nt(M) = 0, iff M is a trivial machine.
Let M' be the result of adding-trivially to M, then nt(M') = nt(M), because add-trivially adds only a trivial path.Next, let M' be a result of cut-wrong-arrow over M, then /it(M') < nt(M), because we always cut the arrow that points to a non-trivial state q, and non-triviality of the state q decreases, and therefore non-triviality of machine also decreases.Thus, we cannot have an infinite loop, addtrivially, cut-wrong-arrow, add-trivially, cut-wrong-arrow, add-trivially, , because nt(M) always decreases but nt(M) > 0. <end of proof> Fi nal , v, we descri be an actual system.RR, that learns to construct finite automata.RR is running in MACLI SP ei tner on CMU-2GC or CMU-10A.
RR has a machi ne (finite automaton) and each time RR is given a string in (1 + 0)" as its input, RR runs the machine with the string given.If the machine accepts the string, RR answers ACCEPT, otherwise it answers REJECT.At the very beginning, RR has a null machine, which accepts nothing, and therefore RR does not accept any string at all.Now, consider some regular set R that we want to teach to RR.When we input a stri ng s to RR, it should accept s if and only if s is in R. If s in not in R, RR shoul d, rej ect it.Whenever RR answers incorrectly, we scold it.When RR answers correctly and we think this example is important 8 , we encourage it.When RR is scolded or encouraged, it memorizes the fact that the string must be accepted or rejected, that is, if it is the case that the string must be accepted.RR puts it into right-list, which is a set of strings that must be accepted, and similarly, if the string must be rejected, RR puts it into wrong-list.After memorizing, RR re-constructs 9 the machine in the way described in chapter 4, so that it accepts all strings in the right-list and none in the wrong-list.After each re-construction, RR simplifies the machine in the way described in chapter 3.In this section, we describe hew to execute the RR system, and in the following section, we show several sample runs.
Then you get prompt "»>" and are in the RR system.

How to teach
• Giving example: The format for giving an example to RR is the following: Typical input is: »>(1 0 1 0 1 0 1 0) RR then outputs the answer, either ACCEPTED or REJECTED.
• Scolding: To scold for a wrong answer, input n right after the wrong answer.

»>n
• Encouraging: To encourage RR, input y right after the answer.

Other Commands
• r: show present right-list.
• w: show present wrong-list.
• m: show present machine.
• o: show order of memorized strings.
• t: show runtime of each step and total runtime.

Sample Run 1:
As the simplest example, let us teach the regular set 1* to RR.The desired machine is: The underlined strings are user's inputs, and the Italic strings are comments.
We taught it in this order.+ means "in right-list".

Time spent to teach (+) and ( + 1).
Total time in seconds to learn l'.No, null should be accepted.

Sample
Show the present machine.
This machine accepts nothing but a null string.
No, this should be accepted.
This machine is 7 *.
No, this should be rejected.

This machine is (1 1 1)'.
All right, it should be accepted.The total time is much shorter.

Sample Run 4:
We next try problem 3, which is very hard.This reguiar set is: Any strings without odd number of consecutive O's AFTER odd number of consecutive l's.

Sample Run 5:
We now try the previous run again with a more effective ordering.

Discussion
We saw in the previous section that the run-time of sample run 3 is much shorter than the run-time of sample run 2, and also sample run 5 is much faster than sample run 4. Thus, RR is very sensitive to what is given as examples, and how these are ordered.In this section, we are interested in how to teach RR effectively.
First, we consider the worst case and the best case of re-construction.In the worst case, RR calls add-trivially and cut-wrong-arrow again and again, and eventually its machine becomes the trivial machine. 10We know that a trivial machine can be constructed easily without such a special technique as re-construction.
On the other hand, the best case is that RR calls add-trivially once but no further cut-wrong-arrow.Thus, in order to "teach" the RR system effectively, we have to choose the examples nicely so that RR can re-construct its machine only by add-trivially.For instance, the example inputs of sample run 3 and sample run 5 are so chosen, and their run-time is in fact very short.Also, to avoid calling cut-wrong-arrow, we had better give the negative examples earlier.

Concluding Remark
Our new approach to construction of finite automata from given examples has been shown to work very nicely, despite the fact that its algorithm is quite simple.In chapter 2, we saw that construction of finite automata with n states can be nicely done using hill-climbing if n is a reasonable number.In chapter 3, we saw that we could often simplify the resulting machine of chapter 2 also using hillclimbing, although some problems could not be solved.In chapter 4, we discussed how to utilize past work, if a given problem is very close to the past problem.The RR system, which uses these techniques, was introduced and described in chapter 5. Finally, we enumerate several extensions of this work.
• Our hill-climbing algorithm sometimes climbs a local hill, and therefore fails to find a correct solution.There are several ways to avoid climbing a local hill, and one of them is adaptive search [Cavicchio 70], [Holland 75].Adaptive search can be considered as a powerful version of hill-climbing.There are not only one "current generation", but usually a population of 20-30 .The best five or so are chosen as winners (the others are discarded) and 15-25 slightly-altered copies of them are made as the new population.Altering way is not only mutation, but also cross-over (mix two and produce one), inverse (inverse a certain part of one) , 11 and so on.This approach becomes really powerful if parallel computation is available.
• Our finite automata have been deterministic, that is, arrows either exist or do not exist.
The operator create-arrow or delete-arrow often makes too much difference to climb hill smoothly.The idea is to let our finite automata be probabilistic, that is, an arrow exists partially with a real number between 0.0 and 1.0, which indicates a probability of existence of the arrow.(See [Rabin 63].)In this case, we increase or decrease the real numbers, rather than create or delete an arrow.This method might help to climb hills smoothly.
• Our mutation function might be modified so that the mutation does not take place completely randomly,, but somewhat "cleverly".For instance, if the machine accepts a string in the wrong-list, then delete-arrow or decrease-prob-of-arrow should take place more often on this wrong path than on others.Our idea becomes more concrete if we deal with the probabilistic automata described in the previous paragraph.If the machine somehow accepts a string in the wrong-list, then we should decrease all probabilities of the arrows on this path.If the machine accepts a string in the right-list, we increase the probabilities on this path, etc.
• Our problem domain in this paper has been regular sets.It might be possible to extend it to context-free sets by constructing Push-Down Automata (finite automata with stack, see [Hopcroft 79]).Since construction of Push-Down Automata must be much harder than finite automata, we would definitely need techniques just listed.
• A finite automaton can be viewed as a program that takes a string as its argument and outputs TRUE or FALSE.Therefore we might be able somehow to apply our method to automatic programming from specification by examples.

INTRODUCTIONFigu
Figu re 1 -1: The machine of the sample problem

FeldmanFigure 1 - 2 :
Figure 1 -2: Sample Strings and BNF grammar produced by Feldman's system , their system requires us to provide nicely-chosen examples, and it cannot solve from poorly organized examples such as the problem we introduced at the beginning.Bierman and Feldman then built a system that constructs a finite automaton from given examples.Although it takes only positive examples, they showed an application to the case where both positive and negative examples are given.Their algorithm also requires nicely-chosen examples, and they showed the method to choose the examples from a regular set "nicely", so that it always turns out the simplest machine.However if the examples are not nicely-chosen, as in the problem we introduced at the beginning, their system hardly turns out the simplest machine.Apart from the grammatical inference, there has been a good deal of work on discovery of a regularity or a common pattern in the given examples that are not necessarily nicely-chosen ([Langley 81a] [Langley 81b] [Buchanan 76] [Hayes-roth 77] [Michalski 73] [Vere 75] [Winston 70]).INTRODUCTION 1.4 Overview of the Paper

Figure 2
Figure 2-1: Flowchart of the Hill-Climbing its argument, the operator mutate chooses one digit randomly, and replaces it by another digit. 4That is, the mutation in our algorithm is randomly one of the following: delete an arrow, insert an arrow, change the destination = < A. =<8;0 =< B. =<8;and0 = <F.=< 1. CONSTRUCTION OF FINITE AUTOMATA of an arrow to another destination, make a non-final state into a final state, and make a final state into a non-final state.Evaluation Function E: The evaluation function E takes a machine as its argument and returns r -w, where r is the number of strings in the right-list accepted by the machine, and w is the number of strings in the wrong-list accepted by the machine.If r -w < 0 then it returns 0.
-2 the trace of the experiment of problem 3, to see how bur algorithm gradually refines a random machine into the desired machine.Each line corresponds to the current generation M. The column E indicates E(M), and G indicates the cumulative number of generation.The final machine of this trace accepts all strings in the right-list but none in the wrong-list of problem 3 (figure 2-3).

Figure 2 - 5 :
Figure 2-5: The number of Steps to get the desired machine

Figure 2
Figure 2-6: The Number of States and Runtime [sec] ^The number of steps using hill-climbing in this figure is the sum of the number of steps to construct the 8 state machine and the number of steps to simplify it into the simplest machine.

Figure 5 -
Figure 5-1 shows a flow chart of the RR system.
right, no scolding.Next try this.Accepted, all right, it should be accepted.Maybe we've got 7 , let us look inside the machine.
Run 2: Let us try to teach a harder automaton, problem 4.This regular set is:The difference between the number of O's and the number of lus try null, which should be accepted.
, it should be rejected.Particularly, encourage it.
10 A trivial machine is a machine that accepts exactly all strings in the right list and nothing else.See chapter 3. 40 CONCLUDI NG KEMARK 11 The cross-over operator acts on a pair of strings by breaking each string at some point and rejoining the subsegments from different strings.The inversion operator makes two breaks, inverts the inner segment and then reioin the string.41 Figure 3-4: The Number of Steps to obtai n the si mpl est machine