A Pattern Classification Approach to Evaluation Function Learning

We present a new approach to evaluation function learning using classical pattern-classification methods. Unlike other approaches to game-playing where ad-hoc mediods are used to generate the evaluation function, our approach is a disciplined one based on Bayesian Learning. This technique can be applied to any domain where a goal can be defined and an evaluation function can be applied. Such an approach has several advantages: (1) automatic and optimal combination of the features, or terms, of the evaluation function; (2) understanding of inter-feature correlations; (3) capability for recovering from erroneous features; (4) direct estimation of the probability of winning by the evaluation function. We implemented this algorithm using the game of Othello and it resulted in dramatic improvements over a linear evaluation function that has performed at world-championship level.


List of Figures
shows a sample board with legal moves (for Black) to C6, D6, D2, E6, and G2; (c) shows the board after Black plays to E6.    iii 18 Table 4-2: Percentage of agreement between two versions of BILL and the move that guarantees the 18 largest winning margin. Most successful game-playing programs employ full-width search that applies a heuristic evaluation function at terminal nodes [14] [1] [11] [7], A typical evaluation function has the form:

List of Tables
where Eval is the static evaluation of a board configuration, and linearly combines a number of features (F t F, ---F^ weighed by coefficients (C^ C, • • • CJ. Each feature is a well-defined measure of the "goodness" of the board position. In chess, reasonable features might be piece-count advantage, center control, and pawn structure. In Othello, reasonable features might be mobility, edge position, and disc centrality.
The above formulation suggests three ways to improve a game-playing program: 1. Finding a superior search strategy.

Combining the features appropriately.
We believe that the first two ways are already well-understood. Researchers have proposed several new full-width search strategies, such as Scout [10] and the zero-window search [1]. Unfortunately, these techniques provide only constant improvements in an exponential search space [7]. Selecting good features is extremely important. However, good features are usually not very difficult to derive from expert knowledge 1 .
At present, the strongest game-playing programs are relying on fast hardware instead of new search algorithms [3] [1], and efficient feature analysis instead of discovering new features [1] [7]. Therefore, we suggest that research in full-width searching and feature selection has reached the saturation point, and that the success of future game-playing programs will depend crucially on how well the features are combined. In this paper, we will introduce an algorithm based on Bayesian Learning that automatically combines features.
Unlike feature selection, feature combination is a very unintuitive process. On the one hand, one must establish a precarious balance among diversified strategies (such as choosing weights for positional advantage or piece advantage in chess). On the other hand, one must attend to interaction between related features (such as pawn structure and king safety in chess). Furthermore, one always faces the dilemma of having either too few features, or many features that include correlated or redundant ones.
Samuel [12] was the first to propose algorithms that automatically combine features. He introduced a linear evaluation function learning algorithm, and subsequently devised a non-linear learning algorithm based binding good features automatically is a very difficult problem, but it is beyond the scope of this paper.
it was defeated from nearly even initial positions. The average final score from this experiment was 37 to 27.
Wc show that this gain is equivalent to two extra plies of search. In another experiment involving Odiello problem solving, BILL 3.0 solved 11% more problems than BILL 2.0 using eight plies of search.
In Chapter 2, we will first discuss conventional ways of constructing evaluation functions and their shortcomings, with emphasis on Samuel's work. In Chapter 3, Bayesian Learning and its application to evaluation function learning is described in detail. In Chapter 4, the results from Odiello are presented.
Chapter 5 contains analyses and discussion of the Bayesian Learning of evaluation function. Finally, Chapter 6 contains some concluding remarks.

The Role of the Evaluation Function in Search
The fundamental paradigm in game-playing programs has changed very little since Newell, Simon, and Shaw's discovery of the alpha-beta relationship [9]. Almost all programs still rely on full-width alpha-beta search, and all programs still use static evaluation at terminal nodes.
Since most programs employ similar search strategies, the evaluation function plays the most crucial part in game-playing programs. The evaluation function embodies the knowledge of the program, and is responsible for differentiating good moves and positions from poor ones. Furthermore, since most programs rely on the evaluation function for move ordering, a good evaluation function leads to a more efficient search.
The static evaluation includes two stages: (1) evaluating certain features of a board position, and (2) combining these feature scores into an evaluation. Selecting the features is a domain-dependent task, and cannot be systematically studied. In this study, we will focus on the combination of the feature scores into an evaluation. In particular, we will later present an algorithm that accomplishes this task automatically.
Traditionally, a static evaluation is a linear combination of the features as shown in Equation (2.1): - where Eval is the static evaluation of a board configuration, and is a linear combination of features There are two problems with this representation. First, it assumes that the features are independent, and that they can be combined linearly. This is clearly a false assumption. In fact, we will later show that every pair of features are correlated to some degree. Second, the coefficients are usually derived by ad-hoc methods. In many cases, the implementor guesses these coefficients from his domain knowledge. Even when the implementor is knowledgeable, it is difficult to derive these coefficients because humans do not think in terms of alpha-beta search and static evaluation. When the implementor is not knowledgeable, he will be clueless. This was the initial motivation for Arthur Samuel, a novice checkers player, to write a checkers learning program.

Samuel's Evaluation Function Learning Experiments
One of the earliest and the most intensive studies on machine learning was conducted by Arthur Samuel in the domain of checkers from 1947 to 1967 [12] [13]. His objective was very similar to that of this study, namely, given a set of feature values of a board position, assign a score which measures the goodness of the position. Although he performed many experiments, we will focus on the two most important ones: (1) polynomial evaluation learning through self-play, and (2) nonlinear signature table evaluation learning through book-move. In the two subsequent sections, these two procedures will be described and evaluated.

Polynomial Evaluation Learning through Self-play
In polynomial evaluation learning [12], Samuel arranged to have two copies of the checkers programs play against each other, and learn the weight for each feature in a linear evaluation function. One copy of the program, Beta, uses a fixed function throughout a game. The other copy, Alpha, continuously improves its evaluation function. Alpha learns by comparing its evaluation to that of a more accurate evaluation, which is derived by the use of a minimax search. If the search returns a sufficiently higher value than the static evaluation, it is assumed that the static evaluation is in error. Each negative feature in the static evaluation is penalized by lowering its weight Alpha and Beta are originally identical, and Alpha continuously improves its weights. After each game, if Alpha defeats Beta, it is assumed to be better, and Beta adopts Alpha's evaluation function. Conversely, if Alpha loses three games to Beta, it is assumed to be on the wrong track, and the coefficient of its leading term is set to 0 in an attempt to put it back on the right track. Furthermore, manual intervention was used to restore previous states if "it becomes apparent that the learning process is not functioning properly." The resulting evaluation function from this learning algorithm seemed to stabilize after a number of games when Alpha consistently defeated Beta. The final program was able to play a "better-than-average" game of checkers. This learning procedure was one of the first examples of machine learning. However, its validity is predicated upon several incorrect assumptions.
The first of these assumptions is that a good evaluation function can be defined as a linear combination of independent features. This assumption is false in general, and is particularly wrong with Samuel's learning procedure because he intentionally collected redundant features. If two identical features were considered by Samuel's procedure, both would be assigned the same weight, resulting in over-estimation of the feature's value. Furthermore, a linear evaluation function is unable to capture the relationship between features. By repeating Samuel's experiments, Griffith [5] showed that better performance was achieved by an extremely simple heuristic move ordering procedure.
The second assumption is that when search and static evaluation disagree, the static evaluation must be in error. While deep searches are more accurate than shallow searches in general, there are several ways this assumption can fail. It is possible that a problem with a position can only be discovered by looking several plies ahead. In this case the static evaluation should be allowed to remain in error, as it is usually impossible to program such look-ahead knowledge into it. Furthermore, searches could suffer from the horizon effect [2], resulting in inaccurate evaluations.
The diird assumption is that if the evaluation function is found to be overly-optimistic, any positive component is assumed to be in error. This is clearly incorrect because in most positions, each player is ahead in some features and behind in others. The erroneous evaluation may be due to components that are negative, but not negative enough. Just checking the sign is simply not adequate. Samuel recognized these problems, and subsequently devised another learning procedure that remedied most of them [13]. In order to handle nonlinear interactions among the feature, he introduced signature tables', and in order to cope with the incorrect assumptions in self-play, he used book moves.
Signature tables are multi-dimensional tables that combine features together nonlinearly. Each    However, there are a number of new problems with the signature table approach. First, this approach makes two assumptions: (1) the book move is always the best move, and (2) there are no equally good alternative moves. Since these were expert games, and games from the losing side were not used, they are reasonable assumptions, but certainly not infallible ones. These problems could probably be minimized by using a sufficiently large training database.

5-
A major problem is that considerable accuracy is lost during the quantization process. Extreme quantization is very risky. For example, if one were to quantize material difference in chess into 3 or 5 values, how could one expect to make the right moves when the loss of a bishop may be equal to the loss of a queen?
Moreover, extreme quantization violates Berliner's smoothness principle [2], thereby introducing the blemish effect In Berliner's words, a very small change in the value of some feature could produce a substantial change in the value of the function. When die program has the ability to manipulate such a feature, it will frequently do so to its own detriment The arbitrary quantization into so few levels has precisely this problem.
In spite of its sacrifice of smoothness, signature table learning still does not provide a general solution to the linearity problem. Consider the case where two features were identical (or highly correlated). If these features, Fl and F2, were organized as in Figure 2-2a, their redundancy will be successfully eliminated in Table 1. However, if they were organized as in Figure 2-2b, Fl would contribute to Table 1, and F2 would contribute to the Table 2. Furthermore, Table 3 could not eliminate this redundancy because it could not know whether its two inputs are affected by Fl and F2 or the other features. This results in over-estimation of the utility of this feature. For the same reason, we believe the higher level signature tables to be of little utility as they cannot identify the contributors to their inputs, and consequently cannot handle inter-table correlations.. As a result, it is important to carefully arrange the structure of the signature tables so that covariances are captured. Furthermore, it is necessary to determine the range and levels for the quantization of each level, the initial values in the cells, the frequency that they must be updated, and many others. This is what we consider to be the greatest flaw of both procedures, namely, excessive human initialization and interventions are required. Each parameter adds the possibility of human errors affecting the learning process, and increases the amount of time needed to derive and test these values. The result is either poor learning caused by human errors, or acceptable learning as a result of excessive human intervention.
All of this heuristic tuning made Samuel's procedure domain dependent. If one wanted to implement his procedures in another domain, considerable additional domain knowledge must be used in the learning algorithm, and considerable effort must be invested in die trial-and-error modification of the parameters. This is clearly undesirable.
Finally, the concepts learned by Samuel's algorithms are suboptimal. His self-play algorithm learns to distinguish good features from bad features based on search. His book-move algorithm learns to distinguish moves chosen by experts from moves not chosen by experts. However, in a game-playing domain, the optimal concept to learn is that which distinguishes winning positions from losing positions.
In summary, while Samuel's studies were a milestone in the early years of machine learning, we believe that the amount of supervision makes them impractical and domain dependent. These major flaws, coupled with Samuel's problematic assumptions, smoothness problems, and suboptimal learning severely limited the practicality and applicability of his procedures.

Other Work on Automatic Feature Combination
Griffith, who originally conceived of the idea of signature tables, reported a number of evaluation function learning results in checkers [5]. He compared linear evaluation function learning, two variants of signature table learning, and a heuristic move ordering algorithm. He showed that the heuristic move ordering algorithm, which has only extremely rudimentary checkers knowledge, outperformed the linear evaluation function, but was outperformed by the signature table algorithms.
Another study was undertaken by Mitchell [8], who used regression analysis to create a linear evaluation function in Othello. The resulting program did not play as well as was hoped. We conjecture that this is due to the lack of nonlinearity in his program.
Both of these results indicate the inadequacy of linear evaluation functions. However, Griffith and Samuel's signature table approach is also not adequate because of problems with smoothness and the excessive tuning required.
where / is the matrix transpose operation.

Bayesian Learning of Evaluation Function
In this chapter, an algorithm that automatically combines the features is presented. First, Bayesian Learning is introduced for readers unfamiliar with the concept. Next, the game of Othello and the Othello program BILL are briefly described. Finally, we present the evaluation learning algorithm, which is based on Bayesian Learning, and applied to the domain of Othello.

Bayesian Learning
Bayesian Learning of discriminant functions is a standard technique used in pattern recognition.
Typically, it is applied to recognition and classification of concrete objects, such as characters, images, speech and seismic waves. Assuming multivariate normal distribution, the discriminant function defines a decision boundary between classes. In a two-class problem, all points on this boundary are equally likely to belong to either class. This boundary is automatically computed from the features vectors of the training data, and takes into consideration variance and covariance of the features. It is composed of two stages, namely, training and recognition.

The Training Stage
The training stage is a straight-forward parameter estimation stage. A database of labeled training data is required. Each training sample consists of a feature vector and a label indicating which class the feature vector belongs to. The task of the training stage is to estimate the mean feature vector and the covariance matrix for each label (or class) from the training data.
The mean vector for a f class c, /A^ is simply the arithmetic average of each feature value for each training sample labeled as this class, and is computed as follows: where N is the number of times class c was observed, and x. is the r feature vector for class c.
To determine which class a new feature vector belongs to, g is computed for each class, and the new input is assigned to the class with the greatest g. Furthermore, if probabilities, P 9 are preferred to making hard decisions, they can be derived by simply normalizing g(x):

The Game of Othello and Bill
Before explaining how Bayesian Learning can be applied to evaluation function learning, we first briefly describe the domain of our experiment, Othello.
Othello is a game played on an 8 by 8 board between two players, black and white. The board is initially set up as in Figure 3-1. Black starts the game by placing a black disc (a playing piece with the black side up) on any empty square on the board which allows white's disc(s) to be flipped. Each white disc captured between this black disc and any other black disc is flipped to black. The players alternately place discs on the board until neither player can make another move. The player with the most discs is declared the winner.
Othello has been a very popular game for computer implementation because of its relatively small branching factor, and the relative ease of writing programs that play reasonably. This ease is due to the 3. Weighted Squares -Measures the goodness of each square occupied by each disc of the player. Both static measures (central squares are better than squares next to the corner) and dynamic measures (surrounded discs are better than peripheral discs) are used.
4. Edge Position -Measures the player's edge position using a table that contains every combination on each edge. This table is generated by a probabilistic minimax procedure [7].
The interested reader is directed to [11] or [7] for analyses of Othello strategies. BILL 2. 0 combines these features together linearly. The weights in the linear combination were determined by creating ten different versions and holding a tournament among these ten versions. The version that won the tournament was chosen as the final set of coefficients. We will apply Bayesian Learning to the same four features, and measure the strength of the resulting program by comparing it against BILL 10.

Evaluation Function Learning
We will now present an evaluation function learning algorithm that uses Bayesian Learning as described in Section 3.1. The key difference between our application and typical applications is that instead of trying to recognize concrete objects, we are trying to recognize and classify board positions as winning positions or losing positions.
The basic approach of our algorithm is shown in Figure 3-2. Like Bayesian Learning, this algorithm consists of two stages, namely, training (learning) and recognition (evaluation), which will be described in die next two sections.

The Training Stage
In order to train a Bayesian discriminant function, a database of positions, where each position is labeled as a winning position or a losing position, is required. There are many ways to obtain such a database.
In this study, the training data were taken from actual games between two experts, and each position of the winning player is marked as a winning position, and each position of the losing player as a losing position.
While this is a simple and consistent method, there is a serious problem: A winning position could be lost by a poor subsequent move, and would be mislabeled as a losing position. We deal with this problem in two ways. First, reliable experts are needed. Since BILL 2.0 is a world-championship level player, we simply used it to generate the training data by self-play from initial positions. Second, 20 random initial moves were generated for each game and training commenced after 20 random moves (or after there are 24 discs on the board). It was hoped that one of the players would be ahead after the 20 moves, and that this player would go on to win the game. In cases with truly even positions, sufficient training data should result in equivalent number of winning and losing labels.
To generate the training data, BILL 2.0 was set up to play itself under the following conditions: the first 20 half-moves are made randomly, then BILL 2.0 is given 15 minutes for each side to play the remaining 40 half-moves . When there are 15 half-moves left, an endgame search is performed to find the winner assuming perfect play on both sides. The game is terminated, and recorded as training data. 3000 games were played to estimate the parameters.
We are usually exacUy 60 half-moves to a game. The only exception is when neither side has a legal move.  It is well known that different strategies are needed for different stages of the game [2]. Therefore, we generated a discriminant function for each stage, where a stage is defined by the number of discs on the board. The discrimination function for a stage with N discs is generated from training positions with AT-2, N-l, N, N+l, and N+2 discs 3 . Since there are almost always 60 moves per game in Othello, disc count provide a reliable estimate of the stage of the game. By coalescing adjacent data, the discriminant function is slow-varying, and similar to the application coefficients proposed by Berliner [2], Having generated the training data, the four features are extracted from each position in the database.

Evaluation (Recognition)
Then, the mean feature vector and the covariance matrix between features are estimated for the two classes, winning positions and losing positions. Table 3-1 shows the mean vector, the covariance matrix, and the correlation matrix for the classes of winning and losing positions at Discs = 40. They clearly support our earlier claims that every pair of features are correlated to some degree, and that nonlinearity is crucial to the success of an evaluation function.

The Evaluation Stage
The evaluation of a board position involves the computation of the features, and then the combination of this feature vector, x, into a final evaluation. From Section 3.1.2, we know that the two discriminant functions for win and loss are: The evaluation function should measure the likelihood that the board position belongs to the class win, or: P win g(x) = -jT L = g win -g l0SS (3.9) loss 16 N We now substitute (3.7) and (3.8) into (3.9). The constant -log 27r are canceled. Also, we will assume that the a priori probability of winning and losing are equal, and eliminate log P(win) and log P(loss). This results in our final evaluation function: The term log |2 y | -log \^l oss \ is a constant used to normalize the quantity, and is not stricdy necessary if all evaluations use the same constant. But as stated above, a different set of parameters is estimated for each stage of the game; therefore, eliminating this term would result in different evaluation ranges on different levels of the search tree. That would be inconvenient because some thresholds in our search require a consistent range [7]. Thus, the term is retained. Furthermore, by retaining this term, when the program reports its evaluation, it is possible to compute the probability of winning directly from g(x):

Results
In this chapter, we will describe two experiments with two versions of the Othello program, BILL. The first version is BILL 2.0, which combines the four features linearly, and is known to play at die worldchampionship level. The other version, BILL 3.0, uses the same four features, and combines them using Bayesian Learning described in the previous chapter.
There are many ways to evaluate and compare game-playing programs, including (1) playing the programs against each other, (2) giving problems with known solution to the programs, and (3) actual playing record or rating. The first two methods are used here, and the third was not possible due to the scarcity of opponents who play at BILL'S level.

Actual Games
The most obvious measure of two programs is simply arranging them to play each other. We arranged In order to find out exactly how much is gained from Bayesian Learning, different versions of BILL 10 that searched to different depths played each other from the same initial positions. The results are shown in

Endgame Problems
Since Othello endgames can be solved many moves from the end of the game, it is possible to assess the strength of a program by the frequency with which it selects (without searching to the end) the move that leads to the optimal result. A problem with this scheme is that as endgame is approached, the less applicable are the known Othello strategies, and sometimes counter-intuitive moves have to be made. In order to minimize this problem, we acquired a database of 63 winning positions with 20 to 24 moves left, each with the  move that leads to the win with the largest margin. This database was generated by a hardware endgame searcher built by Clarence Hewlett [6].
The linear version and the Bayesian Learning version of BILL were given these problems, and each suggested a best move using 3 to 8 plies of search. The frequency that they agreed with the optimal move is shown in Table 4 In order to evaluate these figures, one should be aware that some of the optimal moves may still be counter-intuitive. More importantly, there are often many moves that still preserve a winning position.
Unfortunately, we were not provided with sufficient information to evaluate the frequency that wins were lost due to a poor move. But the above statistics show that Bayesian Learning is indisputably stronger.
Finally, it is possible to compare these figures against expert human play. 19 of these positions were 19 taken from six games between the top players in the world. It was found that the human experts made 9 correct moves, and 10 incorrect ones, for an accuracy of 47.36%. From, this, it is clear that both versions of BILL played significantly better than human experts. Furtheimore. BILL 3.0 is far better than BILL 2.0 and the human experts.

Advantages of Bayesian Learning
In this section, we will discuss the advantages of the Bayesian Learning algorithm by comparing it against Samuel's algorithms.  Learning approach, on the other hand, understands nonlinear relationships between the features by considering covariances between every pair of features. It is always possible to detect redundant features, and account for all the overlap among the features.
In the previous chapter, we saw the dramatic improvement produced by using a nonlinear evaluation function. To illustrate why a linear evaluation is inadequate, the correlations between each pair of features arc plotted in Figure 5-1. Although the four features were extracted from different characteristics of the board, they were found to be highly correlated, particularly mobility, potential mobility, and weighted squares. It would be detrimental to combine these highly correlated features linearly.
Another difference is smoothness. Samuel's polynomial learning is smooth, as it uses natural features.
But the extreme quantization in signature table learning  Another serious problem with both of Samuel's procedures is that they require additional tuning and supervision. Bayesian Learning, on the other hand, is completely automatic. Since tuning of featurecombination is very unintuitive, automation is a very desirable property. Furthermore, Bayesian Learning provides the optimal quadratic combination assuming multivariate normal distribution 4 .
One other problem with Samuel's procedures is that they do not adequately account for the stages of the game. We measured the utility of each feature in Figure 5 Finally, the method of training is different for all three algorithms. Samuel's polynomial learning algorithm used self-play to generate the training data. Since this is an incremental hill-climbing procedure, it is likely to converge to a local maximum. The signature table learning is more global; however, the training from book-move suffers from limitations. First, while expert moves usually provide good positive exemplars, using all moves not chosen by the expert as negative exemplars is misleading. Second, by learning to imitate expert moves, it is theoretically impossible for the evaluation function to play better (without searching) than the experts. In this study, the use of winning and losing positions provide good positive and negative exemplar learning. Furthermore, by modeling "moves that lead to a win" rather than "moves chosen by experts", it is theoretically possible for our evaluation to be superior to the experts who played the training games.

Multivariate Normal Assumption
The simplicity and elegance of Bayesian Learning arc largely due to its assumption of the underlying distribution of die data. In order for our learning algorithm to function properly, the distributions of the feature must be multivariate normal. To verify diis assumption, die distribution of the four features from all 3000 training games were plotted in Figure 5-3. The thick curve is the distribution for winning positions, and the thin one is the distribution for losing positions. The positions were taken from positions with 24 empty squares on the board. It is clear from these figures that this assumption is quite reasonable.

Accuracy of Labeling
One point that can be raised is that the win/loss labeling procedure may not be very accurate, and any mistake in the labeling is likely to adversely affect the performance of Bayesian Learning.
Although positions with 15 empty squares are always perfect because of BILL'S endgame solving capability, the earlier positions could be in error. We feel, however, that our labeling method is reasonable because: 1. Many pattern classification procedures use hand-labeled training data, which are not always perfect.
2. Since BILL 2.0 probably played better than any expert, this is the best labeling mechanism available.
3. The first 20 random moves should create many positions that are not very close, and the side that is ahead will almost always win because BILL plays extremely well.

4.
Nearly even positions are difficult to label; however, given sufficient training data, these positions will simply form a boundary where win and loss are difficult to differentiate.

Efficiency of Bayesian Learning Evaluation
Perhaps the greatest problem with Bayesian Learning of evaluation function is that of efficiency. Each We could deal with this problem by reducing the dimensions of the feature space using principal components analysis [4], which rotates the feature space into one that has independent features, and unimportant features (those with small variance) can be discarded. Another similar approach is to use Fisher's linear discriminant [4], which uses labeled training data to maximize the ratio between inter-class variance and intra-class variance.

Applicability to Other Domains
Bayesian Learning has already been applied to speech, vision, character recognition, and many other domains. This study is the first that uses Bayesian Learning to learn feature combination in an evaluation function. The key concept that enabled this application is our use of Bayesian Learning to maximally separate the classes of winning and losing positions. This algorithm is applicable to any other game or any other search-based application where static evaluation is used. In order to obtain superior results, the following conditions are necessary: (1) good features must be used, (2) it must be possible to define the goal in terms of classes, and (3) the multivariate normal distribution must provide a reasonable fit The first condition is needed for any program to be successful. The second condition is easy to satisfy in game-playing programs, because the classes of winning and losing positions are the ideal concepts for learning.
It may be more difficult in other domains. The third condition is satisfiable in most domains.
Although we used a different version of BILL to generate the training data, it is not always desirable to do so. In games difficult for computers such as Go, self-generation will lead to many mislabeled positions.
But in that case, using games between superior players should lead to even better performance; however, it was not possible here because it is questionable that such a player existed in Othello. But in Go, where humans are far superior to programs, training with expert games will improve the level of play even more drastically.

Conclusion
In this paper we presented a new algorithm for combining terms, or features, of an evaluation function.
This algorithm is based on Bayesian Learning. First, a training database is obtained, and each position is labeled as winning or losing. These positions are used to train a discriminant function that evaluates positions by estimating the probability that a position is a winning one.
While machine learning of evaluation functions has been studied, algorithms such as Samuel's suffer from lack of smoothness, excessive human tuning, and lack of generality. The Bayesian Learning algorithm eliminates these problems, and has a number of desirable properties: 1. Completely automatic learning from training data.
4. Capability of recovering from erroneous features.

Evaluation directly estimating (he probability of winning.
We demonstrated that Bayesian Learning significantly improved the playing ability of an Othello program that already played at the world-championship level. We believe that it can be applied to any domain where a static evaluation is needed, and will not only drastically reduce the tuning time, but also dramatically improve the performance of the program.