Lesson4.4 Choosing a Function Approximation Algorithm - Designing A Learning System -

The Premise

The checkers problem(Problem X) is converted to another problem(Problem Y) and we state that solving Y would help us solve X.

Here the Problem Y is a linear equation of x's and w's and the machine(learner) needs to learn the unknown w's.

From lesson 4.3,

Our Choice

Board features:

        x1(b) — number of black pieces on board b

        x2(b) — number of white pieces on b

        x3(b) — number of black kings on b

        x4(b) — number of white kings on b

        x5(b) — number of white pieces threatened by black(can captured in next move)

        x6(b) — number of black pieces threatened by white

Linear Combination

V'= w0 + w1 · x1(b) + w2 · x2(b) + w3 · x3(b) + w4 · x4(b) +w5 · x5(b) + w6 · x6(b)

V'

- learners current approximation of V

It represents the AI's method for evaluating a board state. It's like the AI's strategy or set of rules to determine how good a position is.

Structure of training samples

To understand and teach the target function V', which is essential in machine learning for board games, we need a collection of training examples. Each example should clearly describe a specific arrangement of pieces on the board, which we'll refer to as 'b', along with its corresponding training value, denoted as Vtrain(b). In simpler terms, each example is a pair that consists of a board state 'b' and its evaluated value Vtrain(b).

Summary

b

- Specific arrangement of pieces on the board

Each arrangement of checkers on the board at any moment is a "board state" or 'b'. 

Eg. x1=4, x2=0, x3=1, x4=0,x5=0,x6=0

Vtrain(b)

- Training value of b

The "training value" of this board state 'b' is how the AI evaluates it. For example, if the AI has more pieces than the opponent in state 'b', it might assign a high value, indicating a favorable position.

Let's consider an example to illustrate this concept. Suppose we have a training example that describes a situation in a game where the black pieces have won. This is indicated by the condition 'x2 = 0', which means the white pieces are all captured and no longer on the board. In such a scenario, the training value, Vtrain(b), assigned to this board state 'b' would be +100. This numerical value signifies a winning situation for the black pieces in the game."

Sample : ([x1=4, x2=0, x3=1, x4=0,x5=0,x6=0],+100)

Step 1: ESTIMATING TRAINING VALUES

Issues in Estimating Training Values for Intermediate States

- The final outcome may be won

- It does not say that the intermediate moves were the best

- The players could have made bad moves but still won

- They could have made some good moves and still lost

Successor(b)' 

- It is the next board state after both the AI and its opponent have made their moves. This is what the AI evaluates to decide its next action.

Estimate of Training Values

Vtrain(b) = V'(Successor(b))

Let's illustrate this with an example in checkers:

- Imagine the AI is in the middle of a game. The board is in state 'b', where it has a mix of regular pieces and kings, and so does the opponent.

- The AI uses its current strategy 'V’' to evaluate 'b'. Based on this, it decides to move a piece forward.

- The opponent responds by jumping over one of the AI's pieces, leading to a new board state. This is the 'successor' of 'b'.

- The AI evaluates this new state using 'V’'. Perhaps it realizes that its piece was vulnerable, which it hadn't considered before. This outcome teaches the AI to adjust its strategy 'V’'.

Over time, by evaluating these states and learning from the outcomes, the AI's strategy 'V’' gets refined. The goal is for 'V’' to become as accurate as possible in predicting which moves will lead to winning the game, getting closer to the ideal strategy 'V'.

Simplifying 

Resulting Successor States: 

The resulting board states after the opponent's moves are the final successor(b) states.

In a more advanced implementation, you might evaluate the desirability of these states using a heuristic function, considering factors like piece count, king count, control of the center, and vulnerability to capture.

Let's simplify this with an example in the context of a checkers game. A heuristic function is like a set of rules or guidelines that help the AI evaluate how good a particular board state is. Here are some common factors that might be considered in a checkers game heuristic:

1. **Piece Count**: The number of pieces each player has. Generally, having more pieces than your opponent is advantageous.

2. **King Count**: The number of kings (promoted pieces) each player has. Kings are more valuable because they have more movement options.

3. **Control of the Center**: Occupying the center of the board is often strategically beneficial.

4. **Vulnerability to Capture**: Positions where a player's pieces are at risk of being captured.

### Example Heuristic Function

Imagine a simple heuristic function that calculates a score for a board state based on these factors:

- **Piece Score**: +1 point for each regular piece, +2 points for each king.

- **Center Control**: +0.5 points for each piece in the center of the board.

- **Vulnerability**: -1 point for each piece that can be captured by the opponent in their next move.

Now, let's evaluate a hypothetical board state:

- **Your Pieces**: 6 regular pieces, 1 king.

- **Opponent's Pieces**: 7 regular pieces, no kings.

- **Center Control**: You have 2 pieces in the center.

- **Vulnerable Pieces**: You have 1 piece that can be captured.

Using our heuristic, we calculate your score as follows:

- **Your Piece Score**: 6 * 1(regular pieces) + 1 * 2(king) = 8

- **Center Control**: 2 * 0.5 = 1

- **Vulnerability**: 1 * -1 = -1

- **Total Score**: 8 (pieces) + 1 (center) - 1 (vulnerability) = 8

If we perform the same calculation for the opponent and find they have a lower score, the AI might consider this board state favorable.

### Putting It Into Action

The AI uses this heuristic to evaluate all possible successor states. It looks ahead to possible moves (and counter-moves by the opponent) and applies the heuristic to each resulting board state. The state with the best score (from the AI's perspective) guides the AI's choice of move.

Remember, this is a simplified example. Real-world implementations can be much more complex, considering additional strategic elements and using more sophisticated algorithms to predict future states of the game. The goal of the heuristic is to provide a quick, rule-of-thumb evaluation to guide the AI's decision-making.

Step 2 : ADJUSTING THE WEIGHTS

Goal

- find values of w0 to w6 that best fit (b,Vtrain(b))

That is, fit the training samples with (values of x1 to x6, credit)

What algorithm to use to find the values of w0 to w6.

The goal can be also stated as minimize the squared error.

Squared Error

Find the V' that minimizes

Squared Error = (Vtrain(b) - V'(b))^2 for all the training examples

Many algorithms known for finding weights in a linear function.

Goal of Algorithm

- Refine weights as it sees every training sample

- Even if there are errors in the heuristic evaluation it should be able to compute wi's

LMS (Least Mean Squares) training rule

The LMS (Least Mean Squares) training rule is a fundamental algorithm in the field of machine learning and signal processing. It's used for adjusting the weights of a learning model, typically a linear predictor, to minimize the mean square error between the desired output and the actual output produced by the model. Here's an overview of the LMS algorithm:

1. Principle:

   - The LMS algorithm is based on the method of steepest descent, where the weights of the predictor are updated in a way that minimally reduces the mean squared error (MSE) at each step.

2. Operation:

   - The algorithm takes a set of input-output pairs as training data.

   - For each input, the current set of weights is used to predict an output.

   - The predicted output is then compared with the actual (desired) output, and the difference (error) is measured.

3. Weight Update Rule:

   - The weights are updated by moving in the opposite direction of the gradient of the error with respect to the weights.

   - The update is proportional to the product of the error and the input. This proportionality factor is known as the learning rate.

   - Mathematically, the weight update rule can be expressed as: 

W_new = W_old + n . (d - y) . x 

where:

     - W_new and W_old are the new and old weights, respectively.

     - n is the learning rate.

     - d is the desired output. In the checkers problem it is Vtrain(b).

     - y is the predicted output. In the checkers problem it is V'(b)

     - x is the input.

Effect of (d - y) on weight

If 0 - no change in weight

If positive, value of weight increases

4. Applications:

   - The LMS algorithm is widely used in adaptive filters for noise cancellation, adaptive equalization in data communications, and other areas where systems need to adapt to changing environments.

   - It's also a foundational concept in neural networks and other machine learning models, especially in the context of online learning or real-time data processing.

5. Advantages:

   - Simplicity and ease of implementation.

   - Effective in environments where the properties of the signal or the noise are not fully known or change over time.

6. Limitations:

   - The choice of the learning rate is crucial. A small learning rate makes the algorithm converge slowly, while a large learning rate can lead to instability or oscillation.

   - LMS is typically used for linear models and might not be effective for complex, non-linear relationships.

In summary, the LMS training rule is a key algorithm in adaptive signal processing and machine learning, particularly useful for applications that require continuous adjustment of parameters in response to an evolving data stream.

Summary 

To derive training examples from indirect training experience in the context of board games like chess or checkers for a learning AI, you can follow a procedure involving game simulation, evaluation, and feedback. This process allows the AI to learn from experiences without direct or explicit instruction. Here's a step-by-step guide:

1. Game Simulation:

   - Play Games: Have the AI play numerous games against itself or against other opponents (either AI or human). During these games, every move leads to a new board state.

   - Record States: Systematically record each board state 'b' encountered during these games. This includes the arrangement of pieces on the board at every stage of the game.

2. Outcome Evaluation:

   - Determine Outcomes: After a game concludes, identify the outcome (win, lose, draw) for each side. This outcome is especially clear in games like chess or checkers where there's a definitive win/loss condition.

   - Assign Values to Final States: Assign a training value to the final board state based on the outcome. For example, +100 for a win, -100 for a loss, 0 for a draw.

   - Backtrack Values: Work backwards from the final state to earlier states in the game. The idea is to assign a value to these earlier states that reflects their contribution to the final outcome. This can be done using algorithms like temporal difference learning, where the value of a state is adjusted based on the value of the state that follows it.

3. Intermediate State Valuation:

   - Heuristic Evaluation: For states that are not directly linked to the end of a game (like mid-game states), use heuristic methods to assign a preliminary value. This could be based on the number of pieces each player has, the positioning of pieces, control of the board, etc.

   - Refinement Over Time: As the AI encounters similar states in different games, it refines the values of these states based on new outcomes and experiences. The AI learns which states tend to lead to wins and which to losses.

4. Creating Training Pairs:

   - Pair States with Values: Pair each recorded board state 'b' with its corresponding value Vtrain(b). This creates a dataset of training examples in the form (b, Vtrain(b)).

   - Use in Training: Use these training pairs to train the AI. The AI learns to recognize patterns in the board states and understand how different states correlate with winning or losing.

5. Iterative Learning and Adjustment:

   - Continuous Improvement: As the AI plays more games and encounters more states, it continuously adjusts its evaluation of board states based on new experiences.

   - Feedback Loop: The performance of the AI in subsequent games serves as feedback, indicating how well it has learned from the training examples.

By following this procedure, the AI can develop a nuanced understanding of the game, learning not just from direct wins or losses but also from the intricacies of each board state it encounters throughout the game. This method is particularly effective for complex games where direct training examples are not readily available or are too numerous to manually categorize.


Comments

Popular posts from this blog

ANN Series - 10 - Backpropagation of Errors

Regression 10 - Pseudo-Inverse Matrix Approach - Relationship between MLE and LSA - Derivation

Regression 9 - Maximum Likelihood Estimation(MLE) Approach for Regression - Derivation