Lesson4.2 - Choosing the Target Function for a Checkers AI - Designing A Learning System -

 Choosing the Target Function for a Checkers AI

When designing an AI to play checkers, one critical decision is selecting the target function. This function defines the what the AI will learn. There are two primary approaches to consider.

  Choice 1: The `ChooseMove` Function

 Function Type: `ChooseMove: B → M`

 Description: This function takes any board state from the set of legal board states `B` and produces a move from the set of legal moves `M`.

 Goal: The aim is to improve the AI's performance `P` in the task `T` (playing checkers) by learning the `ChooseMove` function. This function effectively decides the best move in any given board state.

 Implication: The choice of the `ChooseMove` function is pivotal as it directly dictates the AI's move in each turn, focusing on immediate decision making.

  Choice 2: The `V` Function

 Function Type: `V: B → R`

 Description: This function maps any legal board state from the set `B` to a real number. 

The "training value" of this board state 'b' is a numerical value that the AI uses to evaluate how good or bad this arrangement is for it. A higher value might mean the AI is in a strong position, while a lower value could indicate a weaker position. 

For example, if the AI has more pieces than the opponent in state 'b', it might assign a high value, indicating a favorable position. It aims to assign higher scores to better board states. The goal of the AI to learn a high valued board state.

 Rules:

Case 1 : The board state is final

  1. If `b` is a final board state that is won, then `V(b) = 100`.

  2. If `b` is a final board state that is lost, then `V(b) = 100`.

  3. If `b` is a final board state that is drawn, then `V(b) = 0`.

Case 2: The board state is intermediate

  4. If `b` is not a final state, `V(b) = V(b')`, where `b'` is the best final board state achievable from `b` by playing optimally.

NonOperational Description

Imagine you're playing checkers and have reached a specific board state, let's call it "b." You want to know how good this state is for you, its value. The passage defines the value of b (V(b)) as follows:

If b is not a final state (not the end of the game):

V(b) is equal to the value of the best final state (b') you can reach from b by playing optimally until the end.

This assumes your opponent also plays optimally, so you're considering the best possible outcome for you given perfect play from both sides.

Here's an example to illustrate:

Board state b: You have three checkers remaining, and your opponent has two. You have a good position, but the game is not over yet.

Finding b': You look at all possible moves you can make from b. For each move, you imagine playing optimally until the end of the game, considering your opponent's best responses. You then find the move that leads to the best final state for you (b').

Calculating V(b): Let's say playing optimally from b with your best move leads to a final state where you have all three checkers left and your opponent has none (a win). This final state, b', would have a value of 100 (maximum win). Therefore, V(b) would also be 100, indicating that the current state (b) is very good for you because it leads to a potential win.

Key points:

V(b) depends on the value of the best achievable final state. This involves considering all possible moves and their consequences under optimal play.

The example shows how V(b) reflects the potential outcome of the game based on the current state. But finding an appropriate value for V(b') based on this is computationally expensive. So it is termed a nonoperational description.

Remember, this is a simplified example. In real checkers, evaluating all possible moves and their consequences can be computationally expensive. That's why practical checkers playing programs use approximations and heuristics to estimate V(b) efficiently.

  Operational Description:

The task is to discover an operational (ready to use) description of the ideal target function `V`. That is find values that are computable in a reasonable amount of time.

Function Approximation

It's often challenging to learn a perfect operational form of `V`, hence learning algorithms usually aim for an approximation, known as function approximation.

 Learned Function vs. Ideal Function:

The function actually learned by the AI, denoted `V̂`, is an approximation of the ideal target function `V`.

 Implications of the Target Function Choice

 `ChooseMove` Approach: 

Focuses on immediate, move by move decision making. It's more about selecting the best action at each step.

 `V` Function Approach: 

Aims at evaluating board states, providing a more strategic overview. It helps in understanding the overall game strategy and long term planning.

 Conclusion

The choice between `ChooseMove` and `V` represents a fundamental design decision in AI development for games like checkers. `ChooseMove` zeroes in on immediate actions, while `V` offers a broader evaluation of the game's progress. This decision shapes the learning focus of the AI, whether it's about perfecting individual moves (`ChooseMove`) or understanding and evaluating game states (`V`) for strategic depth. The optimal choice often depends on the specific goals and constraints of the AI development project.

Let us assume the target function V for our discussion.

Comments

Popular posts from this blog

ANN Series - 10 - Backpropagation of Errors

Regression 10 - Pseudo-Inverse Matrix Approach - Relationship between MLE and LSA - Derivation

Regression 9 - Maximum Likelihood Estimation(MLE) Approach for Regression - Derivation