Lesson4.2 - Choosing the Target Function for a Checkers AI - Designing A Learning System -
Choosing the Target Function for a Checkers AI
When designing an AI to play checkers, one critical decision is selecting the target function. This function defines the what the AI will learn. There are two primary approaches to consider.
Choice 1: The `ChooseMove` Function
Function Type: `ChooseMove: B → M`
Description: This function takes any board
state from the set of legal board states `B` and produces a move from the set
of legal moves `M`.
Goal: The aim is to improve the AI's
performance `P` in the task `T` (playing checkers) by learning the `ChooseMove`
function. This function effectively decides the best move in any given board
state.
Implication: The choice of the `ChooseMove`
function is pivotal as it directly dictates the AI's move in each turn,
focusing on immediate decision making.
Choice 2: The `V` Function
Function Type: `V: B → R`
Description: This function maps any legal board state from the set `B` to a real number.
The "training value" of this board state 'b' is a numerical value that the AI uses to evaluate how good or bad this arrangement is for it. A higher value might mean the AI is in a strong position, while a lower value could indicate a weaker position.
Rules:
Case 1 : The board state is final
1. If `b` is a final board state that is won, then `V(b) = 100`.
2. If `b` is a final board state that is lost, then `V(b) = 100`.
3. If `b` is a final board state that is drawn, then `V(b) = 0`.
Case 2: The board state is
intermediate
4. If `b` is not a final state, `V(b) = V(b')`, where `b'` is the best
final board state achievable from `b` by playing optimally.
NonOperational Description
Imagine
you're playing checkers and have reached a specific board state, let's call it
"b." You want to know how good this state is for you, its value. The
passage defines the value of b (V(b)) as follows:
If
b is not a final state (not the end of the game):
V(b)
is equal to the value of the best final state (b') you can reach from b by
playing optimally until the end.
This
assumes your opponent also plays optimally, so you're considering the best
possible outcome for you given perfect play from both sides.
Here's
an example to illustrate:
Board
state b: You have three checkers remaining, and your opponent has two. You have
a good position, but the game is not over yet.
Finding
b': You look at all possible moves you can make from b. For each move, you
imagine playing optimally until the end of the game, considering your
opponent's best responses. You then find the move that leads to the best final
state for you (b').
Calculating
V(b): Let's say playing optimally from b with your best move leads to a final
state where you have all three checkers left and your opponent has none (a
win). This final state, b', would have a value of 100 (maximum win). Therefore,
V(b) would also be 100, indicating that the current state (b) is very good for
you because it leads to a potential win.
Key
points:
V(b) depends on the value of the best achievable final state. This involves considering all possible moves and their consequences under optimal play.
The example shows how V(b) reflects the potential outcome of the game based on the current state. But finding an appropriate value for V(b') based on this is computationally expensive. So it is termed a nonoperational description.
Remember,
this is a simplified example. In real checkers, evaluating all possible moves
and their consequences can be computationally expensive. That's why practical
checkers playing programs use approximations and heuristics to estimate V(b)
efficiently.
Operational Description:
The task is to discover an
operational (ready to use) description of the ideal target function `V`. That is find values that are computable in a reasonable amount of time.
Function Approximation
It's often challenging to learn a
perfect operational form of `V`, hence learning algorithms usually aim for an
approximation, known as function approximation.
Learned Function vs. Ideal Function:
The function actually learned by
the AI, denoted `V̂`, is an approximation of the ideal target function `V`.
Implications of the Target Function Choice
`ChooseMove` Approach:
Focuses on immediate,
move by move decision making. It's more about selecting the best action at each
step.
`V` Function Approach:
Aims at evaluating
board states, providing a more strategic overview. It helps in understanding
the overall game strategy and long term planning.
Conclusion
The choice between `ChooseMove`
and `V` represents a fundamental design decision in AI development for games
like checkers. `ChooseMove` zeroes in on immediate actions, while `V` offers a
broader evaluation of the game's progress. This decision shapes the learning
focus of the AI, whether it's about perfecting individual moves (`ChooseMove`)
or understanding and evaluating game states (`V`) for strategic depth. The
optimal choice often depends on the specific goals and constraints of the AI
development project.
Let us assume the target function
V for our discussion.
Comments
Post a Comment