Lesson4.5 The Final Design - Designing A Learning System

 Central components of Learning System

1. **Performance System**:

Input : New Problem

Output : Trace of the solution

- play against itself to get the sequence of moves

- Uses the learned evaluation function, denoted as V', to select its next move.

- As the evaluation function becomes more accurate, the quality of the AI’s move selection should improve.

2. **Critic**:

Input : Trace of game

Output : Finds (b,Vtrain(b))

i.e. estimates Vtrain(b) = V'(successor(b))

3. **Generalizer**:

Input : Training examples or dataset
Output : Estimate of weights
- the machine learning model

   - Takes the training examples provided by the Critic and creates a generalized hypothesis for the target function.

   - Applies an algorithm, like the LMS (Least Mean Squares) algorithm, to generalize from specific examples to a broader understanding that can be applied to unseen game states.

4. **Experiment Generator**:

Input : The linear function got as output from generalizer
Output : A new board state which the performance system plays

   - Uses the current hypothesis (the current state of the learned function \( p \)) to generate new problems or initial board states for the Performance System to solve.

   - Its goal is to select problems that will effectively improve the learning rate of the AI, thereby enhancing its playing ability.

   - In the given example, the strategy of the Experiment Generator is basic: it repeatedly proposes the same starting board configuration to begin a new game. However, more complex strategies could be implemented to present a variety of starting positions or scenarios, forcing the AI to adapt and learn from a broader range of situations.

Overall Strategy

The overall strategy of the learning system is cyclical and iterative. 

1. The Performance System plays the game based on its current knowledge. 

2. The Critic reviews the performance and generates training data. 

3. The Generalizer updates the AI’s knowledge based on that data. 

4. Finally, the Experiment Generator provides new challenges, propelling the cycle to start anew, each iteration aimed at refining the AI's ability to evaluate and select moves. 

This process is designed to incrementally improve the AI's performance in playing checkers.

Note that this iteration is required cause the training experience itself was based on heuristics. For examples that have defined labelled datasets, this repetition might not be required.

Comments

Popular posts from this blog

ANN Series - 10 - Backpropagation of Errors

Naive Bayesian Classifiers - Multinomial, Bernoulli and Gaussian with Solved Examples and Laplace Smoothing

Clustering - K means Clustering