Goal :

Design program to play checkers and compete in world checkers tournament.
There are various steps to design a system that learns to play checkers. This blog focuses on the first step.

Step 1 : Choosing the Training Experience

The effectiveness of a learner's training hinges greatly on the type of training experience it receives. Different training experiences can lead to drastically different outcomes, with some facilitating success and others paving the way for failure.

Type of training data used:

The challenge of designing a learning system for playing checkers can be well-understood through the lens of the type of training data used: direct training examples versus indirect information. Both methods have their unique challenges and advantages, and they contribute differently to the system's learning process.

Direct Training Examples

In the context of checkers, direct training examples would consist of specific board states paired with the best possible move for each state. This is akin to a more traditional, supervised learning approach.

Example of Direct Training

Imagine a database containing a variety of checkers board configurations. Each configuration is accompanied by an expert's move, considered the best move for that particular situation. The learning system analyzes these examples and learns to associate specific board configurations with these optimal moves.

Indirect Information

Indirect information involves learning from the sequences of moves and the final outcomes of games without explicit guidance on the best move for each board state. This approach is more aligned with reinforcement learning, where the system learns from the consequences of its actions over time.

Example of Learning from Indirect Information

Game Simulation:

- Play Games: Have the AI play numerous games against itself or against other opponents (either AI or human). During these games, every move leads to a new board state.

- Record States: Systematically record each board state 'b' encountered during these games. This includes the arrangement of pieces on the board at every stage of the game.

In this scenario, the system might analyze a large number of completed checkers games. Each game is a sequence of moves that led to either a win or a loss. The system must infer which moves were good and which were not, based on the game's outcome. This inference is challenging due to the credit assignment problem.
For example, consider a game where the system made several strong moves in the beginning but made a critical error in the middle game, leading to a loss. The system must figure out that the loss can be attributed more to the mid-game error rather than the initial moves. The complexity arises because an optimal move at one point in the game can be rendered ineffective or even detrimental by a subsequent poor move.

Credit Assignment Challenge

The credit assignment problem is a significant challenge in learning from indirect information. It's difficult to determine how much each move contributes to the final outcome of the game, especially since the impact of a move can be influenced by subsequent moves.

Example of Credit Assignment

Let's say the system executed a strategy that gained a material advantage early in the game but failed to capitalize on this advantage due to weak subsequent plays, eventually leading to a loss. The system needs to recognize that the early moves were effective but the later moves were not. This recognition is complex because it's not always clear-cut which moves were responsible for the loss, especially in a game like checkers where the strategic landscape can shift significantly with each move.

Learner control in the training experience:

We discus the importance of learner control in the training experience and present various scenarios with different levels of control:
1. Teacher-controlled:
The teacher selects everything - board states, moves, and training classifications. The learner passively receives information.
2. Learner-initiated:
The learner asks for specific board states and receives the correct moves from the teacher. The learner has some control over the training content but not the sequence.

3. Autonomous:
The learner has complete control over both board states and training classifications. It can experiment with novel situations or refine its skills on existing lines of play.
This suggests that having some agency in their training helps learners explore their environment, experiment with novel situations, and focus on areas they find challenging. This can lead to deeper understanding and more effective learning compared to a purely passive approach.

Different settings for learning:

Random process:

The learner receives training examples randomly, without any control.

Expert queries:

The learner can pose questions to a teacher to gain specific knowledge. This is akin to the RLHF(Reinforcement Learning with Human Feedback)

Autonomous exploration:

The learner actively explores its environment and gathers its own training data.
These different settings offer varying degrees of control and influence the way the learner interacts with the learning environment.

Data Distribution

In practical applications of machine learning, such as designing a system to play checkers at a competitive level, there's often a disparity between the distribution of examples used for training and the scenarios in which the final system is evaluated. This situation arises due to various constraints and practicalities, one of which might be the unavailability of training data from top-tier players like the world checkers champion.

Training Without Champion-Level Data

Imagine you're developing a checkers-playing AI. Ideally, you'd want this AI to learn from games played by the world's best players, as these games represent the highest level of strategic play. However, such data might not be readily accessible for several reasons:
1. Privacy or Proprietary Issues: Top players might not want to share their strategies or game data.
2. Limited Availability: There might be a limited number of high-level games available for analysis.
3. Resource Constraints: Obtaining and processing high-level game data might be resource-intensive.

Learning from an Alternative Distribution

Given these constraints, your AI might have to learn from a different distribution of examples, such as:
1. Games Played by Amateur or Semi-professional Players: These games are more readily available but might contain a different style or level of strategy compared to champion-level play.
2. Simulated Games: The AI could play games against itself or pre-programmed strategies, which may not fully replicate the depth of human world-class play.
3. Historical Data: Using older games, which might not reflect current top-tier strategies.

Impact and Adaptation

Learning from these alternative sources can still be beneficial, but it has certain implications:
1. Strategy Gaps: The AI might not learn certain advanced strategies used by top players, leading to a potential performance gap in high-stakes scenarios.
2. Overfitting to a Specific Style: If the AI mostly learns from a particular level or style of play, it might struggle against strategies outside of that scope.
3. Adaptability: The system needs to be adaptable, capable of learning and adjusting even during actual competition, to mitigate the differences between its training data and the games it encounters in tournaments.

Decision

Allow system to play games against itself and get outcomes. So the size of the training data can be huge.

Search This Blog

Machine Learning - Simplified

Lesson4.1 : Designing a Learning System for Checkers Problem - The Training Experience