Lesson5.8. List-then-eliminate Algorithm

- January 02, 2024

Introduction

The List-Then-Eliminate algorithm is a concept learning strategy used in machine learning and artificial intelligence to systematically search through a hypothesis space to find hypotheses that are consistent with a set of training examples. The algorithm operates under the principle of representational completeness, meaning it considers every possible hypothesis that can be constructed from the attributes in the training data.

At its core, the List-Then-Eliminate algorithm begins with the assumption that any hypothesis could be the true one, and it incrementally refines this set by eliminating any hypothesis that fails to correctly classify the training examples. It uses a version space approach, where the version space is defined as the subset of the hypothesis space that is consistent with all the observed training examples.

The algorithm can be summarized as follows:

1. Initialize Version Space:

Begin with a Version Space that contains every possible hypothesis in the hypothesis space ( H ). The hypothesis space ( H ) is the set of all hypotheses that can be formed from the given attributes and their values.

2. Process Each Training Example:

For each training example ( (x, c(x)) ), where ( x ) is the instance and (c(x)) is the classification (or concept) of that instance, do the following:

- Eliminate Inconsistent Hypotheses:

Remove from the Version Space any hypothesis (h) for which (h(x) != c(x) ). In other words, if a hypothesis does not correctly predict the classification of the training example, it is eliminated. This step gradually narrows down the Version Space to include only those hypotheses that are consistent with all the observed training examples.

3. Output Remaining Hypotheses:

After processing all the training examples, output the list of hypotheses remaining in the Version Space. These are the hypotheses that are consistent with all the training examples provided.

Advantages and Disadvantages

The strength of the List-Then-Eliminate algorithm lies in its exhaustive and systematic approach, ensuring that no potential hypothesis is overlooked. However, this strength is also its weakness; the algorithm is computationally intensive and often impractical for all but the smallest datasets with a very limited number of attributes. Despite this, it serves as a valuable educational tool, offering clear insights into how machines can learn from data and the challenges involved in developing efficient learning algorithms. It also lays the groundwork for more sophisticated learning algorithms that seek to balance the trade-off between representational completeness and computational practicality.

Example

For the List-Then-Eliminate algorithm, we start with a hypothesis space that contains all possible combinations of attribute values. We then eliminate hypotheses based on the training instances.

Consider the following hypothetical dataset with 2 features. It is impractical to explain with more features.

Climate	Type	Label
Warm	Urban	Yes
Warm	Rural	Yes
Cold	Urban	No

Given the three possible kinds of values for each attribute in the hypothesis:

- "?" (any value of the attribute is acceptable)

- A specific value (e.g., "Urban", "Rural")

- "0" (no value is acceptable)

Let's apply the List-Then-Eliminate algorithm to the training dataset:

Initial Hypothesis Space:

Initially, the version space contains all possible hypotheses. Since we have two possible values for each attribute ("?", specific value, "0"), for two attributes, the initial hypothesis will be

H = VersionSpace

{<0,0>,

<warm, urban> <warm, rural>, <warm, ?>, <warm, 0>

<cold, urban> <cold, rural>, <cold, ?>, <cold, 0>

<?, rural>, <?, urban>, <0, rural>, <0,urban>

<?, ?>

Explanation For h

Most Specific Hypothesis:

<0,0> (Neither feature1 nor feature2 has any specific value assigned).

Specific Hypotheses:

These specify one of the features while leaving the other as either 0 (any value) or as specific values.

<warm, 0>, <cold, 0>, <0, rural>, <0, urban>

Hypotheses Specifying Both Features:

<warm, rural>, <warm, urban>, <cold, rural>, <cold, urban>

More General Hypotheses:

These replace one of the specific values with ?, which represents a more general state than 0.

<warm, ?>, <cold, ?>, <?, rural>, <?, urban>

Most General Hypothesis:

<?, ?> (Both features can take any value).

1. Process Instance 1 (Positive Instance): <warm, urban, yes>

When the instance is checked, the hypothesis that predicts "No" will be eliminated.

Eg. When we pass <warm, urban> through hypothesis <warm, rural>, it sees mismatch in 2nd attribute and returns 'No'

When we pass <warm, urban> through hypothesis <warm, ?>, it sees that ? accepts both rural/urban in 2nd attribute and returns 'Yes'.

VersionSpace =

{~~<0,0>~~,

<warm, urban> ~~<warm, rural>~~, <warm, ?>, ~~<warm, 0>~~

~~<cold, urban>~~ ~~<cold, rural>~~, ~~<cold, ?>~~, ~~<cold, 0>~~
~~~~<?, rural>~~, <?, urban>,~~

<?, ?>

2. Process Instance 2 (Positive Instance): <warm, rural, yes>

VersionSpace =

{

~~~~<warm, urban>~~, <warm, ?>, ~~<?, urban>~~,~~

<?, ?>

}

3. Process Instance 3 (Negative Instance): <cold, urban, no>

When we pass <cold, urban> through hypothesis <warm, ?>, it sees mismatch in 1st attribute and returns 'No'. So it is acceptable.

When we pass <cold, urban> through hypothesis <?, ?>, it sees sees no mismatch and returns 'Yes'. But it is a negative instance and so the hypothesis is not acceptable.

VersionSpace = {<warm, ?>}

Conclusion:

The version space after applying the List-Then-Eliminate algorithm will have multiple hypotheses that are all consistent with the positive examples of the training dataset and have not been eliminated by the negative examples. The specific hypotheses will vary based on the attributes that can take "?" to allow for the variability seen in the positive instances.

Search This Blog

Machine Learning - Simplified