Lesson5.8. List-then-eliminate Algorithm
Introduction
The List-Then-Eliminate
algorithm is a concept learning strategy used in machine learning and
artificial intelligence to systematically search through a hypothesis space to
find hypotheses that are consistent with a set of training examples. The
algorithm operates under the principle of representational completeness,
meaning it considers every possible hypothesis that can be constructed from the
attributes in the training data.
At its core, the List-Then-Eliminate algorithm
begins with the assumption that any hypothesis could be the true one, and it
incrementally refines this set by eliminating any hypothesis that fails to
correctly classify the training examples. It uses a version space approach,
where the version space is defined as the subset of the hypothesis space that
is consistent with all the observed training examples.
The
algorithm can be summarized as follows:
1. Initialize Version Space:
Begin with a Version Space
that contains every possible hypothesis in the hypothesis space ( H ). The
hypothesis space ( H ) is the set of all hypotheses that can be formed from the
given attributes and their values.
2. Process Each Training Example:
For each training example (
(x, c(x)) ), where ( x ) is the instance and (c(x)) is the classification (or
concept) of that instance, do the following:
- Eliminate
Inconsistent Hypotheses:
Remove from the Version
Space any hypothesis (h) for which (h(x) != c(x) ). In other words, if a
hypothesis does not correctly predict the classification of the training
example, it is eliminated. This step gradually narrows down the Version Space
to include only those hypotheses that are consistent with all the observed
training examples.
3. Output Remaining Hypotheses:
After processing all the
training examples, output the list of hypotheses remaining in the Version
Space. These are the hypotheses that are consistent with all the training
examples provided.
Advantages and Disadvantages
The strength of the
List-Then-Eliminate algorithm lies in its exhaustive and systematic approach,
ensuring that no potential hypothesis is overlooked. However, this strength is
also its weakness; the algorithm is computationally intensive and often impractical
for all but the smallest datasets with a very limited number of attributes.
Despite this, it serves as a valuable educational tool, offering clear insights
into how machines can learn from data and the challenges involved in developing
efficient learning algorithms. It also lays the groundwork for more
sophisticated learning algorithms that seek to balance the trade-off between
representational completeness and computational practicality.
Example
For the List-Then-Eliminate
algorithm, we start with a hypothesis space that contains all possible
combinations of attribute values. We then eliminate hypotheses based on the
training instances.
Consider the following
hypothetical dataset with 2 features. It is impractical to explain with more
features.
Climate |
Type |
Label |
Warm |
Urban |
Yes |
Warm |
Rural |
Yes |
Cold |
Urban |
No |
Given the three possible
kinds of values for each attribute in the hypothesis:
- "?" (any value
of the attribute is acceptable)
- A specific value (e.g.,
"Urban", "Rural")
- "0" (no value is
acceptable)
Let's apply the
List-Then-Eliminate algorithm to the training dataset:
Initial Hypothesis Space:
Initially, the version space
contains all possible hypotheses. Since we have two possible values for each
attribute ("?", specific value, "0"), for two attributes,
the initial hypothesis will be
H = VersionSpace
{<0,0>,
<warm, urban>
<warm, rural>, <warm, ?>, <warm, 0>
<cold, urban>
<cold, rural>, <cold, ?>, <cold, 0>
<?, rural>, <?,
urban>, <0, rural>, <0,urban>
<?, ?>
Explanation For h
Most Specific Hypothesis:
<0,0> (Neither feature1 nor feature2 has any
specific value assigned).
Specific Hypotheses:
These specify one of the features while leaving the other as either 0 (any value) or as specific values.
<warm, 0>, <cold, 0>, <0, rural>, <0, urban>
Hypotheses Specifying Both Features:
<warm, rural>, <warm, urban>, <cold, rural>, <cold, urban>
More General Hypotheses:
These replace one of the specific values with ?, which
represents a more general state than 0.
<warm, ?>, <cold, ?>, <?, rural>, <?, urban>
Most General Hypothesis:
<?, ?> (Both features can take any value).
1. Process Instance 1 (Positive Instance): <warm, urban, yes>
When the instance is
checked, the hypothesis that predicts "No" will be eliminated.
Eg. When we pass <warm,
urban> through hypothesis <warm, rural>, it sees mismatch in 2nd
attribute and returns 'No'
When we pass <warm,
urban> through hypothesis <warm, ?>, it sees that ? accepts both
rural/urban in 2nd attribute and returns 'Yes'.
VersionSpace =
{<0,0>,
<warm, urban> <warm,
rural>, <warm, ?>, <warm, 0>
<?, ?>
2. Process Instance 2 (Positive Instance): <warm, rural, yes>
VersionSpace =
{
<?, ?>
}
3. Process Instance 3 (Negative Instance): <cold, urban, no>
When we pass <cold,
urban> through hypothesis <warm, ?>, it sees mismatch in 1st attribute
and returns 'No'. So it is
acceptable.
When we pass <cold,
urban> through hypothesis <?, ?>, it sees sees no mismatch and returns
'Yes'. But it is a negative instance and so the hypothesis is not acceptable.
VersionSpace = {<warm, ?>}
Conclusion:
The version space after
applying the List-Then-Eliminate algorithm will have multiple hypotheses that
are all consistent with the positive examples of the training dataset and have
not been eliminated by the negative examples. The specific hypotheses will vary
based on the attributes that can take "?" to allow for the
variability seen in the positive instances.
Comments
Post a Comment