Lesson5.5. Find-S
The Find-S algorithm is a simple concept learning algorithm used in machine learning to find the maximally specific hypothesis that fits all the positive examples in the training data. The algorithm starts with the most specific hypothesis and generalizes it step by step as it encounters positive training examples.
Characteristics of Find-S Algorithm:
1. Initial Hypothesis:
The initial hypothesis is the most
specific one - it assumes no value is acceptable for any feature.
2. Positive Examples Only:
It considers only positive
examples for generalization. Negative examples are ignored in the hypothesis
formation.
3. Generalization:
When a positive example doesn't fit the current hypothesis, the algorithm generalizes the hypothesis just enough to include the example.
Steps of the Find-S Algorithm:
1. Initialize `h`:
Start with the most specific hypothesis,
where all features are set to none or '0'.
2. For each Positive Training Example:
- For each
attribute in the hypothesis:
- If the
attribute value in the example matches the hypothesis, do nothing.
- If the
attribute value in the hypothesis is '0', replace it with the value in the
example.
- If the attribute value in the hypothesis is different from the example, replace it with '?' (a wildcard that accepts any value).
3. Ignore Negative Examples:
Negative examples don't cause any change in the hypothesis.
4. Result:
The final hypothesis covers all positive examples and is as specific as possible under these constraints.
Example with House Buying Scenario:
Example 1:
Training Dataset(D) - With Instances(X)
Instance | Location | Budget | Size | Garden | Schools Nearby | Public Transport | Will Buy House (Label) |
1 | Urban | No | Small | No | Yes | No | No |
2 | Rural | Yes | Large | No | No | No | No |
3 | Rural | Yes | Small | Yes | Yes | Yes | Yes |
4 | Urban | Yes | Medium | Yes | No | Yes | Yes |
Example 2
Let's apply the Find-S algorithm to your house-buying scenario. Suppose Jordan has four features to consider (Location, Size, Garden, Schools Nearby) and the following positive examples of houses he likes:
1. (Urban, Large, Yes, Yes)
2. (Suburban, Medium, Yes, Yes)
3. (Urban, Medium, Yes, Yes)
Applying Find-S:
1. Initialize `h`:
Start with `h = (0, 0, 0, 0)`.
2. First Positive Example
(Urban, Large, Yes, Yes)`:
- `h` becomes
`(Urban, Large, Yes, Yes)`.
3. Second Positive Example
(Suburban, Medium, Yes, Yes)`:
- `h` changes to
`(?, ?, Yes, Yes)` because 'Urban' changes to '?' due to 'Suburban', and
'Large' changes to '?' due to 'Medium'.
4. Third Positive Example
(Urban, Medium, Yes, Yes):
- `h` remains `(?, ?, Yes, Yes)` as it already covers this example.
Result:
The final hypothesis `h = (?, ?, Yes, Yes)` indicates that Jordan prefers houses with a garden and good schools nearby, regardless of their location and size.
Limitations:
- The Find-S algorithm cannot handle negative examples. If a
negative example fits the hypothesis, the algorithm has no mechanism to correct
it.
- It finds only the maximally specific hypothesis, which
might not be the most accurate or useful representation of the target concept.
- It assumes that the training data is noise-free and
consistent.
Comments
Post a Comment