Lesson5.5. Find-S

 The Find-S algorithm is a simple concept learning algorithm used in machine learning to find the maximally specific hypothesis that fits all the positive examples in the training data. The algorithm starts with the most specific hypothesis and generalizes it step by step as it encounters positive training examples.

 Characteristics of Find-S Algorithm:

1. Initial Hypothesis: 

The initial hypothesis is the most specific one - it assumes no value is acceptable for any feature.

2. Positive Examples Only: 

It considers only positive examples for generalization. Negative examples are ignored in the hypothesis formation.

3. Generalization: 

When a positive example doesn't fit the current hypothesis, the algorithm generalizes the hypothesis just enough to include the example.

 Steps of the Find-S Algorithm:

1. Initialize `h`: 

Start with the most specific hypothesis, where all features are set to none or '0'.

2. For each Positive Training Example:

   - For each attribute in the hypothesis:

     - If the attribute value in the example matches the hypothesis, do nothing.

     - If the attribute value in the hypothesis is '0', replace it with the value in the example.

     - If the attribute value in the hypothesis is different from the example, replace it with '?' (a wildcard that accepts any value).

3. Ignore Negative Examples: 

Negative examples don't cause any change in the hypothesis.

4. Result: 

The final hypothesis covers all positive examples and is as specific as possible under these constraints.

 Example with House Buying Scenario:

Example 1:

Training Dataset(D) - With Instances(X)

Instance

Location

Budget

Size

Garden

Schools Nearby

Public Transport

Will Buy House (Label)

1

Urban

No

Small

No

Yes

No

No

2

Rural

Yes

Large

No

No

No

No

3

Rural

Yes

Small

Yes

Yes

Yes

Yes

4

Urban

Yes

Medium

Yes

No

Yes

Yes

Remember that Find-S starts with the most specific hypothesis and generalizes it as positive instances are encountered. Here, the most specific hypothesis would be the one that has "0" for all attributes, meaning no values are acceptable initially.

For the Find-S algorithm, we only consider positive instances where "Will Buy House" is "Yes".

Find-S Algorithm

Start with the most specific hypothesis:
`h = ['0', '0', '0', '0', '0', '0']`

For each positive instance, update the hypothesis `h` as follows:

After Instance 3:
`h = ['C', 'Yes', 'Small', 'Yes', 'Yes', 'Yes']`

After Instance 4, generalize the hypothesis `h` based on the differing attributes from Instance 3. Since Instance 4 has 'Urban' for 'Location' and 'Medium' for 'Size' which are different from 'Rural' and 'Small' in Instance 3, we will put a "?" in those places to indicate that any value is acceptable:

`h = ['?', 'Yes', '?', 'Yes', '?', 'Yes']`

Final hypothesis from Find-S:
`h = ['?', 'Yes', '?', 'Yes', '?', 'Yes']`

This hypothesis states that for someone to want to buy a house, according to the training data, it must have a 'Yes' for 'Budget', a 'Yes' for 'Garden', and a 'Yes' for 'Public Transport'. The 'Location', 'Size', and 'Schools Nearby' attributes can be any value, as they are not consistent factors in determining the outcome in the positive instances.

Example 2

Let's apply the Find-S algorithm to your house-buying scenario. Suppose Jordan has four features to consider (Location, Size, Garden, Schools Nearby) and the following positive examples of houses he likes:

1. (Urban, Large, Yes, Yes)

2. (Suburban, Medium, Yes, Yes)

3. (Urban, Medium, Yes, Yes)

 Applying Find-S:

1. Initialize `h`: 

Start with `h = (0, 0, 0, 0)`.

2. First Positive Example 

(Urban, Large, Yes, Yes)`:

   - `h` becomes `(Urban, Large, Yes, Yes)`.

3. Second Positive Example 

(Suburban, Medium, Yes, Yes)`:

   - `h` changes to `(?, ?, Yes, Yes)` because 'Urban' changes to '?' due to 'Suburban', and 'Large' changes to '?' due to 'Medium'.

4. Third Positive Example 

(Urban, Medium, Yes, Yes):

   - `h` remains `(?, ?, Yes, Yes)` as it already covers this example.

 Result:

The final hypothesis `h = (?, ?, Yes, Yes)` indicates that Jordan prefers houses with a garden and good schools nearby, regardless of their location and size.

 Limitations:

- The Find-S algorithm cannot handle negative examples. If a negative example fits the hypothesis, the algorithm has no mechanism to correct it.

- It finds only the maximally specific hypothesis, which might not be the most accurate or useful representation of the target concept.

- It assumes that the training data is noise-free and consistent.


Comments

Popular posts from this blog

ANN Series - 10 - Backpropagation of Errors

Naive Bayesian Classifiers - Multinomial, Bernoulli and Gaussian with Solved Examples and Laplace Smoothing

Clustering - K means Clustering