Lesson5.1. Concept Learning
What is Concept Learning?
Input Training Dataset: Positive and negative samples. Eg. Images of cat and not cat.
Output : Identify Category. Eg. Binary Classification of Cat or not cat
Representation of output : Boolean.
{1:cat, 0:otherwise}
From specific examples of cat and not cat images, it understands the concept of cat or not cat.
Dataset
Consider a dataset to a real-world scenario where it
predicts whether a person, let's call them "Jordan", will buy a house
based on various factors (features).
Scenario:
Will Jordan Buy a House?
Features Explanation:
Feature 1 (Location): The type of location (A - Urban, B -
Suburban, C - Rural).
Feature 2 (Budget): Is the house within Jordan's budget?
(Yes, No).
Feature 3 (Size): What is the size of the house adequate? (small,
medium, large).
Feature 4 (Garden): Does the house have a garden? (Yes, No).
Feature 5 (Schools Nearby): Are there good schools nearby?
(Yes, No).
Feature 6 (Public Transport): Is there convenient public
transport? (Yes, No).
Outcomes:
"Yes" means Jordan will buy the house.
"No" means Jordan will not buy the house.
Training Dataset(D) - With Instances(X)
Instance |
Location |
Budget |
Size |
Garden |
Schools
Nearby |
Public
Transport |
Will Buy
House (Label) |
1 |
Urban |
No |
Small |
No |
Yes |
No |
No |
2 |
Rural |
Yes |
Large |
No |
No |
No |
No |
3 |
Rural |
Yes |
Small |
Yes |
Yes |
Yes |
Yes |
4 |
Urban |
Yes |
Medium |
Yes |
No |
Yes |
Yes |
Target Concept(C) :
Task :
Hypothesis(H)
It is the statement that
a researcher sets out to test through experimentation.
Representing Hypothesis as Vector
Jordan is making a decision on whether to buy a house based on a set of six features. Each feature has a set of possible values. The decision ("Yes" to buy or "No" not to buy) depends on how these features align with Jordan's preferences or requirements.
A hypothesis in this scenario could look like a vector with six elements, each corresponding to one of the features.
Three kinds of values can be assigned to each attribute in the hypothesis:
"?" (Question Mark):
This symbol represents that
any value of the attribute is acceptable. It indicates that this particular
attribute is not a deciding factor in the hypothesis.
A specific value
(e.g., A - Urban, B - Suburban, C - Rural, etc.):
This indicates that the attribute must have this particular value for the hypothesis to hold true. It's a strict requirement that this attribute takes a specific value.
"0" (Zero):
This represents that no value is acceptable for the attribute. It implies that the presence of any value for this attribute will invalidate the hypothesis.
For example:
`("A", "Yes", "large", "Yes", "Yes", "No")`
could represent a preference for an Urban (A) house, within budget ("Yes"), large in size, with a garden ("Yes"), with good schools nearby ("Yes"), and it's not important if there's convenient public transport ("No").
`("?", "Yes", "?", "No", "Yes", "?")`
means the location and size don't matter, but the house must be within budget, should not have a garden, must have good schools nearby, and public transport is not a concern.
Decision Making:
Jordan's decision to buy a house ("Yes") or not ("No") will be based on how well a particular house aligns with his hypothesis.
If a house satisfies all the constraints of his hypothesis, then it's classified as a positive example ("Yes"), and he will decide to buy it. If not, it will be classified as a negative example ("No"), and he will decide not to buy it.
Hypothesis with "0":
The "0" in a hypothesis vector indicates an attribute value that is absolutely unacceptable or disqualifying. Let's say Jordan absolutely does not want a house in a rural area, does not want a small house, and insists on having a garden. Here's how the hypothesis could be represented:
("0", "?", "0", "Yes", "?", "?")
In this example:
"Rural" is unacceptable, and it should be defined as such in the rules or documentation accompanying the hypothesis.
1. Location: "0" means a rural location is absolutely unacceptable. Jordan will not consider rural houses, regardless of other attributes.
2. Budget: "?" indicates that Jordan is flexible regarding the budget.
3. Size: "0" indicates that a small house is a deal-breaker. Jordan only wants medium or large-sized houses.
4. Garden: "Yes" means the house must have a garden. This is a mandatory requirement.
5. Schools Nearby: "?" suggests that the presence of good schools nearby is not a deciding factor for Jordan.
6. Public Transport: "?" indicates that the availability of convenient public transport is not a crucial factor.
Interpretation:
- If a house is in a rural area (regardless of other
features), Jordan will not buy it.
- If a house is small (regardless of other features), Jordan
will not buy it.
- Jordan will only consider houses with a garden.
- Other factors like budget, schools nearby, and public transport are flexible and not deal-breakers in this hypothesis.
This hypothesis demonstrates how Jordan can use specific criteria (and deal-breakers) to decide whether to buy a house. The "0" values play a crucial role in immediately disqualifying certain options.
To use "0" effectively in such a hypothesis, there should be a clear understanding or predefined rule of what "0" stands for in the context of each attribute. For instance:
This approach ensures that the hypothesis is clear and can be accurately used for decision-making.
Determine
A hypothesis h in H such that h(x) = c(x) for all x in X
That is, if (f1, f2, ...f6) --> 1, then the hypothesis when applied on the instance should give 1.
Comments
Post a Comment