Lesson5.1. Concept Learning

What is Concept Learning?

Input Training Dataset: Positive and negative samples. Eg. Images of cat and not cat.

Output : Identify Category. Eg. Binary Classification of Cat or not cat

Representation of output : Boolean. 

{1:cat, 0:otherwise}

From specific examples of cat and not cat images, it understands the concept of cat or not cat.

Dataset

Consider a dataset to a real-world scenario where it predicts whether a person, let's call them "Jordan", will buy a house based on various factors (features).

Scenario: 

Will Jordan Buy a House?

Features Explanation:

Feature 1 (Location): The type of location (A - Urban, B - Suburban, C - Rural).

Feature 2 (Budget): Is the house within Jordan's budget? (Yes, No).

Feature 3 (Size): What is the size of the house adequate? (small, medium, large).

Feature 4 (Garden): Does the house have a garden? (Yes, No).

Feature 5 (Schools Nearby): Are there good schools nearby? (Yes, No).

Feature 6 (Public Transport): Is there convenient public transport? (Yes, No).

Outcomes:

"Yes" means Jordan will buy the house.

"No" means Jordan will not buy the house.

Training Dataset(D) - With Instances(X)

Instance

Location

Budget

Size

Garden

Schools Nearby

Public Transport

Will Buy House (Label)

1

Urban

No

Small

No

Yes

No

No

2

Rural

Yes

Large

No

No

No

No

3

Rural

Yes

Small

Yes

Yes

Yes

Yes

4

Urban

Yes

Medium

Yes

No

Yes

Yes

Target Concept(C) : 

c: Buy House : X -> {0,1}

Will Jordan buy the house or not

Task : 

Learn the target concept

Hypothesis(H)

It is the statement that a researcher sets out to test through experimentation.

Representing Hypothesis as Vector

Jordan is making a decision on whether to buy a house based on a set of six features. Each feature has a set of possible values. The decision ("Yes" to buy or "No" not to buy) depends on how these features align with Jordan's preferences or requirements. 

 A hypothesis in this scenario could look like a vector with six elements, each corresponding to one of the features. 

Three kinds of values can be assigned to each attribute in the hypothesis:

"?" (Question Mark): 

This symbol represents that any value of the attribute is acceptable. It indicates that this particular attribute is not a deciding factor in the hypothesis.

 A specific value 

(e.g., A - Urban, B - Suburban, C - Rural, etc.): 

This indicates that the attribute must have this particular value for the hypothesis to hold true. It's a strict requirement that this attribute takes a specific value.

 "0" (Zero): 

This represents that no value is acceptable for the attribute. It implies that the presence of any value for this attribute will invalidate the hypothesis.

For example:

`("A", "Yes", "large", "Yes", "Yes", "No")` 

could represent a preference for an Urban (A) house, within budget ("Yes"), large in size, with a garden ("Yes"), with good schools nearby ("Yes"), and it's not important if there's convenient public transport ("No").

 `("?", "Yes", "?", "No", "Yes", "?")` 

means the location and size don't matter, but the house must be within budget, should not have a garden, must have good schools nearby, and public transport is not a concern.

 Decision Making:

Jordan's decision to buy a house ("Yes") or not ("No") will be based on how well a particular house aligns with his hypothesis. 

If a house satisfies all the constraints of his hypothesis, then it's classified as a positive example ("Yes"), and he will decide to buy it. If not, it will be classified as a negative example ("No"), and he will decide not to buy it.

 Hypothesis with "0":

The "0" in a hypothesis vector indicates an attribute value that is absolutely unacceptable or disqualifying. Let's say Jordan absolutely does not want a house in a rural area, does not want a small house, and insists on having a garden. Here's how the hypothesis could be represented:

("0", "?", "0", "Yes", "?", "?")

In this example:

"Rural" is unacceptable, and it should be defined as such in the rules or documentation accompanying the hypothesis.

1. Location: "0" means a rural location is absolutely unacceptable. Jordan will not consider rural houses, regardless of other attributes.

2. Budget: "?" indicates that Jordan is flexible regarding the budget.

3. Size: "0" indicates that a small house is a deal-breaker. Jordan only wants medium or large-sized houses.

4. Garden: "Yes" means the house must have a garden. This is a mandatory requirement.

5. Schools Nearby: "?" suggests that the presence of good schools nearby is not a deciding factor for Jordan.

6. Public Transport: "?" indicates that the availability of convenient public transport is not a crucial factor.

 Interpretation:

- If a house is in a rural area (regardless of other features), Jordan will not buy it.

- If a house is small (regardless of other features), Jordan will not buy it.

- Jordan will only consider houses with a garden.

- Other factors like budget, schools nearby, and public transport are flexible and not deal-breakers in this hypothesis.

This hypothesis demonstrates how Jordan can use specific criteria (and deal-breakers) to decide whether to buy a house. The "0" values play a crucial role in immediately disqualifying certain options.

To use "0" effectively in such a hypothesis, there should be a clear understanding or predefined rule of what "0" stands for in the context of each attribute. For instance:

This approach ensures that the hypothesis is clear and can be accurately used for decision-making.

Determine

A hypothesis h in H such that h(x) = c(x) for all x in X

That is, if (f1, f2, ...f6) --> 1, then the hypothesis when applied on the instance should give 1.




Comments

Popular posts from this blog

ANN Series - 10 - Backpropagation of Errors

Naive Bayesian Classifiers - Multinomial, Bernoulli and Gaussian with Solved Examples and Laplace Smoothing

Clustering - K means Clustering