identifying buying preferences of customers

5
ISSN: 2277-4629 ( Onli ne) | ISSN: 2250- 1827 (Print ) CPMR-IJT Vol. 2, No.1, June 2012 www .cpmr.or g.in CPMR-IJT: International Journal of Technology 1 Identifying Buying Preferences of Customers in Real Estate Industry Using Data Mining T echniques Aman Gupta* Gaurav Dubey** Buying preferences of a customer in real estate industry may depend on multiple factors such as income, age,  pr of es si on, fa mi ly si ze et c. Fu rt he r a co mb in at io n of th es e factors eventually decide the size of the flat purchased, whether loan would be required or not etc. In this paper we are considering various attributes of a cu stomer  buy ing a fla t suc h as: Ag e Occupation Family Income Family size  No. of Bedro oms in the flat Area of the flat Cost of the flat Whether Loan is required The customer of the flat is primarily identified by his/her occupation. In this paper we are segregating the customer’s occupation into three broad categories which are: Professional : This may include teachers, doctors, lawyers, IT professionals etc. Govt. Employees: This may include Bureaucrats, PSU Employees, Nationalized Bank Employees, Ex Defense Personnel etc. *M.Tech. Scholar, Amity University, Noida **Asst. Professor, Amity University, Noida ABSTRACT With an enormous amount of data stored in databases and data warehouses, it is increasingly important to develop powerful tools for analysis of  such data and mining interesting knowledge from it. Data mining is a process of inferring knowledge  from such huge data. The main problem r elated to the retrieval of information from the Real Estate  Industry is the enormous number of unstructured documents and resources, i.e., the difficulty of locating and tracking appropriate sources. In this  paper , a survey of the research in the area of real estate is presented with its findings. Applying Data mining techniques to the real estate industry can be very useful in extracting customer preferences at any given time.  Keywor ds:  Data Mining, Real Estate, Custome r  pre fer ences. I. INTRODUCTION In real estate business today, companies are working fast to gain a valuable competitive advantage over other  competitors. A fast -growing and popular technology, which can help to gain this advantage, is data mining.

Upload: cpmr

Post on 05-Apr-2018

216 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Identifying Buying Preferences of Customers

7/31/2019 Identifying Buying Preferences of Customers...

http://slidepdf.com/reader/full/identifying-buying-preferences-of-customers 1/5

ISSN: 2277-4629 (Online) | ISSN: 2250-1827 (Print) CPMR-IJT Vol. 2, No.1, June 2012

www.cpmr.org.in CPMR-IJT: International Journal of Technology 1

Identifying Buying Preferences of Customers in Real

Estate Industry Using Data Mining Techniques

Aman Gupta*

Gaurav Dubey**

Buying preferences of a customer in real estate industry

may depend on multiple factors such as income, age profession, family size etc. Further a combination of these

factors eventually decide the size of the flat purchased

whether loan would be required or not etc. In this paper

we are considering various attributes of a customer

 buying a flat such as:

• Age

• Occupation

• Family Income

• Family size

•  No. of Bedrooms in the flat

• Area of the flat

• Cost of the flat

• Whether Loan is required

The customer of the flat is primarily identified by

his/her occupation. In this paper we are segregating

the customer’s occupation into three broad categories

which are:

Professional: This may include teachers, doctors

lawyers, IT professionals etc.

• Govt. Employees: This may include Bureaucrats

PSU Employees, Nationalized Bank Employees

Ex Defense Personnel etc.

*M.Tech. Scholar, Amity University, Noida

**Asst. Professor, Amity University, Noida

ABSTRACT

With an enormous amount of data stored in

databases and data warehouses, it is increasingly

important to develop powerful tools for analysis of 

 such data and mining interesting knowledge from

it. Data mining is a process of inferring knowledge

 from such huge data. The main problem related to

the retrieval of information from the Real Estate

 Industry is the enormous number of unstructured 

documents and resources, i.e., the difficulty of 

locating and tracking appropriate sources. In this paper, a survey of the research in the area of real 

estate is presented with its findings. Applying Data

mining techniques to the real estate industry can be

very useful in extracting customer preferences at 

any given time.

 Keywords:  Data Mining, Real Estate, Customer 

 preferences.

I. INTRODUCTION

In real estate business today, companies are working

fast to gain a valuable competitive advantage over other 

competitors. A fast-growing and popular technology,

which can help to gain this advantage, is data mining.

Page 2: Identifying Buying Preferences of Customers

7/31/2019 Identifying Buying Preferences of Customers...

http://slidepdf.com/reader/full/identifying-buying-preferences-of-customers 2/5

ISSN: 2277-4629 (Online) | ISSN: 2250-1827 (Print) CPMR-IJT Vol. 2, No.1, June 2012

www.cpmr.org.in CPMR-IJT: International Journal of Technology 2

• Businessman: Everybody who is self employed

or has his/her own business is included in this

category.

The data is collected from 300 customers of real

estate projects going on alongside Noida-Greater Noidaexpressway. This belt has seen massive and rapid

development of housing projects during last three years.

Finding correlation between the above mentioned

attributes can be very useful for real estate companies

for future sales and tailoring their projects according to

the needs of the contemporary buyers, who may have

different needs and aspirations then earlier generations.

This is where data mining can be very handy. Data

mining technology allows a company to use the mass

quantities of data that it has compiled, and developcorrelations and relationships among this data to help

 businesses improve efficiency, learn more about its

customers, make better decisions, and help in planning.

Data Mining has three major components

Clustering or Classification, Association Rules and

Sequence Analysis. This technology can develop these

analyses on its own, using a blend of statistics, artificial

intelligence, machine learning algorithms, and data stores.

II. DATA MININGData mining is a tool that can extract predictive

information from large quantities of data, and is data

driven. It uses mathematical and statistical calculations

to uncover trends and correlations among the large

quantities of data stored in a database. It is a blend of 

artificial intelligence technology, statistics, data

warehousing, and machine learning.

Data mining started with statistics. Statistical

functions such as standard deviation, regression analysis,

and variance are all valuable tools that allow people to

study the reliability and relationships between data. Much

of what data mining does is rooted in statistics, making

it one of the cornerstones of data mining technology.

In the 1970’s data was stored using large mainframe

systems and COBOL programming techniques. These

simplistic beginnings gave way to very large databases

called “data warehouses”, which store data in one

standard format. The dictionary definition of a data

warehouse is “a generic term for storing, retrieving, and

managing large amounts of data” [1]. These datawarehouses “can now store and query terabytes and

megabytes of data in sophisticated database

management systems”[2]. These data stores are an

essential part of data mining, because a cornerstone of 

the technology is that it needs very large amounts of 

organized data to manipulate. In addition to basic

statistics and large data warehouses, a major part of 

data mining technology is artificial intelligence (AI).

Artificial intelligence started in the 1980’s with a set

of algorithms that were designed to teach a computer how to “learn” by itself. As they developed, these

algorithms became valuable data manipulation tools and

were applied to large sets of data. Instead of entering a

set of pre-defined hypothesis, the data mining software,

combined with AI technology was able to generate its

own relationships between the data. It was even able to

analyze data and discover correlations between the data

on its own, and develop models to help the developers

interpret the relationships that were found.

AI gave way to machine learning. Machine learningis defined as the ability of a machine to improve its

 performance based on previous results. Machine

learning is the next step in artificial intelligence technology

 because it blends trial and error learning by the system

with statistical analysis. This lets the software to learn

on its own and allows it to make decisions regarding

the data it is trying to analyze.

Later in the 1990’s data mining became wildly

 popular. Many companies began to use the data mining

technology and found that it was much easier than having

actual people work with such large amounts of data

and attributes. This technology allows the systems to

“think” for themselves and run analysis that would

 provide trend and correlation information for the data

in the tables. In 2001, the use of data warehouses grew

 by over a third to 77% [3].

Page 3: Identifying Buying Preferences of Customers

7/31/2019 Identifying Buying Preferences of Customers...

http://slidepdf.com/reader/full/identifying-buying-preferences-of-customers 3/5

ISSN: 2277-4629 (Online) | ISSN: 2250-1827 (Print) CPMR-IJT Vol. 2, No.1, June 2012

www.cpmr.org.in CPMR-IJT: International Journal of Technology 3

Data mining is a very important tool for business

and as time goes on, business is becoming more and

more competitive and everyone is scrambling for a

competitive edge. Businesses need to gain a competitive

edge, and can get it from the increased awareness theycan get from data mining software that is available on

the market right now[4].

Data mining approaches can handle high-

dimensional heterogeneous data with a high degree of 

sparseness and multicollinearity, and with a significant

 percentage of outliners/leverage points and missing

values, and are able to discover uncharacterizable non-

linearities among differently scaled variables in high-

dimensional space [5].

2.1 Association Rule Algorithms

An association rule is a rule which implies certain

association relationships among a set of objects (such

as “occur together” or “one implies the other”) in a

database. Given a set of transactions, where each

transaction is a set of literals (called items), an association

rule is an expression of the form X Y , where X and Y

are sets of items. The intuitive meaning of such a rule is

that transactions of the database which contain X tend

to contain Y.

2.2 Classification Algorithms

In Data classification one develops a description or 

model for each class in a database, based on the features

 present in a set of class-labeled training data. There

have been many data classification methods studied,

including decision-tree methods, such as statistical

methods, neural networks, rough sets, database-

oriented methods etc.

2.3 Sequential Analysis

Here we are looking for a Sequential Patterns, called

data-sequences. Each data sequence is an ordered list

of transactions (or item sets), where each transaction

is a sets of items (literals). Typically there is a

transaction-time associated with each transaction. A

sequential pattern also consists of a list of sets of items.

The problem is to find all sequential patterns with a

user specified minimum support, where the support of

a sequential pattern is the percentage of data sequences

that contain the pattern.

III. REAL ESTATE SURVEY & ITS

FINDINGS

A survey was conducted among 300 flat customers

 buying flats in and around Noida - Greater Noida

Expressway. The following questions were asked

• Your Name

• Your Age

• Your Occupation• Your Annual family Income (in Rs.)

• Your Family Size

• No. of Bedrooms in the flat

• Area of the flat (in sq.ft.)

• Cost of the flat (in Rs.)

• Whether loan is taken or not

The survey data and its subsequent mining have

thrown light on several interesting points. From Figure

1 it is evidently clear that the majority of people buyingflats in this area are Professionals. This is mainly due to

the fact that there are many Private sector companies in

this area like HCL, NIIT, Accenture, Metlife, Moserbaer

employing thousands of professionals who need

accommodation near their workplaces.

Figure 1: Occupation Distribution

Page 4: Identifying Buying Preferences of Customers

7/31/2019 Identifying Buying Preferences of Customers...

http://slidepdf.com/reader/full/identifying-buying-preferences-of-customers 4/5

ISSN: 2277-4629 (Online) | ISSN: 2250-1827 (Print) CPMR-IJT Vol. 2, No.1, June 2012

www.cpmr.org.in CPMR-IJT: International Journal of Technology 4

Figure 2:Average age of customers

In Figure 2 we see that the Professionals decide to

 buy a flat at very early age as compared to Govt.

Employees or Businessmen. This points to the

aspirational and non conservative nature of the

Professionals and younger generation. They start

investing in real estate at much earlier stage than Govt.

employees.

Figure 3:Avg. annual income(in Rs.)

Average annual income, as we see in Figure 3 is

expectedtly highest among the Businesssman because

of the nature of work.

Figure 4:Avg. family size

In Figure 4 we see that the average family size for 

 professionals is considerably lower than the other two

categories. This is due to two prominent reasons. One

 being that the professionals buying the flats are younger 

then other two categories as we see in Figure 1. Secondly

young professionals are conciously choosing to have

smaller families because of the urban lifestyle constraints.

Figure 5: Avg. bedrooms in the flat purchased

In Figure 5 we see that the busineesman buy larger 

flats in terms of no. of bedrooms. This is due to the fact

that they have bigger incomes.

Figure 6:Avg. Area of the flat(in sq.ft.)

Comapring Figure 5 and Figure 6 gives an important

finding. While the gap between avg. no. of bedrooms

 between professionals and govt. Employees is about15% (1.87 vs 2.18), but the gap when seen from the

area point of view is only 5%(1000 sq.ft. vs 1045 sq.ft.).

This indicates that the empasis for professionals is on

 bigger flat with lesser bedrooms while for govt.

employees the prefernce is more no. of bedrooms. This

can be attributed to the comparatively smaller family

size of professionals and larger family size of govt.

employees as indictaed in Figure 4.

Figure 7:Avg. cost of flat

Page 5: Identifying Buying Preferences of Customers

7/31/2019 Identifying Buying Preferences of Customers...

http://slidepdf.com/reader/full/identifying-buying-preferences-of-customers 5/5

ISSN: 2277-4629 (Online) | ISSN: 2250-1827 (Print) CPMR-IJT Vol. 2, No.1, June 2012

www.cpmr.org.in CPMR-IJT: International Journal of Technology 5

Although the professionals purchase flats with lesser 

no. of bedrooms(Figure 5) and with lesser area(Figure

6) as compared to govt. employees but the averge cost

of the flat for professionals is higher as compared to

govt. employees, as indicated in Figure 7. This is mainly

due to the fact that professionals have greater tendency

in choosing premium specfications and better located

 projects as compared to govt. employees.

Figure 8: Percentage of people taking loan

Another important aspect of the real estate industry

is the loan facility. Most people rely on long term loans

to purchase a flat. This is confirmed in the survey where

 percentage of professionals taking loan is as high as

96.6% (Figure 8). This is expected as the professionals

are younger demographic compared to other two

occupation so their need for loan and reliance on loan is

much higher as compared to other two. Also the

 percentage of businessman taking loan is much lower 

owing to the fact that they have higher overall incomes.

Figure 9: Avg. age of people taking loan/not taking

loan

Older people tend not to take loan as indicated by

Figure 9. This is true across all ocupation whether it is

 professionals, businessman, or govt. employees. This is

mainly due to two reasons; one being that loan

companies prefer giving loans to younger people so it is

difficult for older people to get loan, secondly older 

 people have much more savings so they can afford to

 buy the flat outright.

IV. CONCLUSION

Real Estate sector is witnessing a significant change inits customer profile from earlier times which necessitates

the change in designing of new projects. Majority of the

customers are young professionals with smaller families

who do not mind paying a bit extra for premium facilities

Rest of the market id divided between Govt. Employees

and Businessmen. Govt.Employees are conservative in

their buying prefernces whereas businessman tend to

the buy the flats outright with best availble specifications

a developer can offer. Further research in this area can

shed light on even more customer prefernces and their

realtivet importance to the customer. This may include

features like distance to school, hospital, market, railway

station etc. The relative reputaion of the developer

importance of reccomendation from a peer etc. Since

this research would involve even larger and more

complex data then what used in this paper, the data

mining techiniques are of paramount importance in

making sense of the raw data compiled.

V. REFERENCES

[1] “Data Mining” Def. www. Dictionary.Com. Date

of retrieval: 01/06/2012.

[2] Carbone, P. (August, 2000). What is the Origin

of Data Mining? www.mitre.org/pubs/edge

august_00/carbone.htm Date of retrieval: 01/

06/2012.

[3] Hardison,. (2002). Data Mining: The New Gold

Rush. Pharmaceutical Executive. March, 26

28, 30

[4] Montana, J. (2001). Data Mining: A Slippery

Slope.  Information Management Journal.

October, 50-54.

[5] Brusilovskiy, P. (2007), Data Mining in

Pharmaceutical Marketing and Sales Analysis

2007 ICSA Appplied Statistics Symposium.