identifying buying preferences of customers
TRANSCRIPT
7/31/2019 Identifying Buying Preferences of Customers...
http://slidepdf.com/reader/full/identifying-buying-preferences-of-customers 1/5
ISSN: 2277-4629 (Online) | ISSN: 2250-1827 (Print) CPMR-IJT Vol. 2, No.1, June 2012
www.cpmr.org.in CPMR-IJT: International Journal of Technology 1
Identifying Buying Preferences of Customers in Real
Estate Industry Using Data Mining Techniques
Aman Gupta*
Gaurav Dubey**
Buying preferences of a customer in real estate industry
may depend on multiple factors such as income, age profession, family size etc. Further a combination of these
factors eventually decide the size of the flat purchased
whether loan would be required or not etc. In this paper
we are considering various attributes of a customer
buying a flat such as:
• Age
• Occupation
• Family Income
• Family size
• No. of Bedrooms in the flat
• Area of the flat
• Cost of the flat
• Whether Loan is required
The customer of the flat is primarily identified by
his/her occupation. In this paper we are segregating
the customer’s occupation into three broad categories
which are:
Professional: This may include teachers, doctors
lawyers, IT professionals etc.
• Govt. Employees: This may include Bureaucrats
PSU Employees, Nationalized Bank Employees
Ex Defense Personnel etc.
*M.Tech. Scholar, Amity University, Noida
**Asst. Professor, Amity University, Noida
ABSTRACT
With an enormous amount of data stored in
databases and data warehouses, it is increasingly
important to develop powerful tools for analysis of
such data and mining interesting knowledge from
it. Data mining is a process of inferring knowledge
from such huge data. The main problem related to
the retrieval of information from the Real Estate
Industry is the enormous number of unstructured
documents and resources, i.e., the difficulty of
locating and tracking appropriate sources. In this paper, a survey of the research in the area of real
estate is presented with its findings. Applying Data
mining techniques to the real estate industry can be
very useful in extracting customer preferences at
any given time.
Keywords: Data Mining, Real Estate, Customer
preferences.
I. INTRODUCTION
In real estate business today, companies are working
fast to gain a valuable competitive advantage over other
competitors. A fast-growing and popular technology,
which can help to gain this advantage, is data mining.
7/31/2019 Identifying Buying Preferences of Customers...
http://slidepdf.com/reader/full/identifying-buying-preferences-of-customers 2/5
ISSN: 2277-4629 (Online) | ISSN: 2250-1827 (Print) CPMR-IJT Vol. 2, No.1, June 2012
www.cpmr.org.in CPMR-IJT: International Journal of Technology 2
• Businessman: Everybody who is self employed
or has his/her own business is included in this
category.
The data is collected from 300 customers of real
estate projects going on alongside Noida-Greater Noidaexpressway. This belt has seen massive and rapid
development of housing projects during last three years.
Finding correlation between the above mentioned
attributes can be very useful for real estate companies
for future sales and tailoring their projects according to
the needs of the contemporary buyers, who may have
different needs and aspirations then earlier generations.
This is where data mining can be very handy. Data
mining technology allows a company to use the mass
quantities of data that it has compiled, and developcorrelations and relationships among this data to help
businesses improve efficiency, learn more about its
customers, make better decisions, and help in planning.
Data Mining has three major components
Clustering or Classification, Association Rules and
Sequence Analysis. This technology can develop these
analyses on its own, using a blend of statistics, artificial
intelligence, machine learning algorithms, and data stores.
II. DATA MININGData mining is a tool that can extract predictive
information from large quantities of data, and is data
driven. It uses mathematical and statistical calculations
to uncover trends and correlations among the large
quantities of data stored in a database. It is a blend of
artificial intelligence technology, statistics, data
warehousing, and machine learning.
Data mining started with statistics. Statistical
functions such as standard deviation, regression analysis,
and variance are all valuable tools that allow people to
study the reliability and relationships between data. Much
of what data mining does is rooted in statistics, making
it one of the cornerstones of data mining technology.
In the 1970’s data was stored using large mainframe
systems and COBOL programming techniques. These
simplistic beginnings gave way to very large databases
called “data warehouses”, which store data in one
standard format. The dictionary definition of a data
warehouse is “a generic term for storing, retrieving, and
managing large amounts of data” [1]. These datawarehouses “can now store and query terabytes and
megabytes of data in sophisticated database
management systems”[2]. These data stores are an
essential part of data mining, because a cornerstone of
the technology is that it needs very large amounts of
organized data to manipulate. In addition to basic
statistics and large data warehouses, a major part of
data mining technology is artificial intelligence (AI).
Artificial intelligence started in the 1980’s with a set
of algorithms that were designed to teach a computer how to “learn” by itself. As they developed, these
algorithms became valuable data manipulation tools and
were applied to large sets of data. Instead of entering a
set of pre-defined hypothesis, the data mining software,
combined with AI technology was able to generate its
own relationships between the data. It was even able to
analyze data and discover correlations between the data
on its own, and develop models to help the developers
interpret the relationships that were found.
AI gave way to machine learning. Machine learningis defined as the ability of a machine to improve its
performance based on previous results. Machine
learning is the next step in artificial intelligence technology
because it blends trial and error learning by the system
with statistical analysis. This lets the software to learn
on its own and allows it to make decisions regarding
the data it is trying to analyze.
Later in the 1990’s data mining became wildly
popular. Many companies began to use the data mining
technology and found that it was much easier than having
actual people work with such large amounts of data
and attributes. This technology allows the systems to
“think” for themselves and run analysis that would
provide trend and correlation information for the data
in the tables. In 2001, the use of data warehouses grew
by over a third to 77% [3].
7/31/2019 Identifying Buying Preferences of Customers...
http://slidepdf.com/reader/full/identifying-buying-preferences-of-customers 3/5
ISSN: 2277-4629 (Online) | ISSN: 2250-1827 (Print) CPMR-IJT Vol. 2, No.1, June 2012
www.cpmr.org.in CPMR-IJT: International Journal of Technology 3
Data mining is a very important tool for business
and as time goes on, business is becoming more and
more competitive and everyone is scrambling for a
competitive edge. Businesses need to gain a competitive
edge, and can get it from the increased awareness theycan get from data mining software that is available on
the market right now[4].
Data mining approaches can handle high-
dimensional heterogeneous data with a high degree of
sparseness and multicollinearity, and with a significant
percentage of outliners/leverage points and missing
values, and are able to discover uncharacterizable non-
linearities among differently scaled variables in high-
dimensional space [5].
2.1 Association Rule Algorithms
An association rule is a rule which implies certain
association relationships among a set of objects (such
as “occur together” or “one implies the other”) in a
database. Given a set of transactions, where each
transaction is a set of literals (called items), an association
rule is an expression of the form X Y , where X and Y
are sets of items. The intuitive meaning of such a rule is
that transactions of the database which contain X tend
to contain Y.
2.2 Classification Algorithms
In Data classification one develops a description or
model for each class in a database, based on the features
present in a set of class-labeled training data. There
have been many data classification methods studied,
including decision-tree methods, such as statistical
methods, neural networks, rough sets, database-
oriented methods etc.
2.3 Sequential Analysis
Here we are looking for a Sequential Patterns, called
data-sequences. Each data sequence is an ordered list
of transactions (or item sets), where each transaction
is a sets of items (literals). Typically there is a
transaction-time associated with each transaction. A
sequential pattern also consists of a list of sets of items.
The problem is to find all sequential patterns with a
user specified minimum support, where the support of
a sequential pattern is the percentage of data sequences
that contain the pattern.
III. REAL ESTATE SURVEY & ITS
FINDINGS
A survey was conducted among 300 flat customers
buying flats in and around Noida - Greater Noida
Expressway. The following questions were asked
• Your Name
• Your Age
• Your Occupation• Your Annual family Income (in Rs.)
• Your Family Size
• No. of Bedrooms in the flat
• Area of the flat (in sq.ft.)
• Cost of the flat (in Rs.)
• Whether loan is taken or not
The survey data and its subsequent mining have
thrown light on several interesting points. From Figure
1 it is evidently clear that the majority of people buyingflats in this area are Professionals. This is mainly due to
the fact that there are many Private sector companies in
this area like HCL, NIIT, Accenture, Metlife, Moserbaer
employing thousands of professionals who need
accommodation near their workplaces.
Figure 1: Occupation Distribution
7/31/2019 Identifying Buying Preferences of Customers...
http://slidepdf.com/reader/full/identifying-buying-preferences-of-customers 4/5
ISSN: 2277-4629 (Online) | ISSN: 2250-1827 (Print) CPMR-IJT Vol. 2, No.1, June 2012
www.cpmr.org.in CPMR-IJT: International Journal of Technology 4
Figure 2:Average age of customers
In Figure 2 we see that the Professionals decide to
buy a flat at very early age as compared to Govt.
Employees or Businessmen. This points to the
aspirational and non conservative nature of the
Professionals and younger generation. They start
investing in real estate at much earlier stage than Govt.
employees.
Figure 3:Avg. annual income(in Rs.)
Average annual income, as we see in Figure 3 is
expectedtly highest among the Businesssman because
of the nature of work.
Figure 4:Avg. family size
In Figure 4 we see that the average family size for
professionals is considerably lower than the other two
categories. This is due to two prominent reasons. One
being that the professionals buying the flats are younger
then other two categories as we see in Figure 1. Secondly
young professionals are conciously choosing to have
smaller families because of the urban lifestyle constraints.
Figure 5: Avg. bedrooms in the flat purchased
In Figure 5 we see that the busineesman buy larger
flats in terms of no. of bedrooms. This is due to the fact
that they have bigger incomes.
Figure 6:Avg. Area of the flat(in sq.ft.)
Comapring Figure 5 and Figure 6 gives an important
finding. While the gap between avg. no. of bedrooms
between professionals and govt. Employees is about15% (1.87 vs 2.18), but the gap when seen from the
area point of view is only 5%(1000 sq.ft. vs 1045 sq.ft.).
This indicates that the empasis for professionals is on
bigger flat with lesser bedrooms while for govt.
employees the prefernce is more no. of bedrooms. This
can be attributed to the comparatively smaller family
size of professionals and larger family size of govt.
employees as indictaed in Figure 4.
Figure 7:Avg. cost of flat
7/31/2019 Identifying Buying Preferences of Customers...
http://slidepdf.com/reader/full/identifying-buying-preferences-of-customers 5/5
ISSN: 2277-4629 (Online) | ISSN: 2250-1827 (Print) CPMR-IJT Vol. 2, No.1, June 2012
www.cpmr.org.in CPMR-IJT: International Journal of Technology 5
Although the professionals purchase flats with lesser
no. of bedrooms(Figure 5) and with lesser area(Figure
6) as compared to govt. employees but the averge cost
of the flat for professionals is higher as compared to
govt. employees, as indicated in Figure 7. This is mainly
due to the fact that professionals have greater tendency
in choosing premium specfications and better located
projects as compared to govt. employees.
Figure 8: Percentage of people taking loan
Another important aspect of the real estate industry
is the loan facility. Most people rely on long term loans
to purchase a flat. This is confirmed in the survey where
percentage of professionals taking loan is as high as
96.6% (Figure 8). This is expected as the professionals
are younger demographic compared to other two
occupation so their need for loan and reliance on loan is
much higher as compared to other two. Also the
percentage of businessman taking loan is much lower
owing to the fact that they have higher overall incomes.
Figure 9: Avg. age of people taking loan/not taking
loan
Older people tend not to take loan as indicated by
Figure 9. This is true across all ocupation whether it is
professionals, businessman, or govt. employees. This is
mainly due to two reasons; one being that loan
companies prefer giving loans to younger people so it is
difficult for older people to get loan, secondly older
people have much more savings so they can afford to
buy the flat outright.
IV. CONCLUSION
Real Estate sector is witnessing a significant change inits customer profile from earlier times which necessitates
the change in designing of new projects. Majority of the
customers are young professionals with smaller families
who do not mind paying a bit extra for premium facilities
Rest of the market id divided between Govt. Employees
and Businessmen. Govt.Employees are conservative in
their buying prefernces whereas businessman tend to
the buy the flats outright with best availble specifications
a developer can offer. Further research in this area can
shed light on even more customer prefernces and their
realtivet importance to the customer. This may include
features like distance to school, hospital, market, railway
station etc. The relative reputaion of the developer
importance of reccomendation from a peer etc. Since
this research would involve even larger and more
complex data then what used in this paper, the data
mining techiniques are of paramount importance in
making sense of the raw data compiled.
V. REFERENCES
[1] “Data Mining” Def. www. Dictionary.Com. Date
of retrieval: 01/06/2012.
[2] Carbone, P. (August, 2000). What is the Origin
of Data Mining? www.mitre.org/pubs/edge
august_00/carbone.htm Date of retrieval: 01/
06/2012.
[3] Hardison,. (2002). Data Mining: The New Gold
Rush. Pharmaceutical Executive. March, 26
28, 30
[4] Montana, J. (2001). Data Mining: A Slippery
Slope. Information Management Journal.
October, 50-54.
[5] Brusilovskiy, P. (2007), Data Mining in
Pharmaceutical Marketing and Sales Analysis
2007 ICSA Appplied Statistics Symposium.