getting accurate result on various aspects with the help of descision making system

International Journal of Computer Trends and Technology (IJCTT) – volume 4 Issue10 – Oct 2013

ISSN: 2231-2803 http://www.ijcttjournal.org Page3444

Getting Accurate Result On Various Aspects With The Help Of Descision Making System

Varakanth Reddy, (M.Tech) CSE Dept, CMRCET, Hyderabad

Abstract: — Data Mining algorithms are useful for finding

frequent Items from large datasets. The algorithms can find

popular or preferable product features. The algorithms

maximize the profitability and popularity of a product. The

algorithms can be applied on cross-domains such as for a

product from different vendors. The permutations of

attributes help to identify all possible ways for a profitable

product feature. Because many customers are interested to

find the potential products ( ex:- priceline.com ). The

necessity of dominance and skyline analysis has been

developed in multi-level solution creating applications.

Earlier researches stare on the way to help users to get a set

of “efficient” possible products from a stream of available

products. In this paper, we explored different crisis or a

problem, retrieving top-k preferable products, which are not

explored in previous researches. Available a set of products

in the existing system, we are in need to hunt a set of k

“efficient” available products incase these new ones are not

influenced by the products which are already present in the

old market. We research couple of problems of getting top-k

preferable products. In the first crisis instance, we are in

situation to set the cost of these products in that way the

complete profit is increased. Those products which are

capable known as top-k profitable products. Approaching to

another problem or crisis, we are in need to get k products in

that way these k products are capable of attracting the huge

number of users. Such products are referred as top-k

products. From above two problems, a direct solution is to

calculate all accessible subsets of size k and also by getting

the subset which donates the huge profit or attracts the huge

number of customers as per second crisis. Moreover, there

A. Vivekanand, Ph.D. Professor, CMRCET, Hyderabad

are a multiple number of available subsets. In this paper, we

prefer solutions to get the top-k profitable products which are

of accurate in nature and also the top-k popular products

efficiently. An extreme working research by utilizing both

synthetic and real data sets is referred to examine the accuracy

and efficiency of referred algorithms

I. INTRODUCTION

The main in lots of multicriteria decision-making apps are

dominance analysis. Taking any brand items one of the

examples. Take a reality situation if customer want to buy a

new product like mobile, he filters the brand and get top two

branded companies A and B. He checks the properties or

known as features of both products. Those features are like

battery life, camera pixel, model, technology used etc. Then

the comparison goes between top two products A and B. If any

features of p one are less than the q, then p was dominated by q

lets us take a three different brands: A; B, C and D. As per

customer view that a lower price and a long battery life are

more preferable.

Items Quality Rating Rate

A 7 3000

B 5 4000

C 3 5000

D 4 6000

TABLE 1

Here D has a longer battery life that the C but it has high prize

than the C. So was dominated by C. These both are dominated

by B because it has longer battery life than the C and D and

also it has lowest prize than the both of them. Based on

complete table most preferable and also top one is A because it



has longer battery life than the others and also it has lower

price than the others. So A dominates all B, C and D.

Getting most accurate and Top-k Profitable Products

The first instance is called finding top-k profitable products.

A naive way for this instance/problem is to enumerate all

possible subsets of size k from Q, calculate the sum of the

profits of each possible subset, and choose the subset with the

greatest sum. However, this approach is not scalable because

there are an exponential number of all possible subsets. This

motivates us to propose efficient algorithms for problem

TPP.

Although how we set the price of a new package may affect

how we set the price of another new package and there are an

exponential number of possible subsets, interestingly, we

propose a dynamic programming approach which finds an

optimal solution when there are two attributes to be

considered. But, we show that this problem is NP-hard when

there are more than two attributes to be considered. Thus, we

propose two greedy algorithms for this problem. One greedy

algorithm has a theoretical guarantee on the profit returned

while the other greedy algorithm performs well empirically.

Finding top-k profitable products is common in many real-

life applications. Other applications include finding profitable

laptops in a new laptop company, finding profitable delivery

services in a new cargo delivery company and finding

profitable e-advertisements in a webpage. In some cases, data

sets are dynamic and change from time to time. In this paper,

we also study how to find top-k profitable products when

data sets change. For example, some new products are

launched in the existing market while some products which

were present in the existing market become unavailable.

Besides, the prices of existing products in the market may

change due to various reasons, such as inflation and cost

increase.

Getting Top-k Popular Products

The second instance is called finding top-k popular products.

In some cases, if we know how many customers are interested

in some potential products, we can better find potential

products. One well-known application which allows customers

to provide their preferences is “Name Your Own Price”

developed by “Priceline.com.” If customers indicate their

preferences on some hotels, “Name Your Own Price” service

will return some potential hotels to customers. Similarly, a

naive way is to enumerate all possible subsets of size k from

Q, calculate the total number of customers interested in some

packages in this subset, and choose the subset with the greatest

number of customers. But, it is not scalable. We show that this

problem is NP-hard. But, interestingly, we propose a 0.63-

approximate algorithm which runs in polynomial time. Our

contributions are summarized as follows: first, to the best of

our knowledge, we are the first to study how to find top-k

preferable products. Finding top-k preferable products can help

the effort of companies to find a subset of products together

with their corresponding profitable prices, which cannot be

addressed by existing methods. Second, for the first problem of

finding top-k profitable products, we propose a dynamic

programming approach which can find an optimal solution

when there are two attributes to be considered. We show that

this problem is NP-hard when there are more than two

attributes. Thus, we propose two greedy approaches to solve

the problem efficiently. Third, we also propose an incremental

approach for the first problem when data sets change over

time. Fourth, we show that the second problem of finding top-k

popular products is NP-hard and propose a 0.63-approximate

algorithm for this problem. Fifth, we present a systematic

performance study using both real and synthetic data sets to

verify the effectiveness and the efficiency of our proposed

approaches. The experimental results show that finding top-k

profitable products and top-k popular products is interesting.

II. PROBLEM STATEMENT

Finding popular product. Finding profitable product from large

dataset. Existing systems cannot find profitable product. The



existing approaches cannot be applied on dynamic datasets.

Takes much execution time. Takes much Programming time.

Takes much high utility of memory. Cannot find profitable.

III IMPLEMENTATION

Step1:

The product dataset will be formed based upon the criteria of

existing products which are already present in the market by

different vendors. Let us take an example of cars. For that

research purpose on car products, the existing cars, features,

cost etc. has to be gathered from different vendors

Step2:

Frequent feature set identification on the dataset of popular

products. The high utility mining refers to the feature set

extraction, by satisfying certain conditions. The condition in

the work is high profitability. Considering the high

profitability, identify the feature set that occurred more

frequently by the other vendors.

Step3:

The Frequency set provides various levels of sets.

Ex: level1: with single item

level2: with two combinations of items

level3: with three combinations of items

Have to build the least sets to build a product.

Step4:

Setting the optimal value to dominate existing market, by its

price, with Dynamic programming. The outcome of the

Dynamic programming is the set of size‘t’ which provides

greater profit.

Step5:

For the price Correlation among the optimal feature

set‘t’ and the existing feature and their costs. Here NP-Hard

with Greedy Approach is applied.

Algorithm:

With Algorithm 1, intra-dominance checking steps are

removed if the scenario has the at-most-one merging attribute

characteristic. Thus, |TQ|2 checks are avoided. Since we

focus on processing T0 Q instead of TQ (where T0 Q _ TQ),

the total number of interdominance checking steps is reduced

from |TE| × |TQ| to |TE| × |T0 Q|.

The total number of checks in this algorithm is utmost |TE|×

|T0 Q|. Similarly, we can find T0 E = SKY (TE) so that the

number can be reduced to |T0 E| × |T0 Q|. In the following,

when we write TE, we mean T0 E. Although Algorithm 1

helps us to derive an efficient algorithm, a naive

implementation still materializes all possible products

generated from T0 1, T0 2, ..., T 0 k and obtains a set T0 Q,

which is computationally expensive. As we described before, if

is the size of each table T0 i , the total number of tuples in T0

Q is k. In the following, we propose techniques to avoid

materializing T0 Q.

IV. RELATED WORK

In researches, Techniques that decompose the reviews into

segments that evaluate the individual characteristics of a

product. Hybrid technique combining text mining and

econometrics that models consumer product reviews as

elements in a tensor product of feature and evaluation spaces.

It allows an economic-aware analysis of the consumer reviews,

identifying the weight that customers place on individual

product features. Demonstrated the value of using economic

data and econometric modeling for a quantitative interpretation

of consumer reviews on the Internet

Uses the hedonic regression concept, to estimate:

The weight that customers place on each individual product

feature. The implicit evaluation score that customers assign to

each feature. How these evaluations affect the revenue for a

given product.



Our paper adds to a growing literature on sentiment analysis.

Similar to almost any other sentiment mining technique, the

first step involves selecting the set of features to use. Our

approach to feature selection is very close to the one

presented by Hu and Liu. Hu and Liu used a POS-tagger and

association miner to find frequent item-sets, which are then

treated as candidate features. For each instance of a candidate

feature in a sentence, the nearby adjective (if there is such)

was treated as the effective opinion. Note that effective

opinions used by Hu and Liu are direct counterparts of

evaluation words in our study. A major difference of our

approach is that whereas Hu and Liu were interested in the

effectiveness of feature extraction and high recall values, we

are concerned about gathering a small set of major features

and evaluation words. In order to ensure that gathered

features reflect hedonic characteristics of the products, we

performed manual post-processing of frequent itemsets.

Contrary to Hu and Liu, who performed additional research

to identify infrequent features, we intentionally discarded

such features to obtain robust estimates of model coefficients.

It should also be noted that while Hu and Liu used Word-Net

to identify polarity of opinion words, our model evaluates

polarity and strength of each evaluation directly from the

regression on product demand. Our research was inspired by

previous studies about opinion strength analysis. Popescu and

Etzioni presented OPINE, an unsupervised information

extraction system capable of identifying product features,

identifying user opinions regarding product features,

determining polarity of the opinions and ranking the opinions

based on their strength. Unfortunately, so far, no paper was

published explaining how OPINE solves the last task

(opinion ranking). Evaluating strength of a sentiment

quantitatively is even more challenging task than simple

ranking and there were just a few papers on the topic

published thus far. Wilson, Wiebe and Hwa presented a

supervised learning approach able to distinguish between

weak, medium and strong subjectivity of the opinion.

Unfortunately, this technique requires a manually annotated

corpus of opinions. Human annotators might not able to

distinguish strength of opinions on a finer-grained scale or

estimate economic impact of each particular opinion. In a close

stream of research, Pang and Lee [20] studied the rating-

inference problem. They augmented SVM classifier with a

metric labeling concept and applied the altered classifier to

infer the author's numerical rating for Rotten Tomatoes movie

reviews.

Experimental Result for Finding Top-k Profitable Products

i) Result over Synthetic Data Sets

In this section, we conducted experiments over both small and

large synthetic data sets to study the scalability of GR1 and

GR2. We varied jpg; jQj; d; l; k; _, and h in our experiments.

The values of each parameter used over large synthetic data

sets are given in Table 4 where the default values are in bold.

For the sake of space, we show the results when we varied jQj

only in Fig. 1. Other experimental results over small synthetic

data sets and large synthetic data sets can be found in [20].

Execution time. Fig. 1a shows the effect on the execution

times of all algorithms. GR2 runs slower than GR1. As we

discussed, the time complexity of GR2 is higher than that of

GR1.

Total time. Fig. 1b shows the effect on the total time of each

greedy algorithm which corresponds to the sum of the

execution time of the greedy algorithm and the execution time

of the preprocessing time of the greedy algorithm. We denote

the total times of GR1 and GR2 by “GR1) PREP” and “GR2)

PREP,” respectively. It is clear that when jQj increases, the

total times of the algorithms increase.



Memory cost. Fig. 1c shows the effect on the memory cost of

the algorithms. Since the memory cost of both GR1 and GR2

is the memory occupied by the R_-tree on data sets P [ Q,

when jQj increases, the memory cost increases, as shown in

Fig. 1. Profit. Fig. 1d shows the effect on the profit returned

by the algorithms. In most cases, GR1 and GR2 give similar

profits.

ii) Result over Real Data Sets

We also conducted experiments on real data sets. We varied

four factors, namely h; k; d, and _. For the sake of space, we

only show the results with two factors h and k as shown in

Figs. 3 and 4, respectively. The default setting configuration

is: k ¼ 150, h ¼ 20, d ¼ 0:6, and _ ¼ 50. The results for real

data sets are similar to those for synthetic data sets. Note that

there is a big difference in execution times between GR1 and

GR2 in the real data sets but this big difference cannot be

found in the synthetic data sets. As we described the time

complexity of GR2 is quadratic with respect to k but the time

complexity of GR1 is not. Thus, when k is larger, then the

difference in execution times between GR1 and GR2 is

larger. Compared with the synthetic data sets where k is set

to 10, 20, 50, or 100, in the real data sets, since k is set to a

larger value (e.g., 100, 150, 200, and 250), the difference in

execution times between GR1 and GR2 is larger. In order to

see whether the prices of the top-k profitable packages we

found are consistent with the original prices of the packages

listed in Priceline.com in the market, we conducted

experiments by regarding some of the existing products in

Pricline.com as new packages in our problem and comparing

the prices of the top-k profitable packages we found with

their original prices listed in Priceline.com. We conducted

experiments over 100 trials. For each trial, we randomly select

30 percent of the 149 existing packages in Priceline.com to

form a new package set Q. In the experiment, we set k ¼ 5, h

¼ 0, d ¼ 0:5, and _ ¼ 50. The experimental results show that

the price of a package we

IV. CONCLUSION

In this paper, we find and cracked the problem of getting top-k

preferable products, which are not gone through previously.

We researched couple of instances which are of preferable

products. Those are referred as profitable products and another

one is popular products. We explores methods to tackle top-k

profitable products and also top-k popular products perfectly.

Research of an extensive performance by using both synthetic

and real data sets results to check its accuracy and efficiency.

For further researches, we will go through other aspects of the

crisis of finding top-k preferable products by maintaining the

utility function to other relative objective functions. This

complete description explains single promising utility function

is the function which gives back the sum of the unit profits of

the choosed products multiplied by the multiple number of

users preferred in these products.

REFERENCES

[2] S. Borzsonyi, D. Kossmann, and K. Stocker, “The Skyline

Operator,” Proc. Int’l Conf. Data Eng. (ICDE), 2001.

[3] J.L. Bently, H.T. Kung, M. Schkolnick, and C.D.

Thompson, “On the Average Number of Maxima in a Set of

Vectors and Applications,” J. ACM, vol. 25, no. 4, pp. 536-

543, 1978.

[4] J.L. Bently, K.L. Clarkson, and D.B. Levine, “Fast Linear

Expected-Time Algorithms for Computing Maxima and

Convex

Hulls,” Proc. First Ann. ACM-SIAM Symp. Discrete

Algorithms (SODA), 1990.

[5] O. Barndorff-Nielson and M. Sobel, “On the Distribution

of the Number of Admissible Points in a Vector Random

Sample,”



Theory of Probability and Its Application, vol. 11, no. 2, pp.

249-269, 1966.

[6] D.S. Hockhbaum, “Approximating Covering and Packing

Problems: Set Cover, Vertex Cover, Independent Set, and

Related Problems” Approximation Algorithms for NP-Hard

Problems, PWS Publishing Company, 1997.

[7] B. Jiang, J. Pei, X. Lin, D.W.-L. Cheung, and J. Han,

“Mining Preferences from Superior and Inferior Examples,”

Proc. ACM SIGKDD Int’l Conf. Knowledge Discovery and

Data Mining, 2008.

[8] J.M. Kang, M.F. Mokbel, S. Shekhar, T. Xia, and D.

Zhang, “Continuous Evaluation of Monochromatic and

Bichromatic Reverse Nearest Neighbors,” Proc. Int’l Conf.

Data Eng. (ICDE), 2007.

[9] F. Korn and S. Muthukrishnan, “Influence Sets Based on

Reverse Nearest Neighbor Queries,” Proc. ACM SIGMOD

Int’l Conf. Management of Data, 2000.

[10] D. Kossmann, F. Ramsak, and S. Rost, “Shooting Stars

in the Sky: An Online Algorithm for Skyline Queries,” Proc.

28th Int’l Conf. Very Large Data Bases (VLDB), 2002.

[11] B. Li, A. Ghose, and P.G. Ipeirotis, “Towards a Theory

Model for Product Search,” Proc. 20th Int’l Conf. World

Wide Web (WWW ’11), pp. 327-336, 2011.

[12] X. Lin, Y. Yuan, Q. Zhang, and Y. Zhang, “Selecting

Stars: The k Most Representative Skyline Operator,” Proc.

Int’l Conf. Data Eng. (ICDE), 2007.

[13] D. Papadias, Y. Tao, G. Fu, and B. Seeger, “Progressive

Skyline Computation in Database Systems,” ACM Trans.

Database Systems, vol. 30, no. 1, pp. 41-82, 2005.

[14] D. Sacharidis, S. Papadopoulos, and D. Papadias,

“Topologically-Sorted Skylines for Partially-Ordered

Domains,” Proc. Int’l Conf. Data Eng. (ICDE), 2009.

First Author: S Varakanth Reddy received my B.Tech

Degree in Computer Science and Engineering from PRRM

Engineering College (JNTUH) in the year of 2011. He is

currently M.Tech student in the Computer Science Engineering

from Jawaharlal Nehru Technological University (JNTUH),

Hyderabad. And he is interested in the field of Data Mining

Second Author: A.Vivekanand working as an Associate

Professor CMR College of Engineering & Technology

,Hyderabad. Having 15 years of Teaching Experience.

Published International/National Research Papers: 5. Life-

Time member : ISTE, Guided many B.Tech and M.Tech

students.

getting accurate result on various aspects with the help of descision making system

Documents