getting accurate result on various aspects with the help of descision making system
DESCRIPTION
Data Mining algorithms are useful for findingfrequent Items from large datasets. The algorithms can findpopular or preferable product features. The algorithmsmaximize the profitability and popularity of a product. Thealgorithms can be applied on cross-domains such as for aproduct from different vendors. The permutations ofattributes help to identify all possible ways for a profitableproduct feature. Because many customers are interested tofind the potential products ( ex:- priceline.com ). Thenecessity of dominance and skyline analysis has beendeveloped in multi-level solution creating applications.Earlier researches stare on the way to help users to get a setof “efficient” possible products from a stream of availableproducts. In this paper, we explored different crisis or aproblem, retrieving top-k preferable products, which are notexplored in previous researches. Available a set of productsin the existing system, we are in need to hunt a set of k“efficient” available products incase these new ones are notinfluenced by the products which are already present in theold market. We research couple of problems of getting top-kpreferable products. In the first crisis instance, we are insituation to set the cost of these products in that way thecomplete profit is increased. Those products which arecapable known as top-k profitable products. Approaching toanother problem or crisis, we are in need to get k products inthat way these k products are capable of attracting the hugenumber of users. Such products are referred as top-kproducts.TRANSCRIPT
International Journal of Computer Trends and Technology (IJCTT) – volume 4 Issue10 – Oct 2013
ISSN: 2231-2803 http://www.ijcttjournal.org Page3444
Getting Accurate Result On Various Aspects With The Help Of Descision Making System
Varakanth Reddy, (M.Tech) CSE Dept, CMRCET, Hyderabad
Abstract: — Data Mining algorithms are useful for finding
frequent Items from large datasets. The algorithms can find
popular or preferable product features. The algorithms
maximize the profitability and popularity of a product. The
algorithms can be applied on cross-domains such as for a
product from different vendors. The permutations of
attributes help to identify all possible ways for a profitable
product feature. Because many customers are interested to
find the potential products ( ex:- priceline.com ). The
necessity of dominance and skyline analysis has been
developed in multi-level solution creating applications.
Earlier researches stare on the way to help users to get a set
of “efficient” possible products from a stream of available
products. In this paper, we explored different crisis or a
problem, retrieving top-k preferable products, which are not
explored in previous researches. Available a set of products
in the existing system, we are in need to hunt a set of k
“efficient” available products incase these new ones are not
influenced by the products which are already present in the
old market. We research couple of problems of getting top-k
preferable products. In the first crisis instance, we are in
situation to set the cost of these products in that way the
complete profit is increased. Those products which are
capable known as top-k profitable products. Approaching to
another problem or crisis, we are in need to get k products in
that way these k products are capable of attracting the huge
number of users. Such products are referred as top-k
products. From above two problems, a direct solution is to
calculate all accessible subsets of size k and also by getting
the subset which donates the huge profit or attracts the huge
number of customers as per second crisis. Moreover, there
A. Vivekanand, Ph.D. Professor, CMRCET, Hyderabad
are a multiple number of available subsets. In this paper, we
prefer solutions to get the top-k profitable products which are
of accurate in nature and also the top-k popular products
efficiently. An extreme working research by utilizing both
synthetic and real data sets is referred to examine the accuracy
and efficiency of referred algorithms
I. INTRODUCTION
The main in lots of multicriteria decision-making apps are
dominance analysis. Taking any brand items one of the
examples. Take a reality situation if customer want to buy a
new product like mobile, he filters the brand and get top two
branded companies A and B. He checks the properties or
known as features of both products. Those features are like
battery life, camera pixel, model, technology used etc. Then
the comparison goes between top two products A and B. If any
features of p one are less than the q, then p was dominated by q
lets us take a three different brands: A; B, C and D. As per
customer view that a lower price and a long battery life are
more preferable.
Items Quality Rating Rate
A 7 3000
B 5 4000
C 3 5000
D 4 6000
TABLE 1
Here D has a longer battery life that the C but it has high prize
than the C. So was dominated by C. These both are dominated
by B because it has longer battery life than the C and D and
also it has lowest prize than the both of them. Based on
complete table most preferable and also top one is A because it
International Journal of Computer Trends and Technology (IJCTT) – volume 4 Issue10 – Oct 2013
ISSN: 2231-2803 http://www.ijcttjournal.org Page3445
has longer battery life than the others and also it has lower
price than the others. So A dominates all B, C and D.
Getting most accurate and Top-k Profitable Products
The first instance is called finding top-k profitable products.
A naive way for this instance/problem is to enumerate all
possible subsets of size k from Q, calculate the sum of the
profits of each possible subset, and choose the subset with the
greatest sum. However, this approach is not scalable because
there are an exponential number of all possible subsets. This
motivates us to propose efficient algorithms for problem
TPP.
Although how we set the price of a new package may affect
how we set the price of another new package and there are an
exponential number of possible subsets, interestingly, we
propose a dynamic programming approach which finds an
optimal solution when there are two attributes to be
considered. But, we show that this problem is NP-hard when
there are more than two attributes to be considered. Thus, we
propose two greedy algorithms for this problem. One greedy
algorithm has a theoretical guarantee on the profit returned
while the other greedy algorithm performs well empirically.
Finding top-k profitable products is common in many real-
life applications. Other applications include finding profitable
laptops in a new laptop company, finding profitable delivery
services in a new cargo delivery company and finding
profitable e-advertisements in a webpage. In some cases, data
sets are dynamic and change from time to time. In this paper,
we also study how to find top-k profitable products when
data sets change. For example, some new products are
launched in the existing market while some products which
were present in the existing market become unavailable.
Besides, the prices of existing products in the market may
change due to various reasons, such as inflation and cost
increase.
Getting Top-k Popular Products
The second instance is called finding top-k popular products.
In some cases, if we know how many customers are interested
in some potential products, we can better find potential
products. One well-known application which allows customers
to provide their preferences is “Name Your Own Price”
developed by “Priceline.com.” If customers indicate their
preferences on some hotels, “Name Your Own Price” service
will return some potential hotels to customers. Similarly, a
naive way is to enumerate all possible subsets of size k from
Q, calculate the total number of customers interested in some
packages in this subset, and choose the subset with the greatest
number of customers. But, it is not scalable. We show that this
problem is NP-hard. But, interestingly, we propose a 0.63-
approximate algorithm which runs in polynomial time. Our
contributions are summarized as follows: first, to the best of
our knowledge, we are the first to study how to find top-k
preferable products. Finding top-k preferable products can help
the effort of companies to find a subset of products together
with their corresponding profitable prices, which cannot be
addressed by existing methods. Second, for the first problem of
finding top-k profitable products, we propose a dynamic
programming approach which can find an optimal solution
when there are two attributes to be considered. We show that
this problem is NP-hard when there are more than two
attributes. Thus, we propose two greedy approaches to solve
the problem efficiently. Third, we also propose an incremental
approach for the first problem when data sets change over
time. Fourth, we show that the second problem of finding top-k
popular products is NP-hard and propose a 0.63-approximate
algorithm for this problem. Fifth, we present a systematic
performance study using both real and synthetic data sets to
verify the effectiveness and the efficiency of our proposed
approaches. The experimental results show that finding top-k
profitable products and top-k popular products is interesting.
II. PROBLEM STATEMENT
Finding popular product. Finding profitable product from large
dataset. Existing systems cannot find profitable product. The
International Journal of Computer Trends and Technology (IJCTT) – volume 4 Issue10 – Oct 2013
ISSN: 2231-2803 http://www.ijcttjournal.org Page3446
existing approaches cannot be applied on dynamic datasets.
Takes much execution time. Takes much Programming time.
Takes much high utility of memory. Cannot find profitable.
III IMPLEMENTATION
Step1:
The product dataset will be formed based upon the criteria of
existing products which are already present in the market by
different vendors. Let us take an example of cars. For that
research purpose on car products, the existing cars, features,
cost etc. has to be gathered from different vendors
Step2:
Frequent feature set identification on the dataset of popular
products. The high utility mining refers to the feature set
extraction, by satisfying certain conditions. The condition in
the work is high profitability. Considering the high
profitability, identify the feature set that occurred more
frequently by the other vendors.
Step3:
The Frequency set provides various levels of sets.
Ex: level1: with single item
level2: with two combinations of items
level3: with three combinations of items
Have to build the least sets to build a product.
Step4:
Setting the optimal value to dominate existing market, by its
price, with Dynamic programming. The outcome of the
Dynamic programming is the set of size‘t’ which provides
greater profit.
Step5:
For the price Correlation among the optimal feature
set‘t’ and the existing feature and their costs. Here NP-Hard
with Greedy Approach is applied.
Algorithm:
With Algorithm 1, intra-dominance checking steps are
removed if the scenario has the at-most-one merging attribute
characteristic. Thus, |TQ|2 checks are avoided. Since we
focus on processing T0 Q instead of TQ (where T0 Q _ TQ),
the total number of interdominance checking steps is reduced
from |TE| × |TQ| to |TE| × |T0 Q|.
The total number of checks in this algorithm is utmost |TE|×
|T0 Q|. Similarly, we can find T0 E = SKY (TE) so that the
number can be reduced to |T0 E| × |T0 Q|. In the following,
when we write TE, we mean T0 E. Although Algorithm 1
helps us to derive an efficient algorithm, a naive
implementation still materializes all possible products
generated from T0 1, T0 2, ..., T 0 k and obtains a set T0 Q,
which is computationally expensive. As we described before, if
is the size of each table T0 i , the total number of tuples in T0
Q is k. In the following, we propose techniques to avoid
materializing T0 Q.
IV. RELATED WORK
In researches, Techniques that decompose the reviews into
segments that evaluate the individual characteristics of a
product. Hybrid technique combining text mining and
econometrics that models consumer product reviews as
elements in a tensor product of feature and evaluation spaces.
It allows an economic-aware analysis of the consumer reviews,
identifying the weight that customers place on individual
product features. Demonstrated the value of using economic
data and econometric modeling for a quantitative interpretation
of consumer reviews on the Internet
Uses the hedonic regression concept, to estimate:
The weight that customers place on each individual product
feature. The implicit evaluation score that customers assign to
each feature. How these evaluations affect the revenue for a
given product.
International Journal of Computer Trends and Technology (IJCTT) – volume 4 Issue10 – Oct 2013
ISSN: 2231-2803 http://www.ijcttjournal.org Page3447
Our paper adds to a growing literature on sentiment analysis.
Similar to almost any other sentiment mining technique, the
first step involves selecting the set of features to use. Our
approach to feature selection is very close to the one
presented by Hu and Liu. Hu and Liu used a POS-tagger and
association miner to find frequent item-sets, which are then
treated as candidate features. For each instance of a candidate
feature in a sentence, the nearby adjective (if there is such)
was treated as the effective opinion. Note that effective
opinions used by Hu and Liu are direct counterparts of
evaluation words in our study. A major difference of our
approach is that whereas Hu and Liu were interested in the
effectiveness of feature extraction and high recall values, we
are concerned about gathering a small set of major features
and evaluation words. In order to ensure that gathered
features reflect hedonic characteristics of the products, we
performed manual post-processing of frequent itemsets.
Contrary to Hu and Liu, who performed additional research
to identify infrequent features, we intentionally discarded
such features to obtain robust estimates of model coefficients.
It should also be noted that while Hu and Liu used Word-Net
to identify polarity of opinion words, our model evaluates
polarity and strength of each evaluation directly from the
regression on product demand. Our research was inspired by
previous studies about opinion strength analysis. Popescu and
Etzioni presented OPINE, an unsupervised information
extraction system capable of identifying product features,
identifying user opinions regarding product features,
determining polarity of the opinions and ranking the opinions
based on their strength. Unfortunately, so far, no paper was
published explaining how OPINE solves the last task
(opinion ranking). Evaluating strength of a sentiment
quantitatively is even more challenging task than simple
ranking and there were just a few papers on the topic
published thus far. Wilson, Wiebe and Hwa presented a
supervised learning approach able to distinguish between
weak, medium and strong subjectivity of the opinion.
Unfortunately, this technique requires a manually annotated
corpus of opinions. Human annotators might not able to
distinguish strength of opinions on a finer-grained scale or
estimate economic impact of each particular opinion. In a close
stream of research, Pang and Lee [20] studied the rating-
inference problem. They augmented SVM classifier with a
metric labeling concept and applied the altered classifier to
infer the author's numerical rating for Rotten Tomatoes movie
reviews.
Experimental Result for Finding Top-k Profitable Products
i) Result over Synthetic Data Sets
In this section, we conducted experiments over both small and
large synthetic data sets to study the scalability of GR1 and
GR2. We varied jpg; jQj; d; l; k; _, and h in our experiments.
The values of each parameter used over large synthetic data
sets are given in Table 4 where the default values are in bold.
For the sake of space, we show the results when we varied jQj
only in Fig. 1. Other experimental results over small synthetic
data sets and large synthetic data sets can be found in [20].
Execution time. Fig. 1a shows the effect on the execution
times of all algorithms. GR2 runs slower than GR1. As we
discussed, the time complexity of GR2 is higher than that of
GR1.
Total time. Fig. 1b shows the effect on the total time of each
greedy algorithm which corresponds to the sum of the
execution time of the greedy algorithm and the execution time
of the preprocessing time of the greedy algorithm. We denote
the total times of GR1 and GR2 by “GR1) PREP” and “GR2)
PREP,” respectively. It is clear that when jQj increases, the
total times of the algorithms increase.
International Journal of Computer Trends and Technology (IJCTT) – volume 4 Issue10 – Oct 2013
ISSN: 2231-2803 http://www.ijcttjournal.org Page3448
Memory cost. Fig. 1c shows the effect on the memory cost of
the algorithms. Since the memory cost of both GR1 and GR2
is the memory occupied by the R_-tree on data sets P [ Q,
when jQj increases, the memory cost increases, as shown in
Fig. 1. Profit. Fig. 1d shows the effect on the profit returned
by the algorithms. In most cases, GR1 and GR2 give similar
profits.
ii) Result over Real Data Sets
We also conducted experiments on real data sets. We varied
four factors, namely h; k; d, and _. For the sake of space, we
only show the results with two factors h and k as shown in
Figs. 3 and 4, respectively. The default setting configuration
is: k ¼ 150, h ¼ 20, d ¼ 0:6, and _ ¼ 50. The results for real
data sets are similar to those for synthetic data sets. Note that
there is a big difference in execution times between GR1 and
GR2 in the real data sets but this big difference cannot be
found in the synthetic data sets. As we described the time
complexity of GR2 is quadratic with respect to k but the time
complexity of GR1 is not. Thus, when k is larger, then the
difference in execution times between GR1 and GR2 is
larger. Compared with the synthetic data sets where k is set
to 10, 20, 50, or 100, in the real data sets, since k is set to a
larger value (e.g., 100, 150, 200, and 250), the difference in
execution times between GR1 and GR2 is larger. In order to
see whether the prices of the top-k profitable packages we
found are consistent with the original prices of the packages
listed in Priceline.com in the market, we conducted
experiments by regarding some of the existing products in
Pricline.com as new packages in our problem and comparing
the prices of the top-k profitable packages we found with
their original prices listed in Priceline.com. We conducted
experiments over 100 trials. For each trial, we randomly select
30 percent of the 149 existing packages in Priceline.com to
form a new package set Q. In the experiment, we set k ¼ 5, h
¼ 0, d ¼ 0:5, and _ ¼ 50. The experimental results show that
the price of a package we
IV. CONCLUSION
In this paper, we find and cracked the problem of getting top-k
preferable products, which are not gone through previously.
We researched couple of instances which are of preferable
products. Those are referred as profitable products and another
one is popular products. We explores methods to tackle top-k
profitable products and also top-k popular products perfectly.
Research of an extensive performance by using both synthetic
and real data sets results to check its accuracy and efficiency.
For further researches, we will go through other aspects of the
crisis of finding top-k preferable products by maintaining the
utility function to other relative objective functions. This
complete description explains single promising utility function
is the function which gives back the sum of the unit profits of
the choosed products multiplied by the multiple number of
users preferred in these products.
REFERENCES
[2] S. Borzsonyi, D. Kossmann, and K. Stocker, “The Skyline
Operator,” Proc. Int’l Conf. Data Eng. (ICDE), 2001.
[3] J.L. Bently, H.T. Kung, M. Schkolnick, and C.D.
Thompson, “On the Average Number of Maxima in a Set of
Vectors and Applications,” J. ACM, vol. 25, no. 4, pp. 536-
543, 1978.
[4] J.L. Bently, K.L. Clarkson, and D.B. Levine, “Fast Linear
Expected-Time Algorithms for Computing Maxima and
Convex
Hulls,” Proc. First Ann. ACM-SIAM Symp. Discrete
Algorithms (SODA), 1990.
[5] O. Barndorff-Nielson and M. Sobel, “On the Distribution
of the Number of Admissible Points in a Vector Random
Sample,”
International Journal of Computer Trends and Technology (IJCTT) – volume 4 Issue10 – Oct 2013
ISSN: 2231-2803 http://www.ijcttjournal.org Page3449
Theory of Probability and Its Application, vol. 11, no. 2, pp.
249-269, 1966.
[6] D.S. Hockhbaum, “Approximating Covering and Packing
Problems: Set Cover, Vertex Cover, Independent Set, and
Related Problems” Approximation Algorithms for NP-Hard
Problems, PWS Publishing Company, 1997.
[7] B. Jiang, J. Pei, X. Lin, D.W.-L. Cheung, and J. Han,
“Mining Preferences from Superior and Inferior Examples,”
Proc. ACM SIGKDD Int’l Conf. Knowledge Discovery and
Data Mining, 2008.
[8] J.M. Kang, M.F. Mokbel, S. Shekhar, T. Xia, and D.
Zhang, “Continuous Evaluation of Monochromatic and
Bichromatic Reverse Nearest Neighbors,” Proc. Int’l Conf.
Data Eng. (ICDE), 2007.
[9] F. Korn and S. Muthukrishnan, “Influence Sets Based on
Reverse Nearest Neighbor Queries,” Proc. ACM SIGMOD
Int’l Conf. Management of Data, 2000.
[10] D. Kossmann, F. Ramsak, and S. Rost, “Shooting Stars
in the Sky: An Online Algorithm for Skyline Queries,” Proc.
28th Int’l Conf. Very Large Data Bases (VLDB), 2002.
[11] B. Li, A. Ghose, and P.G. Ipeirotis, “Towards a Theory
Model for Product Search,” Proc. 20th Int’l Conf. World
Wide Web (WWW ’11), pp. 327-336, 2011.
[12] X. Lin, Y. Yuan, Q. Zhang, and Y. Zhang, “Selecting
Stars: The k Most Representative Skyline Operator,” Proc.
Int’l Conf. Data Eng. (ICDE), 2007.
[13] D. Papadias, Y. Tao, G. Fu, and B. Seeger, “Progressive
Skyline Computation in Database Systems,” ACM Trans.
Database Systems, vol. 30, no. 1, pp. 41-82, 2005.
[14] D. Sacharidis, S. Papadopoulos, and D. Papadias,
“Topologically-Sorted Skylines for Partially-Ordered
Domains,” Proc. Int’l Conf. Data Eng. (ICDE), 2009.
First Author: S Varakanth Reddy received my B.Tech
Degree in Computer Science and Engineering from PRRM
Engineering College (JNTUH) in the year of 2011. He is
currently M.Tech student in the Computer Science Engineering
from Jawaharlal Nehru Technological University (JNTUH),
Hyderabad. And he is interested in the field of Data Mining
Second Author: A.Vivekanand working as an Associate
Professor CMR College of Engineering & Technology
,Hyderabad. Having 15 years of Teaching Experience.
Published International/National Research Papers: 5. Life-
Time member : ISTE, Guided many B.Tech and M.Tech
students.