starring role of data mining in cloud computing paradigm international journal of scientific...

4

Click here to load reader

Upload: buithu

Post on 24-Jul-2018

212 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Starring role of Data Mining in Cloud Computing Paradigm International Journal of Scientific Research Engineering & Technology (IJSRET), ISSN 2278 – 0882 984 Volume 4, Issue 9, September

www.ijsret.org

982 International Journal of Scientific Research Engineering & Technology (IJSRET), ISSN 2278 – 0882

Volume 4, Issue 9, September 2015

Starring role of Data Mining in Cloud Computing Paradigm

C.Edward Jaya Singh

Assistant Professor, Nesamony Memorial Christian College, Marthandam, Tamilnadu, 629161, India

Dr.E.Baburaj

Professor, Narayanaguru Siddhartha College of Engg. & Technology, Padanthalumoodu, Tamilnadu, India

Abstract

The ideal support of Information and communication

technology leads to the enlargement of Big Data

processing mechanisms like Data Mining. It is an

exercise of extracting concealed as well as valuable

information from raw Data. Today, with the rapid

growth of the Information Technology the size of the

data has been increased from KB level PB level. The

objective of data mining process is also additional and

more problematical, so the data mining algorithms are

needed to be more competent. Cloud computing

paradigm can provide the infrastructure to gigantic and

multifaceted data of data mining, as well as innovative

challenging issues for data mining. The cloud computing

researches are materialized. This Script deals with the

study of how data mining key features are used in cloud

computing and also converse about the basic concept of

cloud computing services and the role of data mining

algorithms for the effectiveness and sketches out how

data mining is recycled in cloud computing paradigm.

1. Introduction

With the rapid growth of processing and storage

technologies and the accomplishment of the Internet,

computing resources have become cost effective, more

authoritative and more collectively available than ever

before. This technological propensity has enabled the

awareness of a new and innovative computing model

called Cloud Computing. NIST definition of cloud

computing: Cloud computing is a model for enabling

convenient, on-demand network access to a shared pool

of configurable computing resources (e.g. networks,

servers, storage, applications, and services) that can be

rapidly provisioned and released with minimal

management effort or service provider interaction.

Today, the cloud computing plays a vital role and

undertaking broad changes in the way IT services are

designed, delivered, consumed, and managed. The boom

in cloud computing over the past few years has led to a

situation that is common to many innovations and new

technologies: “Cloud computing” was coined for what

happens when applications and facilities are moved into

the “cloud” World. Popularity of cloud computing is

increasing day by day in distributed computing

environment. There is a growing trend of using cloud

environments for storage and data processing needs. To

use the full potential of cloud computing, data is

transferred, processed, retrieved and stored by external

cloud providers. Data owners are very skeptical to place

their data outside their own control sphere. Their main

concerns are the confidentiality, integrity, security and

methods of mining the data from the cloud.

2. Data Mining Services

Data Mining refers to extracting or “Mining” knowledge

from huge volumes of scattered data. It is a dynamic

process where intelligent methods are applied in order to

extract Data Patterns. Many other terms carry a similar

or slightly different meaning to data mining, such as

knowledge mining from data, knowledge extraction,

data/pattern analysis, data archaeology, and data

dredging. The KDD as a process consists of an iterative

sequence of the following steps:

Data cleaning (to remove noise and inconsistent

data)

Data integration (where multiple data sources

may be combined)1

Data selection (where data relevant to the

analysis task are retrieved from the database)

Data transformation (where data are transformed

or consolidated into forms appropriate

for mining by performing summary or

aggregation operations, for instance)

Data mining (intelligent methods are applied in

order to extract data patterns)

Pattern evaluation (to identify the truly

interesting patterns representing knowledge)

Page 2: Starring role of Data Mining in Cloud Computing Paradigm International Journal of Scientific Research Engineering & Technology (IJSRET), ISSN 2278 – 0882 984 Volume 4, Issue 9, September

www.ijsret.org

983 International Journal of Scientific Research Engineering & Technology (IJSRET), ISSN 2278 – 0882

Volume 4, Issue 9, September 2015

CLOUD

NAME

KEY FEATURES

Clustering

Useful for exploring data and finding natural groupings. Members of a cluster are more

like each other than they are like members of a different cluster. Common examples

include finding new customer segments and life sciences discovery.

Classification Most commonly used technique for predicting a specific outcome such as response / no-

response, high / medium / low value customer, likely to buy / not buy.

Association

Find the rules associated with frequently co-occurring items, used for market basket

analysis. It can be used to determine the level of generalization and ensure that a pattern

covers a sufficient number of cases.

Regression Technique for predicting a continuous numerical outcome such a customer lifetime value,

house value, process yield rates.

Attribute

Importance

Ranks attributes according to strength of relationship with target attribute. Use cases

include finding factors most associated with customers who respond to an offer, factors

most associated with healthy patients.

Anomaly

Detection

Identifies unusual or suspicious cases based on deviation from the norm. Common

examples include health care fraud, expense report fraud, and tax compliance.

Feature

Extraction

Produces new attributes as linear combination of existing attributes. Applicable for text

data, latent semantic analysis, data compression, data decomposition and projection, and

pattern recognition.

Table.1.Cloud Name and its Key Features

3. Few Aspects Regarding Data Mining

Data mining characterizes finding useful hidden patterns

or trends through bulky amounts of data. Data mining is

defined as a type of distributed, relational or object

oriented database analysis that attempts to discover

useful patterns or relationships in a group of data.Data

mining in Cloud Computing: Data mining techniques

and applications are very much required in the cloud

computing paradigm. The data mining role in Cloud

Computing allows the organizations are to centralize the

management of software and data storage, with

declaration of resourceful, reliable and more protected.

4. Primary Cloud Model

A layered model of cloud computing environment

can be divided into 4 layers such as Hardware or

Datacenter layer, Infrastructure layer, Platform layer and

Application layer.

Fig. 4.1 Layered Cloud computing architecture

There are different types of clouds, each with its own

benefits and drawbacks. Public clouds: A cloud in which

service providers offer their resources as services to the

general public. It offers several key benefits to service

providers, including no initial capital investment on

infrastructure and shifting of risks to infrastructure

providers. However, public clouds lack fine-grained

control over data, network and security settings, which

hampers their effectiveness in many business scenarios.

Private clouds: Private clouds are designed for exclusive

use by a single organization. A private cloud offers the

highest degree of control over performance, reliability

and security. Hybrid clouds: A hybrid cloud is a

combination of public and private cloud models that tries

to address the limitations of each approach. In a hybrid

cloud, part of the service infrastructure runs in private

clouds while the remaining part runs in public clouds.

Hybrid clouds offer more flexibility than both public and

private clouds. Virtual Private Cloud: An alternative

solution to addressing the limitations of both public and

private clouds is called Virtual Private Cloud. A VPC is

essentially a platform running on top of public clouds.

The main difference is that a VPC leverages virtual

private network (VPN) technology that allows service

providers to design their own topology and security

settings such as firewall rules. VPC provides seamless

transition from a proprietary service infrastructure to a

cloud-based infrastructure, owing to the virtualized

network layer.

Page 3: Starring role of Data Mining in Cloud Computing Paradigm International Journal of Scientific Research Engineering & Technology (IJSRET), ISSN 2278 – 0882 984 Volume 4, Issue 9, September

www.ijsret.org

984 International Journal of Scientific Research Engineering & Technology (IJSRET), ISSN 2278 – 0882

Volume 4, Issue 9, September 2015

5. The Responsibility of Data Mining In Cloud

Data mining techniques and applications are very much

needed in the cloud computing paradigm. As cloud

computing is penetrating more and more in all ranges of

business and scientific computing, it becomes a great

area to be focused by data mining. “Cloud computing

denotes the new trend in Internet services that rely on

clouds of servers to handle tasks. Data mining in cloud

computing is the process of extracting structured

information from unstructured or semi-structured web

data sources. The data mining in Cloud Computing

allows organizations to centralize the management of

software and data storage, with assurance of efficient,

reliable and secure services for their users. As Cloud

computing refers to software and hardware delivered as

services over the Internet, in Cloud computing data

mining software is also provided in this way.

The main effects of data mining tools being delivered by

the Cloud are the customer only pays for the data mining

tools that he needs – that reduces his costs since he

doesn’t have to pay for complex data mining suites that

he is not using exhaustive. The customer doesn’t have to

maintain a hardware infrastructure, as he can apply data

mining through a browser – this means that he has to pay

only the costs that are generated by using Cloud

computing. Using data mining through Cloud computing

reduces the barriers that keep small companies from

benefiting of the data mining instruments. These data

mining tasks include: Analyze Key Influencers, Detect

Categories, Fill From example, Forecast, Highlight

Exceptions, Scenario Analysis, Prediction Calculator,

and Shopping Basket Analysis.

The implementation of data mining techniques through

Cloud computing will allow the users to retrieve carrying

great weight in order to virtually integrated data

warehouse that reduces the costs of infrastructure and

storage.

6. Current Research Challenges In Cloud

Countless obtainable issues have not been entirely

addressed, while new challenges keep emerging from

various applications. This section describes some of the

challenging research issues in cloud computing for

researchers who are much interested in this field.

6.1 Automated service provisioning

One of the primary key features of cloud computing

is the capability of acquiring and releasing resources on-

demand. The objective of a service provider in this case

is to allocate and de-allocate resources from the cloud to

satisfy its service level objectives while minimizing its

operational cost.

6.2 Virtual machine migration

Virtualization can provide significant benefits in

cloud computing by enabling virtual machine migration

to balance load across the data center. In addition, virtual

machine migration enables robust and highly responsive

provisioning in data centers. Now, detecting workload

hotspots and initiating a migration lacks the agility to

respond to rapid workload changes.

6.3 Server consolidation

Server consolidation is an effective innovative

approach to take full advantage of resource utilization

while minimizing energy consumption in a cloud

computing environment. Existing virtual machine

migration technology residing on multiple servers onto a

single server at that time the remaining servers can be set

to an energy-saving state. The problem of optimally

consolidating servers in a data center is often formulated

as a variant of the vector bin-packing problem which is

an NP-hard optimization problem.

6.4. Energy management

Improving energy efficiency is another major issue

in cloud computing. Infrastructure providers are under

enormous pressure to reduce energy consumption.

Designing energy-efficient data centers has recently

received considerable attention. This problem can be

from several directions such as Energy efficient

hardware architecture, Energy-aware job scheduling and

server consolidation. In this admiration, the minority

researchers have recently started to investigate

coordinated solutions for performance and power

management in a active cloud environment.

6.5. Traffic management and analysis

Examination of data traffic is significant for today’s

data centers in cloud environment. Network operators

also need to know how traffic flows through the network

in order to make many of the management and planning

decisions. Currently, there is not much work on

measurement and analysis of data center traffic.

6.6. Data security

Data security is another important research topic in

cloud computing. Since service providers typically do

not have access to the physical security system of data

centers, they must rely on the infrastructure provider to

achieve full data security. The hardware layer must be

trusted using hardware TPM. Secondly, the virtualization

platform must be trusted using secure virtual machine

monitors. VM migration should only be allowed if both

source and destination servers are trusted. Recent work

has been devoted to designing efficient protocols for

trust establishment and management.

Page 4: Starring role of Data Mining in Cloud Computing Paradigm International Journal of Scientific Research Engineering & Technology (IJSRET), ISSN 2278 – 0882 984 Volume 4, Issue 9, September

www.ijsret.org

985 International Journal of Scientific Research Engineering & Technology (IJSRET), ISSN 2278 – 0882

Volume 4, Issue 9, September 2015

6.7. Storage technologies and data management

These file systems are different from traditional

distributed file systems in their storage structure, access

pattern and application programming interface. In

particular, they do not implement the standard POSIX

interface, and therefore introduce compatibility issues

with legacy file systems and applications. Several

research efforts have studied this problem but not yet

optimum.

6.8. Novel cloud architectures

Now, most of the commercial clouds are

implemented in large data centers and operated in a

centralized fashion. Although this design achieves

economy-of-scale and high manageability, it also comes

with its limitations such high energy expense and high

initial investment for constructing data centers. Recent

researchers suggests that small size data centers can be

more advantageous than big data centers in many cases:

a small data center does not consume so much power,

hence it does not require a powerful and yet expensive

cooling system; small data centers are cheaper to build

and better geographically distributed than large data

centers. The researchers think to build Nano-Data centers

with full pledged manner.

The Data mining technologies provided throughout

Cloud computing is an extremely essential trait for

today’s businesses to make proactive, knowledge driven

decisions, as it helps them have future trends and

behaviors predicted. This chapter provides an overview

of the necessity and utility of data mining in cloud

computing. As the need for data mining tools is growing

every day, the aptitude of integrating them in cloud

computing becomes progressively stringent. The current

technologies are not matured enough to realize its full

potential. In future the researchers highly deliberate the

data mining technologies to implement in fog computing.

7. Conclusion

Popularity of cloud computing is increasing day by day

in distributed computing environment. There is a

growing trend of using cloud environments for storage

and data processing needs. To use the full potential of

cloud computing, data is transferred, processed, retrieved

and stored by external cloud providers. Data owners are

very skeptical to place their data outside their own

control sphere. Their main concerns are the

confidentiality, integrity, security and methods of mining

the data from the cloud.

References [1] Ananthanarayanan R, Gupta K et al (2009) Cloud

analytics: do we really need to reinvent the storage

stack? In: Proc of HotCloud.

[2] Armbrust M et al (2009) Above the clouds: a

Berkeley view of cloud computing. UC Berkeley

Technical Report

[3] Berners-Lee T, Fielding R, Masinter L (2005) RFC

3986: uniform resource identifier (URI): generic syntax,

January 2005

[4] Bodik P et al (2009) Statistical machine learning

makes automatic control practical for Internet

datacenters. In: Proc HotCloud

[5] Jiawei Han and Micheline Kamber, “Data Mining

Concepts and Techniques” Second Edition, Elsevier,

Reprinted 2008.

[6] Rimmy Chuchra, Mahak Jindal, Bharti Mehta, “Role

of Component Based Systems in Data Mining & Cloud

Computing”, International Journal of Emerging

Technology and Advanced Engineering (IJETAE), Vol.3,

Issue 5, May 2013, pp.513-517.

[7] T.V. Mahendra, N. Deepika, N. Keasava Rao, “Data

Mining for High Performance Data Cloud using

Association Rule Mining”, International Journal of

Advanced Research in Computer Science and Software

Engineering, Vol. 2, Issue 1, January 2012.

[8] Naskar Ankita, Mishra Monika R., “Using Cloud

Computing to Provide Data Mining Analysis”,

International Journal of Engineering and Computer

Science, Vol.2, Issue 3, March 2013, pp.545-550.

[9] Ruxandra-Stefania PETRE, “Data Mining in Cloud

Computing”, Database SystemsJournal, Vol.3, No.3,

2012, pp.67-71.

[10] Shu-Chuan Chen1 and Chi-Ming Tsou, “A study on

Time series Data mining based on the concepts and

principles of Chinese IChing”, African Journal of

MarketingManagement, Vol. 4(1), January 2012, pp.1-

16.

[11] Qi Zhang, Lu Cheng, Raouf Boutaba, “Cloud

computing: state-of-the-art and research challenges”, J

Internet Serv Appl (2010) 1: 7–18, DOI 10.1007/s13174-

010-0007-6

[12] Alawode A. Olaide,” On Modeling Confidentiality

Archetype and Data Mining in Cloud Computing”,

African Journal of Computing & ICT, Vol. 6. No. 1, pp-

79-86, March 2013