cloud-based big data analytics

16
CLOUD-BASED BIG DATA ANALYTICS

Upload: sateeshreddy-n

Post on 13-Apr-2017

45 views

Category:

Engineering


4 download

TRANSCRIPT

Page 1: Cloud-Based Big Data Analytics

CLOUD-BASED BIG DATA

ANALYTICS

Page 2: Cloud-Based Big Data Analytics

INTRODUCTION:

• With the advent of the digital age, the amount of data being generated, stored and shared has been on the rise. From data warehouses, social media, webpages and blogs to audio/video streams, all of these are sources of massive amounts of data. • This data has huge potential, ever-increasing complexity,

insecurity and risks, and irrelevance.

Page 3: Cloud-Based Big Data Analytics

• Big data, by definition, is a term used to describe a variety of data -structured, semi-structured and unstructured, which makes it a complex data infrastructure. • Big data includes variety, volume, velocity

and veracity • The different types of data available on a dataset

determine variety while the rate at which data is produced determines Velocity.• Predictably, the size of data is called Volume.• Veracity indicates data reliability.

INTRODUCTION: CNTD…

Page 4: Cloud-Based Big Data Analytics

INTRODUCTION: CNTD…• The cloud computing environment offers

development, installation and implementation of software and data applications ‘as a service’. • software as a service(SaaS)• Platform as a service(PaaS)• Infrastructure as a service(IaaS)

• Infrastructure-as-a-service is a model that provides computing and storage resources as a service. • in case of PaaS and SaaS, the cloud services

provide software platform or software itself as a service to its clients.

Page 5: Cloud-Based Big Data Analytics

LITERATURE SURVEY: • Traditional data management tools and data processing or data mining

techniques cannot be used for Big Data Analytics for the large volume and complexity of the datasets that it includes. • Conventional business intelligence applications make use of methods,

which are based on traditional analytics methods and techniques and make use of OLAP, BPM, Mining and database systems like RDBMS. • One of the most popular models used for data processing on cluster of

computers is MapReduce. • Hadoop is simply an open-source implementation of the MapReduce

framework, which was originally created as a distributed file system.

Page 6: Cloud-Based Big Data Analytics

PROBLEM STATEMENT:• In order to move beyond the existing techniques and

strategies used for machine learning and data analytics, some challenges need to be overcome. NESSI identifies the following requirements as critical.• In order to select an adequate method or design, a solid scientific

foundation needs to be developed. • New efficient and scalable algorithms need to be developed. • For proper implementation of devised solutions, appropriate

development skills and technological platforms must be identified and developed. • Lastly, the business value of the solutions must be explored just as

much as the data structure and its usability.

Page 7: Cloud-Based Big Data Analytics

PROBLEM STATEMENT:CNTD…• This section, describes two example applications where large

scale data management over cloud is used. These are specific use-case examples in telecom and finance. • In the telecom domain, massive amount of call detail records

can be processed to generate near real-time network usage information. • In finance domain it can be describe the fraud detection

application.

Page 8: Cloud-Based Big Data Analytics

DESIGN, IMPLEMENTATION AND RESULT ANALYSIS DETAILS: 1.Dashboard for CDR Processing: • Telecom operators are interested in building a dashboard that would allow

the analysts and architects to understand the traffic flowing through the network along various dimensions of interest. • The traffic is captured using Call Detail Records (CDRs) whose volume runs

into a terabyte per day. • CDR is a structured stream generated by the telecom switches to

summarize various aspects of individual services like voice, SMS, MMS, etc. • The dashboard include determining the cell site used most for each

customer, identifying whether users are mostly making calls within cell site calls, and for cell sites in rural areas identifying the source of traffic i.e. local versus routed calls.

Page 9: Cloud-Based Big Data Analytics

DESIGN, IMPLEMENTATION AND RESULT ANALYSIS DETAILS: 1.Dashboard for CDR Processing: CNTD…• Given the huge and ever growing customer base and large call

volumes, solutions using traditional warehouse will not be able to keep-up with the rates required for effective operation. • The need is to process the CDRs in near real-time, mediate them (i.e.,

collect CDRs from individual switches, stitch, validate, filter, and normalize them), and create various indices which can be exploited by dashboard among other applications. • An IBM Stream Processing Language (SPL) based system leads to

mediating 6 billion CDRs per day. • CDRs can be loaded periodically over cloud data management

solution. As cloud provides flexible storage, depending on traffic one can decide on the storage required.

Page 10: Cloud-Based Big Data Analytics

DESIGN, IMPLEMENTATION AND RESULT ANALYSIS DETAILS: 2. Credit Card Fraud Detection: • More than one-tenth of world’s population is shopping online. Credit card is

the most popular mode of online payments. As the number of credit card transactions rise, the opportunities for attackers to steal credit card details and commit fraud are also increasing. • As the attacker only needs to know some details about the card (card

number, expiration date, etc.), the only way to detect online credit card fraud is to analyze the spending patterns and detect any inconsistency with respect to usual spending patterns. • The companies keep tabs on the geographical locations where the credit

card transactions are made—if the area is far from the card holder’s area of residence, or if two transactions from the same credit card are made in two very distant areas within a relatively short timeframe, — then the transactions are potentially fraud transactions.

Page 11: Cloud-Based Big Data Analytics

DESIGN, IMPLEMENTATION AND RESULT ANALYSIS DETAILS: 2. Credit Card Fraud Detection:CNTD…• Various data mining algorithms are used to detect patterns within the

transaction data. Detecting these patterns requires the analysis of large amount of data. • Using tuples of the transactions, one can find the distance between

geographic locations of two consecutive transactions, amount of these transactions, etc. By these parameters, one can find the potential fraudulent transactions. Further data mining, based on a particular user’s spending profile can be used to increase the confidence whether the transaction is indeed fraudulent.

Page 12: Cloud-Based Big Data Analytics

DESIGN, IMPLEMENTATION AND RESULT ANALYSIS DETAILS: 2. Credit Card Fraud Detection:CNTD…• As number of credit card transactions is huge and the kind of

processing required is not a typical relational processing (hence, warehouses are not optimized to do such processing), one can use Hadoop based solution for this purpose as depicted. • Using Hadoop one can create customer profile as well as creating

matrices of consecutive transactions to decide whether a particular transaction is a fraud transaction. As one needs to find the fraud with-in some specified time, stream processing can help.• By employing massive resources for analyzing potentially fraud

transactions one can meet the response time guarantees.

Page 13: Cloud-Based Big Data Analytics

DESIGN, IMPLEMENTATION AND RESULT ANALYSIS DETAILS: 3. Result Analysis: • Several open source data mining techniques,

resources and tools exist. Some of these include R, Gate, Rapid-Miner and Weka, in addition to many others. • Cloud-based big data analytics solutions must

provide a provision for the availability of these affordable data analytics on the cloud so that cost-effective and efficient services can be provided. • The fundamental reason why cloud-based analytics

are such a big thing is their easy accessibility, cost-effectiveness and ease of setting up and testing.

Page 14: Cloud-Based Big Data Analytics
Page 15: Cloud-Based Big Data Analytics

CONCLUSION AND FUTURE RESEARCH DIRECTION: • This is an age of big data and the emergence of this field of study has

attracted the attention of many practitioners and researchers. • Considering the rate at which data is being created in the digital world,

big data analytics and analysis have become all the more relevant. • The cloud infrastructure suffices the storage and computing

requirements of data analytics algorithms. On the other hand, open issues like security, privacy and the lack of ownership and control exist. • Research studies in the area of cloud-based big data analytics aim to

create an effective and efficient system that addresses the identified risks and concerns.

Page 16: Cloud-Based Big Data Analytics

THANK YOU