big data and use cases


Post on 18-Jul-2015




0 download

Embed Size (px)


What is Emotional Intelligence (EI)

BIG DATA AND USE CASESBhaskara Reddy Sannapureddy, Senior Project Manager@Infosys, +91-7702577769AgendaThe objectives of this presentation are to discuss the benefits of big data and to present use cases and case studies that demonstrate the value of advanced analytics. It also explains how the existing data warehousing environment can be extended to support big data solutions. Topics that will be covered include:Review the history and evolution of big data and advanced analyticsExplain the role of the data scientist in developing advanced analyticsLook at the technologies that support big dataDiscuss big data use cases and the benefits they bring to the business

The Evolution of Digital Data

Data Growth

Data Growth Multi Structured DataDefinition: Data that has unknown, ill-defined or overlapping schemasMachine generated data, e.g., sensor data, system logsInternal/external web content including social computing dataText, document and XML dataGraph, map and multi-media dataVolume increasing faster than structured data Usually not integrated into a data warehouseIncreasing number of analytical techniques to extract useful information from this dataThis information can be used to extend traditional predictive models and analyticsChanging World: BI AnalyticsAdvanced AnalyticsImproved analytic tools and techniques for statistical and predictive analyticsNew tools for exploring and visualizing new varieties of dataOperational intelligence with embedded BI services and BI automationData ManagementAnalytic relational database systems that offer improved price/performance and libraries of analytic functionsIn-memory computing for high performanceNon-relational systems such as Hadoop for handling new types of dataStream processing/CEP systems for analyzing in-motion dataAdvanced Analytics

Advanced Analytics1. Descriptive analytics using historical data to describe the business. This is usually associated with Business Intelligence (BI) or visibility systems. In supply chain, you use descriptive analytics to better understand your historical demand patterns, to understand how product flows through your supply chain, and to understand when a shipment might be late.2. Predictive analytics using data to predict trends and patterns. This is commonly associated with statistics. In the supply chain, you use predictive analytics to forecast future demand or to forecast the price of fuel.3. Prescriptive analytics using data to suggest the optimal solution. This is commonly associated with optimization. In the supply chain, you use prescriptive analytics to set your inventory levels, schedule your plants, or route your trucks.Data Science Skill Requirements

BIG DATA Application Examples

In Memory DATA & Analytics

In Memory Data


Hadoop Components

Stream Processing/CEP

BIG DATA Benefits

Use Cases And Application Examples

Use Cases Vs Technologies

RDBMS + Hadoop

Hybrid Option 1: Techniques

Hybrid Option 1: Product Examples

RR is an integrated suite of software facilities for data manipulation, calculation and graphical display. This open source scripting language has become an important part of the IT arsenal for analysts and data scientists conducting statistical analysis on big data.There are now about 2 million R users worldwide who utilize thousands of open sources packages within the R ecosystem to enhance productivity in such domains as bioinformatics, spatial statistics, financial market analysis, and linear/non-linear modeling.While many developers and analysts run R programs on their personal computers, sometimes they need to do advanced computations on large amounts of data in a short period of time. To assist these developers, Oracle has created a broad set of options for conducting statistical and graphical analyses on data stored in Hadoop or Oracle Database, bringing enterprise caliber capabilities to projects that require high levels of security, scalability and performance.Oracle Advanced Analytics and Oracle R Connector for Hadoop combine the advantages of R with the power and scalability of Oracle Database and Hadoop. R programs and libraries can be used in conjunction with these database assets to process large amounts of data in a secure environment. Customers can build statistical models and execute them against local data stores as well as run R commands and scripts against data stored in a secure corporate database.Oracle Advanced Analytics links R to Oracle Database. Oracle Advanced Analytics, an option of Oracle Database Enterprise Edition, empowers data and business analysts to extract knowledge, discover new insights and make predictionsworking directly with large data volumes in the Oracle Database. Oracle Advanced Analytics offers a combination of powerful in-database algorithms and open source R algorithms, accessible via SQL and R languages, and provides a range of GUI and IDE options targeting the spectrum from business users to data scientists.In addition, Oracle R Connector for Hadoop, one of the Oracle Big Data Connectors, links R to Hadoop. If you have data spread out in Hadoop Distributed File System (HDFS) clusters, you can use tools like Hive to lend structure to the data and create a table-like environment. Analysts can then run queries against the HDFS files and execute the latest R packages inside the database.Oracle Advanced Analytics can be used in conjunction with a multi-threaded version of open source R with enhanced mathematical libraries. Oracle R Distribution can be readily enhanced with high performance libraries such as Intels MK for enhanced linear algebra and matrix processing. Oracle R Distribution is supported by Oracle. It uses in-database statistical techniques to optimize data transport and can leverage data-parallelism constructs within the database for enhanced scalability and performance.Run R with HadoopOracle R Connector for Hadoop (ORCH) is an R package that provides transparent access to Hadoop and data stored in HDFS. This Oracle Big Data Connector enables users to run R models efficiently against large volumes of data, as well as leverage MapReduce processes without having to leave the R environment. They can use R to analyze data stored in HDFS with Oracle supplied analytics, as well as utilize over 3,500 open source R packages.ORCH enables R scripts to run on data in Hive tables and files in HDFS by leveraging the transparency layer as defined in Oracle R Enterprise. Hadoop-based R programs can be deployed on a Hadoop cluster. Users dont need to know about Hadoop internals, command line interfaces, or the IT infrastructure to create, run, and store these R scripts.Big Data Use Cases with Oracle Database and HadoopUse Case 1: Analyzing Credit Risk

Banks continually offer new services to their customers, but the terms of these offers vary based on eachcustomers credit status. Do they pay the minimum amount due on credit balances, or more? Are their paymentsever late? How much of their credit lines do they use and how many other credit lines do the have? What is theoverall debt-to-income ratio?All of these variables influence policies about how much credit to award to each customer, and what type ofterms to offer them. A bureau like Equifax or Transunion examines an individuals overall credit history. Butbanks can examine a much more detailed set of records about their customersdown to the level of everydiscrete transaction. They need big data analytics to get down to this level of precision with this volume of data.For example, one Oracle customer in Brazil is running multiple neural network algorithms against hundreds ofmillions of records to examine thousands of attributes about each of its customers. Previously the bank hadtrouble crunching this massive volume of data to generate meaningful statistics. They solved this problem byrunning a specialized algorithm using Oracle R Connector for Hadoop to analyze this data in parallel on the samecluster that is running the Hadoop file system, Hive, and other tools. The Oracle R Connector for Hadoopenables analysts to execute R analysis, statistics and models on tables stored in the banks large Hadoop filesystems. They can now run complex statistical algorithms against these files systems and Hive tables. They canalso use specialized Mahout algorithms to perform unique analyses.The Oracle algorithms are transparent to R users and accessed with a couple of lines of code. Behind the scenes,ORCH provides the interface for executing MapReduce jobs in parallel on multiple processors throughout thebanks cluster. Analysts can create these MapReduce algorithms in R and store them in Hadoop as well as easilysurface these models for review, plotting and analysisand then push them to the databasewithout having toutilize Java.Big Data Use Cases with Oracle Database and HadoopUse Case 2: Detecting Fraud

Another popular use case involves detecting fraud by analyzing financial transactions. Banks, retailers, credit cardcompanies, telecommunications firms and many other large organizations wrestle with this issue. When scoringfraud you typically study transactions as they occur within customer accounts. (Scoring refers to predictingoutcomes using a data-mining model.)Once you understand the modus operandi the normal mode of operations you can then recognize unusualpatterns and suspicious transactions. For example, if you normally shop in Los Angeles and there is a suddenseries of transactions in Rome this would indicate a high likelihood of fraud. Or would it? If you are somebodywho travels a lot, is a surge of activity in Rome an anomaly or a regular pattern? By capturing all previoustransactions and studying these patterns you can develop a model that reflects normal behavior.R has the ideal algorithms and environment f