file · web viewdata mining—why is it important?
TRANSCRIPT
Data Mining—Why is it Important?
Data mining starts with the client. Clients naturally collect data simply by doing business; so that is where the entire process begins. But Customer Relationship Management (CRM) Data is only one part of the puzzle. The other part of the equation is competitive data, industry survey data, blogs, and social media conversations. By themselves, CRM data and survey data can provide very good information, but when combined with the other data available it is powerful.Data Mining is the process of analyzing and exploring that data to discover patterns and trends.
The term Data Mining is one that is used frequently in the research world, but it is often misunderstood by many people. Sometimes people misuse the term to mean any kind of extraction of data or data processing. However, data mining is so much more than simple data analysis. According to Doug Alexander at the University of Texas, data mining is, “the computer-assisted process of digging through and analyzing enormous sets of data and then extracting the meaning of the data. Data mining tools predict behaviors and future trends, allowing businesses to make proactive, knowledge-driven decisions. Data mining tools can answer business questions that traditionally were too time consuming to resolve. They scour databases for hidden patterns, finding predictive information that experts may miss because it lies outside their expectations.”
Data mining consists of five major elements:
1) Extract, transform, and load transaction data onto the data warehouse system.
2) Store and manage the data in a multidimensional database system.
3) Provide data access to business analysts and information technology professionals.
4) Analyze the data by application software.
5) Present the data in a useful format, such as a graph or table.
This technique is a game changer in the world of statistical analysis and business. It is important in this realm because it can make predictions that older analyses techniques were simply not capable making. This visual from thearling.commay help understand the evolution and differences of data analysis through the years:
Evolutionary Step
Business Question
Enabling Technologies
Product Providers
Characteristics
Data Collection(1960s)
“What was my total revenue in the last five years?”
Computers, tapes, disks
IBM, CDC Retrospective, static data delivery
Data Access(1980s)
“What were unit sales in New England last March?”
Relational databases (RDBMS), Structured Query Language (SQL), ODBC
Oracle, Sybase, Informix, IBM, Microsoft
Retrospective, dynamic data delivery at record level
Data Warehousing &Decision Support(1990s)
“What were unit sales in New England last March? Drill down to Boston.”
On-line analytic processing (OLAP), multidimensional databases, data warehouses
Pilot, Comshare, Arbor, Cognos, Microstrategy
Retrospective, dynamic data delivery at multiple levels
Data Mining(Emerging Today)
“What’s likely to happen to
Advanced algorithms, multiprocessor
Pilot, Lockheed, IBM, SGI,
Prospective, proactive information
Boston unit sales next month? Why?”
computers, massive databases
numerous startups (nascent industry)
delivery
Table 1. Steps in the Evolution of Data Mining.
Data Mining can be used in many different sectors of business to both predict and discover trends. It is a proactive solution for businesses looking to gain a competitive edge. In the past, we were only able to analyze what a company’s customers or clients HAD DONE, but now, with the help of Data Mining, we can predict what clientele WILL DO.
With Data Mining, companies can make better and more effective business decisions – marketing, advertising, etc – decisions that will help these companies grow.
For more information about how Data Mining can help discover trends and patterns in your market, contact the market research specialists at The Research Group by calling 410-332-0400 or click here today!
Qualitative market research utilizes the disciplines of psychology and sociology to garner emotive insights that drive behavior, and importantly influence decisions. The Research Group’s team of seasoned researchers will assist you in turning those insights into opportunities.
3 Reasons Why Data Mining is (almost) DeadData Mining (sometimes called data or knowledge discovery) is the process of analyzing data from different perspectives and
summarizing it into useful information. As the term suggests, the data is mined or queried for insight. For example, retailers use data mining
techniques to do basket analysis (customers who bought this also
bought that) and to further understand what other factors influence a purchase.
Traditionally, data mining has consisted of analysts generating questions to feed to a database in the hope of finding an answer. This
could be something like asking the data belonging to a clothing retailer, “Are customers buying Hawaiian shirts in Atlanta?” Sounds
very applicable, especially when it comes to the hype around Big Data, doesn’t it?
Applicable, yes. Effective? Not so much.
Given today’s explosion of “Big Data,” companies need more
advanced methods for leveraging their data – methods that don’t rely solely on tribal knowledge, personal experience or best guesses.
What’s needed are new technologies and purpose-built solutions that reveal questions to answers no one even knew to ask.
That leads me to the three main reasons why traditional data mining methods are going the way of the dodo:
1. The current volume of data is unprecedented. In fact, 15 of 17 sectors
in the U.S. have more data stored per company than the entire U.S.
Library of Congress. According to IDC, in 2015, an estimated 7.9
zettabytes of data will be produced and replicated – the equivalent of
18 million libraries of congress. With these massive data sets, it’s
close to impossible to figure out what to query? The number of
queries exponentially explodes with the number of data elements.
Should I query about customers buying shirts in Atlanta? Or in
summer? Or in summer with a coke? Or with a hot dog?…the list is
endless. As one my customers said – “I do not know what questions
to ask. Therein is the limitation!” The breadth and depth of this “big”
data makes querying seem like trying to strike oil while digging with a
toothpick.
2. Added to volume is velocity of the data. The data is piling up faster
and faster. A company encounters a continuous stream of real-time
data – social media updates, customer feedback, sales figures,
financial data, supply chain data, product quality data, product
monitoring data and on and on and on. There’s simply not enough
time to manually query the data – it’s like a physician trying to
diagnose thousands of patients at the same time. The data must
constantly inform the end-user – ie. diagnose itself and recommend a
treatment – for it to be of any strategic value.
3. As I’ve already discussed, conventional data mining techniques are
driven by the analyst – or group of people – tasked with coming up
with a hypothesis, which is subjective and vulnerable to personal bias
and human error. Given the amount of information that’s out there,
asking the right question every time is becoming more and more of a
challenge because even the smartest, most experienced analysts
“don’t know what they don’t know.” Querying methods are seriously
biased by what the analyst thinks to ask. Again, going to back to the
striking oil analogy, if the analyst thinks there is oil under a certain
rock, that is the only place he will dig. He could be sitting on a gold
mine 50 feet away, but he’d completely miss it.
Data mining is limited to manual endeavors – why limit company
success to antiquated methods that by design fail to leverage the data for all it’s worth? It’s time to usher in new methods – new technologies
– for transforming the enterprise from reactive – based on guesstimates, hunches, and flawed insight – to proactive – based on
data-driven, actionable insight.
CMMI
Maturity Level 1, called "Initial", is characterized by "Heroic Efforts". The CMMI identifiesno Process Areas at this level. You automatically achieve this level if you can design, develop, integrate, and test. Organizations at Maturity Level 1 are sometimes successful, and sometimes not.
Maturity Level 2, called "Managed", is characterized by "Basic Project Management". The seven Process Areas at Maturity Level 2 all deal with management, rather than technical issues:
Maturity Level 3, called "Defined", is characterized by "Process Standardization". This iswhere the bulk of the Process Areas reside in the CMMI. We find that these Process Areas fallinto three main categories:
Technical – The first five Process Areas (Requirements Development, Technical Solution, Product Integration, Verification, and Validation) deal with the technical engineering work.
Process Management – The next three Process Areas (Organizational Process Focus, Organizational Process Definition, and Organizational Training) provide the infrastructure for maintaining and improving the organization's processes.
Management – The last six Process Areas (Integrated Product Management, Risk Management, Integrated Teaming, Integrated Supplier Management, Decision Analysis & Resolution, and Organizational Environment for Integration) all build more management
discipline on top of the basic management Process Areas established at Maturity Level 2.
Maturity Level 4, called "Quantitatively Managed", is characterized by "Quantitative Management". With the disciplined processes established at Maturity Levels 2 and 3, the organization is now in the position to be able to gain a statistical, numbers-based understanding of its performance, and use that understanding to "manage by fact". The two Process Areas at Maturity Level 4 (Organizational Process Performance and Quantitative Project Management) apply this capability for statistical management to understand the quality of both the processes the organization uses and the products it produces.
Maturity Level 5, called "Optimizing", is characterized by "Continuous Process Improvement". Built on the disciplined processes of Maturity Levels 2 and 3, and the quantitative understanding of Maturity Level 4, the two Process Areas at Maturity Level 5 (Organizational Innovation & Deployment and Causal Analysis & Resolution) put the organization on the path of ever-improving performance by understanding and correcting the root causes of problems, and by fostering an environment of innovation and creativity.
Why Do People Believe the CMMI Has Little Value?The CMM and CMMI have received a lot of bad press over the years. Most of that bad press can be traced to one of two things: misunderstandings and abuses.
Misunderstandings. Many people who open the CMMI book are immediately overwhelmed by the volume of information: five Maturity Levels, two Generic Goals, 12 Generic Practices, 25 Process Areas, 55 Specific Goals, 185 Specific Practices, hundreds of Sub-Practices—nearly a thousand pages in all! It is hard to blame them for feeling that this model must be way too restrictive to be applicable to a real-life organization.
Naturally, if your organization is not under a mandate to achieve a Maturity Level rating, then the Practices, and even the Goals in the CMMI take on more of a suggestive flavor. Of course, any organization would do well to take them as exceedingly strong suggestions, given the CMMI’s solid research basis!
Abuses. As we said at the beginning of this paper, the SEI designed the CMMI to be a roadmap for process improvement. But what we have seen in practice is organizations requiring their suppliers to achieve specific Maturity Level ratings. This in turn causes those suppliers to turn to the CMMI simply to achieve a rating, even if they have little or no interest in process improvement. When the CMMI is used by an organization that has no interest in process improvement, its use can (and often does) become abuse. Processes are written solely to satisfy a CMMI Appraiser, but with little or no thought for how they will affect the organization's work. Paperwork grows seemingly without bounds, and people feel that they are drowning in "process for process' sake".Those five steps seem easy enough. But organizational change actually involves much more work than the simple mechanics of deciding to make a change. The key players in the organization must all agree on the need for change, as well as the strategy to be employed. Garnering the necessary agreement and establishing momentum are major challenges in and of themselves. But those are topics for another white paper.
How can CMMI help?• CMMI provides a way to focus and manage hardware and software
development from product inception through deployment and maintenance.
– ISO/TL9000 are still required. CMMI interfaces well with them. CMMI and TL are complementary - both are needed since they address different aspects.
• ISO/TL9000 is a process compliance standard• CMMI is a process improvement model
• Behavioral changes are needed at both management and staff levels. Examples:
– Increased personal accountability– Tighter links between Product Management, Development, SCN,
etc.• Initially a lot of investment required – but, if properly managed, we will
be more efficient and productive while turning out products with consistently higher quality.
CMMI Models within the Framework• Models:
– Systems Engineering + Software Engineering (SE/SW)– Systems Engineering + Software Engineering + Integrated Product
and Process Development (IPPD) – Systems Engineering + Software Engineering + Integrated Product
and Process Development + Supplier Sourcing (SS)– Software Engineering only
• Representation options:– Staged – Continuous
• The CMMI definition of “Systems Engineering” - “The interdisciplinary approach governing the total technical and
managerial effort required to transform a set of customer needs, expectations and constraints into a product solution and to support that solution throughout the product’s life.” This includes both hardware and software.
Maturity Level 1: Initial
• Maturity Level 1 deals with performed processes.• Processes are unpredictable, poorly controlled, reactive. • The process performance may not be stable and may not meet specific
objectives such as quality, cost, and schedule, but useful work can be done.
Maturity Level 2 : Managed at the Project Level• Maturity Level 2 deals with managed processes.• A managed process is a performed process that is also:
– Planned and executed in accordance with policy– Employs skilled people– Adequate resources are available– Controlled outputs are produced– Stakeholders are involved– The process is reviewed and evaluated for adherence to
requirements
Slide of 146
Level 5
Initial
Level 1
Processes are unpredictable, poorly controlled, reactive.
Managed
Level 2
Processes are planned, documented, performed, monitored, and controlled at the project level. Often reactive.
Defined
Level 3 Processes are well characterized and understood. Processes, standards, procedures, tools, etc. are defined at the organizational (Organization X ) level. Proactive.
Quantitatively Managed
Level 4 Processes are controlled using statistical and other quantitative techniques.
Optimizing
Process performance continually improved through incremental and innovative technological improvements.
• Processes are planned, documented, performed, monitored, and controlled at the project level. Often reactive.
• The managed process comes closer to achieving the specific objectives such as quality, cost, and schedule.
Maturity Level 3 : Defined at the Organization Level• Maturity Level 3 deals with defined processes.• A defined process is a managed process that:
– Well defined, understood, deployed and executed across the entire organization. Proactive.
– Processes, standards, procedures, tools, etc. are defined at the organizational (Organization X ) level. Project or local tailoring is allowed, however it must be based on the organization’s set of standard processes and defined per the organization’s tailoring guidelines.
• Major portions of the organization cannot “opt out.”
Behaviors at the Five Levels
Initial
Managed
Defined
QuantitativelyManaged
Optimizing
Process is unpredictable,poorly controlled, and reactive
Process is characterized for projects and is oftenreactive
Process is characterizedfor the organization andis proactive
Process is measuredand controlled
Focus is on continuousquantitative improvement
Maturity LevelProcess Characteristics BehaviorsFocus on "fire prevention";improvement anticipated anddesired, and impacts assessed.
Greater sense of teamwork and inter-dependencies
Reliance on defined process. People understand, support and follow the process.Over reliance on experience of good
people – when they go, the processgoes. “Heroics.”
Focus on "fire fighting";effectiveness low – frustration high.
CMMI Components• Within each of the 5 Maturity Levels, there are basic functions that need
to be performed – these are called Process Areas (PAs).• For Maturity Level 2 there are 7 Process Areas that must be completely
satisfied.• For Maturity Level 3 there are 11 Process Areas that must be completely
satisfied.• Given the interactions and overlap, it becomes more efficient to work the
Maturity Level 2 and 3 issues concurrently.• Within each PA there are Goals to be achieved and within each Goal
there are Practices, work products, etc. to be followed that will support each of the Goals.
CMMI Process Areas
Example
For the Requirements Management Process Area:An example Goal (required):
“Manage Requirements”An example Practice to support the Goal (required):
“Maintain bi-directional traceability of requirements”Examples (suggested, but not required) of typical Work Products might be
Slide of 146
Maturity Levels (1- 5)
GenericPractices
GenericGoals
Process Area 2
Common Features
Process Area 1 Process Area n
VerifyingImplementation
SpecificGoals
SpecificPractices
Abilityto Perform
DirectingImplementation
RequiredRequired
Sub practices, typical work products, discipline amplifications, generic practice elaborations, goal and
practice titles, goal and practice notes, and references
Commitmentto Perform
Sub practices, typical work products, discipline amplifications, generic practice elaborations, goal and
practice titles, goal and practice notes, and references
Required. Specific for each process area.
Required. Common across all process areas.
Requirements traceability matrix orRequirements tracking system
Yet another CMMI term: Institutionalization
• This is the most difficult part of CMMI implementation and the portion where managers play the biggest role and have the biggest impact
• Building and reinforcement of corporate culture that supports methods, practices and procedures so they are the ongoing way of business……..
– Must be able to demonstrate institutionalization of all CMMI process areas for all organizations, technologies, etc.
• Required for all Process Areas
Scenario 1
ABC Pvt Ltd is a company with branches at Mumbai, Delhi, Chennai and Banglore. The Sales Manager wants quarterly sales report. Each branch has a separate operational system.
Solution 1:ABC Pvt Ltd.Extract sales information from each database.Store the information in a common repository at a single site.
Mumbai
Delhi
Chennai
Banglore
SalesManager
Sales per item type per branchfor first quarter.
Scenario 2One Stop Shopping Super Market has huge operational database.Whenever Executives wants some report the OLTP system becomes slow and data entry operators have to wait for some time.
Solution 2Extract data needed for analysis from operational database.Store it in warehouse.Refresh warehouse at regular interval so that it contains up
to date information for analysis.
Mumbai
Delhi
Chennai
Banglore
DataWarehouse
SalesManager
Query &Analysis tools
Report
OperationalDatabase
Data Entry Operator
ManagementWait
Report
Warehouse will contain data with historical perspective.
Scenario 3
Cakes & Cookies is a small,new company.President of the company wants his company should grow.He needs information so that he can make correct decisions.
Solution 3 Improve the quality of data before loading it into the
warehouse.Perform data cleaning
and transformation before loading the data.
Use query analysis tools to support adhoc
queries.
What is Data Warehouse??Inmons’s definition A data warehouse is
-subject-oriented,
Operationaldatabase
DataWarehouse
Extractdata
Data EntryOperator
Data EntryOperator
Manager
Report
Transaction
Query a nd A
nal ysistool
salesD
ataW
arehou se
-integrated,-time-variant,-nonvolatile
collection of data in support of management’sdecision making process.
Subject-orientedData warehouse is organized around subjects such as
sales,product,customer. It focuses on modeling and analysis of data for decision
makers.Excludes data not useful in decision support process.
IntegrationData Warehouse is constructed by integrating multiple
heterogeneous sources.Data Preprocessing are applied to ensure consistency.
In terms of data.– encoding structures. – Measurement of attributes. – physical attribute of data
RDBMS
LegacySystem
DataWarehouse
Flat File Data ProcessingData Transformation
– naming conventions. – Data type format
Time-variantProvides information from historical perspective e.g. past 5-
10 yearsEvery key structure contains either implicitly or explicitly an
element of timeNonvolatile
Data once recorded cannot be updated.Data warehouse requires two operations in
data accessing– Initial loading of data– Access of data
Operational v/s Information SystemFeatures Operational Information
Characteristics Operational processing Informational processing
Orientation Transaction Analysis
User Clerk,DBA,database professional
Knowledge workers
Function Day to day operation Decision support
Data Current Historical
View Detailed,flat relational Summarized, multidimensional
DB design Application oriented Subject oriented
load
access
Unit of work Short ,simple transaction
Complex query
Access Read/write Mostly read
Features Operational Information
Focus Data in Information out
N0. of rec. accessed tens millions
Number of users thousands hundreds
DB size 100MB to GB 100 GB to TB
Priority High prformnc,high availability
High flexibility,end-user autonomy
Metric Transaction throughput Query througput
Operational v/s Information System
Extract Transform Load Refresh
ServeExternalSources
Analysis
Query/Reporting
Monitoring &AdministrationMetadata
Repository
OLAP Servers
Reconciled data
Data Warehouse ArchitectureData Warehouse server
– almost always a relational DBMS,rarely flat filesOLAP servers
– to support and operate on multi-dimensional data structures
Clients– Query and reporting tools– Analysis tools– Data mining tools
Data Warehouse SchemaStar SchemaFact Constellation SchemaSnowflake Schema
Star SchemaA single,large and central fact table and one table for each
dimension.Every fact points to one tuple in each of the dimensions and
has additional attributes.Does not capture hierarchies directly.
SnowFlake SchemaVariant of star schema model.A single,large and central fact table and one or more tables
for each dimension.Dimension tables are normalized i.e. split dimension table
data into additional tablesFact Constellation
Multiple fact tables share dimension tables.This schema is viewed as collection of stars hence called
galaxy schema or fact constellation.Sophisticated application requires such schema.
Building Data WarehouseData SelectionData Preprocessing
– Fill missing values– Remove inconsistency
Data Transformation & IntegrationData Loading
Data in warehouse is stored in form of fact tables and dimension tables.
Case Study
Afco Foods & Beverages is a new company which produces dairy,bread and meat products with production unit located at Baroda.
There products are sold in North,North West and Western region of India.
They have sales units at Mumbai, Pune , Ahemdabad ,Delhi and Baroda.
The President of the company wants sales information.
Sales Information
Sales Measures & Dimensions
Measure – Units sold, Amount.Dimensions – Product,Time,Region.
Sales Data Warehouse Model
Sales Data Warehouse Model
Online Analysis Processing(OLAP)
It enables analysts, managers and executives to gain insight into data through fast, consistent, interactive access to a wide variety of possible views of information that has been transformed from raw data to reflect the real dimensionality of the enterprise as understood by the user.
OLAP Server
An OLAP Server is a high capacity,multi user data manipulation engine specifically designed to support and operate on multi-dimensional data structure.
OLAP server available are– MOLAP server
– ROLAP server– HOLAP server
Data Warehousing includesBuild Data WarehouseOnline analysis processing(OLAP).Presentation.
Need for Data Warehousing Industry has huge amount of operational dataKnowledge worker wants to turn this data into useful
information.This information is used by them to support strategic
decision making . It is a platform for consolidated historical data for analysis. It stores data of good quality so that knowledge worker can
make correct decisions.From business perspective
-it is latest marketing weapon-helps to keep customers by learning more about their
needs .-valuable tool in today’s competitive fast evolving world.
Data Warehousing ToolsData Warehouse
– SQL Server 2000 DTS– Oracle 8i Warehouse Builder
OLAP tools– SQL Server Analysis Services– Oracle Express Server
Reporting tools– MS Excel Pivot Chart– VB Applications
• What is Crowdsourcing?• How Crowdsourcing works?• Types of Crowdsourcing• Applications of Crowdsourcing• Benefits & Problems of Crowdsourcing• Video
WHAT IS CROWDSOURCING?• Crowdsourcing is the process of getting work or funding,
usually online, from a crowd of people.• The word Crowdsourcing is a combination of Crowd &
Outsourcing • Definition's:• Crowdsourcing is the act of outsourcing tasks, traditionally
performed by an employee or contractor, to an undefined, large group of people or community (a "crowd"), through an open call.
• Crowdsourcing is an online, distributed problem solving and production model.
• The term crowd sourcing was first used by Jeff Howe in 2006 in an article for wired magazine.
The Croud Sourcing Process IN EIGHT STEPS
1- Company has a problem2- Company broadcasts problem online3- Online “crowd” is asked to give solutions4- Crows submits Solutions5- Crowd vets solutions6- Company rewards winning solvers7- Company owns winning solutions8- Company Profits
TYPES OF CROWDSOURCING
• Crowd funding• The wisdom of the crowd• Crowdsourcing creative work• Microwork
CROWD FUNDING• Crowd funding describes the collective effort of individuals
who network and pool their money, usually via the Internet, to support efforts initiated by other people or organizations. This includes disaster relief, startup company funding, free software development, scientific research and many more.
THE WISDOM OF THE CROWD
• The wisdom of the crowd is the process of taking into account the collective opinion of a group of individuals rather than a single expert to answer a question.
CROWDSOURCING CREATIVE WORK
• Creative crowdsourcing spans sourcing creative projects such as graphic design, architecture, apparel design, writing, illustration etc.
MICROWORK
• Microwork is a series of small tasks which together comprise a large unified project, and are completed by many people over the Internet. Microwork is considered the smallest unit of work in a virtual assembly line. It is often used where human intelligence required to complete the task efficiently.
APPLICATIONS OF CROWDSOURCING• Testing & Refining a
ProductNetflix SellaBand
• Market ResearchThreadless
Knowledge Management• Accenture • Wikipedia
• Customer Service• My Starbucks ideas
• R & D• InnoCentive • P&G Connect &
Develop • Polling and Voting
• InTrade Building a new city
The History / Genesis of Crowd sourcing
1714- Marine Pocket Clock invented1936- Toyota Holds a Logo Contest1955- Syd Opera House Architecture Contest2001- Wikipedia Launched2002- American Idol Season 12005- Youtube Launched2006- Crowdsourcing term coined
BENEFITS OF CROWDSOURCING• Problems can be explored at comparatively little cost. • Payment is by results. • The organization can tap a wider range of talent than might
be present in its own organization• Turn customers into designers• Turn customers into marketers
PROBLEMS WITH CROWDSOURCING
• Quality• Intellectual property leakage• No time constraint• Not much control over development or ultimate product• Ill-will with own employees• Choosing what to crowd source & what to keep in-house
Benefits of Refactoring
The Summary: Refactoring is a huge aid in untangling production code without breaking it, and in improving its long-term maintainability.
Refactoring helps you achieve:
1. self-documenting code, for better readability and maintainability, which is pretty much the only kind of code documentation that ever seems to stay current (Extract Method and Introduce Local allow you to create function and variable names that are descriptive enough to rarely need
comments). Until you experience readable, self-describing code, you don't know what you're missing
2. fine-grained encapsulation, for easier debugging and code reuse: Extract Method automatically determines the parameters needs in order to create a method from the current selection, and handles them correctly. You then know exactly what external information the selected block requires in order to operate. This can be a great aid in untangling complex code during code reviews or debugging.
3. the generalization of existing code, to make it easier to apply existing code to a broader range of problems - as youExtract Method, you can easily replace things like hard-coded constants (perhaps, a connection string, or a table name) with parameters, thus allowing the application of proven code to new contexts.
Continues…
UnderstandabilityMore straightforward and well organized (factored) code is easier to understand.
CorrectnessIt's easier to identify defects by inspection in code that's easier to understand. Overly complex, poorly structured, Rube Goldberg style code is much more difficult to inspect for defects. Additionally, well componentized code with high coherency of components and loose coupling between components is vastly easier to put under test. Moreover, smaller, well-formed bits under test makes for less overlap in code coverage between test cases which makes for faster and more trustworthy tests (which becomes a self-reinforcing cycle driving toward better and better tests). As well, more straightforward code tends to be more predictable and reliable.
Ease of Maintenance and EvolutionWell-factored, high quality, easy to understand common components are easier to use, extend, and maintain. Many changes to the system are now easier to make because they have smaller impact and it's more obvious how to make the appropriate changes.
Refactoring code does have merit on its own just in terms of code quality and correctness issues, but where refactoring pays off the most is in maintenance and evolution of the design of the software. Often a good tactic when adding new features to old, poorly factored code is to refactor the target code then add the new feature. This often will take less development effort than trying to add the new feature without refactoring and it's a handy way to improve the quality of the code base without undertaking a lot of "pie in the sky" hypothetical advantage refactoring / redesign work that's hard to justify to management.
Cloud computing Definitions of Cloud computing Architecture of Cloud computing Benefits of Cloud computing Opportunities of Cloud Computing Cloud computing – Google Apps Grid computing vs Cloud computing
Definitions Cloud computing is using the internet to access someone else's
software running on someone else's hardware in someone else's data center. Lewis Cunningham[2]
A large-scale distributed computing paradigm that is driven by economies of scale, in which a pool of abstracted, virtualized, dynamically scalable, managed computing power, storage,
platforms, and services are delivered on demand to external customers over the Internet. Ian Foster[9]
A Cloud is a type of parallel and distributed system consisting of a collection of interconnected and virtualized computers that are dynamically provisioned and presented as one or more unified computing resources based on service-level agreements established through negotiation between the service provider and consumers. Rajkumar Buyya[10]
Architecture of Cloud computingEssential Characteristics[7]
On-demand self-service. A consumer can unilaterally provision computing capabilities such
as server time and network storage as needed automatically, without requiring human interaction with a service provider.
Broad network access. Capabilities are available over the network and accessed through
standard mechanisms that promote use by heterogeneous thin or thick client platforms (e.g., mobile phones, laptops, and PDAs) as well as other traditional or cloudbased software services.
Resource pooling.
The provider’s computing resources are pooled to serve multiple consumers using a multi-tenant model, with different physical and virtual resources dynamically assigned and reassigned according to consumer demand.
Rapid elasticity. Capabilities can be rapidly and elastically provisioned - in some
cases automatically - to quickly scale out; and rapidly released to quickly scale in.
To the consumer, the capabilities available for provisioning often appear to be unlimited and can be purchased in any quantity at any time.
Measured service. Cloud systems automatically control and optimize resource usage
by leveraging a metering capability at some level of abstraction appropriate to the type of service.
Resource usage can be monitored, controlled, and reported - providing transparency for both the provider and consumer of the service.
Cloud Service ModelsSPI Model
Cloud Software as a Service (SaaS) Cloud Platform as a Service (PaaS) Cloud Infrastructure as a Service (IaaS)
Infrastructure as a Service (IaaS) The capability provided to the consumer is to provision processing,
storage, networks, and other fundamental computing resources. Consumer is able to deploy and run arbitrary software, which can
include operating systems and applications. The consumer does not manage or control the underlying cloud
infrastructure but has control over operating systems, storage, deployed applications, and possibly limited control of select networking components (e.g., host firewalls).
Software as a Service (SaaS) The capability provided to the consumer is to use the provider’s
applications running on a cloud infrastructure. The applications are accessible from various client devices through a
thin client interface such as a web browser (e.g., web-based email). The consumer does not manage or control the underlying cloud
infrastructure including network, servers, operating systems, storage, or even individual application capabilities, with the possible exception of limited user specific application configuration settings.
Cloud Deployment Models Public Cloud. Private Cloud. Community Cloud. Hybrid Cloud.
Public Cloud The cloud infrastructure is made available to the general public or a
large industry group and is owned by an organization selling cloud services.
Private Cloud The cloud infrastructure is operated solely for a single organization. It
may be managed by the organization or a third party, and may exist on-premises or off-premises.
Community Cloud The cloud infrastructure is shared by several organizations and
supports a specific community that has shared concerns (e.g., mission, security requirements, policy, or compliance considerations). It may be managed by the organizations or a third party and may exist on-premises or off-premises.
Hybrid Cloud The cloud infrastructure is a composition of two or more clouds
(private, community, or public) that remain unique entities but are bound together by standardized or proprietary technology that enables data and application portability (e.g., cloud bursting for load-balancing between clouds).
Benefits of Cloud Computing
Business Benefits Almost zero upfront infrastructure investment Just-in-time Infrastructure More efficient resource utilization Usage-based costing Reduced time to market
Technical Benefits
Automation – “Scriptable infrastructure” Auto-scaling Proactive Scaling More Efficient Development lifecycle Improved Testability Disaster Recovery and Business Continuity
Opportunities of Cloud Computing End consumers. Business customers. Developers and Independent Software Vendors (ISVs).
Google App Engine
Google App Engine enables you to build web applications on the same scalable systems that power Google applications. App Engine
applications are easy to build, easy to maintain, and easy to scale as your traffic and data storage needs grow.
Cost è ? Pay only for what you actually use. Exceed the free quota of 500 MB of storage and around 5M
pageviews per month. Trial? è
How to Create applications for Cloud computing? build an App Engine application using standard Java web
technologies, such as servlets and JSP. create an App Engine Java project with Eclipse àuse the Google
Plugin for Eclipse for App Engine development. use the App Engine datastore with the Java Data Objects (JDO)
standard interface. upload your app to App Engine.
Grid computing vs Cloud computing
Collective: interactions across collections of resources, directory servicesPlatform: collection of specialized tools, middleware and services on top of the unified resources toprovide a development and/or deployment platform.Unified Resources: resources that have been abstracted/encapsulated
Resource: discovery, negotiation, monitoring, accounting and payment of sharing operations on individual resourcesConnectivity: communication and authentication protocols
Application Grid Computing emerged in eScience to solve scientific problems
requiring HPC. Cloud Computing is rather oriented towards applications that run
permanently and have varying demand for physical resources while running.
the well-known CRM SaaS Salesforce.com.
Cloud Increase computing. Increase store. consumption basis. IBM, Google, Microsoft Hour, storage, view…
Grid Increase computing. Increase store. project-oriented academia or gov. labs number of service units