esd.341j – software systems architecture esd.937 architecting (large-scale) software systems...

30
ESD.341J – Software Systems Architecture ESD.937 Architecting (Large- Scale) Software Systems Lecture 2 Architecture Survey & Evolution David Hartzband, Sc.D. Lecturer, ESD ESD 937 – Large-Scale Software

Upload: shavonne-conley

Post on 25-Dec-2015

217 views

Category:

Documents


2 download

TRANSCRIPT

ESD.341J – Software Systems Architecture

ESD.937Architecting (Large-Scale)

Software SystemsLecture 2

Architecture Survey & Evolution

David Hartzband, Sc.D.Lecturer, ESD

ESD 937 – Large-Scale Software

ESD.341J – Software Systems ArchitectureESD 937 – Large-Scale Software

Hardware

MainframeCentralized processing,Storage, control …

ClustersVax ClusterWolfpack, etc.

Distributed load balance, fail-over, storage …

Client-Server

Thick & thinclients, distb.

processing …

DistributedDBR/*Rdb/star

Dist. query processing & storage …

Component-basedEJB.NET

DistributedObjectCORBA…

Remote ServerFarms

>10K servers

Cloud-based Google Amazon Microsoft…

Software

Organizing Principle:Co-evolution of Hardware& Software Architectures

PC Client

PC Server

Remote objectsDistb. ODB

RPC basedRemote procedures

SOAUDDIWDSL

1963 1984 1989 1992 1998 2001 2008

Locationaltransparency

Less Abstract

More Abstract

DistributedMiddleware

BEA (Oracle)IBMMSTIBCOSUNJBossIonaWebMethodsMany Others

1 PB+ appsX million user apps

SocialNetworks

9/15/09 2

ESD.341J – Software Systems ArchitectureESD.936 – Software Systems Architecture

History? Why is this Important?(or even interesting?)

• Several reasons:– Many current problems have been addressed

previously, knowing how this was done may provide solutions more quickly & effectively

• N.B. I said addressed, not solved, - solutions need to be (re)developed in the current context

– Looking at previous inflection points allows current ones to be recognized – which allows us to know to stand back, look at trends & make decisions about new technology usage

9/15/09 3ESD 937 – Large-Scale Software

ESD.341J – Software Systems ArchitectureESD 937 – Large-Scale Software

Coevolution?

• A process of reciprocal changes involving two or more entities caused by external forces or by the interaction itself• Changes in technology landscape drive both hardware & software evolution, &• Changes in hardware structure & capabilities lead to changes in software structure & capabilites & vice versa• Each leads change at different times

9/15/09 4

ESD.341J – Software Systems ArchitectureESD.936 – Software Systems Architecture

Architecture?

• Software architecture is the high level design of a software system, typically defined in terms of the major functional components (both user & program facing), & the interfaces that connect them (this is not generally product architecture, unless your product is a distributed system)

• Many of my colleagues have taken this term literally (“building software is like building bridges”), but I am not a fan of this approach. There are aspects of software architecture that are engineering-based & aspects that are more akin to pure design (IMHO)

9/15/09 5ESD 937 – Large-Scale Software

ESD.341J – Software Systems ArchitectureESD.936 – Software Systems Architecture

Design Work Model*• Research has been done recently to describe how designers & other creative

people work.* Characteristics of this description include:– Alternation between individual contribution & highly collaborative work

modes– Episodic & iterative, work content/context must be preserved between

episodes– Knowledge based – deep historical, technical & contextual knowledge is

necessary– Underconstrained – incomplete or poorly specified conditions &

constraints– Eclectic – uses a wide range of problem solving techniques– Intuitive – even in highly constrained situations

• Nontextual – must be able to reason about nontextual information• “nonlogical” – inductive, analogic• Must be able to maintain contradictions & still reason (partitioning)

*Rowe, P.G. 1987. Design Thinking. MIT Press.

*Hartzband, D.J. 2001. What will future Knowledge workers be like?Warburg-Pincus Investment Forum. San Francisco, CA. 4/02

9/15/09 6ESD 937 – Large-Scale Software

ESD.341J – Software Systems ArchitectureESD.936 – Software Systems Architecture

Mainframes – the Beginning

• 4/1964 - Mainframes (almost entirely IBM System 360/370…390 running OS360/370…390) had central control of everything: processing, storage, applications etc. http://www.mainframes.com/index.htm, http://publib.boulder.ibm.com/infocenter/zoslnctr/v1r7/index.jsp?topic=/com.ibm.zconcepts.doc/zconc_evolvarch.html

• Mainframes evolved (like everything else) to utilize clustering & virtualization techniques, but the core attributes of mainframe computing are still size & central control

• An example of software in this style was the General Motors CAD system (circa 1991) that was 1.1M lines of PL/1 with only two (2) entry points & no transparency with respect to processing, algorithms or storage management

9/15/09 7ESD 937 – Large-Scale Software

ESD.341J – Software Systems ArchitectureESD.936 – Software Systems Architecture

Minis & Accelerated Coevolution

• 10/1977 – Digital Equipment Corp. VAX hardware-VAX/VMS O/S: 32-bit architecture with a machine language & demand page virtual addressing (virtual memory), many others over the next 10 years

• 1984 – VAXClusters: multiple processing nodes each running a version of VMS making up a single logical node using a 70MBpS serial interconnect, a message oriented communications architecture (SCA), a distributed lock manager to provide shared processing (load balancing & fail-over) & shared storage

• 1984/5 – Rdb/VMS, Rdb/ELN: relational database product immediately used by Boeing for 767 Bill-of-Materials

• 1986 – work on VAXClusters leads to Rdb/Star, distributed Rdb with distributed query optimization (DQO)

• DQO leads to a new stage in coevolution

9/15/09 8ESD 937 – Large-Scale Software

ESD.341J – Software Systems ArchitectureESD.936 – Software Systems Architecture

Influence of DQO

• DQO came in 2 flavors:– Digital (now Oracle, Oracle bought Rdb in 1992): cost-based

optimizer, rewrites (fragments of) SQL statements so that initial processing is done at remote DB node & intermediate results shipped to local (user) node for completion

– IBM (System R*): content-based optimizer, records aggregated at intermediate node where query is completed & results shipped to user, aggregates saved for common queries (need to be updated)

– Later (mid-90s) hybrid approaches were implemented that worked well

– Current implementations have moved away from DQO to federated databases

• The DQO work lead to new thinking on locational & functional transparency

9/15/09 9ESD 937 – Large-Scale Software

ESD.341J – Software Systems ArchitectureESD.936 – Software Systems Architecture

Transparency Concepts

• Locational – users &/or programs can make requests for processing without knowing the location of the data or the processing node

• Functional – users &/or programs can make requests for processing without knowing the algorithm(s) for performing or optimizing the processing

• Much of the past 25+ years work on distributed architectures aimed at providing these forms of transparency

9/15/09 10ESD 937 – Large-Scale Software

ESD.341J – Software Systems ArchitectureESD.936 – Software Systems Architecture

RPC-Based Architectures

• Remote Procedure Calls are inter-process communication mechanisms called by a program that cause a procedure (subroutine) to be executed in a separate address space

• The programmer doesn’t need to know the location or the details of the remote function. In fact, the call is the same regardless of whether it is local or remote.

• There were two main versions: Sun’s ONC RPC & OSF’s DCE/RPC. They varied in most details & were not compatible. This resulted in ‘RPC wars’.

• Most RPCs were written to known end-points because we didn’t have good algorithms for finding remote location (except for look-up tables)

• Microsoft used DCE/RPC as the basis for DCOM

• RPC is tightly linked to Client-Server hardware architectures: clients sent RPCs to known servers, application continues running & receives results from RPC return, initially a synchronous function, eventually became asynchronous & virtually idempotent

9/15/09 11ESD 937 – Large-Scale Software

ESD.341J – Software Systems ArchitectureESD.936 – Software Systems Architecture

RPC Details

• DCE RPC was the flavor of RPC that I helped to design & implement at Digital Equipment

• There were ~110 Remote Procedure Calls in the DCE model that covered all aspects of remote function definition & invocation from name service to interface definition to error handling

• All RPCs used IDL (Interface Definition Language) to generate language specific client & server interface stubs. Not all IDLs were similar or compatible.

• Application programmers made interface calls to the local stub & the RPC mechanism managed execution regardless of where the procedure code was.

9/15/09 12ESD 937 – Large-Scale Software

ESD.341J – Software Systems ArchitectureESD.936 – Software Systems Architecture

RPC Schematic

ServerProcedure

Code

ClientApplication

1 2

3

45

RPC Thread

1 RPC thread started, RPC calls client stub2 RPC thread extends to server endpoint (stub)3 remote code executes4 RPC retracts to client endpoint (stub)5 results or message returned to client application

ServerThread(listener)

ClientServer

SingleAddressSpacedifferentthan Client

ClientApplication

Server supports as manyconcurrent calls as it has threads. Server queues calls when it runs out of threads.

9/15/09 13ESD 937 – Large-Scale Software

ESD.341J – Software Systems ArchitectureESD.936 – Software Systems Architecture

Distributed Objects

• RPC-based systems where remote invocation is of an object method

• CORBA (Common Object Request Broker Architecture) is the primary example– CORBA is a series of standards developed &

approved by the OMG (Object Management Group), a consortium of vendors, users, academic & interested parties

– Reference implementations are required in order for standardss to be approved

9/15/09 14ESD 937 – Large-Scale Software

ESD.341J – Software Systems Architecture

CORBA Worked (but had some problems)

CORBA allows Object Request Brokers to manage remote methodinvocations among heterogeneous (languages, ORB implementations)systems

As Digital Equipment’s OMG representative (& co-author of IIOP), I remember the meeting where the Sun representative told me his system was IIOP-compliant if my ORB (DCE-RPC based) made an invocation call to his ORB (ONC-RPC based) & his ORB returned an error message!

9/15/09 15ESD 937 – Large-Scale Software

ESD.341J – Software Systems Architecture

More CORBA

• In CORBA, the Object Request Broker or ORB takes care of all of the details involved in routing a request from client to object, and routing the response to its destination. • There's more that just that, of course. The ORB is also the custodian of the Interface Repository(abbreviated variously IR or IFR), an OMG-standardized distributed database containing IDL interface definitions. (The IFR is defined in Chapter 10 of the OMG CORBA specification.)• On the client side, then, the ORB offers a number of services: It provides interface definitions from the IFR, and constructs invocations for use with the Dynamic Invocation Interface (DII). It also converts Object References between session and stringified format, and (for CORBA 2.4 and later ORBs) converts URL-format corbaloc and corbaname object references to session references. • On the server side, the ORB has even more to do. Although CORBA allows (in fact, requires) the client to assume that every valid object reference corresponds to a running instance, it is likely that the code of the instance is not running. That's because, in order to conserve server resources, the ORB de-activates inactive objects, and re-activates them whenever a request comes in. CORBA supports a number of activation patterns, so that different object or component types can activate and de-activate in the way that uses resources best. 

9/15/09 16ESD 937 – Large-Scale Software

ESD.341J – Software Systems ArchitectureESD.936 – Software Systems Architecture

The Rest is Detail…

• Middleware architectures are primarily based on middleware products (unless you want to write everything yourself)

• Everything =:– Server load balancing

– Server failover

– Web provisioning (service)

– Management of functional ‘items’ (objects, components, services)

– Location of functional ‘items’

– Interface generation & language transparency

– Data connectivity, retrieval & modification

– User interface generation

– Lots of other stuff

• You do not want to do this! So you (we) have been constrained by middleware products that have architecture associated with them

9/15/09 17ESD 937 – Large-Scale Software

ESD.341J – Software Systems Architecture

Middleware Wars

• This is also about the time that we began partitioning architectures into User, Business Logic & Data Access tiers (see WebSphere diagram)

• This was a good thing in that it made programming models much cleaner & the programs in each tier easier to debug

• Along with this came the development (by major vendors) of component-based systems & the middleware wars were on

• Eventually this shook out to COM+ & then .NET for Microsoft & JxEE/EJB for many other vendors (Sun first, but now mainly IBM & open source vendors)

• I implemented systems in both & found (IMHO) that there was very little difference (until the .NET version of Visual Studio at which point MS had better tools)

• Granularity of function was always a problem (how big should components be?) & this motivated SOA as a simplification (abstraction)

9/15/09 18ESD 937 – Large-Scale Software

ESD.341J – Software Systems Architecture

IBM WebSphere V6 Architecture

9/15/09 19ESD 937 – Large-Scale Software

ESD.341J – Software Systems ArchitectureESD.936 – Software Systems Architecture

2 Slight Divergences…

• UML– Personal disclaimer: I was CTO of a company (Riverton Software) that

produced an application modeling environment (not a language) that generated usable applications in VB & Java (pre-EJB) & competed with early UML

– UML is an application specification language that developed from the work of James Rumbaugh, Grady Booch & Ivar Jacobsen. It was adopted as a standard by the OMG in 1997 & by ISO in 1998

• Patterns– Personal disclaimer: My doctoral dissertation was in model theory. I

have always looked on patterns as incompletely defined models.

– Templates for general reusable solutions to commonly occurring problems in software design

– Based very loosely on the actual architectural work of Christopher Alexander & first described at OOPSLA 1987 by Kent Beck & Ward Cunningham

9/15/09 20ESD 937 – Large-Scale Software

ESD.341J – Software Systems ArchitectureESD.936 – Software Systems Architecture

UML

• In 1995-6 UML V1.1 consisted of 5 diagrams: class, object, use case, activity & sequence

• Today, UML 2.1.2 has 14 diagrams• In my experience (both using UML for development & advising

development groups), most projects still use the original 5 diagrams & rarely use any others

• Also in my experience, UML suffers from some substantial problems:– The notation is extremely large & difficult to learn (so most programmers use only

a very small part)– Very few projects update the model as the code changes, ‘my code is my model’– Most UML tools use proprietary or idiosyncratic interpretations which both locks

users into one tool & makes exchange of models between tools difficult to impossible

• I find UML useful at the beginning of a design project, but not throughout the entire project lifecycle

9/15/09 21ESD 937 – Large-Scale Software

ESD.341J – Software Systems ArchitectureESD.936 – Software Systems Architecture

Design Patterns

• Design patterns have been defined in 4 functional areas: creational, structural, behavioral & concurrent as well as informally in architectural

• There are ~40 current design patterns defined & ~10 architecture patterns. Each type has different scope.

• Patterns are defined in terms of usage (intent & general use), structure (graphic), terminology (definitions of critical terms) & examples (short programs in specific languages)

• Patterns are general solutions & so must be adapted (& reprogrammed) for each specific implementation

– Some critics have pointed out that this makes reuse of code impossible

– Others believe that reuse of design is the ‘real’ goal

• My 3 issues with patterns:– For all the ‘architectural’ trimmings, the code examples are what most people understand &

use

– The code examples are not reusable without rewritting

– Most pattern ‘gurus’ I have worked with are dogmatic & inflexible (always good characteristics). The exception is Martin Fowler.

9/15/09 22ESD 937 – Large-Scale Software

ESD.341J – Software Systems ArchitectureESD.936 – Software Systems Architecture

Service-Oriented Architectures

• An abstraction of previous architectures where the managed items are large grained services (as opposed to objects or components which are smaller grained)

• Services generally linked to business process & business workflow, so architecture matched more closely to business concepts as is granularity of reuse

• Services Protocol Stack– Service Transport Protocol: mainly HTTP/HTTPS but also FTP, SMTP &

BEEP

– XML Messaging Protocol: XML-RPC, SOAP

– Service Description Protocol: WSDL

– Service Discovery Protocol: UDDI

• WSDL & UDDI functions often hardwired in current implementations to provide efficiency & higher performance

9/15/09 23ESD 937 – Large-Scale Software

ESD.341J – Software Systems ArchitectureESD.936 – Software Systems Architecture

The ‘Promise’ of SOA

• Services are:– Loosely coupled– Encapsulated– Autonomous

• Goal is to be able to discover & evaluate independent services & coordinate them to act as an ad hoc application

• I haven’t seen this yet (& don’t expect to for some time)

9/15/09 24ESD 937 – Large-Scale Software

ESD.341J – Software Systems ArchitectureESD.936 – Software Systems Architecture

Another Inflection Point

• In October, 2004, my content management team (EMC/Documentum) met with Merck to discuss their next several FDA submissions

• They told us that their next submissions would each be in the 1PB range of managed content, & they asked us if our product would support that

• New types of applications will produce VERY large amounts of content to be managed, organized, searched, reported on – even just retrieving a single item in 1PB is challenging

• Many of these new applications also have VERY large numbers of users – in the range of millions!

• These new applications provide at least part of the motivation for cloud-based architectures – think Salesforce.com, Google Earth or Google Docs, MS Live Office etc. etc. etc…

9/15/09 25ESD 937 – Large-Scale Software

ESD.341J – Software Systems ArchitectureESD.936 – Software Systems Architecture

Cloud-Based Architectures

• An architecture that is based on the on-demand provision of application & infrastructure services utilizing an existing hardware & storage infrastructure & a web-based programming model

• Infrastructure management & provisioning tasks provided ‘in the cloud’, cloud vendors generally have substantial infrastructure (multiple server farms) & specific software to aid in application development & deployment

• Salesforce.com, Amazon, Google, IBM, MS & others now competing for cloud based business; fees are based on amount of resources actually used; easy to build up or tear down to accommodate usage

• Innovative information management required, example is Google Big Table (http://labs.google.com/papers/bigtable.html ), designed for Petabytes of data across thousands of servers

9/15/09 26ESD 937 – Large-Scale Software

ESD.341J – Software Systems Architecture

Google App Engine Architecture

Web Based Infrastructure35-40 datacenters worldwide

Several 1000, 1-processor, 4GB servers in each

Big Table

Admin Console

Python Runtime

GAE SDK

HTML

Browser

• Python is the first language runtime offered• Infrastructure distributes application as needed across web & application servers & datastore• SDK provides build-run- debug cycle locally with emulated APIs

9/15/09 27ESD 937 – Large-Scale Software

ESD.341J – Software Systems ArchitectureESD 937 – Large-Scale Software9/15/09 28

http://mikeg.typepad.com/perceptions/2008/07/reference-archi.htmlMike Gotta, 2008

• Platform allows users to:• define online profile• list connections• commubicate with connections• participate in group activities• control permissions, privacy &c.

Social Network Application Proposed Architecture

Mike Gotta is a Principal Analyst at the Burton Group specializing in collaboration & social network technology

ESD.341J – Software Systems ArchitectureESD.936 – Software Systems Architecture

Thematic Summation

• Several large themes:– Coevolution: changes in hardware & software are driven by both

external factors & by their respective reciprocal evolution– Distribution: processing & data no longer located or controlled

centrally– Transparency: users (end-users & programs) do not need to

know the location or specific functional content of distributed elements, only the mechanism for addressing them

– Abstraction: users (end-users & programs) can address intermediate elements (APIs, Objects, Service Agents…) that abstract the details of application execution

– Simplification: the goal of the last 25 years of large-scale architecture development (IMHO) is to greatly simplify development, deployment & maintenance of applications

9/15/09 29ESD 937 – Large-Scale Software

ESD.341J – Software Systems Architecture

remember – entropy requiresno maintenance

SS = k (pqln pq)NN

David Hartzband, [email protected]@hartzband.com

9/15/09 30ESD 937 – Large-Scale Software