cse 5306 distributed systems - rangerranger.uta.edu/~dliu/courses/ds/2-architecture.pdf · –...
TRANSCRIPT
CSE 5306Distributed Systems
Architectures
1
Architecture
• Software architecture– How software components are organized,– How software components interact
• System architecture– Instantiation and placement of software
components on real machines• Centralized architecture, client-server system• decentralized architecture, peer-to-peer system
• Hybrid architecture2
Software Architecture
• Layered architecture– widely adopted by the networking community
• Object-based architecture– E.g., client-server style (ftp)
• Data-centered architecture– Communicate through a common repository
• Event-based architecture– Communicate through the propagation of events,
e.g., publish/subscribe systems3
Layered Architecture
4
Object-Based Architecture
5
Event-Based Architecture
6
• Decoupled in space (referentially decoupled)– Processes are loosely coupled, need not explicitly refer to each other
• Communication via propagation of events– Mostly publish/subscribe, e.g., clients register in market info.
Shared Data-Space Architecture
7
• Not only decoupled in space but also decoupled in time– Processes need not both be active when communication takes place
• Example of shared data-space architecture– Shared distributed file systems, web-based distributed systems
System Architecture• Centralized architectures
– Application layering (logical software layering)– Multi-tiered architectures (system architecture)
• Decentralized architectures– Structured P2P (peer-to-peer) architectures– Unstructured P2P architectures– Topology management of overlay networks– Superpeers
• Hybrid architectures– Edge-server systems– Collaborative distributed systems
8
Centralized Architecture
9
General interaction between a client and a server.
Client-Server Communication• Connectionless protocol
– It is hard for a sender to detect if the message is successfully received• Retransmission may cause problems
– Usually ok for idempotent operations• Operations can be repeated many times without harm, e.g.,
get a quotes on stock, search on web
• Connection-oriented protocols– Often used for non-idempotent operations
• E.g., buying stock
– Problem: low performance in local-area networks
10
Application Layering• Many client-server system can be divided into three levels
– The user-interface level– The processing level– The data level
• Example: the internet search engine
11
Two-tiered Architectures• The simplest way to place a client-server application is
– A client machine that only implements (part of) the user-interface level
– A server machine implementing the rest, i.e, the processing and data levels
– This is so called the two-tiered architecture
• Thin-client model and fat-client model
12
Thin Client
Fat Client
Three-Tiered Architecture• The server tier in two-tiered architecture becomes more and
more distributed– A single server is no longer adequate for modern information systems
• This leads to three-tiered architecture– Server may acting as a client
13
Decentralized Architecture• Multi-tiered architectures can be considered as vertical
distribution– Placing logically different components on different machines
• An alternative is horizontal distribution (peer-to-peer systems)– A collection of logically equivalent parts– Each part operates on its own share of the complete data
set, balancing the load
• The main question for peer-to-peer system is – How to organize the processes in an overlay network– Two types: structured and unstructured
14
Structured P2P Architectures
• Structured: the overlay network is constructed in a deterministic procedure – Most popular: distributed hash table (DHT)
• Key questions – How to map data item to nodes– How to find the network address of the node
responsible for the needed data item
• Two examples – Chord and Content Addressable Network (CAN)
15
Chord
16
Content Addressable Network (1)
17
•2-dim space [0,1] x [0,1] is divided among 6 nodes
•Each node has an associated region
•Every data item in CAN will be assigned a unique point in space
•That node is responsible for that data element.
Content Addressable Network (2)
18
•To add a new region, split the region
•To remove an existing region, neighbor will take over
Unstructured P2P Architectures• Largely relying on randomized algorithm to construct
the overlay network– Each node has a list of neighbors, which is more or less
constructed in a random way
• One challenge is how to efficiently locate a needed data item– Flood the network?
• Many systems try to construct an overly network that resembles a random graph– Each node maintains a partial view, i.e., a set of live nodes
randomly chosen from the current set of nodes
19
Random Graph• Given N nodes, we build
a random graph by – having an edge
between any two nodes at a probability p
• The random graph theory is to answer– what value should p
have such that it is “almost certainly true” that the graph is fully connected
20
to mount attacks against other sensors of the DSN. For ex-ample, an adversary can launch a “sleep-deprivation attack”[13] that may exhaust the batteries of the sensor nodes withwhom the captured node shares keys by excessive commu-nication. Handling sensor-node capture typically requiresthat tamper-detection technologies [7, 13, 14] be used toshield sensors in such a way that physical sensor manipu-lation would cause the erasure of the sensor’s key ring andthe disabling of the sensor’s operation3. For some sensordesigns, it may be practical to encrypt a node’s key ring ina key-encrypting key whose erasure can be very fast.
Although we assume tamper-detection via sensor-node shield-ing that erases the keys of captured nodes, we note that ourkey-distribution scheme is more robust than those based ona single mission key or on pair-wise private sharing of keyseven in the face of physical attacks against captured un-shielded sensor nodes. In the single mission key scheme, allcommunication links are compromised, whereas in the pair-wise private key sharing, all n-1 links to the captured un-shielded node are compromised. In contrast, in our schemeonly the k ! n keys of a single ring are obtained, whichmeans that the attacker has a probability of approximatelykP to attack successfully any DSN link (viz., simulation re-sults of Section 4). The node’s shared keys with controllerscould also be re-created by the adversary, but this does nota!ect any other sensor nodes.
3. ANALYSIS
3.1 DSN Connectivity with Random GraphsThe limits of the wireless communication ranges of sen-
sor nodes, not just the security considerations, preclude useof DSNs that are fully connected by shared-key links be-tween all sensor nodes. For example, two nodes that are notin wireless communication range cannot take advantage oftheir shared key in a fully connected network. Moreover, it isunnecessary for the shared-key discovery phase to guaranteefull connectivity for a sensor node with all its neighbors inwireless communication range, as long as multi-link paths ofshared keys exist among neighbors that can be used to setupa path key as needed. Further, extra shared-key provisioningis required for incremental network growth and, possibly, forpath-key establishment following revocation and re-keying.
Let p be the probability that a shared key exists betweentwo sensor nodes, n be the number of network nodes, andd = p " (n # 1) be the expected degree of a node (i.e., theaverage number of edges connecting that node with its graphneighbors). To establish DSN shared-key connectivity, weneed to answer the following two questions:
- what value should the expected degree of a node, d,have so that a DSN of n nodes is connected? and,
- given d and the neighborhood connectivity constraintsimposed by wireless communication (e.g., the number
3The problem of detecting and handling sensor-node captureis reminiscent of the somewhat similar concerns regardingthe fast destruction of cryptographic keys by British agentscaptured on enemy territory during World War II. For ex-ample, the Special Operations Executive equipped its agentsin the occupied Europe with cryptographic keys printed onsilk, which could be easily camouflaged in a coat’s linings,cut and burnt. Other examples of the fundamentally dif-ficult problem of detecting and handling agent capture areprovided by Leo Marks’ vivid account [10].
1000 2000 3000 4000 5000 6000 7000 8000 9000 1000010
12
14
16
18
20
22
24
n (number of nodes)
Pr=0.99
Pr=0.999
Pr=0.9999
Pr=0.99999
Pr=0.999999
d(e
xpec
ted
deg
ree
of
node)
Figure 1: Expected degree of node vs. number ofnodes, where Pc = Pr[G(n, p) is connected]
of nodes n! in a neighborhood), what values shouldthe key ring size, k, and pool, P , have for a networkof size n? In particular, if memory capacity of eachsensor limits the key ring size to a given value of k,what should the size of the key pool, P , be?
Random-graph theory helps answer the first question. Arandom graph G(n, p) is a graph of n nodes for which theprobability that a link exists between two nodes is p. Whenp is zero, the graph does not have any edge, whereas whenp is one, the graph is fully connected. The first question ofinterest is what value should p have such that it is “almostcertainly true” that the graph G(n, p) is connected.
Erdos and Renyi [12] showed that, for monotone proper-ties, there exists a value of p such that the property movesfrom “nonexistent” to “certainly true” in a very large ran-dom graph. The function defining p is called the thresholdfunction of a property. Given a desired probability Pc forgraph connectivity, the threshold function p is defined by:
Pc = limn"#
Pr[G(n,p) is connected] = ee!c
where
p =ln(n)
n+
cn
and c is any real constant.
Therefore, given n we can find p and d = p " (n # 1) forwhich the resulting graph is connected with desired proba-bility Pc.
Figure 1 illustrates the plot of the expected degree of anode, d, as a function of the network size, n, for various val-ues of Pc. This figure shows that, to increase the probabilitythat a random graph is connected by one order, the expecteddegree of a node increases only by 2. Moreover, the curves ofthis plot are almost flat when n is large, indicating that thesize of the network has insignificant impact on the expecteddegree of a node required to have a connected graph.
To answer the second question above, we note that thewireless connectivity constraints may limit neighborhoods ton! ! n nodes, which implies that the probability of sharing akey between any two nodes in a neighborhood becomes p! =
d(n"$1) $ p. Hence, we set the probability that two nodes
Partial View Construction
• A framework by Jelasity et al. in 2004• Nodes exchange entries from their partial view
regularly– Each entry is associated with an age tag
• Consists of an active thread and a passive thread– The active thread initiate the communication with a
selected peer for partial view propagation– The passive thread waits for response from another
peer and update its partial view accordingly– A node can be in PUSH mode or PULL mode
21
The Active Thread
22
The Passive Thread
23
Topology Management• Some specific topologies may benefit the applications
in a given P2P system– E.g., only including nearest peers in the partial view may
reduce the latency of data delivery
• Question: – How to constructing a specific topology from a
unstructured P2P systems
• Solution: two-layered approach– Lower layer: unstructured P2P outputs a random graph– Higher layer: carefully exchange and selecting entries to
build a desired topology24
Two-Layered Approach
25
Example of Two-Layer Approach
26
Generating a specific overlay network using a two-layered unstructured peer-to-peer system (Jelasity
and Babaoglu, 2005).
Converge toward more accuracy
Finding Data Items• This is quite challenging in unstructured P2P systems
– Assume a data item is randomly placed
• Solution 1: Flood the network with a search query• Solution 2: A randomized algorithm
– Let us first assume that • Each node knows the IDs of k other randomly selected nodes• The ID of the hosting node is kept at m randomly picked nodes
– The search is done as follows• Contact k direct neighbors for data items• Ask your neighbors to help if none of them knows
– What is the probability of finding the answer directly?
27
Superpeers
• Used to address the following question– How to find data items in unstructured P2P systems
– Flood the network with a search query?
• An alternative is using superpeers– Nodes such as those maintaining an index or acting
as a broker are generally referred to as superpeers– They hold index of info. from its associated peers
(i.e. selected representative of some of the peers)
• Remaining question: how to pick the superpeers
28
An Example of Superpeer Networks
29
A hierarchical organization of nodes into a superpeer network.
Hybrid Architectures
• Many real distributed systems combine architectural features–E.g., the superpeer networks -- combine client-server architecture (centralized) with peer-to-peer architecture (decentralized)
• Two examples of hybrid architectures–Edge-server systems–Collaborative distributed systems
30
Edge-Server Systems• Deployed on the Internet where servers are “at the
edge” of the network (i.e. first entry to network)• Each client connects to the Internet by means of an
edge server
31
Collaborative Distributed Systems• A hybrid distributed model that is based on
mutual collaboration of various systems– Client-server scheme is deployed at the beginning– Fully decentralized scheme is used for collaboration
after joining the system
• Examples of Collaborative Distributed System:– BitTorrent: is a P2P File downloading system. It allows
download of various chunks of a file from other users until the entire file is downloaded
– Globule: A Collaborative content distribution network. It allows replication of web pages by various web servers
32
BitTorrent
33
The principal working of BitTorrent (Pouwelse et al. 2004).
Information needed to download a specific file
Many trackers, one per file, tracker holds which node holds which chunk of the file
Globule• Collaborative content distribution network:
– Similar to edge-server systems– Enhanced web servers from various users that replicates
web pages
• Components– A component that can redirect client requests to other
servers.– A component for analyzing access patterns.– A component for managing the replication of Web pages.
• Has a centralized component for registering the servers and make these servers known to others
34
Benefits of Globule
• Example: – Alice has a web server; Bob has a web server– Alice’s server can have replicated contents of the
Bob’s server and vice versa
• Good if your server goes down• Good if too much traffic that your server can
not handle or gets too slow• Better Geographic diversity
– Allow users to get quick response from the nearest server with the replicated page
35
Architectures v.s. Middleware• A middleware layer between application and the
distributed platforms for distribution transparency• The question is:
– Given the software and system architecture, where the middleware fits in?
• Many middleware follows a specific architecture style– Object-based style, event-based style– Benefits: simpler to design application– Limitations: the solution may not be optimal
• Should be adaptable to application requirements– Separating policies from mechanism
36
Supporting Technology: Interceptors
• An Interceptor is a software that – breaks the usual flow of control and
– allows other (application specific) code to be executed
• It makes middleware more adaptable to– application requirements and changing environment
• Interceptors are good for – providing transparent replication and – improving performance
37
Interceptors
Using interceptors to handle remote-object invocations38
May want to send to many other B’s (i.e.
replicated)
May want to break a large message for
better performance
General Approaches for Adaptability• Separation of concerns:
– Modularizing the system and separate security from functionality
– However, the problem is that a lot of things you cannot easily separate, e.g., security
• Computational reflection– Ability to inspect itself, and if necessary, adapt its behavior– Reflective middleware has yet to proof itself as a powerful
tool to manage the complexity of distribute systems
• Component-based design (stand-alone) – However, components are less independent than one may think– Replacement of one component may have huge impact on others
39
Self-Management in Distributed Sys.• Distributed systems are often required to adapt to
environmental changes by – switching policies for allocation resources
• The algorithms to make the changes are often already in the components– But the challenge is how to make such change without human
intervention
• A strong interplay between software architectures and system architectures– Organize components in a way that monitoring and adjustment
can be done easily– Decide where the processes to be executed to do the adaption
40
The Feedback Control Systems• Allow automatic adaption to changes by means of one or more
feedback control loops– self-managing, self-healing, self-configuration, self-optimization, etc.
41
Logical organization of a feedback control system (the physical organization could be very different)
System Monitoring with Astrolabe• A tool for observing system behavior in a distributed system
– One component of the feedback control system
• Every host runs a Astrolabe process, called an agent – Agents are organized as a hierarchy
42
Replication Strategy in Globule
43
• When enough requests for a page is collected,– Globule does a “what-if analysis” to evaluate the replication
policies and select the best policy
• The evaluation is done using a trace-driven simulation
Replication Strategy in Globule
44
The dependency between prediction accuracy and trace length.
• How many requests (i.e., trace length) are needed for evaluation?
Automatic Component Repair in Jade• Jade: A Java implementation framework that allows
components to be added and removed at runtime• Steps in a simple auto-repair example
– Terminate every binding between a component on a non-faulty node, and a component on the node that just failed.
– Request the node manager to start and add a new node to the domain.
– Configure the new node with exactly the same components as those on the crashed node.
– Re-establish all the bindings (between client & server interfaces) that were previously terminated.
• Done via a repair management server (can be replicated)
45