service discovery at ning

2
8/7/2019 Service Discovery at Ning http://slidepdf.com/reader/full/service-discovery-at-ning 1/2 http://c ode.ning.com Page 1 March 11, 2011 "NEVER HAVE SO MANY PEOPLE WATCHED JUST A SINGLE KEYPRESS«" SERVICE DISCOVERY AT NING Posted by Brian McCallister on March 11, 2011 From the beginning, we have used dynamic, logical service discovery at Ning. This means that each instance of a server is responsible for advertising the availability of the services it provides. LOGICAL SERVICES We advertise logical services rather than physical servers for several reasons, the primary one being to enable moving services to different types of servers. Let·s look at a change we made a while back on how we handle static binary content (images, videos, etc). In the beginning was the Binary Objects service (BOBC). The BOBC relied on a shared understanding of available storage volumes and of the on-disk layouts on those storage volumes with our application servers. While this was fast and easy to get off the ground, it didn·t work so well as things started getting bigger as we couldn·t really change anything about how we stored binary content. The BOBC advertised itself as the service type binary-content. Enter Canoe. We wanted to put up a new service which was solely responsible for how we store binary content. We would have to do some work to change how uploads worked, to post to the Canoe service rather than write directly to storage volumes, but by having Canoe implement the same read interfaces as BOBC, we could leave the read side unchanged. From a service discovery perspective, Canoe instances advertise two services, binary-content and binary-storage. By advertising a logical service, instead of just the server and knowing what services live on a server, we were able to make a gradual transition and didn·t even have to touch most of the systems making use of the binary content services. DYNAMIC DISCOVERY Each server is responsible, itself, for advertising its services to the rest of the system. While sometimes we have to configure the location of a service (usually because a priori knowledge is required at start time ³ services such as ZooKeeper behave this way), we much prefer that everything be found dynamically. Some major benefits of this include the ability to dynamically add and remove capacity without having to rehup anything else, getting an additional means of failure detection (if a server dies, it isn·t heartbeating its announcements), and the ability to tune the visibility of systems for debugging or so forth. The other common way of achieving this is to run all internal requests through a load balancer, but this has unfortunate side effects of (depending on where in the network stack you proxy) hiding information about the client and service from each other, increasing latency (a little), introducing additional points of centralization (and hence wider reaching failure), and finally, bandwidth through a load balancer is simply much more expensive than through a switch. HOW WE IMPLEMENTED IT In the beginning, as it is for so many others, we started with IP multicast. Very quickly we moved off that, initially to an messaging/topic based system which kept the same behavior as the multicast, just on a different transport.

Upload: ning-inc

Post on 08-Apr-2018

220 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Service Discovery at Ning

8/7/2019 Service Discovery at Ning

http://slidepdf.com/reader/full/service-discovery-at-ning 1/2

http://code.ning.com Page 1 March 11, 2011

"NEVER HAVE SO MANY PEOPLE WATCHED JUST A SINGLEKEYPRESS«"

SERVICE DISCOVERY AT NINGPosted by Brian McCallister on March 11, 2011

From the beginning, we have used dynamic, logical service discovery at Ning. This means that eachinstance of a server is responsible for advertising the availability of the services it provides.

LOGICAL SERVICESWe advertise logical services rather than physical servers for several reasons, the primary one being toenable moving services to different types of servers. Let·s look at a change we made a while back on howwe handle static binary content (images, videos, etc).

In the beginning was the Binary Objects service (BOBC). The BOBC relied on a shared understanding ofavailable storage volumes and of the on-disk layouts on those storage volumes with our applicationservers. While this was fast and easy to get off the ground, it didn·t work so well as things started gettingbigger as we couldn·t really change anything about how we stored binary content. The BOBC advertiseditself as the service type binary-content.

Enter Canoe. We wanted to put up a new service which was solely responsible for how we store binarycontent. We would have to do some work to change how uploads worked, to post to the Canoe servicerather than write directly to storage volumes, but by having Canoe implement the same read interfaces asBOBC, we could leave the read side unchanged. From a service discovery perspective, Canoe instancesadvertise two services, binary-content and binary-storage.

By advertising a logical service, instead of just the server and knowing what services live on a server, wewere able to make a gradual transition and didn·t even have to touch most of the systems making use ofthe binary content services.

DYNAMIC DISCOVERYEach server is responsible, itself, for advertising its services to the rest of the system. While sometimes wehave to configure the location of a service (usually because a priori knowledge is required at start time ³services such as ZooKeeper behave this way), we much prefer that everything be found dynamically.

Some major benefits of this include the ability to dynamically add and remove capacity without having torehup anything else, getting an additional means of failure detection (if a server dies, it isn·t heartbeatingits announcements), and the ability to tune the visibility of systems for debugging or so forth.

The other common way of achieving this is to run all internal requests through a load balancer, but thishas unfortunate side effects of (depending on where in the network stack you proxy) hiding informationabout the client and service from each other, increasing latency (a little), introducing additional points ofcentralization (and hence wider reaching failure), and finally, bandwidth through a load balancer is simplymuch more expensive than through a switch.

HOW WE IMPLEMENTED IT

In the beginning, as it is for so many others, we started with IP multicast. Very quickly we moved off that,initially to an messaging/topic based system which kept the same behavior as the multicast, just on adifferent transport.

Page 2: Service Discovery at Ning

8/7/2019 Service Discovery at Ning

http://slidepdf.com/reader/full/service-discovery-at-ning 2/2

http://code.ning.com Page 2 March 11, 2011

Sadly, everyone to everyone communication gets pretty nasty as the number of instances grows, so wevery soon thereafter switched to a very, er, ´interestingµ (I can say that, I implemented it) system basedon writing announcements out to shared NFS mounts. We wrote a file per announcement to each mount,and each server would ls the directory on each mount, read any announcements it hadn·t seen before, andmaintain an in memory representation of the union of the mounts. This scaled remote operations asO(number of servers) as compared to O(number of instances squared) for the multicast style solutions. Ithas a nice side effect that scripts could easily look at a mount and use grep and friends to find services.

NFS has its own quirks, though, and we finally settled on the same model (announce and read to multiplediscovery servers, and union the results) but over HTTP. The main driver on this was, despite the NFSquirks, the same problem we had with the binary content ³ the NFS mounts represented a big chunk ofshared remote state that everyone had to operate on exactly the same way, or bad things ensued. This isvery fragile. By encapsulating it in services, rather than shared state, we became free to change how thatstate is stored, and opened the door for higher level tooling.

All that said, if there had been a stable ZooKeeper available three or four years ago, we probably wouldhave used that!

ABOUT NINGNing is the leading online platform for the world·s organizers, activists and influencers to create their ownsocial network. Design a custom social experience in under 60 seconds giving you the power to mobilize,organize and inspire. Based in Palo Alto, California, Ning makes it easy for brands of all shapes and sizes tobuild custom and powerful social websites. For more information, visit www.ning.com . To see our Ninjaskills, visit http://code.ning.com .