report distributed computing

Upload: akanksha-bakshi

Post on 06-Apr-2018

219 views

Category:

Documents


0 download

TRANSCRIPT

  • 8/3/2019 Report Distributed Computing

    1/24

    DISTRIBUTED

    COMPUTING

    V/sGRID COMPUTING

    Management of Software Projects

    SUBMITTED BY:

    MRIDUL SHARMA F-

    AALOK AWASTHI F-

    RAJEEV CHATURVEDI F-

    AVINASH MISHRA F- 44

  • 8/3/2019 Report Distributed Computing

    2/24

    ABSTRACT

    There are two similar trends moving in tandem- distributed computing

    and grid computing. Depending on how you look at the market, the two either overlap, or

    distributed computing is a subset of grid computing. Grid Computing got its name

    because it strives for an ideal scenario in which the CPU cycles and storage of millions

    of systems across a worldwide network function as aflexible, readily accessible poolthat

    could be harnessed by anyone who needs it.

    Sun defines a computational grid as "a hardware and software infrastructure that

    provides dependable, consistent, pervasive, and inexpensive access to computational

    capabilities." Grid computing can encompass desktop PCs, but more often than not its

    focus is on more powerful workstations, servers, and even mainframes and

    supercomputers working on problems involving huge datasets that can run for days. And

    grid computing leans more to dedicated systems, than systems primarily used for other

    tasks.

    Large-scale distributed computing of the variety we are covering usually refers to

    a similar concept, but is more geared to poolingthe resources of hundreds or thousands

    of networked end-user PCs, which individually are more limited in their memory and

    processing power, and whose primary purpose is not distributed computing, but rather

    serving their user.

    When going for a new technology its a must for us to asses competing

    technologies and compare it with the new trends. In this paper we discuss How

    Distributed Computing differs from Grid Computing, the Application Characteristics

    that make distributed computing different from grid computing. We also discuss the

    areas where these computational technologies can be applied. Security and the standards

    behind these technologies have also been discussed.

  • 8/3/2019 Report Distributed Computing

    3/24

    DISTRIBUTED COMPUTING

    Introduction

    The numbers of real applications are still somewhat limited, and the challenges,

    particularlystandardization, are still significant. But there's a new energy in the market,

    as well as some actual paying customers. Increasing desktop CPU power and

    communications bandwidth has also helped to make distributed computing a more

    practical idea. Various vendors have developed numerous initiatives and architectures to

    permit distributed processing of data and objects across a network of connected systems.

    The area of distributed computing has received a lot of attention, and it will be a primary

    focus of this paper--an environment where you can harness idle CPU cycles and storage

    space of hundreds or thousands of networked systems to work together on a processing-

    intensive problem. The growth of such processing models has been limited, however,

    due to a lack of compelling applications and by bandwidth bottlenecks, combined with

    significantsecurity, management, andstandardization challenges.

    How It Works

    A distributed computing architecture consists of very lightweight software agents

    installed on a number of client systems, and one or more dedicated distributed

    computing management servers. There may also be requesting clients with software that

    allows them to submit jobs along with lists of their required resources.

    An agent running on a processing client detects when the system is idle, notifies

    the management server that the system is available for processing, and usually requests

    an application package. The client then receives an application package from the server

    and runs the software when it has spare CPU cycles, and sends the results back to theserver. If the user of the client system needs to run his own applications at any time,

    control is immediately returned, and processing of the distributed application package

    ends.

  • 8/3/2019 Report Distributed Computing

    4/24

    A Typical Distributed System

    Distributed Computing Management Server

    The servers have several roles. They take distributed computing requests and

    divide their large processing tasks into smaller tasks that can run on individual desktop

    systems (though sometimes this is done by a requesting system). They send application

    packages and some client management software to the idle client machines that request

    them. They monitor the status of the jobs being run by the clients. After the client

    machines run those packages, they assemble the results sent back by the client and

    structure them for presentation, usually with the help of a database.

    The server is also likely to manage any security, policy, or other management

    functions as necessary, including handling dialup users whose connections and IP

    addresses are inconsistent. Obviously the complexity of a distributed computing

    architecture increases with the size and type of environment. A larger environment thatincludes multiple departments, partners, or participants across the Web requires complex

    resource identification, policy management, authentication, encryption, and secure

    sandboxing functionality.

  • 8/3/2019 Report Distributed Computing

    5/24

    Administrators or others with rights can define which jobs and users get access to

    which systems, and who gets priority in various situations based on rank, deadlines, and

    the perceived importance of each project. Obviously, robust authentication, encryption,

    and sandboxingare necessary to prevent unauthorized access to systems and data within

    distributed systems that are meant to be inaccessible.

    If you take the ideal of a distributed worldwide grid to the extreme, it requires standards

    and protocols for dynamic discovery and interaction of resources in diverse network

    environments and among different distributed computing architectures. Most distributed

    computing solutions also include toolkits, libraries, and API's for porting third party

    applications to work with their platform, or for creating distributed computingapplications from scratch.

  • 8/3/2019 Report Distributed Computing

    6/24

    Comparisons and Other Trends

    Distributed vs. Other Trends

  • 8/3/2019 Report Distributed Computing

    7/24

    Application CharacteristicsObviously not all applications are suitable for distributed computing. The closer

    an application gets to running in real time, the less appropriate it is. Even processing

    tasks that normally take an hour are two may not derive much benefit if the

    communications among distributed systems and the constantly changing availability of

    processing clients becomes a bottleneck. Instead you should think in terms of tasks that

    take hours, days, weeks, and months. Generally the most appropriate applications consist

    of "loosely coupled, non-sequential tasks in batch processes with a high compute-to-data

    ratio." The high compute to data ratio goes hand-in-hand with a high compute-to-

    communications ratio, as you don't want to bog down the network by sending large

    amounts of data to each client, though in some cases you can do so during off hours.

    Programs with large databases that can be easily parsed for distribution are very

    appropriate.

    Clearly, any application with individual tasks that need access to huge data sets

    will be more appropriate for larger systems than individual PCs. If terabytes of data are

    involved, a supercomputer makes sense as communications can take place across the

    system's very high speed backplane without bogging down the network. Server and other

    dedicated system clusters will be more appropriate for other slightly less data intensive

    applications. For a distributed application using numerous PCs, the required data should

    fit very comfortably in the PC's memory, with lots of room to spare.

    Taking this further, it is recommended that the application should have the

    capability to fully exploit "coarse-grained parallelism," meaning it should be possible to

    partition the application into independent tasks or processes that can be computed

    concurrently. For most solutions there should not be any need for communication

    between the tasks except at task boundaries, though Data Synapse allows some

    interprocess communications. The tasks and small blocks of data should be such that

    they can be processed effectively on a modern PC and report results that, when combined

  • 8/3/2019 Report Distributed Computing

    8/24

    with other PC's results, produce coherent output. And the individual tasks should be

    small enough to produce a result on these systems within a few hours to a few days

    Distributed Computing Applications

    The following scenarios are examples of types of application tasks that can be set up

    to take advantage of distributed computing.

    A query search against a huge database that can be split across lots of desktops,

    with the submitted query running concurrently against each fragment on each

    desktop.

    Complex modelingand simulation techniques that increase the accuracy of results

    by increasing the number of random trials would also be appropriate, as trials

    could be run concurrently on many desktops, and combined to achieve greater

    statistical significance (this is a common method used in various types of

    financial risk analysis).

    Exhaustive search techniques that require searching through a huge number of

    results to find solutions to a problem also make sense. Drug screening is a prime

    example.

    Complex financial modeling, weather forecasting, and geophysical exploration

    are on the radar screens of the vendors, as well as car crash and other complex

    simulations.

    Many of today's vendors are aiming squarely at the life sciences market, which

    has a sudden need for massive computing power. Pharmaceutical firms have

    repositories of millions of different molecules and compounds, some of which

    may have characteristics that make them appropriate for inhibiting newly found

    proteins. The process of matching all these ligands to their appropriate targets is

    an ideal task for distributed computing, and the quicker it's done, the quicker and

    greater the benefits will be.

  • 8/3/2019 Report Distributed Computing

    9/24

    Security and Standards Challenges

    The major challenges come with increasing scale. As soon as you move outside

    of a corporate firewall, security and standardization challenges become quite significant.

    Most of today's vendors currently specialize in applications that stop at the corporate

    firewall, though Avaki, in particular, is staking out the global grid territory. Beyond

    spanning firewalls with a single platform, lies the challenge of spanning multiple

    firewalls and platforms, which means standards.

  • 8/3/2019 Report Distributed Computing

    10/24

    Most of the current platforms offer high level encryption such as Triple DES. The

    application packages that are sent to PCs are digitally signed, to make sure a rogue

    application does not infiltrate a system. Avaki comes with its own PKI (public key

    infrastructure). Identical application packages are typically sent to multiple PCs and the

    results of each are compared. Any set of results that differs from the rest becomes

    security suspect. Even with encryption, data can still be snooped when the process is

    running in the client's memory, so most platforms create application data chunks that are

    so small, that it is unlikelysnoopingthem will provide useful information. Avaki claims

    that it integrates easily with different existing security infrastructures and can facilitate

    the communications among them, but this is obviously a challenge for global distributed

    computing.

    Working out standards for communications among platforms is part of the

    typical chaos that occurs early in any relatively new technology. In the generalized peer-

    to-peer realm lies the Peer-to-Peer Working Group, started by Intel, which is looking to

    devise standards for communications among many different types of peer-to-peer

    platforms, including those that are used for edge services and collaboration.

    The Global Grid Forum is a collection of about 200 companies looking to devise

    grid computing standards. Then you have vendor-specific efforts such as Sun's Open

    Source JXTA platform, which provides a collection of protocols and services that allows

    peers to advertise themselves to and communicate with each other securely. JXTA has a

    lot in common with JINI, but is not Java specific (thought the first version is Java based).

    Intel recently released its own peer-to-peer middleware, the Intel Peer-to-Peer

    Accelerator Kit for Microsoft .Net, also designed for discovery, and based on the

    Microsoft.Net platform.

    http://www.peer-to-peerwg.org/http://www.gridforum.org/http://www.gridforum.org/http://www.peer-to-peerwg.org/
  • 8/3/2019 Report Distributed Computing

    11/24

    GRID COMPUTING

    Introduction

    Grid computing, emerging as a new paradigm for next-generation computing, enables the

    sharing, selection, and aggregation of geographically distributed heterogeneous resources

    for solving large-scale problems in science, engineering, and commerce. The resources

    in the Grid are heterogeneous and geographically distributed. Availability, usage and

    cost policies vary depending on the particular user, time, priorities and goals. It enables

    the regulation of supply and demand for resources.

    It provides an incentive for resource owners to participate in the Grid; and

    motivates the users to trade-off between deadline, budget, and the required level of

    quality of service. The thesis demonstrates the capability of economic-based systems for

    wide-area parallel and distributed computing by developing users quality-of-service

    requirements-based scheduling strategies, algorithms, and systems. It demonstrates their

    effectiveness by performing scheduling experiments on the World-Wide Grid for solving

    parameter sweeptask and data parallelapplications.

    The Grid unites servers and storage into a single system that acts as a single

    computer - all your applications tap into all your computing power. Hardware resources

    are fully utilized and spikes in demand are met with ease. This Web site sponsored by

    Oracle brings you the resources you need to evaluate your organization's adoption of grid

    technologies. The Grid is ready when you are.

    The Grid

    The Grid is the computing and data management infrastructure that will provide

    the electronic underpinning for a global society in business, government, research,

    science and entertainment, integrate networking, communication, computation and

    information to provide a virtual platform for computation and data management in the

    same way that the Internet integrates resources to form a virtual platform for

  • 8/3/2019 Report Distributed Computing

    12/24

    information. The Grid is the computing and data management infrastructure that will

    provide the electronic. Grid infrastructure will provide us with the ability to dynamically

    link together resources as an ensemble to support the execution of large-scale, resource-

    intensive, and distributed applications.

    Grid is a type of parallel and distributed system that enables the sharing,

    selection, and aggregation of geographically distributed "autonomous" resources

    dynamically at runtime depending on their availability, capability, performance, cost,

    and users' quality-of-service requirements.

    Beginning of the Grid

    Parallel computing in the 1980s focused researchers efforts on the development

    of algorithms, programs and architectures that supported simultaneity. During the 1980s

    and 1990s, software for parallel computers focused on providing powerful mechanisms

    for managing communication between processors, and development and execution

    environments for parallel machines. Successful application paradigms were developed to

    leverage the immense potential of shared and distributed memory architectures. Initially

    it was thought that the Grid would be most useful in extending parallel computing

    paradigms from tightly coupled clusters to geographically distributed systems. However,

    in practice, the Grid has been utilized more as a platform for the integration of loosely

    coupled applications some components of which might be running in parallel on a low-

    latency parallel machine and for linking disparate resources (storage, computation,

    visualization, instruments). Coordination and distribution two fundamental concepts in

    Grid Computing.

    The first modern Grid is generally considered to be the information wide-area

    year (IWAY). Developing infrastructure and applications for the I-WAY provided a

    seminar and powerful experience for the first generation of modern Grid researchers and

    projects. This was important, as the development of Grid research requires a very

    different focus than distributed computing research. Grid research focuses on addressing

  • 8/3/2019 Report Distributed Computing

    13/24

    the problems of integration and management of software. I-WAY opened the door for

    considerable activity in the development of Grid software.

    Characteristics

    An enterprise-computing grid is characterized by three primary features -

    Diversity;

    Decentralization

    Dynamism

    Diversity:

    A typical computing grid consists of many hundreds of managed resources of

    various kinds including servers, storage, Database Servers, Application Servers,

    Enterprise Applications, and system services like Directory Services, Security and

    Identity Management Services, and others. Managing these resources and their life cycle

    is a complex challenge.

    Decentralization:

    Traditional distributed systems have typically been managed from a central

    administration point. A computing grid further compounds these challenges since the

    resources can be even more decentralized and may be geographically distributed across

    many different data centers within an enterprise.

    Dynamism:

    Components of a traditional application typically run in a static environment

    without the needing to address rapidly changing demands. In a computing grid, however,

    the systems and applications need to be able to flexibly adapt to changing demand. For

    instance, with the late binding nature and cross-platform properties of web services, an

    application deployed on the grid may consist of a constantly changing set of components.

    At different points in time, these components can be hosted on different nodes in the

  • 8/3/2019 Report Distributed Computing

    14/24

    network. Managing an application in such a dynamic environment can be a challenging

    undertaking.

    A Community of Grid Model

    Over the last decade, the Grid community has begun to converge on a layered

    model that allows development of the complex system of services and software required

    to integrate Grid resources. The Community Grid Model (a layered abstraction of thegrid)being developed in a loosely coordinated manner throughout academia and the

    commercial sector.

    The bottom horizontal layer of the Community Grid Model consists of the

    hardware resources that underlie the Grid. Such resources include computers, networks,

    data archives, instruments, visualization devices and so on. Moreover, the resource pool

    represented by this layer is highly dynamic, both as a result of new resources being

    added to the mix and old resources being retired, and as a result of varying observable

    performance of the resources in the shared, multi-user environment of the Grid.

    Figure 1: Layered architecture of the Community Grid Model.

    The next horizontal layer (common infrastructure) consists of the software

    services and systems, which virtualized the Grid. The key concept at the common

    infrastructure layer is community agreement on software, which will represent the Grid

  • 8/3/2019 Report Distributed Computing

    15/24

    as a unified virtual platform and provide the target for more focused software and

    applications.

    The next horizontal layer (user and application-focused Grid middleware, tools

    and services) contains software packages built atop the common infrastructure. This

    software serves to enable applications to more productively use Grid resources by

    masking some of the complexity involved in system activities such as authentication, file

    transfer.

    Types Of Grid

    Grid computing can be used in a variety of ways to address various kinds of application

    requirements. Often, grids

    are categorized by the type of solutions that they best address. The three primary types of

    grids are

    Computational grid

    A computational grid is focused on setting aside resources specifically for computing

    power. In this type of grid, most of the mchines are high-performance servers.

    Scavenging grid

    A scavenging grid is most commonly used with large numbers of desktop machines.

    Machines are scavenged for available CPU cycles and other resources. Owners of the

    desktop machines are usually given control over when their resources are available to

    participate in the grid.

    Data grid

    A data grid is responsible for housing and providing access to data across multiple

    organizations. Users are not concerned with where this data is located as long as they

    have access to the data. For example, you may have two universities doing life science

  • 8/3/2019 Report Distributed Computing

    16/24

    research, each with unique data. A data grid would allow them to share their data,

    manage the data, and manage security issues such as who has access to what data.

    Another common distributed computing model that is often associated with or

    confused with Grid computing is peer-to-peer computing. In fact, some consider this is

    another form of Grid computing.

    Grid Tools

    Infrastructure components include file systems, schedulers and resource

    managers, messaging systems, security applications, certificate authorities, and filetransfer mechanisms like Grid FTP.

    Directory services. Systems on a grid must be capable of discovering what

    services are available to them. In short, Grid systems must be able to define (and

    monitor) a grids topology in order to share and collaborate. Many Grid directory

    services implementations are based on past successful models, such as LDAP,

    DNS, network management protocols, and indexing services.

    Schedulers and load balancers. One of the main benefits of a grid is maximizing

    efficiency. Schedulers and load balancers provide this function and more.

    Schedulers ensure that jobs are completed in some order (priority, deadline,

    urgency, for instance) and load balancers distribute tasks and data management

    across systems to decrease the chance of bottlenecks.

    Developer tools. Every arena of computing endeavor requires tools that allow

    developers to succeed. Tools for grid developers focus on different niches (file

    transfer, communications, environment control), and range from utilities to full-

    blown APIs.

    Security. Security in a grid environment can mean authentication and

    authorization -- in other words, controlling who/what can access a grids

    resources -- but it can mean a lot more. For instance, message integrity and

    message confidentiality are crucial to financial and healthcare environments

  • 8/3/2019 Report Distributed Computing

    17/24

    Grid Components

    Depending on the grid design and its expected use, some of these components may or

    may not be required, and in some cases they may be combined to form a hybrid

    component.

    Portal/user interface

    Just as a consumer sees the power grid as a receptacle in the wall, a grid user

    should not see all of the complexities of the computing grid.

    Although the user interface can come in many forms and be application-specific. A grid

    portal provides the interface for a user to launch applications that will use the resources

    and services provided by the grid. From this perspective, the user sees the grid as a

    virtual computing resource just as the consumer of power sees the receptacle as an.

    interface to a virtual generator.

  • 8/3/2019 Report Distributed Computing

    18/24

    Figure 2: Possible user view of a grid

    Security

    A major requirement for Grid computing is security. At the base of any grid

    environment, there must be mechanisms to provide security, including authentication,

    authorization, data encryption, and so on. The Grid Security Infrastructure (GSI)

    component of the Globus Toolkit provides robust security mechanisms. The GSI

    includes an OpenSSL implementation. It also provides a single sign-on mechanism, so

    that once a user is authen

    authenticated, a proxy certificate is created and used when performing actions within the

    grid. When designing your grid environment, you may use the GSI sign-in to grant

    access to the portal, or you may have your own security for the portal. The portal will

    then be responsible for signing in to the grid, either using the user's credentials or using a

    generic set of credentials for all authorized users of the portal.

  • 8/3/2019 Report Distributed Computing

    19/24

  • 8/3/2019 Report Distributed Computing

    20/24

    Once the resources have been identified, the next logical step is to schedule the

    individual jobs to run on them. If sets of stand-alone jobs are to be executed with no

    interdependencies, then a specialized scheduler may not be required. However, if you

    want to reserve a specific resource or ensure that different jobs within the application run

    concurrently (for instance, if they require inter-process communication), then a job

    scheduler should be used to coordinate the execution of the jobs. The Globus Toolkit

    does not in include such a scheduler, but there are several schedulers available that have

    been tested with and can be used in a Globus grid environment. It should also be noted

    that there could be different levels of schedulers within a grid environment. For instance,

    a cluster could be represented as a single resource. The cluster may have its own

    scheduler to help manage the nodes it contains. A higher-level scheduler (sometimes

    called a meta scheduler) might be used to schedule work to be done on a cluster, while

    the cluster's scheduler would handle the actual scheduling of work on the cluster's

    individual nodes.

    Figure 5: Scheduler

    Data management

    If any data -- including application modules -- must be moved or made accessible to

    the nodes where an application's jobs will execute, then there needs to be a secure and

    reliable method for moving files and data to various nodes within the grid. The Globus

  • 8/3/2019 Report Distributed Computing

    21/24

    Toolkit contains a data management component that provides such services. This

    component, know as Grid Access to Secondary Storage (GASS), includes facilities such

    as GridFTP. GridFTP is built on top of the standard FTP protocol, but adds additional

    functions and utilizes the GSI for user authentication and authorization. Therefore, once

    a user has an authenticated proxy certificate, he can use the GridFTP facility to move

    files without having to go through a login process to every node involved. This facility

    provides third-party file transfer so that one node can initiate a file transfer between two

    other nodes.

    Figure 6: Data management

    Job and resource management

    With all the other facilities we have just discussed in place, we now get to the

    core set of services that help perform actual work in a grid environment. The Grid

    Resource Allocation Manager (GRAM) provides the services to actually launch a job on

    a particular resource, check its status, and retrieve its results when it is complete.

  • 8/3/2019 Report Distributed Computing

    22/24

    Figure 7: Gram

    Job flow in a grid environment

    Enabling an application for a grid environment, it is important to keep in mind

    these components and how they relate and interact with one another. Depending on your

    grid implementation and application requirements, there are many ways in which these

    pieces can be put together to create a solution.

    Advantages

    Grid computing is about getting computers to work together. Almost every organization

    is sitting on top of enormous, unused computing capacity, widely distributed.

    Mainframes are idle 40% of the time With Grid computing, businesses can optimize

    computing and data resources, pool them for large capacity workloads, share them acrossnetworks, and enable collaboration. Many consider Grid computing the next logical step

    in the evolution of the Internet, and maturing standards and a drop in the cost of

    bandwidth are fueling the momentum we're experiencing today.

    Challenges of the Grid

  • 8/3/2019 Report Distributed Computing

    23/24

    A word of caution should be given to the overly enthusiastic. The grid is not a silver

    bullet that can take any application and run it a 1000 times faster without the need for

    buying any more machines or software. Not every application is suitable or enabled for

    running on a grid. Some kinds of applications simply cannot be parallelized. For others,

    it can take a large amount of work to modify them to achieve faster throughput. The

    configuration of a grid can greatly affect the performance, reliability, and security of an

    organization's computing infrastructure. For all of these reasons, it is important for us to

    understand how far the grid has evolved today and which features are coming tomorrow

    or in the distant future

    Conclusion

    Grid computing introduces a new concept to IT infrastructures because it

    supports distributed computing over a network of heterogeneous resources and is enabled

    by open standards. The advantages of distributed type of architecture for the right kind of

    applications are impressive. The most obvious is the ability to provide access to

    supercomputer level processing power or better for a fraction of the cost of a typical

    supercomputer.

    Grid computing works to optimize underutilized resources, decrease capital

    expenditures, and reduce the total cost of ownership, however, the specific promise of

    distributed computing lies mostly in harnessing the system resources that lies within the

    firewall. It will take years before the systems on the Net will be sharing computeresources as effortlessly as they can share information.

    Scalability is also a great advantage of distributed computing. Though they provide

    massive processing power, super computers are typically not very scalable once they're

    installed. A distributed computing installation is infinitely scalable--simply add more

    systems to the environment. In a corporate distributed computing setting, systems might

  • 8/3/2019 Report Distributed Computing

    24/24

    be added within or beyond the corporate firewall. On the other hand, Grid computing is

    about getting computers to work together. Almost every organization is sitting on top of

    enormous, unused computing capacity, widely distributed. Mainframes are idle 40% of

    the time With Grid computing, businesses can optimize computing and data resources,

    pool them for large capacity workloads, share them across networks, and enable

    collaboration. Many consider Grid computing the next logical step in the evolution of the

    Internet, and maturing standards and a drop in the cost of bandwidth are fueling the

    momentum we're experiencing today.

    Bibliography

    www.google.com

    http://www.extremetech.com/

    Distributed Systems Architecture and Implementation B.W. Lampsen,

    M.Paul

    Distributed Systems, Methods and tools for Specification An Advanced

    Course, Springer-Verlag.

    Operating Systems - Tennanbaman

    http://www.google.com/http://www.extremetech.com/http://www.google.com/http://www.extremetech.com/