web_engg

Upload: haftamuhailu

Post on 04-Apr-2018

214 views

Category:

Documents


0 download

TRANSCRIPT

  • 7/30/2019 Web_Engg

    1/14

    Web EngineeringLecture One

    On Web Engineering

    Software Engg vs Web Engg

    Web technologies: hypertext, hypermedia, client/server, etc

    Search engines: searching, indexing, crawlers, etc

    Search Engine Optimization

    Web matrices and quality

    Web engineering

    Systematic, scientific, engineering and management approach

    Develop, deploy and maintain qualitative Web applications

    focuses on sound methodologies, techniques, and tools for developing web apps

    Web engineering focuses on methodologies, techniques or tools for developing web apps.

    Web engineering is defined as ...the use of scientific, engineering, and management principlesand systematic approaches with the aim of successfully developing, deploying and maintaininghigh quality Web-based systems and applications...

    Web development has an important artistic side.

    Web apps Vs traditional software devt/IS/computer application devt?

    Characteristics of Web apps

    Web apps constantly evolve. Unlike conventional software that goes through a planned anddiscrete revision at specific times in its lifecycle, Web applications continuously evolve in

    terms of their requirements and functionality (instability of requirements). Managing thechange and evolution of a Web application is a major technical, organizational andmanagement challenge much more demanding than a traditional software development.

    Web apps are inherently different from software. The content, which may include text,graphics, images, audio, and/or video, is integrated with procedural processing. Also, theway in which the content is presented and organized has implications on the performanceand response time of the system.

    Web applications are meant to be used by a vast, variable user community - a large numberof anonymous users with varying requirements, expectations, and skill sets. Therefore, theuser interface and usability features have to meet the needs of a diverse, anonymous usercommunity to whom we cannot offer training sessions, thus complicating human-Web

    interaction (HWI), user interface, and information presentation. In general, many Web-based systems demand a good look and feel, favoring visual

    creativity and incorporation of multimedia in presentation and interface. In these systems,more emphasis is placed on visual creativity and presentation.

    Technology instability- new tools, technologies, languages, standards to cope with.

    Web apps devt uses cutting-edge, diverse technologies and standards and integratesnumerous varied components, including traditional and non-traditional software, interpretedscripting languages, HTML files, databases, images, and other multimedia components suchas video and audio, and complex user interfaces.

  • 7/30/2019 Web_Engg

    2/14

    Delivery medium is different from traditional software.

    Security and privacy needs of Web-based systems are more demanding than that oftraditional software.

    Web Apps vs Conventional software

    With respect to their development process, technologies, quality factors, and measures

    Web Hypermedia, Web Software, or Web Application?

    Hypermedia extension of hypertext

    The Web is the best known example of a hypermedia system.

    The Web has been used as the delivery platform for three types of applications: Webhypermedia applications, Web software applications, and Web applications

    Web hypermedia application

    a non-conventional application characterized by the authoring of information using nodes

    (chunks of information), links (relations between nodes), anchors, access structures (fornavigation), and delivery over the Web .

    Technologies: HTML, XML, JavaScript, and multimedia.

    Web software application

    A conventional software application that relies on the Web or uses the Web's infrastructurefor execution .

    Typical applications include legacy information systems such as databases, bookingsystems, e-commerce apps, etc

    They employ development technologies (e.g. DCOM, ActiveX, etc.), database systems, anddevelopment solutions (e.g. J2EE).

    Web application An application delivered over the Web that combines characteristics of both Web

    hypermedia and Web software applications.

    Web Development vs. Software Development

    Areas of difference for web devt and maintenance:People involved, intrinsic characteristics ofweb apps, and audience

    Differences between Web and software development divided into 12 areas

    application characteristics

    primary technologies used

    approach to quality delivered

    development process drivers

    availability of the application

    customers (users/stakeholders)

    update rate/maintenance cycles

    people involved in development

    architecture and network

    disciplines involved

  • 7/30/2019 Web_Engg

    3/14

    legal, ethical and social issues

    information structuring and design

    Application Characteristics

    Primary Technologies Used

    Web apps use technologies such as Java solutions (JavaBeans, JSP, etc), HTML, XML,JavaScript, and databases.

    Software devt uses technologies such as OO languages or procedural, databases, generators,CASE tools.

    Approaches to quality delivered

    Web apps are expected to be high quality so that customers return to do repeat business.

    Usability, accessibility, graphic design become very important

    Competition is high over the users on the web

    popularity is important

    Development Process Drivers The dominant development process drivers for Web companies are composed of three quality

    criteria

    Reliability

    Usability

    Security

    With regards to conventional software development, the development process driver is time tomarketand not quality criteria

    Disciplines Involved

    wide range of skills and expertise is required for web apps Distinct disciplines such as software engineering (development methodologies, project

    management, tools), hypermedia engineering (linking, navigation), requirements engineering,usability engineering, information engineering, graphics design, and network management(performance measurement and tuning)

    for conventional software, smaller disciplines such as software engineering, requirementsengineering, and usability engineering are required.

    Information Structuring and Design

    Web applications present structured and unstructured content, which may be distributed overmultiple sites and use different systems (e.g. database systems, file systems, multimedia storage

    devices) the design of a Web application, unlike that of conventional software applications, includes the

    organisation of content into navigational structures by means of hyperlinks

    Suitable navigational structures

  • 7/30/2019 Web_Engg

    4/14

    Technologies for Web Apps

    The choice of appropriate technologies is an important success factor in the development ofWeb applications.

    Markup/Hypertext/hypermedia/client-server/sockets

    Define WHAT of a system: Define the requirements of web apps, identify the architecture,develop a design, etc

    Define HOW: [implementation phase] choice of appropriate technologies

    Separation of content and presentation, is a central requirement to appropriately usetechnologies.

    The specifics of implementation technologies for Web applications versus conventionalsoftware systems stem from the use of Web standards.

    This concerns in particular the implementation within the three views: request (client),response (server), and the rules for the communication between these two (protocol).

    Protocol: HTTP, SMTP, FTP

    Client Technologies: HTML, Plug-ins, Java Applets, ActiveX Controls,

    Server Technologies:

    Markup

    instructions for document formatting. For example, we could write *Hello* to output Hello

    or /Hello/ to outputHello

    This is text inserted in a document to add information as to how characters and contents shouldbe represented in the document.

    SGML HTML/XML

    Hypertext and Hypermedia

    Hypertext is understood as the organization of the interconnection of single information units. Relationships between these units can be expressed by links .

    Hypermedia is commonly seen as a way to extend the hypertext principle to arbitrarymultimedia objects, e.g., images or video.

    Client/Server Communication on the Web

    The client/server paradigm underlying all Web applications forms the backbone between a user(client or user agent) and the actual application (server)

    2-layer architecture

    SMTP, RTSP,

    SMTP Simple Mail Transfer Protocol

    SMTP combined with POP3 and IMAP allows us to send and receive e-mails

    In addition, SMTP is increasingly used as a transport protocol for asynchronous messageexchange based on SOAP

  • 7/30/2019 Web_Engg

    5/14

    RTSP

    Real Time Streaming Protocol

    A standard designed to support the delivery of multimedia data in real-time conditions.

    In contrast to HTTP, RTSP allows the transmission of resources to the client in a timely contextrather than delivering them in their entirety (at once) .

    This transmission form is commonly called streaming

    Streaming allows us to manually shift the audiovisual time window by requesting the streamat a specific time, i.e., it lets us control the playback of continuous media.

    From Wiki The transmission of streaming data itself is not a task of the RTSP protocol

    Most RTSP servers use the Real-time Transport Protocol(RTP) for media stream delivery

    While similar in some ways to HTTP, RTSP defines control sequences useful in controllingmultimedia playback

    HTTP

    HyperText Transfer Protocol

    Text-based stateless protocol controlling how resources, e.g., HTML documents or images, are

    accessed.

    Session Tracking

    Interactive Web Applications must be able to distinguish requests by multiple simultaneoususers and identify related requests coming from the same user

    Session defines a sequence of related HTTP requests between a specific user and server withinin a specific time window

    Since HTTP is a stateless protocol, the Web server cannot automatically allocate incomingrequests to a session

    Two principal methods can be distinguished, to allow a Web server to automatically allocate an

    incoming request to a session: In each of its requests to a server, the client identifies itself with a unique identification. This

    means that all data sent to the server are then allocated to the respective session.

    All data exchanged between a client and a server are included in each request a client sendsto a server, so that the server logic can be developed even though the communication isstateless.

    Session tracking is normally implemented by URL rewriting or cookies.

    Client Technologies

    Helpers and Plug-ins Adobe reader, WinZip

    Java Applets

    ActiveX Controls

    Document Specific Technologies HTML XML XSL/XSLT SVG Scalable Vector Graphics

    - Allows describing two-dimensional graphics in XML- SVG recognizes three types of graphics objects: vector graphics consisting of straight

    http://en.wikipedia.org/wiki/Real-time_Transport_Protocolhttp://en.wikipedia.org/wiki/HTTPhttp://en.wikipedia.org/wiki/HTTPhttp://en.wikipedia.org/wiki/HTTPhttp://en.wikipedia.org/wiki/Real-time_Transport_Protocol
  • 7/30/2019 Web_Engg

    6/14

    lines and curves, images, and text- Supports event-based interaction, e.g., responses to buttons or mouse movements- This format is suitable for all types of interactive and animated vector graphics.- Application examples include the representation of CAD, maps, and routes.

    SMIL - Synchronized Multimedia Integration Language- Used to represent synchronized multimedia presentations .

    Server Side Technologies

    URI handlers to process HTTP requests

    Server Side Includes (SSI)

    CGI

    Server Side Scripting

    Servlets

    JSP

    ASP.NET

    Web Services

    Middleware Technologies

    Application Servers

    Messaging Systems/Brokers

  • 7/30/2019 Web_Engg

    7/14

    Web Application Architectures

    The quality of a Web application is considerably influenced by its underlying architecture.

    Components of a Generic Web Application Architecture

    Components based on the request-response paradigm

    Components

    Client

    browser or user agent

    Firewall

    A piece of software regulating the communication between insecure networks (e.g., theInternet) and secure networks (e.g., corporate LANs).

    This communication is filtered by access rules.

    Proxy

    A proxy is typically used to temporarily store Web pages in a cache

    However, proxies can also assume other functionalities, e.g., adapting the contents for users(customization), or user tracking.

    A proxy is used as an intermediate server to forward client requests for URLs to the (actual)server.

    proxies are used to adapt and format links and contents to users

    Web Server

    A Web server is a piece of software that supports various Web protocols like HTTP, andHTTPS, etc., to process client requests.

    Database Server

    This server normally supplies an organizations production data in structured form, e.g., intables

    Media Server

    This component is primarily used for content streaming of non-structured bulk data (e.g., audioor video)

    Content Management Server

    Similar to a database server, a content management server holds contents to serve anapplication. These contents are normally available in the form of semi-structured data, e.g.,XML documents.

    Application Server

  • 7/30/2019 Web_Engg

    8/14

    An application server holds the functionality required by several applications, e.g., workflow orcustomization.

    Legacy Application

    A legacy application is an older system that should be integrated as an internal or externalcomponent.

    Data Aspect Architectures

    Data can be grouped into either of three architectural categories: (1) structured data of the kindheld in databases; (2) documents of the kind used in document management systems; and (3)multimedia data of the kind held in media servers.

    Architectures for Multimedia Data

    The ability to handle large data volumes plays a decisive role when designing systems that use

    multimedia contents Basically, multimedia data, i.e., audio and video, can be transmitted over standard Internet

    protocols like HTTP or FTP, just like any other data used in Web applications.

    This approach is used by a large number of current Web applications, because it has the majorbenefit that no additional components are needed on the server.

    Its downside, however, is often felt by users in that the media downloads are very slow.

    We can use streaming technologies to minimize these waiting times for multimedia contents toplay out.

    Streaming in this context means that a client can begin playout of the audio and/or video a fewseconds after it begins receiving the file from a server

    This technique avoids having to download the entire file (incurring a potentially long delay)before beginning playout

    Two protocols are generally used for the streaming of multimedia contents. One protocolhandles the transmission of multimedia data on the network level, and the other protocolcontrols the presentation flow (e.g., starting and stopping a video) and the transmission of meta-data.

    RTP [real time protocol] network protocol , RTSP [real time streaming protocol] controlprotocol, MMS [Microsoft media server]

  • 7/30/2019 Web_Engg

    9/14

    Fig 2: Streaming media architecture using point-to-point connections.

  • 7/30/2019 Web_Engg

    10/14

    Search Engines

    Originally, the term search engine referred to some kind of search index, a huge databasecontaining information from individual Web sites.

    Help people find information on the Internet/on other sites.

    Large search-index companies own thousands of computers that use software known as spiders

    or robots (or just plain bots) to grab Web pages and read the information stored in them . These systems dont always grab all the information on each page or all the pages in a Web site,

    but they grab a significant amount of information and use complex algorithms calculationsbased on complicated formulae to index that information

    General Operations of search engines: [Crawling, Indexing, Searching]

    Search/crawl the Internet

    Keep an index of the words they find, and where they find them

    words: occurring in the title, subtitile, metatags, and other relevant positions.

    Allow users to look for words or combinations of words found in that index

    Search/Crawl the Internet

    Search engine employs special software robots, called spiders, to build lists of the words foundon Web sites

    The early Google system had a server dedicated to providing URLs to the spiders. Rather thandepending on an Internet service provider for the domain name server (DNS) that translates aserver's name into an address, Google had its own DNS, in order to keep delays to a minimum.

    When a spider is building its lists, the process is called Web crawling

    How does any spider start its travels over the Web?

    The usual starting points are lists of heavily used servers and very popular pages.

    The spider will begin with a popular site, indexing the words on its pages and followingevery link found within the site.

    The Google spider was built to index every significant word on a page, leaving out the articles"a," "an" and "the." Other spiders take different approaches.

    robot exclusion protocol: when a site's owner doesn't wish a spider to crawl its pages or links

  • 7/30/2019 Web_Engg

    11/14

    Search Directory

    A search directory is a categorized collection of information about Web sites instead ofcontaining information from Web pages.

    The most significant search directories are owned by Yahoo! (dir.yahoo.com) and the OpenDirectory Project (www.dmoz.org).

    Directory companies dont use spiders or bots to download and index pages on the Web sites inthe directory; rather, for each Web site, the directory contains information, such as a title anddescription, submitted by the site owner.

    Directories are human-editable: People check your web site; people index your website etc.

    Google also has a directory but the information comes from somebody else from the OpenDirectory Project.

    Building the Index

    Once the spiders have completed the task of finding information on web pages, the searchengine must store it in a way that makes it useful.

    There are two key components involved in making the gathered data accessible to users:

    the information stored with the data

    the method by which the information is indexed.

    In the simplest case, a search engine could just store the word and the URL where it was found.

    Page rank/Ranking organic and paid search results

    Search engines store more info that simple word/URL combinations.

    An engine might store the number of times that the word appears on a page.

    The engine might assign a weight to each entry, with increasing values assigned to words asthey appear near the top of the document, in sub-headings, in links, in the meta tags or in thetitle of the page.

    Ranking list tries to present the most useful pages at the top.

    A search engine's organic ranking algorithm is one of the trickiest parts of designing asearch engine, so let's start by examining the simplest kind of ranking algorithm.

    Ranking is just another word for sorting, the act of collating results into a certain order.Shopping search engines typically use simple ranking algorithms that the searcher canchoose. When the searcher is looking for a product to buy, the shopping search enginemight start by ordering the results by price (lowest to highest), but the searcher can decideto sort the list by other columns, such as availability (in stock, within one week, and so on),or any other features of the product.

    Term frequency, term placement, link popularity (link analysis)

    Regardless of the precise combination of additional pieces of information stored by a searchengine, the data will be encoded to save storage space.

    http://www.dmoz.org/http://www.dmoz.org/
  • 7/30/2019 Web_Engg

    12/14

    After the information is compacted, it's ready for indexing.

    An index has a single purpose: It allows information to be found as quickly as possible .

    There are quite a few ways for an index to be built, but one of the most effective ways is tobuild a hash table.

    In hashing, a formula is applied to attach a numerical value to each word. The formula isdesigned to evenly distribute the entries across a predetermined number of divisions. This

    numerical distribution is different from the distribution of words across the alphabet, and that isthe key to a hash table's effectiveness.

    The hash table contains the hashed number along with a pointer to the actual data, which can besorted in whichever way allows it to be stored most efficiently.

    The combination of efficient indexing and effective storage makes it possible to get resultsquickly, even when the user creates a complicated search.

    Search and Display Results

    Searching through an index involves a user building a query and submitting it through the search engine.

    Displaying the results is a lot simpler than some other parts of the process

    display can contain organic or paid results.

    Organic results all use the title of the page followed by a snippet - a summary of thetext from that page that contains the search terms.

    Paid results also use similar methods to display the pages

    Search Relationships

    Search engines compete with each other, but they also collaborate

    Many search engines use technology from their competitors to present results.

    Understanding how each engine delivers its results helps you target the most effectivesearch marketing efforts.

  • 7/30/2019 Web_Engg

    13/14

    "Spiders" take a Web page's content and create key search words that enable online users tofind pages they're looking for.

  • 7/30/2019 Web_Engg

    14/14

    Search Engine Optimization

    SEO is the process of improving the visibility of a websiteor a web pageinsearch enginesviathe "natural" or un-paid ("organic" or "algorithmic") search results.

    Search engine marketing through paid listings In general, the earlier (or higher on the page), and more frequently a site appears in the

    search results list, the more visitors it will receive from the search engine.

    The act of altering a web site so that it does well in the organic, crawler based listings ofsearch engines.

    The process of editing a web sites content and code in order to improve visibility within oneor more search engines

    White hat vs Black hat SEO

    SEO techniques are classified by some into two broad categories: techniques that searchengines recommend as part of good design, and those techniques that search engines do notapprove of and attempt to minimize the effect of, referred to as spamdexing.

    White hats are those website designers that play nice and try to follow all of the searchengine guidelines to optimize their site

    A SEO tactic, technique or method is considered white hat if it conforms to the searchengines' guidelines and involves no deception.

    White hat SEO is not just about following guidelines, but is about ensuring that thecontent a search engine indexes and subsequently ranks is the same content a user willsee.

    White hat advice is generally summed up as creating content for users, not for searchengines, and then making that content easily accessible to the spiders, rather thanattempting to game the algorithm.

    Black hats are where website designers use backdoors, cloaking/hiding, and other tricks tooptimize sites. [keyword stuffing, hidden/invisible/unrelated, metatag stuffing, ]

    Black hat SEO attempts to improve rankings in ways that are disapproved of by the searchengines, or involve deception.

    One black hat technique uses text that is hidden, either as text colored similar to thebackground, in an invisible div, or positioned off screen.

    Search engines may penalize sites they discover using black hat methods, either byreducing their rankings or eliminating their listings from their databases altogether

    http://en.wikipedia.org/wiki/Websitehttp://en.wikipedia.org/wiki/Web_pagehttp://en.wikipedia.org/wiki/Search_enginehttp://en.wikipedia.org/wiki/Search_enginehttp://en.wikipedia.org/wiki/Organic_searchhttp://en.wikipedia.org/wiki/Search_engine_results_pagehttp://en.wikipedia.org/wiki/Spamdexinghttp://en.wikipedia.org/wiki/Spamdexinghttp://en.wikipedia.org/wiki/Span_and_divhttp://en.wikipedia.org/wiki/Websitehttp://en.wikipedia.org/wiki/Web_pagehttp://en.wikipedia.org/wiki/Search_enginehttp://en.wikipedia.org/wiki/Organic_searchhttp://en.wikipedia.org/wiki/Search_engine_results_pagehttp://en.wikipedia.org/wiki/Spamdexinghttp://en.wikipedia.org/wiki/Spamdexinghttp://en.wikipedia.org/wiki/Span_and_div