web_engg

7/30/2019 Web_Engg

1/14

Web EngineeringLecture One

On Web Engineering

Software Engg vs Web Engg

Web technologies: hypertext, hypermedia, client/server, etc

Search engines: searching, indexing, crawlers, etc

Search Engine Optimization

Web matrices and quality

Web engineering

Systematic, scientific, engineering and management approach

Develop, deploy and maintain qualitative Web applications

focuses on sound methodologies, techniques, and tools for developing web apps

Web engineering focuses on methodologies, techniques or tools for developing web apps.

Web engineering is defined as ...the use of scientific, engineering, and management principlesand systematic approaches with the aim of successfully developing, deploying and maintaininghigh quality Web-based systems and applications...

Web development has an important artistic side.

Web apps Vs traditional software devt/IS/computer application devt?

Characteristics of Web apps

Web apps constantly evolve. Unlike conventional software that goes through a planned anddiscrete revision at specific times in its lifecycle, Web applications continuously evolve in

terms of their requirements and functionality (instability of requirements). Managing thechange and evolution of a Web application is a major technical, organizational andmanagement challenge much more demanding than a traditional software development.

Web apps are inherently different from software. The content, which may include text,graphics, images, audio, and/or video, is integrated with procedural processing. Also, theway in which the content is presented and organized has implications on the performanceand response time of the system.

Web applications are meant to be used by a vast, variable user community - a large numberof anonymous users with varying requirements, expectations, and skill sets. Therefore, theuser interface and usability features have to meet the needs of a diverse, anonymous usercommunity to whom we cannot offer training sessions, thus complicating human-Web

interaction (HWI), user interface, and information presentation. In general, many Web-based systems demand a good look and feel, favoring visual

creativity and incorporation of multimedia in presentation and interface. In these systems,more emphasis is placed on visual creativity and presentation.

Technology instability- new tools, technologies, languages, standards to cope with.

Web apps devt uses cutting-edge, diverse technologies and standards and integratesnumerous varied components, including traditional and non-traditional software, interpretedscripting languages, HTML files, databases, images, and other multimedia components suchas video and audio, and complex user interfaces.

7/30/2019 Web_Engg

2/14

Delivery medium is different from traditional software.

Security and privacy needs of Web-based systems are more demanding than that oftraditional software.

Web Apps vs Conventional software

With respect to their development process, technologies, quality factors, and measures

Web Hypermedia, Web Software, or Web Application?

Hypermedia extension of hypertext

The Web is the best known example of a hypermedia system.

The Web has been used as the delivery platform for three types of applications: Webhypermedia applications, Web software applications, and Web applications

Web hypermedia application

a non-conventional application characterized by the authoring of information using nodes

(chunks of information), links (relations between nodes), anchors, access structures (fornavigation), and delivery over the Web .

Technologies: HTML, XML, JavaScript, and multimedia.

Web software application

A conventional software application that relies on the Web or uses the Web's infrastructurefor execution .

Typical applications include legacy information systems such as databases, bookingsystems, e-commerce apps, etc

They employ development technologies (e.g. DCOM, ActiveX, etc.), database systems, anddevelopment solutions (e.g. J2EE).

Web application An application delivered over the Web that combines characteristics of both Web

hypermedia and Web software applications.

Web Development vs. Software Development

Areas of difference for web devt and maintenance:People involved, intrinsic characteristics ofweb apps, and audience

Differences between Web and software development divided into 12 areas

application characteristics

primary technologies used

approach to quality delivered

development process drivers

availability of the application

customers (users/stakeholders)

update rate/maintenance cycles

people involved in development

architecture and network

disciplines involved

7/30/2019 Web_Engg

3/14

legal, ethical and social issues

information structuring and design

Application Characteristics

Primary Technologies Used

Web apps use technologies such as Java solutions (JavaBeans, JSP, etc), HTML, XML,JavaScript, and databases.

Software devt uses technologies such as OO languages or procedural, databases, generators,CASE tools.

Approaches to quality delivered

Web apps are expected to be high quality so that customers return to do repeat business.

Usability, accessibility, graphic design become very important

Competition is high over the users on the web

popularity is important

Development Process Drivers The dominant development process drivers for Web companies are composed of three quality

criteria

Reliability

Usability

Security

With regards to conventional software development, the development process driver is time tomarketand not quality criteria

Disciplines Involved

wide range of skills and expertise is required for web apps Distinct disciplines such as software engineering (development methodologies, project

management, tools), hypermedia engineering (linking, navigation), requirements engineering,usability engineering, information engineering, graphics design, and network management(performance measurement and tuning)

for conventional software, smaller disciplines such as software engineering, requirementsengineering, and usability engineering are required.

Information Structuring and Design

Web applications present structured and unstructured content, which may be distributed overmultiple sites and use different systems (e.g. database systems, file systems, multimedia storage

devices) the design of a Web application, unlike that of conventional software applications, includes the

organisation of content into navigational structures by means of hyperlinks

Suitable navigational structures

7/30/2019 Web_Engg

4/14

Technologies for Web Apps

The choice of appropriate technologies is an important success factor in the development ofWeb applications.

Markup/Hypertext/hypermedia/client-server/sockets

Define WHAT of a system: Define the requirements of web apps, identify the architecture,develop a design, etc

Define HOW: [implementation phase] choice of appropriate technologies

Separation of content and presentation, is a central requirement to appropriately usetechnologies.

The specifics of implementation technologies for Web applications versus conventionalsoftware systems stem from the use of Web standards.

This concerns in particular the implementation within the three views: request (client),response (server), and the rules for the communication between these two (protocol).

Protocol: HTTP, SMTP, FTP

Client Technologies: HTML, Plug-ins, Java Applets, ActiveX Controls,

Server Technologies:

Markup

instructions for document formatting. For example, we could write *Hello* to output Hello

or /Hello/ to outputHello

This is text inserted in a document to add information as to how characters and contents shouldbe represented in the document.

SGML HTML/XML

Hypertext and Hypermedia

Hypertext is understood as the organization of the interconnection of single information units. Relationships between these units can be expressed by links .

Hypermedia is commonly seen as a way to extend the hypertext principle to arbitrarymultimedia objects, e.g., images or video.

Client/Server Communication on the Web

The client/server paradigm underlying all Web applications forms the backbone between a user(client or user agent) and the actual application (server)

2-layer architecture

SMTP, RTSP,

SMTP Simple Mail Transfer Protocol

SMTP combined with POP3 and IMAP allows us to send and receive e-mails

In addition, SMTP is increasingly used as a transport protocol for asynchronous messageexchange based on SOAP

7/30/2019 Web_Engg

5/14

RTSP

Real Time Streaming Protocol

A standard designed to support the delivery of multimedia data in real-time conditions.

In contrast to HTTP, RTSP allows the transmission of resources to the client in a timely contextrather than delivering them in their entirety (at once) .

This transmission form is commonly called streaming

Streaming allows us to manually shift the audiovisual time window by requesting the streamat a specific time, i.e., it lets us control the playback of continuous media.

From Wiki The transmission of streaming data itself is not a task of the RTSP protocol

Most RTSP servers use the Real-time Transport Protocol(RTP) for media stream delivery

While similar in some ways to HTTP, RTSP defines control sequences useful in controllingmultimedia playback

HTTP

HyperText Transfer Protocol

Text-based stateless protocol controlling how resources, e.g., HTML documents or images, are

accessed.

Session Tracking

Interactive Web Applications must be able to distinguish requests by multiple simultaneoususers and identify related requests coming from the same user

Session defines a sequence of related HTTP requests between a specific user and server withinin a specific time window

Since HTTP is a stateless protocol, the Web server cannot automatically allocate incomingrequests to a session

Two principal methods can be distinguished, to allow a Web server to automatically allocate an

incoming request to a session: In each of its requests to a server, the client identifies itself with a unique identification. This

means that all data sent to the server are then allocated to the respective session.

All data exchanged between a client and a server are included in each request a client sendsto a server, so that the server logic can be developed even though the communication isstateless.

Session tracking is normally implemented by URL rewriting or cookies.

Client Technologies

Helpers and Plug-ins Adobe reader, WinZip

Java Applets

ActiveX Controls

Document Specific Technologies HTML XML XSL/XSLT SVG Scalable Vector Graphics

- Allows describing two-dimensional graphics in XML- SVG recognizes three types of graphics objects: vector graphics consisting of straight
http://en.wikipedia.org/wiki/Real-time_Transport_Protocolhttp://en.wikipedia.org/wiki/HTTPhttp://en.wikipedia.org/wiki/HTTPhttp://en.wikipedia.org/wiki/HTTPhttp://en.wikipedia.org/wiki/Real-time_Transport_Protocol

7/30/2019 Web_Engg

6/14

lines and curves, images, and text- Supports event-based interaction, e.g., responses to buttons or mouse movements- This format is suitable for all types of interactive and animated vector graphics.- Application examples include the representation of CAD, maps, and routes.

SMIL - Synchronized Multimedia Integration Language- Used to represent synchronized multimedia presentations .

Server Side Technologies

URI handlers to process HTTP requests

Server Side Includes (SSI)

CGI

Server Side Scripting

Servlets

JSP

ASP.NET

Web Services

Middleware Technologies

Application Servers

Messaging Systems/Brokers

7/30/2019 Web_Engg

7/14

Web Application Architectures

The quality of a Web application is considerably influenced by its underlying architecture.

Components of a Generic Web Application Architecture

Components based on the request-response paradigm

Components

Client

browser or user agent

Firewall

A piece of software regulating the communication between insecure networks (e.g., theInternet) and secure networks (e.g., corporate LANs).

This communication is filtered by access rules.

Proxy

A proxy is typically used to temporarily store Web pages in a cache

However, proxies can also assume other functionalities, e.g., adapting the contents for users(customization), or user tracking.

A proxy is used as an intermediate server to forward client requests for URLs to the (actual)server.

proxies are used to adapt and format links and contents to users

Web Server

A Web server is a piece of software that supports various Web protocols like HTTP, andHTTPS, etc., to process client requests.

Database Server

This server normally supplies an organizations production data in structured form, e.g., intables

Media Server

This component is primarily used for content streaming of non-structured bulk data (e.g., audioor video)

Content Management Server

Similar to a database server, a content management server holds contents to serve anapplication. These contents are normally available in the form of semi-structured data, e.g.,XML documents.

Application Server

7/30/2019 Web_Engg

8/14

An application server holds the functionality required by several applications, e.g., workflow orcustomization.

Legacy Application

A legacy application is an older system that should be integrated as an internal or externalcomponent.

Data Aspect Architectures

Data can be grouped into either of three architectural categories: (1) structured data of the kindheld in databases; (2) documents of the kind used in document management systems; and (3)multimedia data of the kind held in media servers.

Architectures for Multimedia Data

The ability to handle large data volumes plays a decisive role when designing systems that use

multimedia contents Basically, multimedia data, i.e., audio and video, can be transmitted over standard Internet

protocols like HTTP or FTP, just like any other data used in Web applications.

This approach is used by a large number of current Web applications, because it has the majorbenefit that no additional components are needed on the server.

Its downside, however, is often felt by users in that the media downloads are very slow.

We can use streaming technologies to minimize these waiting times for multimedia contents toplay out.

Streaming in this context means that a client can begin playout of the audio and/or video a fewseconds after it begins receiving the file from a server

This technique avoids having to download the entire file (incurring a potentially long delay)before beginning playout

Two protocols are generally used for the streaming of multimedia contents. One protocolhandles the transmission of multimedia data on the network level, and the other protocolcontrols the presentation flow (e.g., starting and stopping a video) and the transmission of meta-data.

RTP [real time protocol] network protocol , RTSP [real time streaming protocol] controlprotocol, MMS [Microsoft media server]

7/30/2019 Web_Engg

9/14

Fig 2: Streaming media architecture using point-to-point connections.

7/30/2019 Web_Engg

10/14

Search Engines

Originally, the term search engine referred to some kind of search index, a huge databasecontaining information from individual Web sites.

Help people find information on the Internet/on other sites.

Large search-index companies own thousands of computers that use software known as spiders

or robots (or just plain bots) to grab Web pages and read the information stored in them . These systems dont always grab all the information on each page or all the pages in a Web site,

but they grab a significant amount of information and use complex algorithms calculationsbased on complicated formulae to index that information

General Operations of search engines: [Crawling, Indexing, Searching]

Search/crawl the Internet

Keep an index of the words they find, and where they find them

words: occurring in the title, subtitile, metatags, and other relevant positions.

Allow users to look for words or combinations of words found in that index

Search/Crawl the Internet

Search engine employs special software robots, called spiders, to build lists of the words foundon Web sites

The early Google system had a server dedicated to providing URLs to the spiders. Rather thandepending on an Internet service provider for the domain name server (DNS) that translates aserver's name into an address, Google had its own DNS, in order to keep delays to a minimum.

When a spider is building its lists, the process is called Web crawling

How does any spider start its travels over the Web?

The usual starting points are lists of heavily used servers and very popular pages.

The spider will begin with a popular site, indexing the words on its pages and followingevery link found within the site.

The Google spider was built to index every significant word on a page, leaving out the articles"a," "an" and "the." Other spiders take different approaches.

robot exclusion protocol: when a site's owner doesn't wish a spider to crawl its pages or links

7/30/2019 Web_Engg

11/14

Search Directory

A search directory is a categorized collection of information about Web sites instead ofcontaining information from Web pages.

The most significant search directories are owned by Yahoo! (dir.yahoo.com) and the OpenDirectory Project (www.dmoz.org).

Directory companies dont use spiders or bots to download and index pages on the Web sites inthe directory; rather, for each Web site, the directory contains information, such as a title anddescription, submitted by the site owner.

Directories are human-editable: People check your web site; people index your website etc.

Google also has a directory but the information comes from somebody else from the OpenDirectory Project.

Building the Index

Once the spiders have completed the task of finding information on web pages, the searchengine must store it in a way that makes it useful.

There are two key components involved in making the gathered data accessible to users:

the information stored with the data

the method by which the information is indexed.

In the simplest case, a search engine could just store the word and the URL where it was found.

Page rank/Ranking organic and paid search results

Search engines store more info that simple word/URL combinations.

An engine might store the number of times that the word appears on a page.

The engine might assign a weight to each entry, with increasing values assigned to words asthey appear near the top of the document, in sub-headings, in links, in the meta tags or in thetitle of the page.

Ranking list tries to present the most useful pages at the top.

A search engine's organic ranking algorithm is one of the trickiest parts of designing asearch engine, so let's start by examining the simplest kind of ranking algorithm.

Ranking is just another word for sorting, the act of collating results into a certain order.Shopping search engines typically use simple ranking algorithms that the searcher canchoose. When the searcher is looking for a product to buy, the shopping search enginemight start by ordering the results by price (lowest to highest), but the searcher can decideto sort the list by other columns, such as availability (in stock, within one week, and so on),or any other features of the product.

Term frequency, term placement, link popularity (link analysis)

Regardless of the precise combination of additional pieces of information stored by a searchengine, the data will be encoded to save storage space.
http://www.dmoz.org/http://www.dmoz.org/

7/30/2019 Web_Engg

12/14

After the information is compacted, it's ready for indexing.

An index has a single purpose: It allows information to be found as quickly as possible .

There are quite a few ways for an index to be built, but one of the most effective ways is tobuild a hash table.

In hashing, a formula is applied to attach a numerical value to each word. The formula isdesigned to evenly distribute the entries across a predetermined number of divisions. This

numerical distribution is different from the distribution of words across the alphabet, and that isthe key to a hash table's effectiveness.

The hash table contains the hashed number along with a pointer to the actual data, which can besorted in whichever way allows it to be stored most efficiently.

The combination of efficient indexing and effective storage makes it possible to get resultsquickly, even when the user creates a complicated search.

Search and Display Results

Searching through an index involves a user building a query and submitting it through the search engine.

Displaying the results is a lot simpler than some other parts of the process

display can contain organic or paid results.

Organic results all use the title of the page followed by a snippet - a summary of thetext from that page that contains the search terms.

Paid results also use similar methods to display the pages

Search Relationships

Search engines compete with each other, but they also collaborate

Many search engines use technology from their competitors to present results.

Understanding how each engine delivers its results helps you target the most effectivesearch marketing efforts.

7/30/2019 Web_Engg

13/14

"Spiders" take a Web page's content and create key search words that enable online users tofind pages they're looking for.

7/30/2019 Web_Engg

14/14

Search Engine Optimization

SEO is the process of improving the visibility of a websiteor a web pageinsearch enginesviathe "natural" or un-paid ("organic" or "algorithmic") search results.

Search engine marketing through paid listings In general, the earlier (or higher on the page), and more frequently a site appears in the

search results list, the more visitors it will receive from the search engine.

The act of altering a web site so that it does well in the organic, crawler based listings ofsearch engines.

The process of editing a web sites content and code in order to improve visibility within oneor more search engines

White hat vs Black hat SEO

SEO techniques are classified by some into two broad categories: techniques that searchengines recommend as part of good design, and those techniques that search engines do notapprove of and attempt to minimize the effect of, referred to as spamdexing.

White hats are those website designers that play nice and try to follow all of the searchengine guidelines to optimize their site

A SEO tactic, technique or method is considered white hat if it conforms to the searchengines' guidelines and involves no deception.

White hat SEO is not just about following guidelines, but is about ensuring that thecontent a search engine indexes and subsequently ranks is the same content a user willsee.

White hat advice is generally summed up as creating content for users, not for searchengines, and then making that content easily accessible to the spiders, rather thanattempting to game the algorithm.

Black hats are where website designers use backdoors, cloaking/hiding, and other tricks tooptimize sites. [keyword stuffing, hidden/invisible/unrelated, metatag stuffing, ]

Black hat SEO attempts to improve rankings in ways that are disapproved of by the searchengines, or involve deception.

One black hat technique uses text that is hidden, either as text colored similar to thebackground, in an invisible div, or positioned off screen.

Search engines may penalize sites they discover using black hat methods, either byreducing their rankings or eliminating their listings from their databases altogether
http://en.wikipedia.org/wiki/Websitehttp://en.wikipedia.org/wiki/Web_pagehttp://en.wikipedia.org/wiki/Search_enginehttp://en.wikipedia.org/wiki/Search_enginehttp://en.wikipedia.org/wiki/Organic_searchhttp://en.wikipedia.org/wiki/Search_engine_results_pagehttp://en.wikipedia.org/wiki/Spamdexinghttp://en.wikipedia.org/wiki/Spamdexinghttp://en.wikipedia.org/wiki/Span_and_divhttp://en.wikipedia.org/wiki/Websitehttp://en.wikipedia.org/wiki/Web_pagehttp://en.wikipedia.org/wiki/Search_enginehttp://en.wikipedia.org/wiki/Organic_searchhttp://en.wikipedia.org/wiki/Search_engine_results_pagehttp://en.wikipedia.org/wiki/Spamdexinghttp://en.wikipedia.org/wiki/Spamdexinghttp://en.wikipedia.org/wiki/Span_and_div

web_engg

Documents