web_engg
TRANSCRIPT
-
7/30/2019 Web_Engg
1/14
Web EngineeringLecture One
On Web Engineering
Software Engg vs Web Engg
Web technologies: hypertext, hypermedia, client/server, etc
Search engines: searching, indexing, crawlers, etc
Search Engine Optimization
Web matrices and quality
Web engineering
Systematic, scientific, engineering and management approach
Develop, deploy and maintain qualitative Web applications
focuses on sound methodologies, techniques, and tools for developing web apps
Web engineering focuses on methodologies, techniques or tools for developing web apps.
Web engineering is defined as ...the use of scientific, engineering, and management principlesand systematic approaches with the aim of successfully developing, deploying and maintaininghigh quality Web-based systems and applications...
Web development has an important artistic side.
Web apps Vs traditional software devt/IS/computer application devt?
Characteristics of Web apps
Web apps constantly evolve. Unlike conventional software that goes through a planned anddiscrete revision at specific times in its lifecycle, Web applications continuously evolve in
terms of their requirements and functionality (instability of requirements). Managing thechange and evolution of a Web application is a major technical, organizational andmanagement challenge much more demanding than a traditional software development.
Web apps are inherently different from software. The content, which may include text,graphics, images, audio, and/or video, is integrated with procedural processing. Also, theway in which the content is presented and organized has implications on the performanceand response time of the system.
Web applications are meant to be used by a vast, variable user community - a large numberof anonymous users with varying requirements, expectations, and skill sets. Therefore, theuser interface and usability features have to meet the needs of a diverse, anonymous usercommunity to whom we cannot offer training sessions, thus complicating human-Web
interaction (HWI), user interface, and information presentation. In general, many Web-based systems demand a good look and feel, favoring visual
creativity and incorporation of multimedia in presentation and interface. In these systems,more emphasis is placed on visual creativity and presentation.
Technology instability- new tools, technologies, languages, standards to cope with.
Web apps devt uses cutting-edge, diverse technologies and standards and integratesnumerous varied components, including traditional and non-traditional software, interpretedscripting languages, HTML files, databases, images, and other multimedia components suchas video and audio, and complex user interfaces.
-
7/30/2019 Web_Engg
2/14
Delivery medium is different from traditional software.
Security and privacy needs of Web-based systems are more demanding than that oftraditional software.
Web Apps vs Conventional software
With respect to their development process, technologies, quality factors, and measures
Web Hypermedia, Web Software, or Web Application?
Hypermedia extension of hypertext
The Web is the best known example of a hypermedia system.
The Web has been used as the delivery platform for three types of applications: Webhypermedia applications, Web software applications, and Web applications
Web hypermedia application
a non-conventional application characterized by the authoring of information using nodes
(chunks of information), links (relations between nodes), anchors, access structures (fornavigation), and delivery over the Web .
Technologies: HTML, XML, JavaScript, and multimedia.
Web software application
A conventional software application that relies on the Web or uses the Web's infrastructurefor execution .
Typical applications include legacy information systems such as databases, bookingsystems, e-commerce apps, etc
They employ development technologies (e.g. DCOM, ActiveX, etc.), database systems, anddevelopment solutions (e.g. J2EE).
Web application An application delivered over the Web that combines characteristics of both Web
hypermedia and Web software applications.
Web Development vs. Software Development
Areas of difference for web devt and maintenance:People involved, intrinsic characteristics ofweb apps, and audience
Differences between Web and software development divided into 12 areas
application characteristics
primary technologies used
approach to quality delivered
development process drivers
availability of the application
customers (users/stakeholders)
update rate/maintenance cycles
people involved in development
architecture and network
disciplines involved
-
7/30/2019 Web_Engg
3/14
legal, ethical and social issues
information structuring and design
Application Characteristics
Primary Technologies Used
Web apps use technologies such as Java solutions (JavaBeans, JSP, etc), HTML, XML,JavaScript, and databases.
Software devt uses technologies such as OO languages or procedural, databases, generators,CASE tools.
Approaches to quality delivered
Web apps are expected to be high quality so that customers return to do repeat business.
Usability, accessibility, graphic design become very important
Competition is high over the users on the web
popularity is important
Development Process Drivers The dominant development process drivers for Web companies are composed of three quality
criteria
Reliability
Usability
Security
With regards to conventional software development, the development process driver is time tomarketand not quality criteria
Disciplines Involved
wide range of skills and expertise is required for web apps Distinct disciplines such as software engineering (development methodologies, project
management, tools), hypermedia engineering (linking, navigation), requirements engineering,usability engineering, information engineering, graphics design, and network management(performance measurement and tuning)
for conventional software, smaller disciplines such as software engineering, requirementsengineering, and usability engineering are required.
Information Structuring and Design
Web applications present structured and unstructured content, which may be distributed overmultiple sites and use different systems (e.g. database systems, file systems, multimedia storage
devices) the design of a Web application, unlike that of conventional software applications, includes the
organisation of content into navigational structures by means of hyperlinks
Suitable navigational structures
-
7/30/2019 Web_Engg
4/14
Technologies for Web Apps
The choice of appropriate technologies is an important success factor in the development ofWeb applications.
Markup/Hypertext/hypermedia/client-server/sockets
Define WHAT of a system: Define the requirements of web apps, identify the architecture,develop a design, etc
Define HOW: [implementation phase] choice of appropriate technologies
Separation of content and presentation, is a central requirement to appropriately usetechnologies.
The specifics of implementation technologies for Web applications versus conventionalsoftware systems stem from the use of Web standards.
This concerns in particular the implementation within the three views: request (client),response (server), and the rules for the communication between these two (protocol).
Protocol: HTTP, SMTP, FTP
Client Technologies: HTML, Plug-ins, Java Applets, ActiveX Controls,
Server Technologies:
Markup
instructions for document formatting. For example, we could write *Hello* to output Hello
or /Hello/ to outputHello
This is text inserted in a document to add information as to how characters and contents shouldbe represented in the document.
SGML HTML/XML
Hypertext and Hypermedia
Hypertext is understood as the organization of the interconnection of single information units. Relationships between these units can be expressed by links .
Hypermedia is commonly seen as a way to extend the hypertext principle to arbitrarymultimedia objects, e.g., images or video.
Client/Server Communication on the Web
The client/server paradigm underlying all Web applications forms the backbone between a user(client or user agent) and the actual application (server)
2-layer architecture
SMTP, RTSP,
SMTP Simple Mail Transfer Protocol
SMTP combined with POP3 and IMAP allows us to send and receive e-mails
In addition, SMTP is increasingly used as a transport protocol for asynchronous messageexchange based on SOAP
-
7/30/2019 Web_Engg
5/14
RTSP
Real Time Streaming Protocol
A standard designed to support the delivery of multimedia data in real-time conditions.
In contrast to HTTP, RTSP allows the transmission of resources to the client in a timely contextrather than delivering them in their entirety (at once) .
This transmission form is commonly called streaming
Streaming allows us to manually shift the audiovisual time window by requesting the streamat a specific time, i.e., it lets us control the playback of continuous media.
From Wiki The transmission of streaming data itself is not a task of the RTSP protocol
Most RTSP servers use the Real-time Transport Protocol(RTP) for media stream delivery
While similar in some ways to HTTP, RTSP defines control sequences useful in controllingmultimedia playback
HTTP
HyperText Transfer Protocol
Text-based stateless protocol controlling how resources, e.g., HTML documents or images, are
accessed.
Session Tracking
Interactive Web Applications must be able to distinguish requests by multiple simultaneoususers and identify related requests coming from the same user
Session defines a sequence of related HTTP requests between a specific user and server withinin a specific time window
Since HTTP is a stateless protocol, the Web server cannot automatically allocate incomingrequests to a session
Two principal methods can be distinguished, to allow a Web server to automatically allocate an
incoming request to a session: In each of its requests to a server, the client identifies itself with a unique identification. This
means that all data sent to the server are then allocated to the respective session.
All data exchanged between a client and a server are included in each request a client sendsto a server, so that the server logic can be developed even though the communication isstateless.
Session tracking is normally implemented by URL rewriting or cookies.
Client Technologies
Helpers and Plug-ins Adobe reader, WinZip
Java Applets
ActiveX Controls
Document Specific Technologies HTML XML XSL/XSLT SVG Scalable Vector Graphics
- Allows describing two-dimensional graphics in XML- SVG recognizes three types of graphics objects: vector graphics consisting of straight
http://en.wikipedia.org/wiki/Real-time_Transport_Protocolhttp://en.wikipedia.org/wiki/HTTPhttp://en.wikipedia.org/wiki/HTTPhttp://en.wikipedia.org/wiki/HTTPhttp://en.wikipedia.org/wiki/Real-time_Transport_Protocol -
7/30/2019 Web_Engg
6/14
lines and curves, images, and text- Supports event-based interaction, e.g., responses to buttons or mouse movements- This format is suitable for all types of interactive and animated vector graphics.- Application examples include the representation of CAD, maps, and routes.
SMIL - Synchronized Multimedia Integration Language- Used to represent synchronized multimedia presentations .
Server Side Technologies
URI handlers to process HTTP requests
Server Side Includes (SSI)
CGI
Server Side Scripting
Servlets
JSP
ASP.NET
Web Services
Middleware Technologies
Application Servers
Messaging Systems/Brokers
-
7/30/2019 Web_Engg
7/14
Web Application Architectures
The quality of a Web application is considerably influenced by its underlying architecture.
Components of a Generic Web Application Architecture
Components based on the request-response paradigm
Components
Client
browser or user agent
Firewall
A piece of software regulating the communication between insecure networks (e.g., theInternet) and secure networks (e.g., corporate LANs).
This communication is filtered by access rules.
Proxy
A proxy is typically used to temporarily store Web pages in a cache
However, proxies can also assume other functionalities, e.g., adapting the contents for users(customization), or user tracking.
A proxy is used as an intermediate server to forward client requests for URLs to the (actual)server.
proxies are used to adapt and format links and contents to users
Web Server
A Web server is a piece of software that supports various Web protocols like HTTP, andHTTPS, etc., to process client requests.
Database Server
This server normally supplies an organizations production data in structured form, e.g., intables
Media Server
This component is primarily used for content streaming of non-structured bulk data (e.g., audioor video)
Content Management Server
Similar to a database server, a content management server holds contents to serve anapplication. These contents are normally available in the form of semi-structured data, e.g.,XML documents.
Application Server
-
7/30/2019 Web_Engg
8/14
An application server holds the functionality required by several applications, e.g., workflow orcustomization.
Legacy Application
A legacy application is an older system that should be integrated as an internal or externalcomponent.
Data Aspect Architectures
Data can be grouped into either of three architectural categories: (1) structured data of the kindheld in databases; (2) documents of the kind used in document management systems; and (3)multimedia data of the kind held in media servers.
Architectures for Multimedia Data
The ability to handle large data volumes plays a decisive role when designing systems that use
multimedia contents Basically, multimedia data, i.e., audio and video, can be transmitted over standard Internet
protocols like HTTP or FTP, just like any other data used in Web applications.
This approach is used by a large number of current Web applications, because it has the majorbenefit that no additional components are needed on the server.
Its downside, however, is often felt by users in that the media downloads are very slow.
We can use streaming technologies to minimize these waiting times for multimedia contents toplay out.
Streaming in this context means that a client can begin playout of the audio and/or video a fewseconds after it begins receiving the file from a server
This technique avoids having to download the entire file (incurring a potentially long delay)before beginning playout
Two protocols are generally used for the streaming of multimedia contents. One protocolhandles the transmission of multimedia data on the network level, and the other protocolcontrols the presentation flow (e.g., starting and stopping a video) and the transmission of meta-data.
RTP [real time protocol] network protocol , RTSP [real time streaming protocol] controlprotocol, MMS [Microsoft media server]
-
7/30/2019 Web_Engg
9/14
Fig 2: Streaming media architecture using point-to-point connections.
-
7/30/2019 Web_Engg
10/14
Search Engines
Originally, the term search engine referred to some kind of search index, a huge databasecontaining information from individual Web sites.
Help people find information on the Internet/on other sites.
Large search-index companies own thousands of computers that use software known as spiders
or robots (or just plain bots) to grab Web pages and read the information stored in them . These systems dont always grab all the information on each page or all the pages in a Web site,
but they grab a significant amount of information and use complex algorithms calculationsbased on complicated formulae to index that information
General Operations of search engines: [Crawling, Indexing, Searching]
Search/crawl the Internet
Keep an index of the words they find, and where they find them
words: occurring in the title, subtitile, metatags, and other relevant positions.
Allow users to look for words or combinations of words found in that index
Search/Crawl the Internet
Search engine employs special software robots, called spiders, to build lists of the words foundon Web sites
The early Google system had a server dedicated to providing URLs to the spiders. Rather thandepending on an Internet service provider for the domain name server (DNS) that translates aserver's name into an address, Google had its own DNS, in order to keep delays to a minimum.
When a spider is building its lists, the process is called Web crawling
How does any spider start its travels over the Web?
The usual starting points are lists of heavily used servers and very popular pages.
The spider will begin with a popular site, indexing the words on its pages and followingevery link found within the site.
The Google spider was built to index every significant word on a page, leaving out the articles"a," "an" and "the." Other spiders take different approaches.
robot exclusion protocol: when a site's owner doesn't wish a spider to crawl its pages or links
-
7/30/2019 Web_Engg
11/14
Search Directory
A search directory is a categorized collection of information about Web sites instead ofcontaining information from Web pages.
The most significant search directories are owned by Yahoo! (dir.yahoo.com) and the OpenDirectory Project (www.dmoz.org).
Directory companies dont use spiders or bots to download and index pages on the Web sites inthe directory; rather, for each Web site, the directory contains information, such as a title anddescription, submitted by the site owner.
Directories are human-editable: People check your web site; people index your website etc.
Google also has a directory but the information comes from somebody else from the OpenDirectory Project.
Building the Index
Once the spiders have completed the task of finding information on web pages, the searchengine must store it in a way that makes it useful.
There are two key components involved in making the gathered data accessible to users:
the information stored with the data
the method by which the information is indexed.
In the simplest case, a search engine could just store the word and the URL where it was found.
Page rank/Ranking organic and paid search results
Search engines store more info that simple word/URL combinations.
An engine might store the number of times that the word appears on a page.
The engine might assign a weight to each entry, with increasing values assigned to words asthey appear near the top of the document, in sub-headings, in links, in the meta tags or in thetitle of the page.
Ranking list tries to present the most useful pages at the top.
A search engine's organic ranking algorithm is one of the trickiest parts of designing asearch engine, so let's start by examining the simplest kind of ranking algorithm.
Ranking is just another word for sorting, the act of collating results into a certain order.Shopping search engines typically use simple ranking algorithms that the searcher canchoose. When the searcher is looking for a product to buy, the shopping search enginemight start by ordering the results by price (lowest to highest), but the searcher can decideto sort the list by other columns, such as availability (in stock, within one week, and so on),or any other features of the product.
Term frequency, term placement, link popularity (link analysis)
Regardless of the precise combination of additional pieces of information stored by a searchengine, the data will be encoded to save storage space.
http://www.dmoz.org/http://www.dmoz.org/ -
7/30/2019 Web_Engg
12/14
After the information is compacted, it's ready for indexing.
An index has a single purpose: It allows information to be found as quickly as possible .
There are quite a few ways for an index to be built, but one of the most effective ways is tobuild a hash table.
In hashing, a formula is applied to attach a numerical value to each word. The formula isdesigned to evenly distribute the entries across a predetermined number of divisions. This
numerical distribution is different from the distribution of words across the alphabet, and that isthe key to a hash table's effectiveness.
The hash table contains the hashed number along with a pointer to the actual data, which can besorted in whichever way allows it to be stored most efficiently.
The combination of efficient indexing and effective storage makes it possible to get resultsquickly, even when the user creates a complicated search.
Search and Display Results
Searching through an index involves a user building a query and submitting it through the search engine.
Displaying the results is a lot simpler than some other parts of the process
display can contain organic or paid results.
Organic results all use the title of the page followed by a snippet - a summary of thetext from that page that contains the search terms.
Paid results also use similar methods to display the pages
Search Relationships
Search engines compete with each other, but they also collaborate
Many search engines use technology from their competitors to present results.
Understanding how each engine delivers its results helps you target the most effectivesearch marketing efforts.
-
7/30/2019 Web_Engg
13/14
"Spiders" take a Web page's content and create key search words that enable online users tofind pages they're looking for.
-
7/30/2019 Web_Engg
14/14
Search Engine Optimization
SEO is the process of improving the visibility of a websiteor a web pageinsearch enginesviathe "natural" or un-paid ("organic" or "algorithmic") search results.
Search engine marketing through paid listings In general, the earlier (or higher on the page), and more frequently a site appears in the
search results list, the more visitors it will receive from the search engine.
The act of altering a web site so that it does well in the organic, crawler based listings ofsearch engines.
The process of editing a web sites content and code in order to improve visibility within oneor more search engines
White hat vs Black hat SEO
SEO techniques are classified by some into two broad categories: techniques that searchengines recommend as part of good design, and those techniques that search engines do notapprove of and attempt to minimize the effect of, referred to as spamdexing.
White hats are those website designers that play nice and try to follow all of the searchengine guidelines to optimize their site
A SEO tactic, technique or method is considered white hat if it conforms to the searchengines' guidelines and involves no deception.
White hat SEO is not just about following guidelines, but is about ensuring that thecontent a search engine indexes and subsequently ranks is the same content a user willsee.
White hat advice is generally summed up as creating content for users, not for searchengines, and then making that content easily accessible to the spiders, rather thanattempting to game the algorithm.
Black hats are where website designers use backdoors, cloaking/hiding, and other tricks tooptimize sites. [keyword stuffing, hidden/invisible/unrelated, metatag stuffing, ]
Black hat SEO attempts to improve rankings in ways that are disapproved of by the searchengines, or involve deception.
One black hat technique uses text that is hidden, either as text colored similar to thebackground, in an invisible div, or positioned off screen.
Search engines may penalize sites they discover using black hat methods, either byreducing their rankings or eliminating their listings from their databases altogether
http://en.wikipedia.org/wiki/Websitehttp://en.wikipedia.org/wiki/Web_pagehttp://en.wikipedia.org/wiki/Search_enginehttp://en.wikipedia.org/wiki/Search_enginehttp://en.wikipedia.org/wiki/Organic_searchhttp://en.wikipedia.org/wiki/Search_engine_results_pagehttp://en.wikipedia.org/wiki/Spamdexinghttp://en.wikipedia.org/wiki/Spamdexinghttp://en.wikipedia.org/wiki/Span_and_divhttp://en.wikipedia.org/wiki/Websitehttp://en.wikipedia.org/wiki/Web_pagehttp://en.wikipedia.org/wiki/Search_enginehttp://en.wikipedia.org/wiki/Organic_searchhttp://en.wikipedia.org/wiki/Search_engine_results_pagehttp://en.wikipedia.org/wiki/Spamdexinghttp://en.wikipedia.org/wiki/Spamdexinghttp://en.wikipedia.org/wiki/Span_and_div