mini project by pawan raj phuyal

31
Koneru Lakshmaiah College Of Engineering (Autonomous) Freshman Engineering Department Green Fields, Vaddeswaram, Guntur-522502 ANDHRA PRADESH, INDIA MINI PROJECT INTERNET SEARCHING BY Mr. PAWAN RAJ PHUYAL (Y8IT284) Mr. SANTOSH BHANDARI(YTIT297) Mr. Md. SHAHBAZ HAIDER(Y8CE240) I/IV B. Tech.:- Information Science & Technology Engineering & Civil Lecturer In-charge(s) Ms. Anila & Ms.Tripura

Upload: phuyalpawan

Post on 13-Nov-2014

977 views

Category:

Documents


0 download

DESCRIPTION

This is my first mini project in engineering career.......................

TRANSCRIPT

Page 1: Mini Project by PAWAN RAJ PHUYAL

Koneru Lakshmaiah College Of Engineering

(Autonomous)

Freshman Engineering Department

Green Fields, Vaddeswaram, Guntur-522502

ANDHRA PRADESH, INDIA

MINI PROJECT

INTERNET SEARCHING

BY

Mr. PAWAN RAJ PHUYAL (Y8IT284)

Mr. SANTOSH BHANDARI(YTIT297)

Mr. Md. SHAHBAZ HAIDER(Y8CE240)

I/IV B. Tech.:- Information Science & Technology

Engineering & Civil

Lecturer In-charge(s)

Ms. Anila & Ms.Tripura

Page 2: Mini Project by PAWAN RAJ PHUYAL

Koneru Lakshmaiah College Of Engineering

CERTIFICATE

This is to certify that the students of I/IV B. tech. Mr. PAWAN

RAJ PHUYAL (Y8IT284), Mr. SANTOSH BHANDARI(Y8IT297)

& Md. SHAHBAZ HAIDER(Y8CE240) have done a mini project in

the field of Internet Searching in the year 2009-06.

Head of the Department Lecturer In-charge

Page 3: Mini Project by PAWAN RAJ PHUYAL

Acknowledgement

This is to express our heartfelt gratitude to our all supporters,

guidance and friends who inspired us to prepare and collect the

datas and helped us to complete this project successfully besides

many obstacles.

Our sincere thanks goes to the senior lecturers and our guides

Ms Anila & Ms Tripura who encouraged us providing the

valuable ideas.

Finally, we want to give special thanks to one and all that either

directly or indirectly co-operated in the successful completion of

this mini project.

Mr. PAWAN RAJ PHUYAL (Y8IT284)

Mr. SANTOSH BHANDARI (Y8IT297)

Mr. Md. SHAHBAZ HAIDER (Y8CE240)

1/4 B. Tech. Information Science & Technology

1/4 B. Tech. Civil Engineering

Page 4: Mini Project by PAWAN RAJ PHUYAL

CONTENTS

1: Introduction

Internet

What is WWW

2: History of Internet

3: Uses of Internet

Reasons why people use internet

Why do people put thing on web

4: How Web Works

5: Searching the Internet

Finding things on the web

Search tips

Search engines

Search engine directories

Some most popular search engines

Limitation of search engines

6: Steps of Searching

7: Browsing the Internet

Web browsers

Useful web browsers

8: Demerits of Internet Searching

Thief of personal information

Spamming

Virus threat

Pornography

9: Conclusion

Page 5: Mini Project by PAWAN RAJ PHUYAL

Chapter 1

Introduction

The Internet is a global system of interconnected computer networks that

use the standardized Internet Protocol Suite (TCP/IP). It is a network of networks

that consists of millions of private and public, academic, business, and government networks of local to global scope that are linked by copper wires, fiber-optic

cables, wireless connections, and other technologies.

The Internet carries a vast array of information resources and services, most notably, the inter-linked hypertext documents of the World Wide Web (WWW)

and the infrastructure to support electronic mail, in addition to popular services

such as online chat, file transfer and file sharing, online gaming, and Voice over Internet Protocol (VoIP) person-to-person communication via voice and video.

The origins of the Internet reach back to the 1960s when the United States funded

research projects of its military agencies to build robust, fault-tolerant and distributed computer networks. This research spawned world-wide participation in

the development of new networking technologies and led to the commercialization

of an international network and the popularization of countless applications in

virtually every aspect of modern human life. By 2009, an estimated quarter of Earth's population uses the services of the Internet.

There are several different ways to look at what the Internet actually is

At the highest level, the Internet is the people that use it - the global community of

users.

Page 6: Mini Project by PAWAN RAJ PHUYAL

At another level, the Internet is a set of protocols that define the rules of

how the computers will transfer information with one another.

At the lowest level, it is the hardware behind the computer networks - the

computers, modems, phone lines and cables that link together to form a huge network.

Who Controls the Internet?

The Internet is a kind of anarchy. Everyone looks after their own little Internet 'patch', but no one is responsible for looking after it as a whole. It would

be nearly impossible to control the Internet now - and trying to would certainly

destroy it. But datas available in the internet are updated, refreshed & released by

their responsible sites & organizations.

What is the World Wide Web?

The official definition of the WWW is "wide-area hypermedia information retrieval initiative aiming to give universal access to a large universe of

documents."

wide-area: The World Wide Web spans the whole globe.

hypermedia: It contains various types of media (text, pictures, sound, movies ...)

and hyperlinks that connect pages to one another.

Page 7: Mini Project by PAWAN RAJ PHUYAL

Chapter 2

History of the Internet

The Internet was born about 20 years ago, as a U.S. Defense Department

network called the ARPnet. information and comments with millions of people all over the world, get a fast answer to any question imaginable on a scientific,

computing, technical, business, investment, or any other subject. You could join

over 11,000 electronic conferences, anytime, on any subject, you would be broadcasting your views , questions, and information to millions of other part.

There has never been anything like it in the history of the world, and in this

English class we've covered a lot of history. At a growing rate of about 20% per

month the Internet is only getting bigger and if people don't start utilizing its resources they could be road kill on this Information Superhighway. Hey, I'll bet in

the middle of that last sentence another computer just got on-line to the Net. There

are three major features of the Internet, On-line discussion groups, Universal

Electronic Mail, files and software. There's about 11,000 on-line discussion groups called Newsgroups, on most any topic you can imagine. If you are on the Net, you

can participate in any of these discussions in any of these newsgroups. The next

thing is Universal Electronic Mail or E-mail. E-mail is the biggest and cheapest system on the Net and is also one of its biggest attractions. Since all commercial

on-line services have something called gateways for sending and receiving

electronic mail messages on the Internet, you're able to send and receive messages

or files to anyone else who is on-line, anywhere in the world and in seconds. The third feature I mentioned was files and software. This in my opinion is the most

impressive one. All the thousands of individual computer facilities connected to

the Internet are also vast storage repositories for hundreds of thousands of software programs, information text files, video and sound clips, and other computer based

resources. And their all accessible in minutes from any personal computer on-line

to the Internet. So I could do all this stuff on the Internet, why should I take notice?

Because of its sheer size, volume of messages, and it's incredible monthly growth. From the latest statistics I was able to get, there are currently 30 million people

who use the Internet worldwide

Before the widespread internetworking that led to the Internet, most communication networks were limited by their nature to only allow

communications between the stations on the local network and the prevalent

computer networking method was based on the central mainframe computer model. Several research programs began to explore and articulate principles of

Page 8: Mini Project by PAWAN RAJ PHUYAL

networking between physically separate networks, leading to the development of

the packet switching model of digital networking. These research efforts included those of the laboratories of Donald Davies (NPL), Paul Baran (RAND

Corporation), and Leonard Kleinrock at MIT and at UCLA. The research led to the

development of several packet-switched networking solutions in the late 1960s and

1970s,[1]

including ARPANET and the X.25 protocols. Additionally, public access and hobbyist networking systems grew in popularity, including unit-to-unfix copy

(UUCP) and FidoNet. They were however still disjointed separate networks,

served only by limited gateways between networks. This led to the application of packet switching to develop a protocol for internetworking, where multiple

different networks could be joined together into a super-framework of networks.

By defining a simple common network system, the Internet Protocol Suite, the

concept of the network could be separated from its physical implementation. This spread of internetworking began to form into the idea of a global network that

would be called the Internet, based on standardized protocols officially

implemented in 1982. Adoption and interconnection occurred quickly across the advanced telecommunication networks of the western world, and then began to

penetrate into the rest of the world as it became the de-facto international standard

for the global network. However, the disparity of growth between advanced

nations and the third-world countries led to a digital divide that is still a concern today.

Following commercialization and introduction of privately run Internet service

providers in the 1980s, and the Internet's expansion for popular use in the 1990s, the Internet has had a drastic impact on culture and commerce.

Page 9: Mini Project by PAWAN RAJ PHUYAL

Chapter 3

Uses of Internet

The internet is computer based global information system. It is composed of many interconnected computer networks. Each network may link thousands of computers enabling them to share information. The internet has brought a transformation in many aspects of life. It is one of the biggest contributors in making the world into a global village. Use of internet has grown tremendously since it was introduced. It is mostly because of its flexibility. Nowadays one can access the internet easily. Most people have computers in their homes but even the ones who don’t they can always go to cyber cafes where this service is provided.

The internet developed from software called the ARPANET which the U.S military had developed. It was only restrict to military personnel and the people who developed it. Only after it was privatized was it allowed to be used commercially.

The internet has developed to give many benefits to mankind. The access to information being one of the most important. Student can now have access to libraries around the world. Some charge a fee but most provide free services. Before students had to spend hours and hours in the libraries but now at the touch of a button students have a huge database in front of them

3.1 Reasons Why People use Internet

It’s the first month of a new year and at this time I’m itching to start new web

ventures both for fun and profit. I usually do up a list of possible startup and site

ideas and narrow them down into those with the highest potential. But success depends on execution and not just plans so I tend not to be too hung up about

having a complete vision of what I want.

A little vagueness won’t hurt. I can always muddle through and change things up in response to market conditions or personal interest. No need to be perfect from

the start.

Page 10: Mini Project by PAWAN RAJ PHUYAL

I looked at many websites to study their methods, to learn what made them a success. I started

planning what specific niche I wanted to explore and suddenly realized that I was

thinking about the whole thing in a roundabout way.

There’s really no need to think hard about having the perfect idea. The foundations of popular and profitable websites/services are deeply related to the basic reasons

why people get online and use the internet. Let’s do some reverse engineering

from that perspective.

So, why do people worldwide use the internet?

To communicate and socialize

This is very much a fundamental human need. People like to meet and talk to other people through the internet. They use it to maintain new or existing

relationships. They want to communicate ideas and find solidarity with

others who share similar interests. So do something which facilitates

communication. Hyper-local or cross-border communities, social networks, virtual worlds, apps or services built on existing communication/social

protocols and services. Bring human social activities onto the internet grid.

Socialize existing web functions, emphasize on connecting people.

To find information, learn new things and be entertained

The internet is a massive archive of new and old information. It is also a

source of pleasure, giving immediate gratification in the form of images, sound and interactivity. As an educational tool, the web is essential for

people who are seeking to learn.

People want to find things online. So help them. Create a system which provides information or filters existing content. Monetize the flow of data.

Blogs, training courses, social news, aggregated news, paid membership

sites, online journals, one-stop entertainment portals, video, image and game

hubs with a specific focus.

To do work, generate income and run a business

People use the internet to make a living. It is essential to many businesses that want to increase brand exposure or sell a product/service. They also use

the web to help them work better. There is a market of webmasters,

entrepreneurs and small/big businesses out there who are willing to pay to

Page 11: Mini Project by PAWAN RAJ PHUYAL

boost their revenue. Consultancies, design firms, freelancers, enterprise

software, business-specific tools/apps and services. Think of ways to help people work smarter and more efficiently online.

To find general information about a subject

The Web is like a huge encyclopedia of information - in some ways it's even better. The volume of information you'll find on the Web is amazing. For

every topic that you've ever wondered about, there's bound to be someone

who's written a Web page about it. The Web offers many different

perspectives on a single topic. For example, here is a selection of pages about Genetic Engineering:

In fact you can even find online encyclopedias. Many of these are now

offering a subscription service which lets you search through the complete

text of the encyclopedia. There are also many free encyclopedias that may give you a cut-down version of what you would find in a complete

encyclopedia.

To access information not easily available elsewhere One of the great things about the Web is that it puts information into your

hands that you might otherwise have to pay for or find out by less

convenient means.

To correspond with faraway friends

Email offers a cheap and easy alternative to traditional methods of correspondence. It's faster and easier than writing snail mail and cheaper

than using the telephone. Of course, there are disadvantages too. It's not as

personal as a handwritten letter - and not as reliable either. If you spell the

name of the street wrong in a conventional address, it's not too difficult for the post office to work out what you mean. However if you spell anything

wrong in an email address, your mail won't be delivered (you might get it

sent back to you or you might never realize).

To meet people The Web is generally a very friendly place. People love getting email from

strangers, and friendships are quick to form from casual correspondence.

The "impersonal" aspect of email tends to encourage people to reveal surprisingly personal things about themselves. When you know you will

never have to meet someone face-to-face, you may find it easier to tell them

your darkest secrets. Cyber-friendships have often developed into real life

ones too. Many people have even found love on the Net, and have gone on

Page 12: Mini Project by PAWAN RAJ PHUYAL

to marry their cyber-partner.

Did you think you were alone in your obsession with a singer, TV programmed, author.

To have fun

There's no doubt that the Internet is a fun place to be. There's plenty to keep

you occupied on a rainy day. To learn

Online distance education courses can give you an opportunity to gain a

qualification over the Internet. To read the news

To find software

The Internet contains a wealth of useful downloadable shareware. Some

pieces of shareware are limited versions of the full piece of software, other are time limited trials (you should pay once the time limit is up). Other

shareware is free for educational institutes, or for non-commercial purposes.

To buy things The security of on-line shopping is still questionable, but as long as you are

dealing with a reputable company or Web Site the risks are minimal.

3.2 Why do people put things on the Web?

To advertise a product

Most company Web sites start up as a big advertisement for their products

and services. It may be hard to see why anyone would willingly visit a 10 page ad - but these advertisements are very useful to anyone genuinely

interested in finding out about their products. Companies may also give

away some information for free as an incentive for people to visit their

pages. To sell a product

Internet shopping (e-commerce) is still in its infancy - it takes a very good

marketing strategy to actually make money out of selling items over the Web, but that doesn't stop lots of people from trying.

To make money

A popular way to make money out of the Web is from advertising revenue.

Popular sites have banners at the top of the page enticing people to click them and be taken to the advertiser's Web site. These banners are generally

animated and very appealing, with mysterious messages to make users

Page 13: Mini Project by PAWAN RAJ PHUYAL

Chapter 4

How Web Works

Web documents can be linked together because they are created in a

format known as hypertext. Hypertext systems provide an easy way to manage

large collection of data, which can include text files, picture, movies, and more. in a hypertext system when you view a document on your system screen, you also can

access all the data that might be linked to it. So, if the document is discussion of

honey bees you might be able to click the hypertext linked and see photos of a

beehive, or a movie of bees gathering pollen from flowers.

To support hypertext documents, the web uses a special protocol, called the

hypertext transfer protocol or HTTP.A hypertext document is specially enclosed

the file that uses the hypertext markup language, or HTML. This language allows the document author or embed hypertext links –also called a hyperlinks –or just

link in the web document .HTTP and hypertext links are the foundation of world

wide web.

As you read a hypertext document more commonly called a web pages –on

screen, you can click a word or picture enclosed as a hypertext link and

immediately jump to another location within the same location or to different web

page .the second page may be located on the same computer as the original page or anywhere else on the internet .Because you do not have to learn separate command

and address to jump to a new location, the world wide web organized widely

scattered resource in to a seamless whole.

A collection of related web pages is called a web site. Web site are housed on

web server, internet host computer s that often store thousands of individual pages.

copying a pages in to a server is called publishing the pages, but process also

called posting or uploading.

Web pages are used to distribute news, interactive educational services,

product information ,catalogs, highway ,traffic reports, and live audio and video,

and other kinds of information.web pages permit readers to consult databases ,order products and information, and submit payment with a credit card or an

account number.

Page 14: Mini Project by PAWAN RAJ PHUYAL

The hypertext transfer protocol use the internet address in a specific format, called a uniform resource locators, or URL .the url type specifies the type of server in

which the file is located. address is the address of server, and path is an location

within the file structure of server. The path includes the list of folder where the desire file is located. One example of url page at the library of congress web site

which includes the information about the library’s collection of permanent

exhibits.

When we put the cursor on the browser’s blinking area and start

to type it will starts to find out the things which are related to

our input throughout the web. If our input is proper and correct

then only it will find whatever we want to search.

Page 15: Mini Project by PAWAN RAJ PHUYAL

Chapter 5

Searching the Internet

4.1 Finding Things on the Web

The Web is a very big and much disorganized place. Just about any information

you would ever want to know (and a whole lot more that you wouldn't) exists on

the Web somewhere. But finding it is another story.

The reason for this is that it was never designed as a global information retrieval system, hence there is no central place monitoring where or how information is

stored. The added complication of hypertext makes it very easy to lose your focus

and get lost.

Search Tools

These are lists of links to other sites related to a particular subject. The most useful

trailblazer pages have links divided into categories and descriptions of why each site is useful.

Trailblazer pages are often constructed by enthusiastic amateurs. Some librarians

are creating trailblazer pages to help people find information, e.g. Children's Literature Page at the University of Calgary

Trailblazer pages can be very useful in your Web searching. You will often find

links to pages that don't show up in search engines or directories. However, it can be frustrating to jump from one trailblazer page to another without finding any

pages with actual content!

Portal Sites

These are sites aim to be an Internet 'one-stop-shop', either to the whole Internet, or

for one particular broad subject (e.g. Education). As well as link directories and

search engines they might offer a range of other services such as discussion

forums, online shopping malls and news reports. They can be quite useful, especially for new users to get orientated to what kinds of things the Internet can

offer them. No portal can cover the entire Internet though, so eventually you might

Page 16: Mini Project by PAWAN RAJ PHUYAL

find their range of subjects limiting and prefer to go on a wider hunt for the

information you require.

4.3 Search Tips

Understand the search engine you are using. Read the 'search tips' or 'help'

for the search engine - for instance Alta Vista's Help. Use a variety of key words, use synonyms.

Search engines often match the first word first so put the most important

word or the broadest category at the beginning. Use quotation marks to search for a phrase. Searching for rock and roll will

return documents with any of the words rock, and or roll. Searching for

"rock and roll" will only return documents with the whole phrase (read the

search engine 'help' to see if it supports phrases). Try different arrangements of key words e.g. if you are looking for

indigenous women poets try "women writers" AND indigenous. This is a

combination of a phrase and a single word. Remember though, that in North America they use native rather than indigenous - think of how your key

words will be written on the document.

Some search engines are case sensitive. Most search engines will match both

upper and lower case if lower case letter are entered, but only upper case if upper case letters are entered. For instance "margaret mahy" will match both

"margaret mahy" and "Margaret Mahy", but "Margaret Mahy" will only

match "Margaret Mahy".

Most search engines will match part or whole of a word - e.g. sing will retrieve singer, single, singe etc.

Think of common misspellings - take into account American spellings e.g.

theater, center. Most search engines will search for documents with any of the words you

enter - e.g. a search for Christmas carols will find documents with just the

word Christmas, as well as documents with just the word carols.

Documents with both of the words will appear earlier on in the results. You can use operators to restrict your search further (check the 'help' to find out

which operators the search engine uses).

Results are returned in order of relevance. If there is nothing useful in the first few pages, chances are there won't be anything useful in any of the

others. Change your search query or use another search tool.

A Web Searching Activity

Page 17: Mini Project by PAWAN RAJ PHUYAL

Pick one of these to do your search on:

A general site about conservation for your students to use as a reference in a piece of their writing.

A page with information about Keas that you can adapt into a worksheet for

your students. A site containing facts on endangered species from around the world.

Information on why the Pohutukawa tree is dying out.

A page with lots of conservation links for your students to use as a starting

point in a web search. A site about New Zealand flora and fauna.

Pick one or more search tools to use for your search. Here are some to get you

started:

Yahoo

AltaVista

Google Infoseek

Excite

Search NZ

What Search Tools did you use?

How long did it take to find what you were looking for?

What search words did you use to start with?

Did you have to revise your search words? Why?

What were the most successful search words?

What's the address of the page you found?

Describe the steps you took to find this page:

How satisfied were you with the page you found? How well did it fit what you were looking for?

How well do you think your chosen search tool/tools performed in your search?

Page 18: Mini Project by PAWAN RAJ PHUYAL

This is a facility that you may "bookmark" or add to your "favorites" it is no longer

regularly updated and maintained nor will it be updated, as I personally use the

Google toolbar for most of my searching, If I need something "special" I can try something from this list or I may use Speciality Search Engines or perhaps the

huge resource at Special Search Engines. There is also a search engine that

searches for specialist search engines, but ironically, I cannot find it at the moment.

4.6 Some most popular Search Engines

The search engines below are all excellent choices to start with when searching for

information.

Google http://www.google.com

Google

Google Inc. is an American public corporation, earning revenue from advertising related to its Internet search, e-mail, online mapping, office productivity, social

networking, and video sharing services as well as selling advertising-free versions

of the same technologies. The Google headquarters, the Googleplex, is located in

Mountain View, California. As of March 31, 2009 (2009 -03-31)[update]

, the company has 20,164 full-time employees.

Page 19: Mini Project by PAWAN RAJ PHUYAL

Google was founded by Larry Page and Sergey Brin while they were students at

Stanford University and the company was first incorporated as a privately held company on September 4, 1998. The initial public offering took place on August

19, 2004, raising US$1.67 billion, implying a value for the entire corporation of

US$23 billion. Google has continued its growth through a series of new product

developments, acquisitions, and partnerships. Environmentalism, philanthropy and positive employee relations have been important tenets during the growth of

Google. The company has been identified multiple times as Fortune Magazine's #1

Best Place to Work,[4]

and as the most powerful brand in the world[5]

(according to the Millward Brown Group).

.

Yahoo http://www.yahoo.com

Yahoo!

Launched in 1994, Yahoo is the web's oldest "directory," a place where human

editors organize web sites into categories. However, in October 2002, Yahoo made

a giant shift to crawler-based listings for its main results. These came from Google until February 2004. Now, Yahoo uses its own search technology. Learn more in

this recent review from our Search Day newsletter, which also provides some

updated submission details.

In addition to excellent search results, you can use tabs above the search box on the Yahoo home page to seek images, Yellow Page listings or use Yahoo's excellent

shopping search engine. Or visit the Yahoo Search home page, where even more

specialized search options are offered.

Page 20: Mini Project by PAWAN RAJ PHUYAL

The Yahoo Directory still survives. You'll notice "category" links below some of

the sites lists in response to a keyword search. When offered, these will take you to a list of web sites that have been reviewed and approved by a human editor.

AltaVista

http://www.altavista.com

AltaVista is a web search engine owned by Yahoo!. AltaVista was once one of the

most popular search engines but its popularity has waned due to the rise of Google.

AltaVista opened in December 1995 and for several years was the "Google" of its

day, in terms of providing relevant results and having a loyal group of users that loved the service.

Ask

http://www.ask.com

Ask Jeeves initially gained fame in 1998 and 1999 as being the "natural language"

search engine that let you search by asking questions and responded with what

seemed to be the right answer to everything. In reality, technology wasn't what made Ask Jeeves perform so well. Behind the scenes, the company at one point

had about 100 editors who monitored search logs. They then went out onto the web

and located what seemed to be the best sites to match the most popular queries.

In 1999, Ask acquired Direct Hit, which had developed the world's first "click

popularity" search technology. Then, in 2001, Ask acquired Teoma's unique index

and search relevancy technology. Teoma was based upon the clustering concept of

subject-specific popularity.

AOL Search

http://aolsearch.aol.com (internal)

http://search.aol.com/(external)

AOL Search provides users with editorial listings that come Google's crawler-

based index. Indeed, the same search on Google and AOL Search will come up

with very similar matches. So, why would you use AOL Search? Primarily because you are an AOL user. The "internal" version of AOL Search provides links to

content only available within the AOL online service. In this way, you can search

AOL and the entire web at the same time. The "external" version lacks these links. Why wouldn't you use AOL Search? If you like Google, many of Google's features

such as "cached" pages are not offered by AOL Search.

Page 21: Mini Project by PAWAN RAJ PHUYAL

Live Search

http://www.live.com/

Live Search is the name of Microsoft's web search engine, successor to MSN

Search, designed to compete with the industry leaders Google and Yahoo. The

search engine offers some innovative features, such as the ability to view additional search results on the same web page and the ability to adjust the amount

of information displayed for each search-result. It also allows the user to save

searches and see them updated automatically on Live.com.

Look Smart http://www.looksmart.com

Look Smart is primarily a human-compiled directory of web sites. It gathers its

listings in two ways. Commercial sites pay to be listed in its commercial categories, making the service very much like an electronic "Yellow Pages."

However, volunteer editors at the LookSmart-owned Zeal directory also catalog

sites into non-commercial categories for free. Though Zeal is a separate web site, its listings are integrated into LookSmart's results.

Lycos

http://www.lycos.comLycos is one of the oldest search engines on the web, launched in 1994. It ceased crawling the web for its own listings in April 1999 and

instead provides access to human-powered results from LookSmart for popular

queries and crawler-based results from Yahoo for others.

Netscape Search http://search.netscape.com

Owned by AOL Time Warner, Netscape Search uses Google for its main listings,

just as does AOL's other major search site, AOL Search. So why use Netscape Search rather than Google? Unlike with AOL Search, there's no compelling reason

to consider it. The main difference between Netscape Search and Google is that

Netscape Search will list some of Netscape's own content at the top of its results. Netscape also has a completely different look and feel than Google. If you like

either of these reasons, then try Netscape Search. Otherwise, you're probably better

off just searching at Google.

Page 22: Mini Project by PAWAN RAJ PHUYAL

4.2 Limitations of Search Engines

The ambiguities of language mean that the list of retrieved documents may contain a high percentage of irrelevant material.

Some search only document titles and others search the entire document.

Being electronic, they can't discriminate between valuable documents and ones of dubious quality.

With millions of people using the Internet they sometimes become

overloaded.

Page 23: Mini Project by PAWAN RAJ PHUYAL

Chapter 6

Steps of Internet Searching

How is it that an Internet Search engine can find the answers to a query so quickly?

It is

a four-step process:

1. Crawling the Web: following links to find pages.

2. Indexing the pages: to create an index from every word to every place it occurs.

3. Ranking the pages: so the best ones show up first.

4. Displaying the results: in a way that is easy for the user to understand.

Crawling is conceptually quite simple: starting at some well-known sites on the

web, recursively follow every hypertext link, recording the pages encountered

along the way. In computer science this is called the transitive closure of the link relation. However, the conceptual simplicity hides a large number of practical

complications: sites may be busy or down at one point, and come back to life later;

pages may be duplicated at multiple sites (or with different URLs at the same site) and must be dealt with accordingly; many pages have text that does not conform to

the standards for HTML, HTTP redirection, robot exclusion, or other protocols;

some information is hard to access because it is hidden behind a form, Flash

animation or Java script program. Finally, the necessity of crawling 100 million pages a day means that building a crawler is an exercise in distributed computing,

requiring many computers that must work together and schedule their actions so as

to get to all the pages without overwhelming any one site with too many requests at once.

A search engine’s index is similar to the index in the back of a book: it is used to

find the pages on which a word occurs. There are two main differences: the search engine’s index lists every occurrence of every word, not just the important

concepts, and the number of pages is in the billions, not hundreds. Various

techniques of compression and clever representation are used to keep the index

―small,‖ but it is still measured in terabytes (millions of megabytes), which again

Page 24: Mini Project by PAWAN RAJ PHUYAL

means that distributed computing is required. Most modern search engines index

link data as well as word data. It is useful to know how many pages link to a given page, and what are the quality of those pages. This kind of analysis is similar to

citation analysis in bibliographic work, and helps establish which pages are

authoritative. Algorithms such as PageRank and HITS are used to assign a numeric

measure of authority to each page. For example, the PageRank algorithm says that the rank of a page is a function of the sum of the ranks of the pages that link to the

page. If we let PR(p) be the PageRank of page p, Out(p) be the number of outgoing

links from page p, Links(p) be the set of pages that link to page p and N be the total number of pages in the index, then we can define PageRank by

PR (p) = r/N + (1 -r) Si Links (p) PR(i)/Out(i)

where r is a parameter that indicates the probability that a user will choose not to follow a link, but will instead restart at some other page. The r/N term means that

each of the N pages is equally likely to be the restart point, although it is also

possible to use a smaller subset of well-known pages as the restart candidates. Note

that the formula for PageRank is recursive – PR appears on both the right- and left-hand sides of the equation. The equation can be solved by iterating several times,

or by standard linear algebra techniques for computing the eigenvalues of a (3-

billion-by-3-billion) matrix.

The two steps above are query independent—they do not depend on the user’s

query, and thus can be done before a query is issued with the cost shared among all

users. This is why a search takes a second or less, rather than the days it would take if a search engine had to crawl the web anew for each query. We now consider

what happens when a user types a query. Consider the query [―National

Academies‖ computer science], where the square brackets denote the beginning

and end of the query, and the quotation marks indicate that the enclosed words must be found as an exact phrase match. The first step in responding to this query

is to look in the index for the hit lists corresponding to each of the four words

―National,‖ ―Academies,‖ ―computer‖ and ―science.‖ These four lists are then

intersected to yield the set of pages that mention all four words. Because ―National Academies‖ was entered as a phrase, only hits where these two words appear

adjacent and in that order are counted. The result is a list of 19,000 or so pages.

The next step is ranking these 19,000 pages to decide which ones are most relevant. In traditional information retrieval this is done by counting the number of

occurrences of each word, weighing rare words more heavily than frequent words,

and normalizing for the length of the page. A number of refinements on this

scheme have been developed, so it is common to give more credit for pages where

Page 25: Mini Project by PAWAN RAJ PHUYAL

the words occur near each other, where the words are in bold or large font, or in a

title, or where the words occur in the anchor text of a link that points to the page. Inaddition the query-independent authority of each page is factored in. The result is

a numeric score for each page that can be used to sort them best-first. For our four-

word query, most search engines agree that the Computer Science and

Telecommunications Board home page at www7.nationalacademies.org/cstb/ is the best result, although one preferred the National

Academies news page at www.nas.edu/topnews/ and one inexplicably chose a

year-old news story that mentioned the Academies. The final step is displaying the results. Traditionally this is done by listing a short description of each result in

rank-sorted order. The description will include the title of the page and may

include additional information such as a short abstract or excerpt from the page.

Some search engines generate query-independent abstracts while others customize each excerpt to show at least some of the words from the query. Displaying this

kind of query-dependent excerpt means that the search engine must keep a copy of

the full text of the pages (in addition to the index) at a cost of several more terabytes. Some search engines attempt to cluster the result pages into coherent

categories or folders, although this technology is not yet mature.

Studies have shown that the most popular uses of computers are email, word processing and Internet searching. Of the three, Internet searching is by far the

most sophisticated example of computer science technology. Building a high-

quality search engine requires extensive knowledge and experience in information

retrieval, data structure design, user interfaces, and distributed systems implementation.

Future advances in searching will increasingly depend on statistical natural

language processing [Lee] and machine learning [Mitchell] techniques. With so much data—billions of pages, tens of billions of links, and hundreds of millions of

queries per day—it makes sense to use data mining approaches to automatically

improve the system. For example, several search engines now do spelling

correction of user queries. It turns out that the vast amount of correctly and incorrectly spelled text available to a search engine makes it easier to create a good

spelling corrector than traditional techniques based on dictionaries. It is likely that

there will be other examples of text understanding that can be done better with a data-oriented approach; this is an area that search engines are just beginning to

explore.

Page 26: Mini Project by PAWAN RAJ PHUYAL

Chapter 7

Browsing the Internet

5.1 What do we need to know? A browser is a program on your computer that enables you to search ("surf")

and retrieve information on the WorldWideWeb (WWW), which is part of the

Internet. The Web is simply a large number of computers linked together in a global network, that can be accessed using an address (URL, Uniform Resource

Locator), e.g. http://www.nepalnews.com.np for thenews of Nepal).

URLs are often long and therefore easy to type incorrectly. They all begin with

http://, and many (but not all) begin with http://www. In many cases the first part (http://, or even http://www.) can be omitted, and you will still be able to access the

page.

5.2 Searching the Web

If you don't know the telephone number of the person you wish to ring to, you need

a telephone directory. The Web provides two methods of searching for pages

providing information: • sites presenting web pages sorted by category and subcategories, e.g. Yahoo

(several sites, including http://www.yahoo.com and http://www.yahoo.no)

• sites offering search engines that return lists of web pages containing text that

matches a search word or string, e.g. Google (http://www.google.com), AltaVista (http://www.altavista.com) and FAST Search (http://www.alltheweb.com).

Before you conduct a search, it is important to consider, among others, the

following points:

1. Is your choice of search term is adequate, too restrictive or too general?

2. Is the search you have planned to undertake most suited for a search engine that

categorizes web sites, so that you can browse through appropriate subcategories when the first results are returned?

3. Are you more interested in using a search engine that merely returns all the

search.

Page 27: Mini Project by PAWAN RAJ PHUYAL

5.3Web browser

A web browser is a software application for retrieving, presenting, and traversing information resources on the World Wide Web. An information resource is

identified by a Uniform Resource Identifier (URI) and may be a web page, image,

video, or other piece of content. Hyperlinks present in resources enable users to easily navigate their browsers to related resources.Although browsers primarily

intended to access the World Wide Web, they can also be used to access

information provided by web servers in private networks or content in file systems.

The major web browsers in order of usage according to Net Applications are

Windows Internet Explorer, Mozilla Firefox, Apple Safari, Google Chrome, and

Opera.

5.4 Some useful Web Browsers

Internet Explorer

Microsoft's Internet Explorer (IE) is one of the most popular browser today. IE was introduced in 1995 and passed Netscape in popularity in 1998.

Firefox Firefox is a browser from Mozilla. It was released in 2004 and is one of the

most popular browser today.

Netscape

Netscape was the first commercial Internet browser. Netscape was

introduced in 1994, but gradually lost its popularity to Internet Explorer.

The development of Netscape officially ended in February 2008.

Mozilla The Mozilla Project has grown from the ashes of Netscape. Browsers based

on Mozilla code are the largest browser-family on the Internet today.

Page 28: Mini Project by PAWAN RAJ PHUYAL

Chapter 8

Demerits of Internet Searching

Theft of Personal information

If you use the Internet, you may be facing grave danger as your personal

information such as name, address, credit card number etc. can be accessed by other culprits to make your problems worse.

Spamming

Spamming refers to sending unwanted e-mails in bulk, which provide no purpose and needlessly obstruct the entire system. Such illegal activities can be

very frustrating for you, and so instead of just ignoring it, you should make an

effort to try and stop these activities so that using the Internet can become that much safer.

Virus threat

Virus is nothing but a program which disrupts the normal functioning of your

computer systems. Computers attached to internet are more prone to virus attacks and they can end up into crashing your whole hard disk, causing you considerable

headache.

Pornography: This is perhaps the biggest threat related to your children’s healthy mental life.

A very serious issue concerning the Internet.

Time wasting

If we are not sure that what we are searching or if we can’t select the proper search

tips it will take long time.

Page 29: Mini Project by PAWAN RAJ PHUYAL

Chapter 9

Conclusion

Now this is the era of 21st century. The most of the people of the world are using

internet and it became a essential part of the daily life. We can say even a small

work at home also people are using internet. The modernization seen in the world in a short period of time and rapid development of the world is only by the

evolution of computer and internet. We already discuss about the feature and uses

of Internet above also.

Therefore, from my experience during this mini project also I understood the

importance of Internet searching. We can get anything from the Internet if we

search for the proper combination of searching tips and proper words. If we are surfing the Internet also we need to utilize it in the proper way.

Page 30: Mini Project by PAWAN RAJ PHUYAL

References:

http://www.google.com

http://www.wikipedia.com

http://www.ask.com

http://www.yahoo.com

Introduction to Computer by PETER NORTON

Page 31: Mini Project by PAWAN RAJ PHUYAL