[Top]

 

Lesson Four:
World Wide Web Search Engines


 

Introduction

The ever-changing nature of the Web provides access to vast numbers of information resources. Web sites and documents appear, are deleted, or are moved to a different location each day. In this dynamic environment, search engines can be the most efficient way of locating information on a specific topic, since they provide access to immense, continuously updated databases of Internet resources. There are hundreds of search engines designed to help you find information, whether you are looking for a topic of personal interest or material for a scholarly research project.

Using search engines effectively may seem intimidating since new search engines appear frequently and existing engines consistently change their search interface and format. Though there is at present no consistent standard which governs search engines, they do share many basic features which allow the searcher to retrieve relevant information. This lesson will help you to effectively use search engines by:

  • Understanding how search engines work, including the role of META tags in search retrieval.
  • Understanding the basic search features available with most search engines.
  • Realizing that the HELP screens available for each search engine must be reviewed on a continuous basis.
  • Knowing what you are looking for prior to conducting a search. Refer back to Lesson 3 for help in analyzing your topic so that you can take advantage of the most advanced search engine features.

 

How Do Search Engines Work?

Most search engines use a computer program called a "spider" to collect information and index web resources. Sometimes called "webcrawlers" or "robots", these computer programs crawl through web sites on the Internet, gathering information from all the pages of a web site. The spider returns the information to a central database and then indexes the information it has gathered. When you perform a search in a search engine, you are searching the database compiled and indexed by the spider.

Spiders are part of a larger category of Internet computer programs called "agents." Agents are computer programs that perform specific functions for their users, usually gathering, comparing and organizing information. There are agents that will gather stock prices for you and shopping agents that will locate merchandise on the Internet. As agents become more sophisticated, they will be able to provide up-to-the-minute research results, gathering the latest information as soon as it is posted on the Internet.

While all search engines rely on spiders to collect and index information, each performs its tasks in a slightly different way. Each search engine has its own search interface and uses different criteria for matching searches with documents. Each may also differ in terms of search speed and how it ranks results in order of relevance.

Searching would be easier if the search engines used a common standard. However, since each search engine operates a little differently, it is a good idea to search more than one to be sure you have retrieved most of the relevant information available on your topic.

Meta Tags

An additional element which affects the way search results are retrieved are META tags. A META tag is a hidden HTML tag used by web developers to help specify how their web pages are indexed by search engine "spiders" and "robots." Keywords describing the content of a web site are added to the site within a META HTML tag. This is done by the web page author or developer. Many search engines, including AltaVista, Go, and HotBot index all the keywords in the META tag in addition to the words in the text of the document. (Excite, FAST, Google and Northern Light indexes page text but not META keywords.) In many cases, the keywords displayed in a META tag will accurately describe the web document and search retrieval will be enhanced. In cases where META tags are not properly formatted, well-composed, or purposefully misleading, irrelevant information may be retrieved.

Location and Frequency

Search engines look at both the location of search terms as well as the frequency of occurence of search terms to help determine relevancy. The higher up on a web site that a search term appears, the higher the ranking of that web site. A web site which contains a search term in the title or in the first few paragraphs of text will be determined to be more relevant than one in which the search term appears toward the end of the document.

Search engines also look at the number of times search terms appear in the text of the web site. Sites with a higher frequency of a search term are determined to be more relevant.

Ranking and Popularity

In addition to text-matching techniques, an increasing number of search engines are using popularity and link analysis as a means of ranking search results.

Direct Hit uses technology that measures what people are selecting from the results of a number of search engines. In addition to measuring the number of hits a site receives, Direct Hit also measures how much time people spend at a site. The longer a user stays at a site the higher that site is ranked. Through a combination of these two measurements Direct Hit can show what it considers to be the most relevant sites for a search topic. Direct Hit stands alone as a search engine, but it is also integrated into the search results of other search engines such as HotBot and Lycos.

Google uses link analysis to rank the usefulness of a web site. Google interprets a link from web site A to web site B as a "vote" by site A for site B. The more votes or links a site receives the more relevant that site is. In addition to looking at the number of links a site receives, Google also analyzes the sites casting the votes. Votes cast by sites which are themselves major sites (e.g. receiving many votes themselves) are weighed more heavily than votes from other less popular sites.


 

General Search Features

Most of the major search engines support the following search techniques, although each search engine operates a little differently. Be sure to read the specific instructions provided for each search engine in the HELP files. There is usually a link to a HELP screen near the search box or near the top of the search engine's home page. These HELP screens should be consulted on a regular basis as the searching features of the search engine may change.

Size and Coverage

  • When search engine producers refer to their size, they are usually counting unique URLs as opposed to unique sites, which may contain a number of URLs. There are 10 to 20 search engines which can be considered "large" with Google, FAST, AltaVista and Northern Light at the top of the scale. The search engine with the largest collection of sites is not necessarily the best search engine. However, the larger the search engine the greater the potential chance that you will find something, especially if what you are looking for is obscure or unusual.
  • Most search engines search the entire text of web pages.
  • Most search engines allow Usenet searches.
  • Many search engines will also search for images, audio files, and video files. (Lesson 6 discusses "format searching".)
  • Some of the search engines provide both simple and advanced or custom search modes. Search techniques may vary between modes.

Boolean Searching

  • Most search engines support Boolean searching, allowing AND, OR, and NOT searches. Some engines only allow AND. In some search engines, the exclusionary NOT operator is expressed as AND NOT.
  • If a list of terms is entered and no Boolean operator is specified, many search engines use the OR operator as the default, while others use the AND operator.
  • Some search engines require that the Boolean operator be capitalized; others do not, though those not requiring capitalization accept it. Therefore, it is a good idea to capitalize any Boolean operator.
  • Many search engines use a simplified form of Boolean operators, replacing the operator with a symbol:
    • the + sign for an AND search

Example: +drinking +driving searches for the words drinking AND driving, in no specific order in the text of the web page.

    • the - sign for a NOT search

Example: +dolphins -football will search for documents which contain the word dolphins but NOT the word football.

  • Search statements combining more than one type of Boolean operator must also use nesting or parentheses around synonymous terms. The parentheses tells the search engine to perform that search first.

Example: +suicide +(teen youth adolescent) will search for documents containing any or all of the terms within the parentheses before combining that result with the word suicide. This assumes that the default operator for the search engine is OR.

Phrase Searching

  • Most search engines support the use of quotation marks around words, terms or names you want searched as a phrase, i.e., appearing in exactly the order you enter them:
    • Example: "ozone layer depletion" searches for the phrase, with the words in the order given.
    • Example: "Martin Luther King" searches for the name as a phrase.
    • Example: "Society for Creative Anachronism" searches for the organization.
  • In some search engines, if a phrase is not specified in the search statement, the default search is an OR Boolean search in which just one of the terms in the search need be present to retrieve a document. This can lead to thousands of irrelevant hits.
  • Some search engines use pull-down menus to allow the searcher to select "exact phrase" as the search option.

Proximity Searching

  • Some search engines, most notably Alta Vista, support proximity searching. The NEAR operator will allow you to look for words within 10 words of each other.
  • Example: "college students" NEAR "binge drinking" would look for those two phrases within 10 words of each other in any order.

Field Searching

  • Some search engines allow you to limit your search to specified fields, such as the title of the document, a word from the URL, the domain name, and the availability of such features as images, sound, and video.
    • Example: title:"affirmative action" searches for the phrase within titles of documents. Limiting a search to the title field can be one of the most effective ways to narrow search results to only the most relevant sites.
    • Example: +domain:gov +title:"health care reform" searches for the phrase within titles of documents produced by a government agency.
    • Example: url:fccj searches for documents with FCCJ as part of the Internet address.
    • Example: link:http://www.fccj.org searches for web sites which have linked to FCCJ.

Truncation

  • Some search engines automatically look for singular and plural forms of terms as well as "-ing" or "-ed" endings. Others use the asterisk (*) to specify that all endings of the root term be searched. This is called "truncation."

Case Sensitivity

  • Some search engines are case sensitive, requiring that proper names and place names be capitalized.
  • In general, when a search statement is entered in all lower case, both lower case and upper case will be retrieved. The reverse is not true. When upper case is used the search engine will only retrieve the exact match. For example, "AIDS" will not retrieve the common word "aids."

Keyword vs. Concept Searching

  • Most search engines use keyword searching. They look for documents containing the exact words entered. This necessitates a careful selection of keywords to describe a topic. For example, a search for the word cancer would not retrieve documents containing the word neoplasm or carcinoma unless the word "cancer" was also present in the document, although all three words express the same concept.
  • A search engine which utilizes concept searching looks for documents related to the idea of the search as well as those documents containing the exact word(s) of the search. Concept searching takes into account that a topic can be described in a wide variety of ways with different words and expressions (for example, cancer, neoplasms, carcinoma). Excite is one search engine which utilizes concept searching.

Related Sites

  • Some search engines (AltaVista, Go, Google and Raging Search) provide links to related or similar sites along with the sites retrieved. In this way, if you like the content of a particular site you may be able to find similar or comparable sites which were not retrieved in your initial search.

Miscellaneous Hints

  • Searching can be confusing! Remember, each search engine works a little differently. To make it easier, be sure to read the HELP files for each search engine on a regular basis!
  • Make sure you try your search in several search engines. Each search engine's database includes unique documents that will not be included in other databases.
  • For the latest developments in search engines bookmark the following sites.

o        Search Engine Showdown

 


 

Links to Major Search Engines

Below are links to some of the largest and most popular search engines along with links to their basic help files.


The following comparison charts provide quick reference guides to most of the major search engines:

·         Comparison of Search Engine User Interface Capabilities


 

Meta-Search Engines

A special kind of seach engine called a meta-search engine, (or parallel search engine) allows you to query several search engines at once. Instead of doing a search itself, a meta-search engine sends your request to other search engines, compiles the results, and displays them for you. This process is much faster than querying several search engines separately.

Meta-search engines do not own any database of web pages--they use and deliver results from the databases and search programs of each of the individual search engines they query. Meta-search engines act as an intelligent middle-man to pass your search through, gather the responses and then give you a report from several engines at once. As well as saving time, this kind of search engine can give you an overview of the kind of document you may find using your search terms, and may even result in giving you exactly what you need if you are searching for a unique term or phrase.

There are some disadvantages in relying exclusively on meta-search engines. None of the meta-search engines query all of the largest search engines. At this writing, none queries Northern Light; several do not query HotBot. If a connection or search takes too long, one or more of the search engines may time out and produce no results. If you submit a complicated search to a meta-search engine that one of the queried tools does not "understand" you may get no hits at all from that engine. However, you will usually get results from another tool that supports your search strategy.

Meta-search engines retrieve only the first 10-50 hits from each search engine; the total number of hits may be less than you would retrieve with a direct search on a single search engine. Thus, meta-search engines do not eliminate the need to learn how to intelligently search at least one or more general web search engines (such as AltaVista, Fast, Google, HotBot, or Northern Light).

Each meta-search engine has its own interface and method for letting you choose engines to search. Below are links to four popular meta-search engines with links to their basic help screens.


 

Sample Searches

Both Alta Vista and Google were used to search for information on the following topic first introduced in Lesson 3: Does binge drinking by college students lead to risk-taking behavior?

Alta Vista Search (July 2000)

Search

Results

Explanation

Search #1: college students binge drinking risk taking behavior

1,432,278 Web Pages Found

In AltaVista, the OR Boolean Operator is the default operator. All web sites containing any of the seven terms were retrieved, causing such a large number of returns. Because AltaVista considers the location of the terms, and the frequency of the terms, some of the first sites retrieved were very relevant.

Search #2: +"college students" +"binge drinking" +"risk taking behavior"

15 Web Pages Found

This search used the Boolean AND Operator as well as Phrase Searching to limit the results. All three phrases had to be mentioned in a web document to be retrieved. Some of the best web sites from Search #1 were not retrieved in this second, more precise search.

Search #3: +"college students" +"binge drinking"

3,249 Web Pages Found

This search eliminated the phrase "risk taking behavior" from the search. Some of the better sites retrieved in search #1 were also retrieved in this search.

Search #4: +title:"binge drinking" +domain:edu

79 Web Pages Found

This search asked for web sites produced by educational institutions which contained the phrase "binge drinking" in the title of the site. Many of the sites retrieved from this search were unique and very relevant.

Search #5: +"binge drinking" +domain:gov

78 Web Pages Found

This search asked for web sites produced by government agencies which contained the phrase "binge drinking" in the text of the sites.

Search #6: +"binge drinking" +"college students" +domain:org
Search #7: +"binge drinking" +"college students" +domain:org +(risk risky)

815 Web Pages Found for Search #6
155 Web Pages Found for Search #7

Initially, this search limited the results to web sites produced by organizations. The secondary search further limited the sites to those containing either the word risk or risky in the text.

Search #8: +"binge drinking" +(rape risk sex crime)
Search #9: +"binge drinking" +(rape risk sex crime) +domain:edu

1,343 Web Pages Found for Search #8
439 Web Pages Found for Search #9

Initially, this search asked for the phrase "binge drinking" in the text of web pages and any one of the words within the parentheses in the text of the site (the Boolean OR Operator). Rather than using the phrase "risk taking behavior" the search specified words which implied risk taking behavior. This search retrieved sites which were unique and which linked binge drinking to criminal activity. By adding the domain:edu to the secondary search, the search was limited to educational web sites. This could help to eliminate any X-rated sites which might be retrieved because of the word "sex".

Google Search (July 2000)

Search

Results

Explanation

Search #1: college students binge drinking risk taking behavior

1,060 Web Pages Found

Google automatically combines all terms entered with the AND Boolean Operator. Google uses link analysis to rank results; therefore the web sites listed first on the list will be those which other web sites have linked to. The first site listed is one which is unique to Google.

Search #2: "college students" "binge drinking" "risk taking behavior"

15 Web Pages Found

This search looked for three phrases and automatically combined them with the Boolean AND Operator.

The following can be surmised from these searches in Alta Vista and Google:

  • Search for a topic in more than one search engine. The results from Alta Vista and Google were different.
  • Do more than one or even two searches on a topic within a search engine. Each of the searches retrieved unique web sites within the first 20 or so web sites retrieved.
  • Try combining search terms in different ways. Leave out one concept to try and enlarge search results.
  • If available as a search option, use field searching to limit search results to web sites with important terms in the title and to limit to educational, governmental and organizational sites.
  • Google does not have as many search options as Alta Vista. It does not support the Boolean OR Operator, field searching or domain searching. What does set it apart though is ranking by link analysis. Complete Exercise Four after reading this lesson. The exercise is It is worth a total of 17 points.

Copyright © 1997-1999 Florida Community College
Learning Resources Standing Committee
Internet Course Task Force