|
|
[Top]
Lesson
Four:
World Wide Web Search Engines
Introduction
The ever-changing nature
of the Web provides access to vast numbers of information resources. Web
sites and documents appear, are deleted, or are moved to a different location
each day. In this dynamic environment, search engines can be the most
efficient way of locating information on a specific topic, since they provide
access to immense, continuously updated databases of Internet resources.
There are hundreds of search engines designed to help you find information,
whether you are looking for a topic of personal interest or material for a
scholarly research project.
Using search engines
effectively may seem intimidating since new search engines appear frequently
and existing engines consistently change their search interface and format. Though
there is at present no consistent standard which governs search engines, they
do share many basic features which allow the searcher to retrieve relevant
information. This lesson will help you to effectively use search engines by:
- Understanding how search engines work,
including the role of META tags in search retrieval.
- Understanding the basic search features
available with most search engines.
- Realizing that the HELP screens available for
each search engine must be reviewed on a continuous basis.
- Knowing what you are looking for prior to
conducting a search. Refer back to Lesson 3 for help in analyzing your
topic so that you can take advantage of the most advanced search engine
features.
How Do Search Engines Work?
Most search engines use a
computer program called a "spider" to collect information and index
web resources. Sometimes called "webcrawlers" or
"robots", these computer programs crawl through web sites on the
Internet, gathering information from all the pages of a web site. The spider
returns the information to a central database and then indexes the
information it has gathered. When you perform a search in a search engine,
you are searching the database compiled and indexed by the spider.
Spiders are part of a
larger category of Internet computer programs called "agents."
Agents are computer programs that perform specific functions for their users,
usually gathering, comparing and organizing information. There are agents
that will gather stock prices for you and shopping agents that will locate
merchandise on the Internet. As agents become more sophisticated, they will
be able to provide up-to-the-minute research results, gathering the latest
information as soon as it is posted on the Internet.
While all search engines
rely on spiders to collect and index information, each performs its tasks in
a slightly different way. Each search engine has its own search interface and
uses different criteria for matching searches with documents. Each may also
differ in terms of search speed and how it ranks results in order of
relevance.
Searching would be easier
if the search engines used a common standard. However, since each search
engine operates a little differently, it is a good idea to search more than
one to be sure you have retrieved most of the relevant information available
on your topic.
Meta Tags
An additional element
which affects the way search results are retrieved are META tags. A META tag
is a hidden HTML tag used by web developers to help specify how their web
pages are indexed by search engine "spiders" and
"robots." Keywords describing the content of a web site are added
to the site within a META HTML tag. This is done by the web page author or
developer. Many search engines, including AltaVista, Go, and HotBot index all
the keywords in the META tag in addition to the words in the text of the
document. (Excite, FAST, Google and Northern Light indexes page text but not
META keywords.) In many cases, the keywords displayed in a META tag will
accurately describe the web document and search retrieval will be enhanced.
In cases where META tags are not properly formatted, well-composed, or
purposefully misleading, irrelevant information may be retrieved.
Location and Frequency
Search engines look at
both the location of search terms as well as the frequency of occurence of
search terms to help determine relevancy. The higher up on a web site that a
search term appears, the higher the ranking of that web site. A web site
which contains a search term in the title or in the first few paragraphs of
text will be determined to be more relevant than one in which the search term
appears toward the end of the document.
Search engines also look
at the number of times search terms appear in the text of the web site. Sites
with a higher frequency of a search term are determined to be more relevant.
Ranking and Popularity
In addition to
text-matching techniques, an increasing number of search engines are using
popularity and link analysis as a means of ranking search results.
Direct Hit uses technology that measures what
people are selecting from the results of a number of search engines. In
addition to measuring the number of hits a site receives, Direct Hit also
measures how much time people spend at a site. The longer a user stays at a
site the higher that site is ranked. Through a combination of these two
measurements Direct Hit can show what it considers to be the most relevant
sites for a search topic. Direct Hit stands alone as a search engine, but it
is also integrated into the search results of other search engines such as
HotBot and Lycos.
Google uses link analysis to rank the usefulness
of a web site. Google interprets a link from web site A to web site B
as a "vote" by site A for site B. The more votes or
links a site receives the more relevant that site is. In addition to looking
at the number of links a site receives, Google also analyzes the sites
casting the votes. Votes cast by sites which are themselves major sites (e.g.
receiving many votes themselves) are weighed more heavily than votes from
other less popular sites.
General Search Features
Most of the major search
engines support the following search techniques, although each search engine
operates a little differently. Be sure to read the specific instructions
provided for each search engine in the HELP files. There is usually a link to
a HELP screen near the search box or near the top of the search engine's home
page. These HELP screens should be consulted on a regular basis as the
searching features of the search engine may change.
Size and Coverage
- When search engine producers refer to their
size, they are usually counting unique URLs as opposed to unique sites,
which may contain a number of URLs. There are 10 to 20 search engines
which can be considered "large" with Google, FAST, AltaVista and Northern Light at the top of the
scale. The search engine with the largest collection of sites is not
necessarily the best search engine. However, the larger the search
engine the greater the potential chance that you will find something,
especially if what you are looking for is obscure or unusual.
- Most search engines search the entire text of
web pages.
- Most search engines allow Usenet searches.
- Many search engines will also search for
images, audio files, and video files. (Lesson 6 discusses "format
searching".)
- Some of the search engines provide both simple
and advanced or custom search modes. Search techniques may vary between
modes.
Boolean Searching
- Most search engines support Boolean searching,
allowing AND, OR, and NOT searches. Some engines
only allow AND. In some search engines, the exclusionary NOT
operator is expressed as AND NOT.
- If a list of terms is entered and no Boolean
operator is specified, many search engines use the OR operator as
the default, while others use the AND operator.
- Some search engines require that the Boolean
operator be capitalized; others do not, though those not requiring
capitalization accept it. Therefore, it is a good idea to capitalize any
Boolean operator.
- Many search engines use a simplified form of
Boolean operators, replacing the operator with a symbol:
- the + sign for an AND search
Example:
+drinking +driving searches for the words drinking AND driving,
in no specific order in the text of the web page.
- the - sign for a NOT search
Example:
+dolphins -football will search for documents which contain the word
dolphins but NOT the word football.
- Search statements combining more than one type
of Boolean operator must also use nesting or parentheses around synonymous
terms. The parentheses tells the search engine to perform that search
first.
Example:
+suicide +(teen youth adolescent) will search for documents containing
any or all of the terms within the parentheses before combining that result
with the word suicide. This assumes that the default operator for the search
engine is OR.
Phrase Searching
- Most search engines support the use of
quotation marks around words, terms or names you want searched as a
phrase, i.e., appearing in exactly the order you enter them:
- Example: "ozone layer depletion"
searches for the phrase, with the words in the order given.
- Example: "Martin Luther King"
searches for the name as a phrase.
- Example: "Society for Creative
Anachronism" searches for the organization.
- In some search engines, if a phrase is not
specified in the search statement, the default search is an OR Boolean
search in which just one of the terms in the search need be present to
retrieve a document. This can lead to thousands of irrelevant hits.
- Some search engines use pull-down menus to
allow the searcher to select "exact phrase" as the search
option.
Proximity Searching
- Some search engines, most notably Alta Vista,
support proximity searching. The NEAR operator will allow you to
look for words within 10 words of each other.
- Example: "college students" NEAR
"binge drinking" would look for those two phrases within 10
words of each other in any order.
Field Searching
- Some search engines allow you to limit your
search to specified fields, such as the title of the document, a word
from the URL, the domain name, and the availability of such features as
images, sound, and video.
- Example: title:"affirmative
action" searches for the phrase within titles of documents.
Limiting a search to the title field can be one of the most effective
ways to narrow search results to only the most relevant sites.
- Example: +domain:gov +title:"health
care reform" searches for the phrase within titles of
documents produced by a government agency.
- Example: url:fccj searches for
documents with FCCJ as part of the Internet address.
- Example: link:http://www.fccj.org
searches for web sites which have linked to FCCJ.
Truncation
- Some search engines automatically look for
singular and plural forms of terms as well as "-ing" or
"-ed" endings. Others use the asterisk (*) to specify that all
endings of the root term be searched. This is called
"truncation."
Case Sensitivity
- Some search engines are case sensitive,
requiring that proper names and place names be capitalized.
- In general, when a search statement is entered
in all lower case, both lower case and upper case will be retrieved. The
reverse is not true. When upper case is used the search engine will only
retrieve the exact match. For example, "AIDS" will not
retrieve the common word "aids."
Keyword vs. Concept Searching
- Most search engines use keyword searching.
They look for documents containing the exact words entered. This
necessitates a careful selection of keywords to describe a topic. For
example, a search for the word cancer would not retrieve
documents containing the word neoplasm or carcinoma unless
the word "cancer" was also present in the document, although
all three words express the same concept.
- A search engine which utilizes concept
searching looks for documents related to the idea of the search as
well as those documents containing the exact word(s) of the search.
Concept searching takes into account that a topic can be described in a
wide variety of ways with different words and expressions (for example,
cancer, neoplasms, carcinoma). Excite
is one search engine which utilizes concept searching.
Related Sites
- Some search engines (AltaVista, Go, Google
and Raging Search)
provide links to related or similar sites along with the sites
retrieved. In this way, if you like the content of a particular site you
may be able to find similar or comparable sites which were not retrieved
in your initial search.
Miscellaneous Hints
- Searching can be confusing! Remember, each
search engine works a little differently. To make it easier, be sure to
read the HELP files for each search engine on a regular basis!
- Make sure you try your search in several
search engines. Each search engine's database includes unique documents
that will not be included in other databases.
- For the latest developments in search engines
bookmark the following sites.
o
Search Engine Showdown
Links to Major Search Engines
Below are links to some of
the largest and most popular search engines along with links to their basic
help files.
The following comparison charts provide quick reference guides to most of the
major search engines:
·
Comparison of
Search Engine User Interface Capabilities
Meta-Search Engines
A special kind of seach
engine called a meta-search engine, (or parallel search
engine) allows you to query several search engines at once. Instead of doing
a search itself, a meta-search engine sends your request to other search
engines, compiles the results, and displays them for you. This process is
much faster than querying several search engines separately.
Meta-search engines do
not own any database of web pages--they use and deliver results from the
databases and search programs of each of the individual search engines they
query. Meta-search engines act as an intelligent middle-man to pass your
search through, gather the responses and then give you a report from several
engines at once. As well as saving time, this kind of search engine can give
you an overview of the kind of document you may find using your search terms,
and may even result in giving you exactly what you need if you are searching
for a unique term or phrase.
There are some disadvantages
in relying exclusively on meta-search engines. None of the meta-search
engines query all of the largest search engines. At this writing, none
queries Northern Light; several do not query HotBot. If a connection or
search takes too long, one or more of the search engines may time out and
produce no results. If you submit a complicated search to a meta-search
engine that one of the queried tools does not "understand" you may
get no hits at all from that engine. However, you will usually get results
from another tool that supports your search strategy.
Meta-search engines
retrieve only the first 10-50 hits from each search engine; the total number
of hits may be less than you would retrieve with a direct search on a single
search engine. Thus, meta-search engines do not eliminate the need to learn
how to intelligently search at least one or more general web search engines
(such as AltaVista, Fast, Google, HotBot, or Northern Light).
Each meta-search engine
has its own interface and method for letting you choose engines to search.
Below are links to four popular meta-search engines with links to their basic
help screens.
Sample Searches
Both Alta Vista and Google
were used to search for information on the following topic first introduced
in Lesson 3: Does binge drinking by college students lead to risk-taking
behavior?
Alta
Vista Search (July 2000)
|
Search
|
Results
|
Explanation
|
|
Search #1: college students binge drinking risk taking
behavior
|
1,432,278 Web Pages Found
|
In AltaVista, the OR Boolean Operator is the default
operator. All web sites containing any of the seven terms were retrieved,
causing such a large number of returns. Because AltaVista considers the
location of the terms, and the frequency of the terms, some of the first
sites retrieved were very relevant.
|
|
Search #2: +"college students" +"binge
drinking" +"risk taking behavior"
|
15 Web Pages Found
|
This search used the Boolean AND Operator as well as
Phrase Searching to limit the results. All three phrases had to be
mentioned in a web document to be retrieved. Some of the best web sites
from Search #1 were not retrieved in this second, more precise search.
|
|
Search #3: +"college students" +"binge
drinking"
|
3,249 Web Pages Found
|
This search eliminated the phrase "risk taking
behavior" from the search. Some of the better sites retrieved in
search #1 were also retrieved in this search.
|
|
Search #4: +title:"binge drinking" +domain:edu
|
79 Web Pages Found
|
This search asked for web sites produced by educational
institutions which contained the phrase "binge drinking" in the title
of the site. Many of the sites retrieved from this search were unique and
very relevant.
|
|
Search #5: +"binge drinking" +domain:gov
|
78 Web Pages Found
|
This search asked for web sites produced by government
agencies which contained the phrase "binge drinking" in the text
of the sites.
|
|
Search #6: +"binge drinking" +"college
students" +domain:org
Search #7: +"binge drinking" +"college students"
+domain:org +(risk risky)
|
815 Web Pages Found for Search #6
155 Web Pages Found for Search #7
|
Initially, this search limited the results to web sites
produced by organizations. The secondary search further limited the sites
to those containing either the word risk or risky in the text.
|
|
Search #8: +"binge drinking" +(rape risk sex
crime)
Search #9: +"binge drinking" +(rape risk sex crime) +domain:edu
|
1,343 Web Pages Found for Search #8
439 Web Pages Found for Search #9
|
Initially, this search asked for the phrase "binge
drinking" in the text of web pages and any one of the words within the
parentheses in the text of the site (the Boolean OR Operator). Rather than
using the phrase "risk taking behavior" the search specified
words which implied risk taking behavior. This search retrieved sites which
were unique and which linked binge drinking to criminal activity. By adding
the domain:edu to the secondary search, the search was limited to
educational web sites. This could help to eliminate any X-rated sites which
might be retrieved because of the word "sex".
|
Google
Search (July 2000)
|
Search
|
Results
|
Explanation
|
|
Search #1: college students binge drinking risk taking
behavior
|
1,060 Web Pages Found
|
Google automatically combines all terms entered with the
AND Boolean Operator. Google uses link analysis to rank results; therefore
the web sites listed first on the list will be those which other web sites
have linked to. The first site listed is one which is unique to Google.
|
|
Search #2: "college students" "binge
drinking" "risk taking behavior"
|
15 Web Pages Found
|
This search looked for three phrases and automatically
combined them with the Boolean AND Operator.
|
The following can be
surmised from these searches in Alta Vista and Google:
- Search for a topic in more than one search engine.
The results from Alta Vista and Google were different.
- Do more than one or even two searches on a
topic within a search engine. Each of the searches retrieved unique web
sites within the first 20 or so web sites retrieved.
- Try combining search terms in different ways.
Leave out one concept to try and enlarge search results.
- If available as a search option, use field
searching to limit search results to web sites with important terms in
the title and to limit to educational, governmental and organizational
sites.
- Google does not have
as many search options as Alta Vista. It does not support the Boolean OR
Operator, field searching or domain searching. What does set it apart
though is ranking by link analysis. Complete Exercise Four
after reading this lesson. The exercise is It is worth a total of 17
points.
Copyright © 1997-1999
Florida Community College
Learning Resources Standing Committee
Internet Course Task Force
|
|