[Top]

 

Lesson 5: Specialized Search Engines and Subject Directories


Introduction

For many research needs, the general purpose search engines discussed in Lesson 4 may be a good place to start gathering information. However, many of the major search engines index hundreds of millions of web pages, and the most carefully phrased search statement may sometimes produce an unmanageable number of results, or a list of results that are of poor quality. Another problem is that there are vast numbers of web resources, including the contents of searchable databases, that are completely "invisible" to the general purpose search engines. 

Actually, there are two distinct types of search tools that provide access to web resources: search engines and subject directories. As Lesson 4 explained, search engine indexes are created with automated programs and allow you to search for Web sites by keyword, using Boolean search parameters. Search engines usually have minimal human oversight and do not apply selection or evaluation criteria to the web pages they index. 

Subject directories (sometimes referred to as subject trees), are created by a person or persons who usually select sites using certain selection or evaluation criteria. Subject directories are hierarchically organized indexes of subjects and subheadings that allow the searcher to browse through lists of subjects for relevant information. Directories often are annotated with descriptions.

Today, the line between search engines and subject directories has blurred. Search engines are partnering with subject directories, or creating their own directories, and returning results gathered from their spiders, as well as from a variety of other databases, guides and services. Some specialized search engines are actually searchable, hand-picked directories of web resources focusing on a particular subject. Many search engines and directory sites have expanded to become web portals: sites that offer a wide range of services and resources, such as web subject directories, white and yellow pages, online shopping malls, e-mail services, discussion forums, etc. Web portals emulate the services first offered by online service providers such as CompuServe and America Online. 

This lesson will help you to:

  • Understand the variety and scope of resources available on the "invisible web" and how to access those resources.
  • Locate and use specialized search engines and subject directories to supplement or provide a more specific alternative to the major search engines. 
  • Search for specific types of files and multimedia programs.


 

The Invisible Web

The "invisible web" consists of searchable databases, password-protected sites, and documents that are hidden by firewalls, which are inaccessible to the spiders and webcrawlers that compile indexes for the general purpose search engines. As web technology advances, web developers are creating more dynamic site interfaces and are providing more resources in searchable databases. When a search engine spider encounters a database, it can index only the location of the database, but nothing about the resources contained within it. These invisible resources are rapidly increasing, and many of their databases are maintained by educational institutions and government agencies, and contain a great deal of scholarly information.

There are a number of subject directories providing access to the invisible web, including the following examples:

  • The Big Hub maintains an index of subject specific searchable databases in over 300 categories.
  • Direct Search, maintained by a librarian at George Washington University, includes subject categories in the humanities, sciences, business & economics, ready reference, government resources, archives & library catalogs, news sources & serials (periodicals), as well as other subject areas.
  • InfoMine Scholarly Internet Resource Collections, maintained by the University of California libraries, includes "databases, electronic journals, electronic books, bulletin boards, listservs, online library card catalogs, articles and directories of researchers, among many other types of information."
  • The Invisible Web: The Search Engine of Search Engines is a searchable directory of over 10,000 databases, archives, and search engines that contain information that traditional search engines have been unable to access. Major academic subject areas include arts & humanities, business, computers, education, finance, government, health, legal, news, reference and sciences.
  • Lycos' Searchable Databases page provides a subject directory of links to searchable databases in many subject areas. 
  • AllSearchEngines.Com includes a lengthy list of topical search engines in addition to traditional search engines.

Specialized Search Engines

The invisible web directories listed above provide access to many specialized search engines. Specialized search tools may have interfaces that look identical to general search engines, but they function very differently, since there is usually a human interface that selects and sometimes annotates (or describes) the resources. Since they usually focus on a specific subject, a geographic region, or a certain type of computer file format, specialized search sites can dramatically reduce irrelevancy, and may help you to quickly pinpoint the information you need. 

Specialized search engines provide access to invisible web resources, but also index high-quality web sites available via general purpose search engines. These search engines usually index fewer Web pages, but focus only on information relevant to the topic. Another difference between large search engines and specialized search engines is human interaction. Many specialized search engines utilize subject specialists who evaluate and annotate each link, ensuring that only the most relevant and best quality resources are included. 

You can compare specialized search engines with specialized reference books in a library. You wouldn't go to a general encyclopedia to find an address or telephone number for an organization or association, but would need to consult a specialized encyclopedia of associations.

Examples of specialized search engines include:

Arts & Humanities

  • ADAM, Art, Design, Architecture & Media Information Gateway is a is a collection of carefully selected Internet resources focusing on fine art, design, architecture, applied arts, media, theory and museum studies and conservation.
  • Internet ArtResources is a comprehensive resource for art-related information. Includes galleries, artists, fairs and expositions and schools. 
  • Labyrinth provides resources in medieval studies. 
  • Voice of the Shuttle is a comprehensive search tool for humanities resources. Includes anthropology, archaeology, art, history, linguistics, literature, music & dance, philosophy, religion and other subject areas. 

Education

  • ERIC provides a huge database of education research and practices. 
  • FinAid! offers information about financial aid loans, scholarships, grants, grad school funds, financial aid applications, calculators and more. 
  • Study Abroad Programs provides a huge database of education research and practice. 

Health & Medical

  • Combined Health Information Database, produced by health-related agencies of the federal government, provides a huge database providing titles and abstracts for health information and health education resources.
  • Healthwise: Go Ask Alice, from Columbia University's Health Education and Wellness program, provides a question & answer service about health, including a searchable database of previously asked questions. 
  • MedSeek: Directory of Physicians, lists physicians across the U. S. You can search by specialty, name or geographic area.

Law & Legal

  • FindLaw provides a search tool called LawCrawler, a specialized search engine which "uses intelligent agents combined with the AltaVista search engine and database and other legal code and case law databases, enabling you to focus your search on legal information and on particular domains".
  • LawGuru offers several ways to search for legal information. You can use the Legal Research page, which allows you to choose a legal search engine, or you can use the Multiple Search Tool, which allows you to search various legal resources from one search window, or you can search the LawGuru BBS, which provides legal questions and answers.

Science

  • BioTech, from the University of Texas at Austin covers biology, chemistry, microbiology, and human genetics.
  • MathSearch, from the University of Sydney, Australia, searches a collection of over 200,000 documents on English language mathematics and statistics servers across the Web.
  • USDA Plants Database focuses on vascular plants, mosses, liverworts, hornworts, and lichens of the U.S. and its territories.

Social Science


Software

A number software libraries provide searchable databases for freeware and shareware programs. Most of the sites listed below also offer software reviews: 

 


Subject Directories

Subject directories are usually compiled and maintained by people, or if by a computer program, by some type of automated selection criteria. Like specialized search engines, since they are usually maintained by human beings and are selective, subject directory databases are smaller than those of the general purpose search engines. Like the specialized search engines, directories usually produce more relevant results than search engines because of their size and because they usually index a web site's first page only. 

Although web subject directories catalog a small segment of the Web's millions of documents, they provide a quick and easy search by subject, and often by keyword. Directories may be extremely useful if you have no idea where to start searching. They are more useful for searching general subjects rather than for more specific information. Beginning an information search in a subject directory can give you some idea of the types of information files available on the Internet for that particular subject. 

When beginning a search, you will notice the top level subjects headings usually consist of very broad subjects, such as "Arts and Humanities", "Education", and "Health." After choosing a subject at the top level, you can move through lists of submenus to narrow your search. Under "Health" you might find "Diseases," "Drugs," and "Fitness." Continue following the subheadings and eventually you will reach a page that lists web documents. Click on the links that look interesting and use your browser's Back button to return to the subject directory. 

Some of the subject directories provide an alternative to moving down their hierarchical lists of menus by providing a search engine for their database. You can use keywords to search these directories, but you will be limited to resources in the directory's database. 

Examples of general subject guides include: 

Remember that some of the sites above also offer a web search engine in addition to their subject directory of reviewed sites. The search engine may search a database which includes non-reviewed sites compiled by an automated spider. 

There are several specialized directories of subject guides compiled by subject specialists who are experts in their subject fields. These directories are called distributed subject trees. These directories distribute the responsibility of maintaining lists of the best, most relevant Internet documents in various subject areas to volunteers. Each volunteer is responsible for maintaining a list of documents in his or her area of subject expertise. These guides are likely to produce highly relevant information sources.

Examples of distributed subject trees include: 

There are also many specialized directories that organize and provide links to resources in specific subject areas, such as 

  • Browser Watch, which provides a subject directory of download sites for available web browser plug-ins. A browser plug-in is a program which handles file types not supported by the browser. Plug-ins must be downloaded separately from the browser and installed on your PC before you can view the file formats they support. Examples of commonly used plug-ins include Adobe's Acrobat Reader, Macromedia's Shockwave Player, and the RealAudio/Video RealPlayer.

 



 

File Format or Multimedia Searching

One area that presents a special problem for web researchers is locating resources in specific file formats, including multimedia files such as shockwave, VRML, Java, pictures, video files, or audio files, or particular file extensions such as .pdf. You can search for these file formats by using one of the general purpose search engines that provide multimedia searching, or you can use a specialized search engine devoted to multimedia files or a particular type of file format. These search engines can be useful when you need to locate multimedia information, such as speeches, oral histories, recorded news events, or if you want to find a music video or live radio broadcast.

The Web is a rich source for pictures and photographs, but be aware that most images on the web have cryptic filenames that may not correspond to the subject of the image, such as libimg.gif or comp.jpg, so a standard keyword search is not likely to be successful. General searches usually do not produce audio or video files, since the content of these files is not visible to the search engines.

The following chart provides a list of commonly found media file types found on the Web, along with some of their extensions, and the player or software required to view or listen. Some file formats are not supported by operating systems or web browsers and require a browser plug-in.

File Extension

File Type

Media Format

Software Required

.au

Audio

Audio

Windows Media

.avi

Audio Video Interleave

Audio

Windows Media

.bmp

Bitmap

Graphic

Browser

.jpg or .jpeg
pronounced jay-peg

Joint Photographic Experts Group

Graphic

Browser

.gif
pronounced jiff or giff

Graphics Interchange Format

Graphic

Browser

.midi

Musical Instrument Digital Interface

Audio

Windows Media

.mpeg

Moving Picture Experts Group

Video

Software Required

.mp3

Audio

MPEG, Audio Layer 3

Windows Media, Various Players

.pdf

Portable Document Format

Text, Graphics

Acrobat Reader

.qt

Quicktime

Video

Quicktime Player

.ra

Real Audio

Audio, Video

Software Required

.wav

Wave Form Audio

Audio

Windows Media

The following general search engines provide special functionality for searching various types of multimedia formats:

·         AltaVista Multimedia Search allows you to search for specific types of images, including photos, graphics, color or black and white. Audio files are searchable audio files such as MP3, WAV, Windows Media and Real Audio, and video files, including such formats as AVI, MPEG, Quicktime, Windows Media, Real Video. You can even specify the length or duration of audio and video files.

  • Excite provides an Audio/Video search for AVI, MIDI, MPEG, Real, Quicktime, and WAV files.
  • Go.com offers images and Audio/Video options.
  • HotBot's Advanced Search has options for image, audio, MP3, video, Shockwave, Java, JavaScript, ActiveX, VRML, Acrobat, VBScript, Windows Media, RealAudio/Video, and allows you to specify a particular file extension.
  • Lycos offers a multiMedia search engine, which allows you to search for pictures, sounds, movies, and streams (streaming audio and video).

There are also specialized, searchable directories devoted to multimedia resources:

  • Scour indexes millions of multimedia files, including music, video, and images, with a focus on entertainment.
  • Streambox focuses on streaming audio and video, in RealAudio, RealVideo, and Windows Media formats. Within these formats, you can find anything from radio stations and music videos to audiobooks, live television segments or lecture series.
  • WebSeek, from Columbia University, has indexed and cataloged hundreds of thousands of images and video files.
  • Yahoo's Broadcast.com offers links to audio and video files.

 


Complete Exercise 5 after reading Lesson 5. It is worth 7 points.

Copyright © 1997-2000 Florida Community College
Learning Resources Standing Committee
Internet Course Task Force