[Top]

 

Lesson One:
Internet History & Protocols


Introduction

Throughout history, libraries have provided information resources to support the needs of researchers. Although recorded information has been disseminated in many ways, from clay tablets to computers, libraries have collected, organized, and made information available to their users. Libraries have evolved over thousands of years from their origins as places to store the archives of a particular city or culture, to twentieth-century information gateways leading to vast amounts of virtual information resources.

Libraries today provide information in many formats, including physical collections of books, periodicals, newspapers, pamphlets, government documents, audiovisual materials, and electronic resources. While physical collections still serve as essential sources of information, the development of a virtual world of online databases, reference services, and the Internet has created a valuable extension to the library. "Virtual libraries", "digital libraries", or "online libraries" are terms used to describe libraries in the Information Age that organize and provide access to a huge number of digital information resources scattered throughout cyberspace.

Many of those resources have been made available via the Internet. Since the creation of the World Wide Web in 1989, the phenomenal growth of the Internet has provided a global connection of information and communication networks that is astounding in scope. At the click of a computer mouse, the Internet seems to instantly produce an infinite number of resources on any imaginable research topic. However, if you jump into searching the Internet without any preparation, you may find yourself overwhelmed by the variety and complexity of search tools and the retrieval of thousands of documents, many of which are irrelevant or superficial.

If you truly want to plumb the depths of the sea of information offered by the Internet, you need some in-depth navigation skills. Planning your research project, deciding when it is appropriate to supplement traditional library resources with Internet resources, focusing on the appropriate search tools, using specific search commands and operators, and evaluating the resources you retrieve can save time and produce high-quality results.

The Internet can help you with a number of research tasks, including:

  • Browsing or participating in worldwide discussion groups on thousands of topics;
  • Using electronic mail to get information from experts in response to your questions;
  • Connecting to libraries all over the world to locate materials in their collections, which may be borrowed through your local library via interlibrary loan;
  • Accessing information from local, state, and federal government agencies, organizations, and professional associations;
  • Accessing online reference materials, electronic journals and newspapers to find the latest news reports and articles;
  • Participating in an interactive online conference or training tutorial dealing with a timely topic.

The Internet has actually redefined the term "resource" and the process of conducting research. Today, the term "resource" encompasses not only traditional resources available in libraries such as books, periodicals and audiovisual materials, but it also includes broadcast media which is available from many sources, and now includes any resource you can access from any Internet-capable computer or device in your library, at work, or at home. Although computers are now the primary means of accessing the Internet, other devices are being used, such as cell phones, personal digital assistants, and pagers, which can send and receive e-mail and access the Web. Soon, a variety of appliances, including your car or your TV set, may be connected to the network, communicating with each other, and providing access to information.

An Internet resource can be an e-mail message stored in a discussion group archive; an online magazine, journal, or encyclopedia article from an Internet-accessible database or research service; an archive of daily newspaper articles; a statistical database compiled by the U.S. Government; a personal, organizational or corporate home page; a sound file, map, digital photograph, streaming audio or video file that can be downloaded to your home computer; an interactive tutorial offering a variety of multimedia features; or a new type of resource that may not have existed prior to the advent of digital networks.


 

Internet History

In order to understand why the Internet provides such a wealth of resources and why those resources are distributed on computers all over the world, you should know a little about its history and structure. The Internet had very humble beginnings. Its founders had no intention for it to develop into a universally accessible, global network of information.

The Internet began in 1969 as a network called ARPANET, designed for the Advanced Research Projects Agency (ARPA) of the U.S. Department of Defense. ARPA was established in 1958 as the result of the Soviet launching of the Sputnik satellites, which ignited fears of Russian aggression from space.

ARPA sponsored research on linking geographically remote computers to allow remote logon access and sharing of data and resources. ARPA's goal was to connect military and defense contractors and universities involved with defense research. In the fall of 1969 ARPANET linked the first four computers known as Interface Message Processors, which were located at the University of California at Los Angeles, Stanford Research Institute, University of California at Santa Barbara, and the University of Utah.

According to most published accounts of Internet history, the original ARPANET was designed as an experiment in developing a network which would withstand a nuclear attack--if a section of the network disappeared, the entire network would not be destroyed. To that end, the network was decentralized, data was distributed among all the network computers, and data was transferred in small packets.

A somewhat revisionist history of the early days of the Internet, Where Wizards Stay Up Late: The Origins of the Internet, was published in 1996, and claims the original purpose for the development of ARPANET was to share data and distribute the cost of computing, rather than to develop a network that would withstand a nuclear attack.

In fact, the concept of a decentralized network dates to the early 1960s, when the RAND Corporation, a think tank which studies national security and public welfare issues for the U. S. government, was asked by the U. S. Air Force to design a communications network that would be able to survive and function during and after a nuclear attack. The idea of a centralized network was unacceptable, since any central network computer or network control center would likely be the first target in an attack.

Paul Baran of RAND published a paper in 1964 entitled On Distributed Communications, which provided the theoretical design for data transfer on an unreliable network, a network which was designed from the beginning to operate while in tatters. Baran's design proposed many of the features which were eventually incorporated into the network we have today, including decentralized data storage, digital packets and different routes for packets in the same data transfer.

Baran proposed a network of special computers or nodes whose sole purpose was to route messages like "hot potatoes." As soon as a message entered the router, it would be tossed out again by the most efficient route, or if the best path was destroyed or busy, the message was sent over the next best route (Hafner, 61-62).

Baran based his network design on observations of human brain function, after noticing that brain functions don't rely on a centralized set of cells. Brain circuitry can be rerouted around damaged cells and neural nets can be re-created over new pathways (Hafner, 57). Baran's ideas were not immediately accepted by ARPA, but were improved upon by an ARPA director, Lawrence Roberts, who in 1967 proposed a packet-switched network, based on the efficiency and reliability of Baran's original idea.

The early 1970s were spent developing basic standards called protocols for data transfer (see How the Internet Works for more information on protocols). The first protocol developed was known as the Network Control Protocol or NCP. This protocol supported computers running on the same network.

By 1972, there were 37 host computers connected to ARPANET and ARPA's name was changed to DARPA (Defense Advanced Research Projects Agency). The researchers at ARPANET realized the need to create protocols that supported not only the data sharing for computers on the same network, but also the interconnection of different computer networks, now known as internetworking. Stanford Research Institute was assigned the task of designing a set of protocols that would allow multiple computer networks to be interconnected together.

During 1973-1978 a team of researchers led by Vinton Cerf at Stanford Research Institute and Robert Kahn of ARPA developed a suite of protocols called TCP/IP (Transmission Control Protocol and Internet Protocol) which supported the interconnection of a number of different computer networks. In 1983 TCP/IP replaced NCP as the core Internet protocol.

Although the ARPANET's founders originally allowed only defense scientists and military researchers to logon and run programs from remote computers, by the early 1980s, educators discovered the value of interconnected computers, especially supercomputers that were expensive to develop, to share research information and computing resources.

The universities needed a worldwide network like ARPANET. Because military agencies are less willing to share information or to allow access, the U. S. academic community began developing several networks. The academics created BITNET (Because It's Time Network), an academic and research network that links IBM computer centers around the world, and the CSNET (Computer Science Network) that linked university computer science departments.

In 1986 the NSFNet was created and named for the National Science Foundation, which provided most of the funding. NSFNet linked academic researchers across the country with five supercomputer centers. This soon expanded to include regional and statewide academic networks that connected universities and research organizations, and the NSFNet began to replace the ARPANET for research networking. The academic networks were developed with the same network structure as ARPANET, as independent, interconnected sites scattered randomly around the world. The NSFNet continuously linked more powerful supercomputers through faster connections, upgrading the network in 1986, 1988, and 1990.

As these government and educational networks established connections, the concept of the Internet, a worldwide connection of networks, was born. However, it wasn't until the World Wide Web became available in the mid 1990s that the Internet became ubiquitous and easily available to the casual computer user.

The World Wide Web

The World Wide Web is a branch or subsection of the Internet that provides access to hypertext documents. Hypertext resources are documents which provide links or connections to other documents. Selecting a hypertext link allows you to jump to the information the link represents. You can also return to a previous link and then go off in another direction. Hypertext lets you move through a text in a nonlinear manner and allows you to explore a vast worldwide "web" of information.

The World Wide Web began in 1989 as a communications project in Switzerland, at the European Laboratory for Particle Physics called CERN (Conseil Europeen pour Researche Nucleaire). Tim Berners-Lee, a graduate of Oxford University with a background in computer communications, proposed a global hypertext information system to be used as a means of transporting research and ideas throughout CERN. Berners-Lee was proposing a solution to two problems: information storage and retrieval, and communication on a global scale, since the members of CERN were located in a number of countries.

Berners-Lee created an information system using hypertext, combined with the global connections provided by the Internet, to produce a "web" of connected documents that can be located anywhere in the world and accessed by anyone with a computer and a hypertext browser.

Hypertext is a concept that has been discussed since 1945, when Vannevar Bush, science advisor to President Roosevelt during World War II, proposed a machine that would be capable of producing hypertext links between documents. Bush's proposal was outlined in an article entitled As We May Think, published in the July 1945 issue of The Atlantic Monthly.

In 1965, Ted Nelson coined the term "hypertext" and proposed a worldwide hypertext system called "Xanadu," to which individuals could contribute resources. Other hypertext programs were developed during the intervening years, but it wasn't until Berners-Lee developed a hypertext browser that functioned with existing Internet technology that a global hypertext information system was created.

The original CERN project outlined a simple system using networked hypertext links to transmit documents and communicate among physics researchers. The links appeared as highlighted words in the document. Later on as more people became interested in the possibilities of hypertext, highlighted, colored or underlined text, pictures, icons, or graphics were used as links, and links were made to sound and video files. The term hypermedia is sometimes used to describe a hypertext system which can display multimedia, including graphics, sounds, animation, and video.

In 1992, there were 50 web servers worldwide. Today, the Internet, including a vast number of World Wide Web sites, has become a collection of millions of independent networks, each owned by organizations independent of each other, all interconnected by high-speed data lines, satellites, cable modems, radio signals, and wireless connections.

Due to its origins in the decentralized ARPANET, there is no central computer or data storage on the Internet. Information files are scattered around the Net, around the world, virtually hidden in far-away places, waiting for discovery.

Internet developers and users have the freedom to publish anything on the Internet. The openness of the Internet and the availability of information on almost any topic reflects the values of those who built it. Although it started with the federal government, it was built by people in the worlds of education and scientific research, and it reflects their values of individual participation, equality, and information sharing. Although commercial interests are now proliferating and seeking to change the concept of free information sharing, a wealth of information, some free, some fee-based, is available to researchers who have the skills to locate, select, and evaluate resources.

Some links on the history of the Internet include:
A Little History of the World Wide Web
History of the Internet and WWW
Hobbes' Internet Timeline


 

How the Internet Works: Protocols

You might take for granted that when you retrieve a file of information or send an e-mail message across the Internet it will always reach its destination. However, sometimes it doesn't because the process for sending information is extremely complex.

In order for the Internet to work in connecting many different types of computers, software and files together, standardized rules called protocols must be used, that define how computers communicate. A good example of an early communications protocol was Morse Code. The protocol for Morse Code used standardized dots and dashes to communicate over telegraph lines by transmitting electrical impulses.

Internet connections are made with a series of protocols called TCP/IP (Transmission Control Protocol/Internet Protocol). The TCP/IP protocols define the Internet as a packet-switched network. With a packet-switched connection there is no single, unbroken connection between sender and receiver, like there is with the telephone system.

The telephone system is a connection-oriented, circuit-switched network. When you make a telephone call, the switches at the telephone company set up a dedicated line between you and the person you call, for the duration of the call. While you are using the line, no one else can; and if there is a problem on the network, you lose your connection.

A packet-switched network does not require two computers to establish a dedicated, unbroken connection for data transfer. It instead breaks the data into small units or packets and transfers the packets over any phone or data lines that are currently available.

When you ask your browser to go to a specific Internet address, or when you click on a hyperlink, the sending computer breaks the data you have asked for into packets. Each packet contains a piece (up to 1500 bytes) of the data. Each packet is labeled with the addresses of the sending and receiving computers along with some instructions on how to put the data back together again once it has reached its destination.

The data in these small packets is transferred over phone lines or data lines. The packets take different routes through a complex series of routers. Each router examines the destination address and decides the best way to get the packets to their destination.

The packets eventually all reach their destination -- your computer -- and are put back together again, using the instructions they have been labeled with. This is why it sometimes takes a while to load the data before information appears on your screen. Of course, the speed of the data transfer depends on the type of network connection or modem you use.

All Internet functions depend on protocols which standardize how the data for those functions are transferred. Such protocols include:

HTTP--Hypertext Transfer Protocol

Protocol for accessing World Wide Web documents

FTP--File Transfer Protocol

Protocol for transferring files from one computer to another

Gopher

Protocol for accessing documents via Gopher menus (archaic; no longer widely used)

Telnet

Protocol that allows users to logon to a remote computer

The World Wide Web uses HTTP (Hypertext Transfer Protocol) to transfer data. The HTTP protocol contains commands that allow you to jump to another hypertext document and retrieve the information in that document. When you enter a URL in your browser window or click on a link, this sends an HTTP command to the web server described in the URL, and directs the server to send the requested file.

The computer language used to create hypertext documents is referred to as HTML (HyperText Markup Language). HTML uses tags (characters enclosed in brackets) to format documents so that a web browser can read and display them. Tags denote such features as headings, paragraphs, fonts, images, and hypertext links. The HTML code behind any web document may be displayed in a browser window by selecting "Page Source" on the "View" menu, or by right clicking the mouse and choosing "View Source".

FTP (File Transfer Protocol), developed in 1985, is a standard method of moving files from one computer to another on the Internet. The transfer of files using FTP can work in either direction. You may retrieve files from a remote server, or transfer files to a remote server, if you have been granted access to that server.

FTP was the only means for file transfer on the Internet prior to the creation of HTTP (HyperText Transfer Protocol) and the World Wide Web. Although many of its functions are now handled by the HTTP protocol, FTP is still used for file transfer on the Internet.

The World Wide Web has made several other early Internet protocols nearly obsolete. Two such protocols, Telnet and Gopher, were once widely used to connect to remote sites and search for information. Many Telnet and Gopher sites have migrated to the World Wide Web, which offers simpler interfaces, multimedia effects, and user-friendly interactivity.

However, Telnet connections still provide access to some library catalogs and some government databases. Telnet, or remote logon, is a tool that allows you to access the programs and applications available on another computer system, whether it is located next door or on another continent. The Telnet protocol allows you to sit at the keyboard of one computer and use that keyboard and monitor as though they were connected to another computer at a remote location.

Telnet is supported by World Wide Web browsers, but requires Telnet client software. Netscape and Internet Explorer allow you to use a Telnet client with the browser, which provides an instant interface with the Telnet program.

Gopher, created in 1991 at the University of Minnesota (whose mascot is the Gopher), is an outdated Internet protocol that is rarely seen today. Popular for several years, especially in universities, Gopher predates the World Wide Web. Gopher files are primarily text, with no hypertext links, very few graphics and virtually no audio or video effects. With hypertext links, the Hypertext Markup Language (HTML), and the development of a graphical browsers, the Web quickly transcended Gopher.

There are many other Internet protocols which will not be covered in this course. Yahoo!'s Protocols page provides links to additional information.


 

Client/Server Concept

Another concept that is important in understanding how the Internet functions is the client/server concept. Most Internet services rely on the client/server model. The Internet user is the client and has client software installed on his computer to access various Internet services. When a user wants to connect to a particular information tool, he uses his client software to connect to server programs, which provide the service or information needed. The web browser is an example of client software needed to access World Wide Web servers. Most browsers function as client programs for World Wide Web, FTP, and Gopher access. For access to Telnet sites, a Telnet client is needed. Your computer also requires specific client software for e-mail and for viewing certain types of information files (such as audio, video, or PDF files). Each piece of client software on your computer recognizes certain protocols and processes data according to those protocols.


 

Internet Addresses: IP Addresses and Domain Names

Each computer connected to the Internet is called a host computer. Each host computer has a unique address called an IP address, which is used by the TCP/IP protocol to identify the host requesting the data file. An IP address is a 32-bit numeric address written as four numbers separated by periods. Each number can be zero to 255. For example, 230.160.25.240 could be an IP address.

Since IP addresses are difficult for people to remember, host names or domain names such as ccla.cc.fl.us are generally used to identify the address of any computer connected to the Internet. Because computers on the Internet only understand IP (numeric) addresses, not domain names, every Web server requires a Domain Name System (DNS) server to translate domain names into IP addresses. A domain name may identify one or more IP addresses. The domain name system organizes domain names into top-level categories, such as:

.edu

educational institutions

.com

commercial hosts

.net

network hosts

.gov

government agencies and organizations

.mil

U. S. military

.org

non-profit organizations

.us

hosts in the U. S.

The U. S. and other countries use two letter country codes, with over 300 two-letter codes for countries, as well as codes for states, such as fl.us. Due to a shortage of top level domain names, several new domain name extensions have been proposed, including:

.arts

for cultural and entertainment entities

.firm

for businesses, or firms

.info

for information services

.nom

for those wishing personal nomenclature

.rec

for recreation and entertainment entities

.store

for businesses offering goods to purchase

.web

for entities related to the WWW

The proposal for new domain names has been controversial. Information about the current state of the domain name system is available in Management of Internet Names and Addresses, a document published by the U. S. Department of Commerce. Domain names in .com, .net or .org can be registered through competing registrars. The International Internet Address and Domain Name System provides a detailed overview of the current domain name system.

Domain names are used in URLs (addresses for Internet documents or files) and e-mail addresses.


 

URLs (Uniform Resource Locators)

Every data file or document on the Internet also has a unique address called a URL (Uniform Resource Locator). The URL consists of three parts: the protocol, the domain name and the path.

The protocol, as discussed above, is the set of rules the computer follows in order to communicate with another computer. It lets the computer know how to process the information it receives. If the protocol is http://, for example, the computer knows it will be processing a World Wide Web document.

The domain name is the Internet address of the computer (server) which is hosting the site & storing the documents. This domain name may be expressed as an IP address.

The path is the directory and file specification; it lets the computer know which directory and file to access after connecting to the server. The path is not a required element, but if you know the path it will take you directly to the desired file or document. The path is also the part of the URL which changes most frequently. If you type in a URL & an error or "File not found" message is returned, retype the URL, omit the path and try to locate the file by searching the site or following links.

Let's break down the URL for a web page from LINCCWeb. The LINCCWeb home page allows you to link to Florida community college library catalogs and many other resources for community college students. This particular LINCCWeb page provides links to electronic databases containing articles from encyclopedias, periodicals, and newspapers, access to worldwide library catalogs and more:

http://www.ccla.lib.fl.us/www/dblist.html

http:// is the protocol. This lets you know you are retrieving a World Wide Web document and lets the computer know how to process the hypertext file it is receiving.

www.ccla.lib.fl.us/ is the domain name, the address of the computer which is hosting the web page. If you were to stop here and not type the path, which consists of the directory and/or file name, you would access the LINCCWeb home page rather than the database page.

www/dblist.html provides the path to the specific page you want; in this case, the directory (www) and name of the html file (dblist.html) which provides links to electronic databases.


 

Complete Exercise One after reading this lesson. It is worth 4 points.

Copyright © 1997-2000 Florida Community College
Learning Resources Standing Committee
Internet Course Task Force