Volume 7, Issue 1 -
Spring/Summer 1999

Volume 6, Issue 3
Fall 1998

Volume 6, Issue 2
Spring/Summer 1998

Volume 6, Issue 1
Winter 1998

Volume 5, Issue 4
Fall 1997

Volume 5, Issue 3
Summer 1997

Volume 5, Issue 2
Spring 1997

Volume 5, Issue 1
Winter 1997

Volume 4, Issue 4
Fall 1996

Volume 4, Issue 3
Summer 1996

Volume 4, Issue 2
Spring 1996

Volume 4, Issue 1
Winter 1996

Volume 3, Issue 4
Fall 1995

Volume 3, Issue 3
Summer 1995

Volume 3, Issue 2
Spring 1995

Volume 3, Issue 1
January 1995

Volume 2, Issue 4
October 1994

Volume 2, Issue 3
July 1994

Volume 2, Issue 2
April 1994

Volume 2, Issue 1
January 1994

Volume 1, Issue 4
October 1993

Volume 1, Issue 3
July 1993

Volume 1, Issue 2
April 1993

Volume 1, Issue 1
January 1993


Stephen E. Lamm, Daniel A. Reed, University of Illinois; Will H. Scullin, Netscape Communications Corporation

Given the current use of the World Wide Web (WWW) for scientific and educational information sharing and its emerging use for electronic commerce, studying access patterns is an important first step in understanding network implications and in designing future generations of WWW servers that can accommodate new media types and interaction modes. However, the large number of requesting sites, the diversity of WWW data types (text, data, images, audio, and video), and the multiplicity of server performance metrics make data correlation and comprehension difficult. Proposed HTTP extensions will add demographic data, further heightening the need for sophisticated analysis techniques.

To support WWW performance analysis, CRPC researcher Dan Reed and collaborators Stephen Lamm and Will Scullin have expanded Avatar, a virtual reality system designed to analyze and display real-time performance data, and applied it to the analysis of WWW traffic using the National Center for Supercomputing Applications (NCSA) WWW server as a high-load testbed. One variant of Avatar supports real-time display of WWW server accesses by mapping them to their geographic points of origin on various projections of the earth. By allowing users to interactively change the displayed performance metrics and observe the real-time evolution of WWW traffic patterns in a familiar geographic context, Avatar provides insights not readily apparent with more traditional statistical analysis. Moreover, it can be extended to accommodate demographic and point-of-sale information for correlation of electronic commerce patterns.

In one study of traffic conducted last August, the group found that sites that act like firewalls, typically large corporations and commercial Internet service providers, appear as the originating point for the largest number of accesses. Smaller sites, such as universities, government laboratories, and small companies, constitute a large fraction of all accesses, but are geographically distributed more uniformly.

The distribution of the sites also follows population lines. In the United States, these are the coastal areas and regions east of the Mississippi River. Because inexpensive Internet access is limited outside universities and larger urban areas, these sites originate the largest number of requests. Access to the NCSA WWW server from outside the United States is common, though far less frequent than from sites within the country. There is little traffic from South America, Africa, or countries of the former Soviet Union, but Europe and the Pacific Rim show thriving WWW communities.

The periods of heaviest activity and the distribution of requests by Internet domain track the normal business day. In the early morning hours (EST), Europe is a major source of activity at the NCSA WWW server. As the morning progresses, the east coast of the United States becomes active. Near the middle of the day, the activity in Europe fades, while the United States requests peak. In the evening, the United States west coast has the highest level of activity.

The characteristics of the requested documents also change with time of day. Requests for audio and video files are much more common during the normal business day than during the evening hours. During the evening, text and image files predominate. The group conjectures that this reflects both lower bandwidth links to Europe and Asia and low-speed modem-based access via commercial service providers.

The group concludes that while this geographic display metaphor provides new insights into the dynamic of WWW traffic patterns and serves as a model for development of a WWW server control center, some issues need to be resolved. Future research will focus on displaying data from multiple WWW servers, avoiding variable resolution clustering of sites, and developing a richer set of statistics and query mechanisms, including demographic data.

The group presented an article about the project at the Fifth International World Wide Web Conference, May 6-10, 1996, exerpts of which are included here. The complete article is available at http://www5conf.inria.fr/fich_html/papers/P49/Overview.html

Table of Contents