This is only a rough draft - Megan 04/10/92 Summary of IETF BOF on Network Statistics and Analysis 1. Introduction The purpose of this BOF is to instigate discussion and information exchange within the community concerning research in wide-area network traffic measurements. Five brief presentations of related research were made, followed by discussion of each. One theme of the BOF was to discuss exactly what kind of network instrumentation, measurement facilities, and types of measurements should be recommended to the Internet community. Many of us would like to encourage the managers of stub networks and routers to collect and make available information similar in spirit to the statistics that NSFNET makes available through Merit/NSFNET Information Services (NIS.NSF.NET). We hope this effort eventually evolves into an RFC, and eventually leads to a widespread cooperative effort. We freely admit that the road to success will be an iterative process, fraught with plenty of challenging technical details. The amount of space consumed by this data completely depends on the type of measurement. For example, collecting TCP SYN/FIN/RST packets could lead to hundreds of megabytes a day, depending on the collection site. Other methods, like sampling or recording the quantity of bytes sent to particular destination networks might require less than a hundred kilobytes a month. In the first case, the volume of trace data can be on the order of one to two percent of the traffic itself, with the resulting data possibly having to be sent by tape rather than electronic means to the location where the network analysis will happen. The Internet Activities Board (IAB) recently announced guidelines for measurement activities. RFC 1262 lists bounds that should be commonly acceptable. However RFC 1262 directly addresses invasive measurement activities, and is only marginally applicable to passive data collection. We believe we will have to face many new issues hitherto unaddressed. What we propose must honor the concerns and restrictions that individual networks may impose, yet thorough enough to capture the data that we need to accomplish the research goals, and should allow for flexibility. An example of a difficult issue to resolve is the privacy when using network addresses, in particular as workstations with their own IP addresses frequently map to individual users. Our efforts should address privacy measures, that still allow professional research to be conducted. Most likely, each of us has a different idea as to the data we need to have measured to achieve our various objectives. Below, we summarize these motivations and give a preliminary list of the measurements and trace data that we believe should be collected or capturable. We encourage you all to add to both the motivation list and chart of traces and measurements, and mail them back to wanchar@usc.edu for inclusion in this document. 2. Motivations 2.1 Artificial workload models (Danzig and Jamin) Good artificial workload models are needed to drive simulations of new resource management algorithms, flow control algorithms, and routing algorithms. The artificial workload models that we are developing consist of an application specific model (ftp, telnet, nntp, etc.) and an application arrival rate model that is stub network dependent. So far we have been able to identify applications from their port numbers. As new transport protocols emerge, we may need other mechanisms. Creating the application specific model requires full traces of TCP/IP packet headers. Creating the stub network specific model requires traces of TCP SYN/FIN/RST packets only. Most of our data has been collected with statspy or tcpdump from a machine on the same Ethernet segment as the stub network's gateway to the backbone. We would like to collect SYN/FIN/RST traces from hundreds of stub networks. Given current network bandwidth and usage, these traces can range to 200MB/day. 2.2 Network planning (Braun and Claffy) SDSC and UCSD are undertaking a network analysis effort with multiple goals of immediate applicability and interest to the Internet environment, with respect to both performance and ubiquity. Areas of current investigation include: measurements and analysis of resource consumption and latencies, network performance degradation under resource starvation, and end-to-end performance testing. We have determined, for selected data sets, characteristics of network usage by application, bandwidth requirements, and geographic distribution. We are also exploring the role that granularity plays in traffic analysis, both in statistical sampling of traffic on an operational basis, and in the level of detail one presents data to optimize the information/noise ratio. We are currently analyzing data from a variety of sources, including national networks as well as federal network interconnection points of multiple agencies. Statistical examination and manipulation of data reveals significant traffic correlations, trends, and dependencies. We are also undertaking collaborative efforts with Toshiya Asaba and the WIDE statistics working group in Japan. In particular, Asaba is largely responsible for the analysis scripts which facilitated statistical examination and data presentation. We first intended the scripts for use in a study of international traffic between Japan and other nations. We were able to adapt the script for use in subsequent studies. Building a public library of usable scripts for different analysis tasks requires agreement on data formats in multiple phases of collection and analysis. We would like to see a collaborative effort within the community toward accomplishing such a task. Further information and slides are available by sending requests to the SDSC Applied Network Research Group, via hwb@sdsc.edu or kc@sdsc.edu 2.3 Stateful router studies (Estrin and Mitzel) [Related information, though not participated at the BOF.] The current Internet is based on a stateless (datagram) architecture. However, many recent proposals rely on the maintenance of state information within network routers, leading to our interest in the implications of a ``stateful'' network layer. We wish to collect internetwork traffic traces at the border routers of stub and transit networks, and use this data to evaluate, or predict, the effects of design alternatives for stateful architectures. An important design decision is the level at which conversations are defined. This determines the granularity of control over the network traffic, and affects the scalability of the system. We are interested in several granularities of conversations, ranging from a single TCP application association, up to aggregation of all traffic between two communicating networks. We will use the data to estimate the number of active conversations at a router, and derive the storage requirements for the associated conversation state table. We will analyze the feasibility of fine grain control at the network periphery and deeper within the network. In conventional IP, the only lookup function normally required for packet forwarding is a routing table lookup. This has been recognized as a bottleneck in the forwarding process [Feldmeier, Jain]. It has been shown that the introduction of an LRU cache can substantially improve the efficiency of the packet forwarding process. Route caching is used in many existing routers. However, unlike the stateful schemes investigated here, which require lookup based on source--destination pairs, current route caches are based only on destination host or network. It is not intuitively obvious whether the solutions developed for routing table caches can be applied here. We will use our network traffic traces to perform trace driven simulations of an LRU cache, for different conversation granularities, and thereby assess traffic locality and the benefits of caching. 2.4 Network monitoring (Schwartz and Pu) Schwartz proposed that a group of a dozen of us or so agree to collaborate to collect traces and measurements. He also described his recent study of FTP traffic which showed that tools to locate copies of large, replicated files may reduce wide area network traffic due to FTP. The unique aspect of Schwartz's traces was that it actually peered at application level data in a way that preserved privacy. 2.5 Host reliability and availability (Long) Long summarized his study of internet host reliability and availability. This was the only active form of tracing discussed during the BOF. 3. Measurements and traces Here is a first pass at the type of data we would like to see collected, and what studies would use this data. These categories need to be detailed, and new categories probably need to be filled in. The table identifies four types of data to collect. These include captured packets and packet headers (excluding data), headers of selected packets, summary data, and routing and congestion data. The first three types of data are pretty well defined, while the last is much less so. Although we can collect such data from anywhere in the Internet, we classify it into three classes: entrances to stub networks, regional and backbone routers, and international gateways. TYPE OF DATA | Captured |TCPDUMP |NSF.NIS.NET |Router | M | Packets & |Conversation|LIKE DATA |Timing and | E | Packet |SYN/FIN/RST |Data |Queue length| A | Headers |Traces | |(MIB) | S -------------------------------------------------------------- U |Workload |Workload | | Congestion | R STUB |models |models | | studies | E NETWORKS | | | | | M |Workload | |Workload | | E |Planning | |Planning | | N -------------------------------------------------------------- T | | | | | REGIONAL | | | | | AND |Stateful | | | Congestion | BACKBONE |Routers | | | studies | P NETWORKS | | | | | O |Workload | |Workload | | I |Planning | |Planning | | N -------------------------------------------------------------- T | | | | Congestion | INTER- | | | | studies | NATIONAL | | | | | GATEWAYS | | | | | |Workload | |Workload | | |Planning | |Planning | | -------------------------------------------------------------- Table 1. 4. Trace formats and tools We need to define the storage format for trace and statistical data. For some formats, like tcpdump or statspy, the format is already pre-defined. Almost certainly we should adopt NSFNET's current format for the type of data they collect. We also need to define ``sanitizer'' programs that implement the security concerns of particular networks. There is an operations area in IETF which has been defining some standard transport and storage formats for various kinds of operational data. Dealing with gigabytes of data is results in a serious resource impact. An effort has to be undertaken to identify schemes to make such large quantities of data useful, possibly via multiple levels of data reduction. 5. Mailing list: The current composition of wanchar@usc.edu is listed below. Change requests can be sent to wanchar-request@usc.edu afs@germany.eu.net ala@merit.edu amr@nri.reston.va.us asaba@isr.recruit.co.jp bac@sdsc.edu becker@ans.net boss@sunet.se brunner@practic.com calton@cs.columbia.edu carson@utcs.utoronto.ca cbagwell@gateway.mitre.org chris@wugate.wustl.edu cjw@nersc.gov cward@westnet.net dan@merlin.dev.cdx.mot.com danzig@usc.edu darrell@cse.ucsc.edu estrin@usc.edu fair@apple.com golding@cis.ucsc.edu goodwin@psc.edu gruth@bbn.com henry@oar.net hwb@sdsc.edu jamin@usc.edu jfl@nersc.gov jgodsil@ncsa.uiuc.edu jkay@cs.ucsd.edu jonchy@dxcoms.cern.ch jrc@uswest.com jun@wide.ad.jp kc@sdsc.edu kfall@cs.ucsd.edu korz@bach.cs.columbia.edu kr@concord.com lear@sgi.com lindahl@violet.berkeley.edu lwinkler@anl.gov mak@cnd.hp.com mak@merit.edu mankin@gateway.mitre.org martin@cearn.cern.ch medin@nsipo.nasa.gov morris@ucar.edu mws@sparta.com nevil@aukuni.ac.nz nitzan@ws1013.nersc.gov ogud@cs.umd.edu peter@usc.edu peterd@cc.mcgill.ca polyzos@cs.ucsd.edu probins@bubba.wpd.sgi.com pushp@cerf.net rama@erlang.enet.dec.com rbutler@ncsa.uiuc.edu rcollet@icm1.icp.net reschly@brl.mil rgc@qsun.att.com rin@qsun.att.com rj@sgi.com schwartz@cs.colorado.edu sherk@sura.net stats@nic.near.net suelin@ibm.com tmwalden@saturn.sys.acc.com tom@cic.net topolcic@nri.reston.va.us van@horse.ee.lbl.gov vcerf@nri.reston.va.us vern@horse.ee.lbl.gov vikas@jvnc.net vu@polaris.dca.mil whaley@ncsc.org