Ediotr's note: These minutes have not been edited. Distributed Network Management Working Group (DISMAN) Reported by David Harrington (dbh@ctron.com) and Steve Waldbusser (stevew@ins.com). Combined and edited by Maria Greene (chair). The DISMAN Working Group met twice in Montreal, first on Monday from 9:30 to 11:00 and then on Tuesday from 1:00 to 3:00. --------------- Monday, June 24 --------------- Maria Greene presented the agenda, which was accepted as proposed with the addition of another prepared presentation. o Agenda bashing o Welcome and Introduction, Maria Greene, Ascom Nexion o Open Discussion on the Charter and Goals o Overview of the Framework Document, Steve Waldbusser, INS o Overview of the Script MIB, David Partain, SNMP Research o Overview of the Alarm MIB, Steve Waldbusser, INS o Other Prepared Presentations - Enhanced Script MIB and Group Membership MIB, Juergen Schoenwaelder, UTwente - Distributed Management Experience, Richard Buckman, IBM o Wrap- up Discussion on the Charter and Goals --- Welcome and Introduction - Greene Maria presented slides giving pointers to the mailing list and ftp site. The archives of the mailing list and the slides mentioned in these minutes will be available on ftp://ftp.nexen.com:/pub/disman (which should be set up shortly). --- Charter and Goals - Greene Charter is posted at http://www.ietf.org (follow the link for "working groups"). Maria recommended people take a look at the latest Simple Times issue, esp. the article by Karl Auerback, Chris Wellens -- good background material/ideas. Points that were brought up during the discussion: o supplement bottom-up orientation (MIBs) with top-down analysis of requirements o we need more proactive management nearer devices to handle large networks o filtering, summaries, notifications of error conditions to decrease information overload o add new functionality over time using basic framework: - historical data collection, e.g., accounting/performance - event monitoring/logging/forwarding/correlation - meta-variables (based on scripting?) - distributed discovery o need to pick our objectives carefully so that we deliver something in a timely manner o will discuss additional proposals on the mailing list (for a limited time) --- Framework - Waldbusser These are very rough transcriptions of Steve's slides. Points that were brought up during the discussion are listed in []'s. Steve presented definitions, goals and some example mechanisms. This information will be the basis for the Framework document that he and Bob Stewart will co-author. What is distributed mgmt? (verb) Delegation of control from mgr to mgr in pursuit of: o scalability through hierarchy: protects central NMS resources o protect the network bandwidth, disconnection, improve efficiency of data movement o deal with human org boundaries o promote better system architecture; modular building blocks. [This mixes goals and means; need to separate in the document. goals: scalability, replication, robustness, modularity] o robustness fault tolerance of NMS vs. of network o mediation abstraction of entities via mediation layer [Mediation/abstraction will only be valid within MIBs to the degree that things are the same. Trying to include this could slow the process. Need to handle the differences as well as handle the abstraction.] What is a distributed manager? (noun) Mgmt app that rcvs requests from another mgr and executes requests by performing ops on agents or other mgrs. o not necessarily hierarchical. o may take a long time to execute and may be registered indefinitely. [Question: Should "Other mgr" include itself? Not necessarily SNMP; other techniques as well: ping, traceroute, etc. Does the wording imply proxy? Proxy can mean many things... We don't want to discuss how to distribute apps, but how to distribute functions; other standars bodies (e.g., OMG) concentrate on distributed apps more than dist functionality. We want to develop mgmt functions with SNMP interfaces. Pictures might clarify what we're discussing. We need to put our docs where our mouths are. We need to make implementable solutions, not just designs. Confused about the model - is this mgr2mgr, or client/server model? o 1st model lets mgrs work independently o 2nd requires server for nms client to work. Def. not consistent with other standards bodies? We should understand other groups docs: 3 ITU docs mentioned by Randy will be ftp-posted: o scripting o policy-mgmt o ODMA - Open Distributed Management Architecture What type of storage infrastructure will we assume? Hopefully none; specific applications might, framework shouldn't. The proposed model is presented as a solution; we should back up and identify the requirements rather than the solutions. o We shouldn't be working bottom-up; we need to think in terms of management functionality needs. o For Framework document, we need to identify the goals, why they aren't being met, and how we can meet them without getting into the use of SNMP as the solution. Why haven't these goals been met yet? o because we've made it too complicated. o MIBs can be used to initiatiate functionality in a remote mgr. o For example, Ping MIB has been implemented; could be threshold MIB, or RMON MIB, etc. Can be extanded to "ping every 5 min, and let me know if a problem occurs." Control is done through SNMP MIBs. Standards are not the place for innovatoive ideas; we need to identify and standardize existing practice. Need to include negotiation between mgrs, not just hierarchy. o Non-hierrarchical approaches include rmon, thresholding, worms, Does config mgmt get incuded? o Config mgmt is a nebulouos concept and needs discussion/definition on mailing list. o Config is important for knowing the responsibilities of apps vs agents/dist-agents, etc. The standards need to define this. ] -------------- Script MIB - David Partain, for David Levi and Jeff Case who weren't present. Script MIB is presented as an implementation of one solution: ftp://ftp.snmp.com/pub/script.txt Actually, this is known as the Mid-Level Mgr MIB. This is a rough transcription of David's slides: History Goals 1) ease burden of mgmt stations 2) reduce/localize traffic (reduce telecomm costs, etc.) 3) expand domain of manageable devices 4) automate corrective behaviors 5) ease of configuration Architecture BRASS server consolidates traffic (see slide text and pictures) MIB to control which apps are available and to download MIB controls frequency of operation etc Script engine interprets/runs the scripts MLM MIB scriptTable used to upload/download scripts MLM <--> NMS script lines stored as octet strings mlmCompileTable config and run scrips script filename when appropriate can specifiy arguments frequency mlmResultTable varbinds that are result of script execution available as MIB variables Script Language - mgmt oriented meta language structured local and remote variables basic control structures (if, while, ...) operators for logical math ops sync/async messages Script Capabilities SNMP ops (get, set, etc.) send traps or informs log data to file fork, call, jump to another script launch another application (may have security implications) MLM actions if something interesting is discovered or detected, .perform logging .send a notifiaction .... MLM sample apps intruder detection script audio counter-attack script M2M-like scripts (summarize agent data) [Is this available public domain? or other PD stuff? Maria emphasizes - this is a proposal o concerns about defining new languages, etc. o please post implementation experiences with scripting to list so we can compare contrast some options. Does this need to be limited to one langauge? The framework could support multiple languages, but that will impact interoperability. The MIB may be able specify what languages the agent can understand.] David Levi and Juergen Schoenwaelder have agreed to be co-authors of the Script MIB document. We need to decide on the mailing list what the requirements are and if we are going to use the SNMP Research Informational RFC as a starting point. ---------------- Tuesday, June 25 ---------------- We picked up the discussion on scripting where we left off on Monday: o Will agents be expected to have multiple interpreters embedded? o Will some be required and others optional, etc.? o If the cost of entry is too high, it will be difficult to encourage deployment. It would useful to understand the resource requirements of various options. o Glenn Waters suggested some requirements: will need to be lightweight; possible to run thousands of scripts in a device; easy to learn; determine our audience (developers or customers); is this something we embed into a router? fast, PD interpreter, etc. Maria took an action item to post a comparative analysis of scripting languages that her company did. o Is there overlap between agentX and DISMAN, since agentX allows distributed subagents? o Jeff Case explained the history that led to the SNMP Research script language: We got to the script MIB because at every step of development, customers want more and more over a number of years. Some will want expressions, looping, a user interface, etc. If I started from scratch, it might look very similar to the current script MIB. o We need to define the management of the environment in which these scripts will run, plus one or more langauages that will be likely to be available on most platforms. o It was pointed out that scripting is not a mgmtm app; is a way to implement a mgmt app. What is the particular management problem we are solving using scripting as the means? o Benefits include delegation, scalability. o Scripting is a way to do these things in a flexible way. SNMP MIBs are a possibly less costly way to achieve a limited solution to the problem. Should we concentrate on the flexible solution or a limited SNMP solution? o We should consider the new paradigms available; the product of this WG need not be just MIBs. Maria pointed out that we need concrete proposals posted to the list for discussion. The Threshold MIB (M2M) and Script MIB are published proposals. Need equally concrete proposals for the alternatives. A July 15 deadline for posting proposals was suggested, but we agreed to discuss this on the mailing list. -------------------- Threshold MIB - Waldbusser Steve presented an outline of the Manager-to-Manager MIB (last posted as RFC1451). M2M MIB layout snmpAlarmtable derived from RMON specifies variable to monitor and thresholds to apply each row contains data for thresholding to create event snmpEventTable defines a specific event type snmpEventNotifytable defines list of destinatiosn for each event type one or many endstations recieve events ToDos: o functional enhancements to threshold monitoring (64 bit ranges, simpler test for equality, expressions, etc.) o abtsract out access control so that it is independent of security. o abstract out transport destnations do in a way that is reusable for other MIBs, e.g., Ping MIB. Discussion: o Why is there a requirement to use an inform? there isn't - either a trap or inform will work. o Is delivery guaraneteed? No, but using Informs will at least add acknowledgement. o How will uptimes effect this? (General SNMP mgmt problem.) o Should extend to an expression that can reflect table row appearance/disappearance etc.? o Tying together arbitrary things in AND/OR expressions is difficult to do using MIBs. (Brian O'Keefe has tried to make a version of M2M with expressions and may have some ideas.) Script may be better idea. ----------------- Script MIB Enhancements, Group Membership MIB - Schoenwaelder Juergen presented his experience with an implementation done at UTwente (the Netherlands) that started with SNMP Research's MLM MIB. This is a rough transcription of Juergen's slides. [slides are available immediately from UTwente web site including pointers to papers] Enhancements: 1. make the SNMP Research MLM MIB language independent 2. make MIB independent of execution environment 3. use HTTP for transport rather than SNMP sets o efficiency o security mechanisms differ between HTTP and SNMP 4. history of results stored in MIB 5. automatic time stamp 6. split up script MIB Problems: 1. control of resources 2. user profiles to privileges in execution environment 3. protect the script from resource problems 4. how to transfer scripts efficiently 5. how to verify scripts (digital signatures?) The UTwente project also developed a "Group Membership MIB": Group Membership MIB: 1. define and delegate mgmt functions dynamically 2. allows one to define arbitrary site specific policies 3. structure can change dynamically Mechanism: o Used mcAliveTrap SNMP traps to a well known IP address. Allows new agents to be recognized. o Master election- allocates resources on the group of agents. o Easy to implement; works with usec o Requires IP multicast, which isn;t supported in all environments. o Voting algorithm may need discussion. Problems: 1. distrib nature introduces new mgmt problem 2. necessary to discover current structure of the system 3. necessary to coordinate activities in the distributed mgmt system. Discussion: o also get script versioning interoperability problems o these would therefore be the problem with the worm approach? - a replicating agent may have a mutation - CM versioning/control will be needed. o MIB development has built-in version control. This will be needed for and should be considered a requiremnt of this WG's output. o Do you have problems with looping caused by having the same script running on multiple nodes? That's why the master election - to coordinate the scripts. o From v2 WG history, there was a proposal to use a similar technique for discovery. (apparently, a woman from DEC?) ----- MLM experience - Richard Buckman, IBM. Richard Buckman (IBM) discussed IBM's experiences with building and deploying a distributed management solution. He described how when IBM implemented mid-level management, they replaced the scalability problem with an administrative problem. The functions provided in the MLM were: Thresholding Data collection Status monitoring (ping mib) Discovery Local trap processing Local system management file monitoring command table Problem #1: realms of responsibilities are not static Problem #2: SNMP is not good for moving data files Problem #3: SNMP traps are not guaranteed to arrive Problem #4: The management system doesn't stay up A number of administrative problems arose: Simple admin problems: Remote installation Community names file Trap destination table Remote config tool (only does 1 MLM at a time) Replication of compiled MIB file needed for name to oid translation needed for counter's uptime needed for trap messages Retrieval of collected data Retrieval of log files Complex admin problems: Fault tolerance Discovery of MLMs Status check of MLM Backup MLM on failure Hierarchy helps solve this problem (P.S.: customers want state maintained) (i.e. correlate status events before and after failover) Synchronization of states Guarantee inform delivery Cold start/warm start currently reset threshold states Traps generated for other reset conditions (disable or deletion of policy) Domaining Used to allow one configuration tool to configure all MLMs Policy Domain - determines which nodes to which to apply policy Management Domain - defines realm of responsibility Implications Usage of MIB Var names makes mib data necessary at MLM Community name file needs to be move around unless you specify community name in each row Need trap destination capability Dependency on index in grouping table begs for sub-agent notification/query mechanism Architected traps Agent addr is no longer the address of the target system Name space collisions between top-level managers sharing the same MLM Abstraction was difficult Group-specific policies generated Their solution: Group table at the MLM Threshold checks have specific times to check Status checks have specific times to check When a mgr finds a new node, it assigns to a group MLMs must be aware of changes to the group table since their responsibility may change as a result. [Is IBM going to make the mibs public? don't know, but most of the info is available to customers and should be open to others.] Conclusion: it is doable, but not "simple". ---------------------------- Wrap-Up ------- We discussed where we should go from here and what's next. o This was a useful discussion, but future work should be based on published documents. o Authors/editors will try to post versions of Framework, Thresholding, and Script documents as soon as possible, keeping in mind schedule from Charter. o Neither doing this through email or IETF meetings of this size is effective. o Would people like to see an interim meeting to get this moving forward? Conclusion: only if necessary (try to avoid), but many would attend.