IEN #31
                 On Names, Addresses and Routings (II)

Dan Cohen
ISI
28 April 1978


            2.3.3.11  On Names, Addresses and Routings (II)


SUMMARY

This note deals with internetwork addressing and routing.

It suggests the following:

     (1)  Some systems have a tree-like  hierarchical  Universal-Address
          (UA) space.

     (2)  The communication connectivity is a superset of this tree.

     (3)  The postal system, the telephone  and  the  ARPANET  are  such
          systems.

     (4)  The address of  a  process  tells  where  it  is  located  (or
          connected  to)  by specifying the route to it from the root of
          the universal addressing tree.

     (5)  The default routing to any address (unless  a  better  one  is
          specifically  known)  is  up the UA-tree, from the source, and
          down the tree to the destination.

          In the case of networks like the ARPANET, the set of  all  the
          IMPs  (the subnet) is considered as a single process, known as
          {ARPANET}.

     (6)  Since the set of all networks is not connected, it  cannot  be
          tree  structured.   However, for ease of name-management it is
          possible to introduce (e.g., administratively)  any  arbitrary
          hierarchy   to  the  address  space  of  all  networks.   This
          hierarchy is "artificial" because it is not a  subset  of  the
          communication connectivity.

     (7)  Since there is no hierarchical structure to the space  of  all
          networks,  there  is  no  tree-like  hierarchical internetwork
          Universal-Addressing scheme.

          In particular, the notion of extending addresses like

            {NET}-[HOST]-(PORT) or {NET}-/IMP/-[HOST]-(PORT)

                                     2



          upward to include METANETs,  SUPERNETs  and  GIGANETs  suffers
          from   the  lack  of  corresponding  underlying  communication
          structure.

     (8)  Since the set of all networks is too big  to  be  captured  in
          local  tables,  and since routing cannot be derived in general
          from the addresses, it should always either be apriori  known,
          or  supplied  by  the source.  This apriori knowledge does not
          include every step (e.g., the sequence of intermediate IMPs or
          PRUs).   It  has  to include only a sequence of addresses such
          that the routing between them is locally known.

     (9)  The corollary of this note is

          OPTIONAL  SOURCE  ROUTING  SHOULD  BE  IMPLEMENTED.

This note explains in great detail several  aspects  of  the  addressing
schemes  used by the postal and telephone systems.  It also mentions the
ARPANET addressing.

These examples are used to support the arguments which are summarized in
paragraphs (1)-(9) above.

Unless one is very interested in the details of these systems and  their
relevance  to  the  internetwork  environment,  or in the details of the
arguments, there is no need to continue reading this  note  beyond  this
point.


INTRODUCTION

Before  discussing  the  internet  addressing/routing  issues   let   us
summarize the basic concepts:

     -    Processes (not hardware pieces) have names and addresses.

     -    A process may have more than one name, and  one  address,
          but addresses and names correspond uniquely to processes.

     -    The name tells WHAT the process is.

     -    The address tells WHERE the process is.

     -    The route tells HOW-TO-GET-THERE.

The last three beautiful definitions are  borrowed  (with  appreciation)
from John Shoch.

More detailed  discussion  of  names  and  addresses  may  be  found  in
"Internetwork   Naming,  Addressing,  and  Routing"  [Internet  Notebook
Section 2.3.3.5, IEN 19] by John Shoch, and in "On Names, Addresses  and
Routings" [Internet Notebook Section 2.3.3.7, IEN 23] by Danny Cohen.

                                     3



Generally the concept of address is well understood, but the concept  of
routing  is  much  more  complex.   Who  performs the address-to-routing
mapping? How is it performed? These and similar problems are  the  topic
of this note.

Most of us are familiar with the postal, the telephone, and the  ARPANET
addressing  schemes.   We  also  have  a  very good understanding of the
routing processes performed for these networks.  In this note we discuss
the   similarity  of  these  addressing  schemes,  and  argue  that  the
internetwork environment violates the basic axiom  which  is  common  to
them,  and  therefore  the internetwork environment requires a different
addressing and routing philosophy.

This is why we have so much trouble with internet addressing  --  it  is
different.  Our experience cannot be used as a model.


ON OTHER ADDRESSING/ROUTING SCHEMES

Three  addressing/routing  schemes  are  discussed:  the   postal,   the
telephone, and the ARPANET schemes.

The postal addressing scheme is a UA-scheme.  It is  defined  for  human
processing, and therefore may tolerate a fair amount of redundancy which
improves the robustness of the scheme with  respect  to  errors  and  to
partial losses (like stains on envelopes).

At the top level of  the  hierarchy  there  is  the  country  name,  and
underneath  it  there  are  as  many  addressing  schemes  as  there are
countries.  Some countries use  ZIP  codes  which  identify  major  post
offices,   and   require  more  information  to  identify  the  terminal
addressee.  Other countries use the ZIP  code  (or  its  equivalent)  to
identify  the  individual  letter-carriers,  and require less additional
information to identify the terminal addressee.  In some cases, a street
address is sufficient.  In others, suite number and names are required.

In summary, the postal addressing tree is of variable depth (or:  postal
addresses  are  of  variable length).  Its top level (the country level)
has a complete connectivity, since every country knows how to route mail
to any other one.

Letters are routed, either directly to the destination or to one of  its
ancestors.   This  is  performed  by  POs  either  directly or via their
ancestors.

Since the address and routing processing is not fully  automated,  human
intelligence  is used for resolving ambiguities, for coping with unknown
addresses, for redundancy handling and the like.  Missing information is
usually  supplied  by  using  common sense and defaults.  As a result, a
great amount of variability in addresses can be tolerated.

For example, while at Tech, I received letters addressed to

                                     4



          Danny          (I was the only Danny there)
          256-80         (The Computer Science Mail-stop)
          91125          (The zip code for Caltech)

I also received letters addressed to:

          Dr. Dan Cohen
          Computer Science, Mail-Stop 256-80
          California Institute of Technology
          Pasadena, California 91125

The first address contains all  the  needed  information,  but  requires
delicate  handling.   If  any  of  the digits is mistyped, there is very
little  chance  that  the  letter  would  be  delivered  to  the   right
destination.   The  second  address,  which  has  about  six  times more
characters, is more robust, and can cope with the multi-Danny situation,
which  (even  though  undesirable)  is  still  very probable.  The terse
address has to be modified if another Danny joins this Mail-Stop.

The phone addressing is considered next.  Like the postal  system,  each
telephone station has a UA.  It always starts with the country code, and
usually continues with a Numbering-Plan-Area (NPA, which is the familiar
Area-Code,  AC),  continues  through  a  Central  Office (CO) number and
terminates with the station number.  Even though the above fields are of
variable  length,  the  total  length never exceeds 12 digits.  However,
when private networks are connected to the universal phone  system,  the
entire address length may exceed this limit.

For example, my current phone number  is  12138221511105,  which  is  14
digits  long.   The  first "1" is the USA code, and the last "105" is my
station (extension) number.

At the top level of  the  hierarchy  there  is  the  country  code,  and
underneath  it  there  are  as  many  addressing  schemes  as  there are
countries.  All  countries  know  how  to  communicate  with  any  other
country.   In  both  systems  addresses  are  of variable length, and by
looking at an arbitrary address one cannot parse it into fields  without
knowing the specifics of each national addressing scheme.

Here are some examples of telephone addresses and their correct  parsing
as COUNTRY-NPA-CO-station:

          1-213-822-1511 (USA, LA)
          44-1-387-3400  (UK, London)
          44-31-332-2424 (UK, Edinburgh)
          44-745-58-3301 (UK, Wales)
          972-4-25-2690  (Israel, Haifa)
          972-67-4-0777  (Israel, Kiryat Shmona)

Obviously, the number of fields to be dialed depends on "distance"  from
the destination.

                                     5



Since the telephone routing is performed by (relatively)  simple  minded
automated  equipment,  no  variability  in  dialing  a number is allowed
(except in very unusual situations).

Like the postal addresses,  phone  dialing  sequences  are  of  variable
length,  as a function of the distance to the destination.  While adding
the country code ("USA") would not hurt either of the  postal  addresses
shown  above,  adding  the  local AC and country code is not allowed for
local phone calls.

The reason is that one does not dial the  address  (telephone  numbers),
but dials the routing! The purpose of the telephone number is to be used
for accounting and for deriving routing, but not for verbatim-dialing.

The actual telephone routing (i.e., hop-by-hop) is  very  efficient  and
deserves   attention.    The   key  to  it  is  the  existence  of  more
communication lines than branches in the UA tree.

The USA is divided into 10 regions, which subsequently are divided  into
sections  and  areas.   The  grouping of areas into sections and regions
cannot be simply deduced from the ACs but has to be  found  by  a  table
lookup  operation.   This is because the ACs were most cleverly assigned
to areas by population size and not geographically like the ZIP codes.

The routing is performed by each center assigning each call  to  a  line
known  to  be  connected (or enroute) to the destination central-office.
This is performed by looking at the first  six  digits  of  the  address
(AC+CO).

If such a line does not exist, then the call is assigned to a line known
to be connected to the principal city of the destination NPA.  If such a
line does not exist either, the call is forwarded to  the  center  above
this one.  Since at the top all regions are interconnected, this process
is guaranteed to terminate.

In addition to the SIX digits recognition, the system is  designed  such
that  in  many  cases  CO  numbers  do  not  conflict across NPAs.  This
eliminated the need to dial an AC of a neighboring town, across a  state
line,  which  is  necessarily  in  a  different  NPA.   This allows, for
example, dialing from Washington, D.C.,  (AC=202)  to  Alexandria,  Va.,
(AC=703) and to Potomac, Md., (AC=301) without dialing the AC.

The third addressing scheme is the one used in the ARPANET.

Addresses on the ARPANET are of processes which are either  NCP-like  in
actual hosts or of other types in "fake" hosts.  It is logical to extend
the address notion "down" to include the port, too.

Conceptually, one can consider the ARPANET as a single process or  as  a
star  network.   The  fact  that this single process is implemented in a
very clever way by a multitude of IMPs is irrelevant, from a  functional
point  of  view.   This  allows treating this entire network as a single
addressable process, the {ARPANET}, if so required.

                                     6



Each IMP knows how to forward messages to any other, and  therefore  all
IMPs constitute the top-level (and the only level) for routing.

In the ARPANET all the connections are between centers  (nodes)  of  the
same  (and  only)  level.  This is in contrast to the telephone network,
which has several levels of centers, partially connected at  all  levels
(except  the  top,  where  they  are fully connected) and also partially
connected between levels.

An ARPANET address is of the form {ARPANET}-[HOST]-(PORT), which one may
consider  as  {ARPANET}-/IMP/-[HOST]-(PORT).  Adding the /IMP/ field may
help in the ARPANET situation, though some generality will be lacking.

This is a valid address, since the {ARPANET}-process (which is  the  set
of  all  IMPs)  can  forward  messages  to all hosts, and hosts can give
messages to PORTs.  Therefore, routing every message up to the {ARPANET}
and then down through the host to the destination port is a good default
routing strategy.


INTERNETWORK ADDRESSING

After this (very)  long  introduction,  let's  return  to  the  Internet
Addressing situation.

But first, let's introduce some more formality:

     *    An address is a string (i.e., an ordered set) of  symbols
          taken   from   a  given  alphabet  (e.g.,  ASCII,  {0,1},
          {1,2,...9,0}).

     *    In a UA tree the level of a given address is its depth in
          the tree.  The level of the address A is denoted by L(A).

     *    Address concatenation (extended to  the  right)  will  be
          used.  The concatenation is denoted by a "-".

     *    If both A1 and A2 are addresses of the  same  process  P,
          then

               (1)  L(A1) and L(A2) may differ, and

               (2)  The addresses (A1)-(X)  and  (A2)-(X)  are
                    necessarily addresses of the same process.

     *    Addresses  should  always  be  decodable  in  a  strictly
          left-to-right sequential manner (prefix coding).

What is the ARPANET address of my mailbox?

In the TENEX environment it is [ISI]-(MAILBOX)-<DANNY>.  But in  another
environment,  in  the  host  [X], it could be [X]-<DANNY>-(MAILBOX).  Or
maybe in the form [X]-/TCP/-(MAIL.DEPO)-<DANNY>.

                                     7



Obviously we want to expand it upwards, to allow for other networks.  We
could simply add in front of these addresses a network field and get

                   {ARPANET}-[ISI]-(MAILBOX)-<DANNY>

and similar addresses.  In other networks the value of the NETWORK-field
may be {PRNET-SF}, {PRNET-BOS}, {SATNET}, etc.

What is the relation between these nets? Do they all belong to the  same
parent like USA or USA-DoD?

This could be a solution, and if adopted my mail address may  be  upward
expanded in the UA scheme to be like:

{GALAXY-573}-[SOLAR.SYS]-(EARTH)-<USA>-{ARPANET}-[ISI]-(MAILBOX)-<DANNY>
With  a  clever use of defaults, the first several fields may be omitted
from most of the intraglobal communication.

The extension of this address upward, to include METANETs, SUPERNETs and
GIGANETs,  is  very  elegant.   This  is the prevailing popular approach
expressed in a series of notes and papers,  such  as  Ken  Harrenstien's
note,   and   various   other  communications  between  the  members  of
[SRI-KL]<NETINFO>FIELD-ADDRESS.LIST.  The  intelligent  reader  probably
has  noticed  by  now  that  this  note  does  not subscribe to the same
philosophy.

This solution suffers from several problems.   What  is  a  network?  Is
every  bus  to  which several processes are connected a network? If not,
why? What is the relation between networks?

In order to stress the difference between the structure of the  internet
address  and  of  the other universal addressing schemes, let's consider
the following example.  One of the ARPANET hosts, [PARC-MAXC],  is  tied
to a private network.  This network is actually a very rich internetwork
environment, with about  14  networks  and  about  400  hosts,  but  for
simplicity let us consider it functionally as a single {XEROX-net}.  One
of the  hosts  on  this  net  is  [RIG],  the  University  of  Rochester
Intelligent  Gateway  to the internal {U-of-R-net}.  This description is
not  accurate,  since  [RIG]  is  actually  connected  directly  to  the
{ARPANET},  not to {XEROX-net}.  But for the sake of the argument let us
assume this connectivity.  We preferred not to use other actual examples
for several reasons.  A poet's-license is nice to have....

One of the hosts on the {U-of-R-net} is, say,  [NOVA-3].   What  is  its
address? Obviously it is

     {ARPANET}-[PARC-MAXC]-{XEROX-Net}-[RIG]-{U-of-R-Net}-[NOVA-3].

By the way, whenever an intermediate host in such a specification (i.e.,
a gateway) is between two networks only, there is no need to specify the
destination network, since it is uniquely defined by context.   However,
this  is  not  a  good practice, since addresses have to be changed when
this host is connected to more networks.

                                     8



What is [ISI]'s address? Obviously {ARPANET}-[ISI].

"Not so!" scream the U-of-R people.  The addresses of  [NOVA-3]  and  of
[ISI] are quite different from the point of view of [NOVA-2].  According
to it, the address of [NOVA-3] is simply  {U-of-R-Net}-[NOVA-3]  but  of
[ISI] is {U-of-R-Net}-[RIG]-{XEROX-Net}- [PARC-MAXC]-{ARPANET}-[ISI].

Who is right? Neither!! Either approach is equally  wrong.   Neither  of
these  addresses  is above the other in the UA scheme.  All the networks
involved in the interconnection of these networks are  of  equal  level,
unless  we  decide  otherwise  for administrative reasons.  The internet
communication environment does not have up-and-down relations, except in
the eyes of some users, which may be very subjective.


INTERNETWORK ADDRESSES ARE DIFFERENT

Telephone stations are always connected to the system network, and their
position  in it dictates their addresses.  Not so with computer networks
which spring into an independent existence until they are interconnected
to  others,  if ever.  Therefore, their addresses cannot be deduced from
their positions (geographically or connection-wise) and vice versa.

Therefore, the network ID is an arbitrary string.  Who  assigns  it  and
makes sure that no conflicts occur? Is it NBS? Jon Postel? Another Czar?

At this point we suggest that:

     *    There is no universal hierarchy of networks, in  contrast
          to the telephone and the postal systems.

     *    There are too many networks to be named and/or  addressed
          in a single flat name/address space.

     *    Therefore, some naming/addressing  hierarchy  has  to  be
          introduced.  However, this addressing tree does not serve
          as  a   basis   (or   underlying   structure)   for   the
          communication connectivity.

     *    Routing cannot be computed from any point  to  any  point
          from the addresses alone.

How should  internetwork  routing  be  performed?  There  are  obviously
several  possibilities.   It could be performed entirely by the networks
involved (i.e., the communication system), by  the  source,  or  by  any
combination of the communication system and the source.

It is always  desirable  to  distribute  the  knowledge  about  possible
destinations to the various centers (gateways?), such that their "sphere
of knowledge" is as large as possible, though uniformity should  not  be
required.   More  knowledge  should  be kept about frequent destinations
than about less frequent ones.

                                     9



Since this information must be limited due to  practicalities  (such  as
finite  storage  and  updating  procedures),  it  is impossible that all
sources always know about all possible destinations.

What should be done about unknown destinations?

Several possibilities may be considered.

     o    Having a `supernet' of default sub destinations with  the
          hope  that  they  know  how to find a way to the terminal
          destination (like the phone and the postal systems),

     o    providing internetwork-wide directory services, or

     o    refusal of service.

Let's consider each of these three possibilities.

Having both <USA>-{ARPANET} and <USA>-{XEROX-Net-3} does  not  guarantee
the  existence  of the path {ARPANET}==<USA>=={XEROX-Net-3}.  Therefore,
the supernet (metanet?) is not a part of  the  underlying  communication
structure as in the phone and postal situations.

In order for this approach to work, one has to create this supernet  and
keep   it   updated   about  all  changes  of  the  entire  internetwork
environment.  Both the storage and the updating procedures  seem  to  be
impractical.

The  second  approach,  the  network-wide-directory-help,  is   a   very
reasonable  one.   It  has a major drawback in the necessity to maintain
centers with indefinite  knowledge  radius.   Note  that  the  telephone
directory  services are structured according to the addressing hierarchy
of NPA-CO, and are not flat as may be  suggested  for  the  internetwork
environment.   In  essence, this second approach has all the problems of
the first one.

The third approach, namely, refusal of service to unknown  destinations,
is  consistent,  to  say  the  least.   Admittedly,  it  seems  like  an
inconvenience to users.  It should be supplemented by ways of "learning"
about the unknown destinations, such that refusal should never occur, or
at most, only in very rare  situations.   Directory  services  could  be
helpful.

Hence, this approach supports serving only  destinations  to  which  the
routing  is  either apriori known or supplied.  Note how similar this is
to the telephone system philosophy.

In summary: The internet routing problem is different from  the  routing
problems of other systems because the internet environment does not have
a communication connectivity which supports a UA scheme,  and  therefore
the  addresses  cannot  support  a  direct address-to-routing mapping by
using only a definite amount of knowledge.

                                    10



I HATE TO ADMIT IT, BUT ...

At the beginning of this note, and in an earlier note, I  used  a  great
line telling that "names tell what the processes are, and addresses tell
where they are." It continues by "routings tell how to get there."

I hate to admit  that  by  now  I  have  some  reservations  about  this
definition.   My  name  is  "Danny."  My address is "ISI." When I was at
Tech, my name was  the  same,  but  the  address  was  different.   This
supports  the  definition.   How  about  the addresses in a broadcasting
media network? When a host changes its position (location) on  the  same
Ethernet,  its address does not change.  Well, maybe these addresses are
not real addresses, according to the definition.  Admittedly, this is an
uncomfortable thought.

I believe that there is a better explanation.  I suggest that an address
is "the canonic routing from the root of the addressing-tree." It sounds
recursive, doesn't it?

To be more precise, an addressing scheme is a hierarchical  organization
of  elements,  with  code assignment such that each element has a unique
set of codes, corresponding to its position in the hierarchy.

The notion that the address tells how-to-get-there from the root of  the
tree  is very similar to the notion that absolute coordinates are really
relative, with respect to the origin.

Since we know (by default) how to get from the source to  the  UA  root,
and since the address tells how to get to the destination from the root,
the address tells how to get from the source to the destination.

Hence, by definition, addresses are routings.

This leaves us only with names and routings.  This should  not  surprise
us  now,  since we already discovered that the telephone system has only
names and routings.

Since the general internet environment does not have  a  hierarchy,  the
notion  of addresses suffers.  Since the addresses are "routing from the
root," and since there is no root of the entire system,  our  conclusion
is that there are no addresses, only names and routing.  In other words,
what we are used to call an address is actually a routing  (even  though
it  is  of higher level than the hop-by-hop routing).  In a well defined
(and tightly  controlled)  environment,  such  as  the  {ARPANET},  this
address/routing  is  a  well  defined  string.  In general, it may be of
indefinite length and structure.

If the destination is in the neighborhood, such as the same network, the
same  nets-cluster,  the same agency (even on different net), the system
may have a built-in knowledge of  how  to  get  to  it.   Otherwise  the
routing information needed should be supplied by the sender.

                                    11



PROPOSAL FOR ADDRESSING AND ROUTING

Our proposal for addressing and routing is as follows:

     *    Establish a UA scheme, of variable level structure.

     *    Disseminate as much knowledge to each participating  node
          as deemed practical.

     *    Allow the option of routing to be included in the headers
          of the messages.

     *    Refuse delivery of messages to a destination with unknown
          routing

     *    Establish internet-directory-assistance service.

The  proper  use  of  the  optional  routing  is  to  supply  a  set  of
subdestinations  which  may  be  as far apart as the networks can handle
without help.  This is very  much  like  source  routing  for  telephone
connections,  where  a  sequence of switching-centers is designated by a
user, but the communication subsystem is free to optimize the hop-by-hop
routing between these centers.

As long as the number of  participating  networks  in  the  internetwork
environment  is  small,  it  is possible to have each of them know about
routing to all the others.  However, we already have a  large  community
of  networks,  including several PRNETs, networks in universities, about
15 networks at Xerox, the commercial and the national ones, many in DoD,
all  the  DEC-Nets,  and  many  others.   In  addition,  each big modern
substantial computing system is a network.

One may be advised to expect the number of networks  to  grow,  and  the
internet connectivity to get more and more obscure.

Therefore, optional source routing seems to be the  most  sensible,  and
the  only,  alternative.  This does not exclude the notion that internet
clusters of any size optimize their internal routing by any scheme,  for
example by ARPANET-like dynamic routing scheme.

We recommend that the optional source routing will be composed  of  self
level-identifying  fields.  Note that this is like the telephone dialing
sequence.

The   self-identifying   fields   could   be   implemented   either   by
codes-exclusion   (like  the  telephone  system)  or  by  identification
subfields.  Obviously these two schemes are equivalent, and  the  choice
between them is just a matter of convenience.