About this list Date view Thread view Subject view Author view Attachment view

From: Herbert Poetzl (herbert_at_13thfloor.at)
Date: Fri 21 Nov 2003 - 04:53:27 GMT


Hi Folks!

yesterday somebody (maja) asked me about the ipv4root
configuration of vservers and how they should be set
up correctly ...

as I could not answer this satisfactorily, I went back
to testing the tools, and looking at the code 8-)

I'll try now to explain what happens, and give some
hints what could be improved and what you should do
to get a specific setup ...

If I missed something, or got it terribly wrong, please
feel free to amend or rectify as apropriate ...

(can be found at)
http://vserver.13thfloor.at/Stuff/VServer-IP-Setup-0.1.txt

enjoy,
Herbert

---------------------

VServer IP Setup (A Journey with Bash)

0. The Players/Setup

0.1 Variables

    the vserver script knows and uses the following shell
    variables when it comes to deciding how to set up
    network interfaces and ipv4root:

    IPROOT, IPROOTDEV, IPROOTMASK, IPROOTBCAST, and NODEV

    while you might already know the first four, the last
    probably will sound unfamiliar, as it can't be specified
    in the <vserver>.conf file yet[1].

    NODEV is set by the --nodev option passed to the script
    and basically disables the network alias setup, but not
    the chbind, which might give unexpected results.

0.2 The Pieces

    a look at the script shows, that the actual work is
    broken down into smaller pieces, designed according to
    'divide et impera' (divide and conquer)

    - ifconfig_iproot() will setup the aliases
    - setipopt() will generate the --ip list

    as those are different parts of this script, they will
    be described independently of each other.

1. The Network Aliases

1.1 Basic Requirements

   basically the following is required to execute the code
   creating an alias of an existing interface:

    - NODEV is not set (--nodev option not given)
    - IPROOT is not empty and does not have the values
      IPROOT="0.0.0.0" or IPROOT="ALL"

1.2 For each entry ...

   then for each entry in IPROOT (entries are separated by
   spaces, as in every good shell script) the following
   assignments and checks are done ...

    - is there a ':' in this entry, if
      o YES, then left part is the <device>
      o NO, then the <device> is IPROOTDEV
 
    - is there a '/' in this entry, if
      o YES, then the right part is the <netmask>
      o NO, then the <netmask> is IPROOTMASK
   
    after those checks, we can assume that <device>, <ip>
    and <netmask> where either specified or assigned the
    'default' values (which might be empty).

1.3 Is there a Device?

    now the script checks whether <device> is non empty
    (which means either IPROOTDEV or the device part for
    this entry wasn't empty), and if found so, does the
    following (if not, continue with the next entry):
    
    - if a vlan device was specified (<name>.<vlan>)
      some vlan setup (vconfig add ...) is done and
      a fake base address (127.0.0.1) is assigned.
      
    anyway, the next step is very interesting, as it
    introduces some indeterministic components, due to
    an uninitialized pair of variables[2] ...
    
1.4 Determinism? Sometimes!

    the up to now collected values for <device>, <ip>,
    <netmask> are completed by a <broadcast> which is
    set to IPROOTBCAST (which can be empty) and fed
    into a (C++/C) tool called 'ifspec' which aims to
    give the 'device specification' in a 'shell usable'
    way ...
    
    called like this
    
    # ifspec <device> <ipaddr> <netmask> <broadcast>
    
    it basically does the following:
    
    - if <ipaddr> is non empty, output ADDR=<ipaddr>
    - otherwise try to get the ipaddr from the
      interface <device> (via SIOCGIFADDR), and if
      successful, output that in the same format.
      
    now the same is done for <netmask> with
    NETMASK=<netmask>/SIOCGIFNETMASK and for
    <broadcast> with BCAST=<broadcast>/SIOCGIFBRDADDR
    except for the detail, that if the <broadcast>
    isn't specified, and can't be retrieved from the
    kernel, the tool tries to compute that value in
    the following manner:
    
    <broadcast> = (<ipaddr> & <netmask>) | ~<netmask>
    
    which is perfectly right, if both, <ipaddr> and
    <netmask> either have been specified or returned
    by the kernel, and interesting[2] if not ...
                          
1.5 Information Feedback

    the next step is simple, the generated output of
    ifspec is evaluated and IPROOTMASK and IPROOTBCAST
    are updated to the reported values ...
    
    the actual interface alias is created with
    
    ifconfig <device>:<name><suffix> <ipaddr> \
            netmask $IPROOTMASK broadcast $IPROOTBCAST
    
    after that, the next entry is processed.
    
1.6 Summary so far

    - a specified device/mask in an entry has priority
      over the 'defaults' IPROOTDEV/IPROOTMASK.
      
    - the mask/bcast is 'calculated' or 'retrieved'
      from the kernel if not specified via entry or
      IPROOTMASK/IPROOTBCAST

    - an entry has this format:
              [<device>:]<ipaddr>[/<netmask>]
      where <ipaddr> and <netmask> have to be in
      Dotted Quad Octet (aaa.bbb.ccc.ddd)
    

2. The IPV4Root
    
2.1 The List of IPs

    when it comes to the ipv4root setup, only entries
    in IPROOT matter, while IPROOTMASK is silently
    ignored (IPROOTDEV and IPROOTBCAST are not relevan)
    
    - if IPROOT is empty, then "0.0.0.0" is used instead
    - if IPROOT="ALL", then a tool called 'listdevip'
      generates a list of all configured IPs (including
      lo/127.0.0.1 and whatever is configured)
    - otherwise for each entry the optional <device>
      part is removed and --ip is prepended ...
      
2.2 The chbind tool

    the sequence of --ip <ipaddr>[/<netmask>] pairs
    generated in the previous step, is passed to the
    chbind tool as arguments, actually restricting the
    environment to those addresses (max 16 for now)[3]
    
    the funny part here is, that the tool is capable
    of understanding /XX netmasks, where the shell
    script and ifconfig are not, which can give some
    funny results like 'Invalid IP number or netmask:'
      
2.3 Conclusions here

    - don't rely on IPROOTMASK fallback
    - don't use /XX masks, unless you know what you
      are doing (see Dirty Tricks)
    - do not specify more than the allowed IPs
    - be careful with empty IPROOT or IPROOT="ALL"

3. Dirty Tricks (Examples)

3.1 The existing Interface
     
    eth0 Link encap:Ethernet HWaddr 52:54:00:12:34:56
          inet addr:192.168.0.1 Bcast:192.168.0.255 Mask:255.255.255.0
          UP BROADCAST RUNNING MULTICAST MTU:1500 Metric:1

    lo Link encap:Local Loopback
          inet addr:127.0.0.1 Mask:255.0.0.0
          UP LOOPBACK RUNNING MTU:16436 Metric:1

 a) you want to allow one IP, for example 192.168.0.1
    for a vserver, but don't want the script to setup
    an alias ...
        
    IPROOT="192.168.0.1"
    
    - because IPROOTDEV isn't specified, and the
      entry doesn't contain an interface, <device> is
      empty and no alias is created.

 b) you want to allow more than one IP, but no aliases
     
    IPROOT="192.168.0.1 127.0.0.1"

3.2 For a specific device

    xyz0 Link encap:Ethernet HWaddr 00:00:00:00:00:00
          inet addr:10.0.0.1 Bcast:10.0.255.255 Mask:255.255.0.0
          UP BROADCAST RUNNING NOARP MTU:1500 Metric:1
    
 a) you want to add one alias for this interface
 
    IPROOT="xyz0:10.0.1.1"
    
    (the alias will be xyz0:<name>)
    
 b) you want to add more than one alias on the same
    interface/network
     
    IPROOT="xyz0:10.0.1.1 xyz0:10.0.1.2"
    
    (the aliases will be xyz0:<name> and xyz0:<name>1)
    
3.3 For a specific network
    
 a) an additional alias for eth0, and ip 172.16.0.1/20

    IPROOT="eth0:172.16.0.1/255.255.240.0"

4. Error Messages

4.1 From ifconfig (Aliasing)

 a) broadcast: Unknown host
    SIOCSIFADDR: Invalid argument

    - those are usually caused by an uninitialized
      interface, which results in funny values for
      the ifconfig statement
    
      + make sure that the <device> you want to
        create an alias for, is up

 b) SIOCSIFADDR: No such device
    eth2:ZZZZ: ERROR while getting interface flags: No such device

    - this is a good sign, that you specified an
      interface which doesn't exist ...
      
 c) SIOCSIFNETMASK: Invalid argument
    Invalid IP number or netmask: 24

    - a netmask was specified in the /XX format,
      which cannot be handled correctly by the
      script (for now)

4.2 From chbind

 a) Segmentation fault
 
    - you managed to call 'chbind --ip' please share
      the knowledge how you did it?

 b) Invalid netmask: 256.1.1.2
 
    - obviously the netmask is wrong

 c) Invalid IP number or host name: 256.0.0.1
 
    - obviously the ip/hostname is wrong

5. And the Future?

5.1 Useful Enhancements

    - extend the script to actually understand
      /XX netmasks, and convert them for ifconfig

    - add an option to display the actual ifconfig
      statements and IPOPT lists
      (would avoid a lot of questions)

    - fix the ifspec bug, and do some sanity checks
      regarding netmasks and interfaces ...

    - add a 'cleanup' option to 'remove' the aliases
      and mounts done by an 'enter' on a stopped
      vserver.

5.2 Internal Changes
      
    - use iproute2 instead of ifconfig
     
    - check for interface name length, to avoid
      collisions[4]
    
6. The End
      

[1] it could be useful to specify this on a 'profile'
    basis, this way, a test profile could leave out
    the network stuff ...
    
[2] struct {
            unsigned long addr;
        unsigned long mask;
    } solved;
    
    (jack, enrico, please fix this in both branches)
    
[3] this can be changed in the kernel, but the ip
    comparison is linear, so each packet will be
    checked agains all addresses ...

[4] actually the max length of a network interface
    name is determined by #define IFNAMSIZ 16
    which means 15 chars and one zero, so the usual
    eth0 alias 'eth0:abcdefghij' will have 10 chars
    left for the vserver name and the suffix, which
    if the name is longer than 9 chars will be
    ignored (which gives nice misconfigurations)

(C) 2003 Herbert Pötzl
-------------------------------------------
Permission is granted to copy, distribute and/or modify this
document under the terms of the GNU Free Documentation License,
Version 1.2 or any later version published by the Free Software Foundation.

_______________________________________________
Vserver mailing list
Vserver_at_list.linux-vserver.org
http://list.linux-vserver.org/mailman/listinfo/vserver


About this list Date view Thread view Subject view Author view Attachment view
[Next/Previous Months] [Main vserver Project Homepage] [Howto Subscribe/Unsubscribe] [Paul Sladen's vserver stuff]
Generated on Fri 21 Nov 2003 - 04:55:27 GMT by hypermail 2.1.3