Right. After a long conversation on this topic with Bj÷rn, this is what
I got out of it;

First, my idea of what namespaces are, where they sit, etc was a bit out
of whack. C'est la vie.

There are two ways we can approach this.

One, is to have the namespace start its life with a partial copy of the
namespace tree. ie, all the mounts for the vserver (that need external
VFS access) will need to be set up before the first CLONE_NS into it.
This is apparently similar to what Alexy does with FreeVPS.

However, this means that if you enter the namespace, then there is no
way to access host tools.

Two, is to start off with the new namespace, as before, and do a
cleanup later. We have the option of whether or not to clean up
*everything inaccessible* or just *everything irrelevant*.

Personally, I don't have a problem with the vserver holding open a
vfsmount for the host root filesystem (and any parents of its own
mount). It can't escape it, as its own root is a mount --bind, and it
means that you do have the option of entering the context and the
namespace, and still having access to tools running on the host system.

That would mean that cleanup could happen in userspace - like the patch
I have provided - and all that would be needed would be to make
/proc/mounts only display mounts from the last "/" inside a namespace.


Sam Vilain wrote:
>> well, with the help of the 'great kernel' we can actually do a lot of
>> things ... we just need to
>> design a concept, then test and implement it ...
> yep. especially since we're still in `alpha' tools status, and so
> Enrico doesn't need to hurt his head worrying about each new 0.30.19x
> release supporting every 1.9.x release :-)

