Re: [vserver] Issue with OverlayFS in 3.18

From: Corey Wright <undefined_at_pobox.com>
Date: Sun 01 Mar 2015 - 08:02:21 GMT
Message-Id: <20150301020221.e1533886b43c6544a11433e5@pobox.com>

On Sun, 22 Feb 2015 16:50:00 +0100
Oliver Welter <mail@oliwel.de> wrote:

> Am 22.02.2015 um 15:50 schrieb Oliver Welter:
> > Hi Bertl,
> >
> > Am 02.02.2015 um 11:11 schrieb Herbert Poetzl:
> >> On Mon, Feb 02, 2015 at 08:10:15AM +0100, Oliver Welter wrote:
> >>> Am 02.02.2015 um 01:58 schrieb Herbert Poetzl:
> >>>> On Sun, Feb 01, 2015 at 03:28:22PM +0100, Oliver Welter wrote:
> >>
> >>>>> Trying the same without starting the vserver, using a simple
> >>>>> "chroot" works, so it looks like some limits are not working
> >>>>> correctly.
> >>
> >>>> Please try without Linux-VServer isolation but with the same
> >>>> namespace setup, I don't think that you are hitting a limit
> >>>> here, it looks more like a namespace problem of overlayfs.
> >>
> >>> Can you be a bit more verbose - I never git in touch with the
> >>> individual components ;)
> >>
> >> Either use vnamespace or the unshare tool to create a namespace
> >> setup identical to your guest and check if the issue remains.
> >
> > I am unable to reproduce it with vnamespace or unshare, I used attached
> > test script and ran it as:
> >
> > vnamespace -n /mnt/test.sh
> > vcontext --create --xid 42 /mnt/test.sh
> > unshare -i -m -u test.sh
> >
> > with the correct result (file breakme is replaced by a directory on the
> > common mount).
> >
> > When you do the same inside a vserver, it does not work (use overlay to
> > assemble rootfs of vserver, start vserver, try to change entity which
> > exists on the lower but not on the upper fs).
> >
> I think I got the root cause - overlayfs uses character files to mark
> whiteouts and the vserver env seems to refuse access to this whiteout
> file. Adding SYS_ADMIN "makes it work", so I guess the right way would
> be to skip the access check for overlayfs whiteouts.

thanks for debugging it to the point of identifying CAP_SYS_ADMIN, which
was the pertinent piece of information, but the character devices were a
red herring [1] (partially because of my confusion in how and when they are
used in overlayfs).

vattribute --set --xid ${VS_XID} --ccap ^3
OR
echo ^3 >>/etc/vservers/${VS_NAME}/ccapabilities

(or "fs_trusted", instead of "^3", once daniel merges this [2].)

DISCLAIMER: this is just a workaround due to linux-vserver *not*
supporting the use of overlayfs from within a vserver.

another pertinent piece of information that i identified from my testing
is that the error happens to directories, but not regular files. so
reading through the overlayfs source code [3] i noticed that in
ovl_create_over_whiteout() a directory is treated differently than
everything else by calling ovl_set_opaque() on it which (eventually)
sets an xattr of "trusted.overlay.opaque" on the directory. of course
after reading through the source code i realize this is all documented
[4] (but it's more memorable this way ;).

before setting the xattr, the kernel checks for the necessary
capabilities, which is normally just "capable(CAP_SYS_ADMIN)", but
patched in linux-vserver kernels as "vx_capable(CAP_SYS_ADMIN,
VXC_FS_TRUSTED)". this isn't a problem in the non-vserver overlayfs
use-case because overlayfs raises CAP_SYS_ADMIN on the potentially
unprivleged process when it's creating an opaque directory (ie a
directory in the upperdir that overrides a file in the lower dir). with
a vserver-patched kernel, a vserver needs both CAP_SYS_ADMIN and
VXC_FS_TRUSTED to create the "trusted.overlay.opaque" xattr and as
overlayfs only raises CAP_SYS_ADMIN, and not VXC_FS_TRUSTED, the
operation fails.

vx_capable(CAP_SYS_ADMIN, VXC_FS_TRUSTED) =
1. capable(CAP_SYS_ADMIN)
OR
2a. cap_raise(current_cap(), CAP_SYS_ADMIN) (which is always true
    because of overlayfs' cap_raise(CAP_SYS_ADMIN))
&&
2b. vx_ccaps(VXC_FS_TRUSTED))

to have overlayfs "just work" in a vserver, we either need to create a
vx_cap_raise() for setting capabilities in the current vserver context,
just as we have vx_cap_raised() which tests capabilities in the current
vserver context, which would cause (1) to return true, or we need a
vx_ccap_raise() for raising context capabilities so (2b) will return
true.

the trusted.* xattr namespace appears to only be used within the kernel
by the overlayfs and lustre filesystems, so i believe it's relatively
safe to give a vserver FS_TRUSTED (especially as CAP_SYS_ADMIN would
also be needed), though i don't know what in user-land uses the
trusted.* xattr namespace, specifically from the host, and therefor
would next expect a guest to be able to set/clear/modify those
attributes.

[1] https://en.wiktionary.org/wiki/red_herring, second definition
[2] https://github.com/linux-vserver/util-vserver/pull/17
[3]
https://git.kernel.org/cgit/linux/kernel/git/mszeredi/vfs.git/commit/?h=overlayfs.v24&id=07d3c379edef433bba39ac14abdbbe1a1f6bb47a
[4] https://www.kernel.org/doc/Documentation/filesystems/overlayfs.txt

corey

--
undefined@pobox.com
> Oli
> 
> 
> 
> -- 
> Protect your environment -  close windows and adopt a penguin!
Received on Sun Mar 1 08:02:43 2015
[Next/Previous Months] [Main vserver Project Homepage] [Howto Subscribe/Unsubscribe] [Paul Sladen's vserver stuff]
Generated on Sun 01 Mar 2015 - 08:02:43 GMT by hypermail 2.1.8