vserver development mailing list: Re: [vserver] Copy-on-write Hard Links, Shared Libraries, Prelink and Memory

From: Gordan Bobic <gordan_at_bobich.net>
Date: Tue 08 Jun 2010 - 23:39:51 BST
Message-ID: <4C0EC6B7.7060804@bobich.net>

On 06/08/2010 11:11 PM, Herbert Poetzl wrote:
> On Tue, Jun 08, 2010 at 06:13:50PM +0100, Gordan Bobic wrote:
>> I apologize in advance if this is a silly question, but I am
>> not familiar enough with the low level workings of mmap() on
>> Linux to know the answer, so I'll ask.
>
>> I understand that VServer has a feature to de-dupe identical
>> files into copy-on-write marked hard-links. My questions are:
>
>> 1) How does this approach co-exist with prelink (daily cron job
>> on most distributions)?
>
> when prelink modifies the library, the CoW link will be
> broken, and if it runs on every guest, there will be
> different library files which cannot be unified again
>
>> This modifies the binaries, and different VMs, unless they are
>> all identical, are likely to end up with files getting unshared
>> very quickly.
>
> correct ....
>
>> Is the only available solution to un-prelink everything and
>> disable prelink?
>
> well, you can live with the fact that the libraries are
> not unified, the executables will still benefit from
> unification as well as data files
>
>> Or is there a way to get both the performance advantage of
>> perlink and the storage space saving (and caching efficiency
>> savings)?
>
> IMHO the performance advantage of prelink is at least
> debateable (not just because of the fact that often
> relocation is still required) but you have the following
> option ...
>
> - disable prelink inside the guests
> - prelink the 'template' guest when created/updated
> - copy over the prelinked libraries to the guests

Thanks.

Is it also possible to hard-link files between the host and the guest?
e.g. if I am running CentOS VMs on a CentOS host, is there any reason
why I couldn't hard-link the files that exist on all of them across them?

If there is no reason why this couldn't be done, are there any security
implications of this? I'm guessing not if changes in the guest result in
copy-on-write, but I thought I'd ask anyway.

Finally, is there a utility for re-merging files that got unmerged, if
they get replaced with the old version that is again the same as the
template? Can somebody point me at the relevant bit of documentation on
this?

>> 2) I've been pondering how something like KSM could be used
>> for all memory on a physical host rather than having to
>> patch every package, almost to the point of renaming malloc()
>> and wrapping it so that all malloc()-ed memory gets marked
>> by madvise().
>
> I presume with KSM you are referring to the Kernel
> Samepage Merging recently merged in mainline, and not
> the Kernel Security Modules or other things shortened
> to KSM :)

Indeed, I am referring to Kernel Samepage Merging. :)

>> So, what happens when multiple files that are hard-linked
>> get mmap()-ed? If, say, glibc is merged between two VServer
>> VMs into a single file with two hard-links, will its memory
>> be allocated once per VM that accesses it, or will it all
>> be mapped by the same physical block of the shared memory
>> across all the VMs?
>
> seems you are already stumbling over your nomenclature
> here (multiple files vs single file with two hard links :)
> so let's put that right first, and the answer will present
> itself:
>
> unix uses inodes to store data (content) and metadata
> (directory entries).
>
> files are special directory entries which point to a
> specific inode containing the actual data (the content).
>
> two (or many :) hardlinks are entries in directory inodes
> pointing to the same data (content), while having different
> names (pathes to be precise).
>
> mapping and reading of content is done based on inodes
> while lookup and naming is done based on directory entries.
>
> conclusion:
>
> hard links to the same inode will use the very same
> inode cache and thus end up providing the same physical
> page (once mapped) to all 'users' (with the appropriate
> virtual mapping of course). this is what makes unification
> a huge benefit not only disk but also memory wise when
> done properly :)

Right. So if I have N Vserver (or OpenVZ, or LXC) VMs that have glibc
hard-linked to the same file, glibc will only use 1x memory rather than Nx?

>> All this assuming there isn't something clever going on
>> (e.g. LD_PRELOAD, of some sort) that somehow marks the VM's
>> individual process memory allocation with madvise() so KSM
>> can operate on it.
>
> KSM is basically a way to find 'shared' data where an
> explicit 'sharing' can not be expressed, e.g. with VMs
> like KVM (it would be hard to tell the kernel that a
> file inside a loopback/partition should be shared between
> two different machines, possibly across different loop
> devices :)

Sure, but in VMs, even of the sort of VServer (or OpenVZ or LXC)
although the sharing can be expressed by hard-linking the shared
libraries, some things get missed. One thing I have found from using
compcache is that there are often thousands of allocated but 0 filled
memory pages on the system at any time. These could safely be merged and
the memory used more productively. Of course, there are also other cases
where more than shared libraries result in existence of identical memory
pages which could be merged.

Hence why I'm pondering if all memory allocation could be wrapped by
intercepting memory allocation calls via LD_PRELOAD so that it is marked
with madvise() for KSM to operate on.

Gordan
Received on Wed Jun 9 00:51:58 2010