Re: [vserver] opteron server dies with vserver patch.

From: Pawel Sikora <pluto_at_pld-linux.org>
Date: Tue 30 Aug 2011 - 15:53:38 BST
Message-ID: <1453618.GxPOk3viax@pawels>

On Tuesday 30 of August 2011 14:17:58 Herbert Poetzl wrote:
> On Mon, Aug 29, 2011 at 02:16:16PM +0200, Pawel Sikora wrote:
> >> Herbert Poetzl wrote:
>
> >>> heh, the observed opteron's machine lock also occurs on a
> >>> loaded desktop machine with single disk and quad-core i3 cpu.
>
> >>> I'm wondering if the ext4 filesystem could be a faulty element?
>
> >> what kernel/patch version was that again?
>
> > it was on the 2.6.38.8 with vs2.3.0.37-rc17.
>
> >>> all these locking machines have ext4. i'll check ext3 in the
> >>> next week...
>
> >> good idea ...
>
> > switching from ext4 to ext2 (ext3 not tested yet) looks like a
> > working hack. at least the 3.0.3 kernel with vs3.2.1-pre10 is
> > working on ext2 with full load so far...
>
> I still cannot reproduce the issue and I'm still confused
> why your system only logs the last part of a kernel stack
> trace ...

~24h of ext2 stressing and successfull crash again :)
now i'm testing again the ext4 on 3.0.3-vs2.3.1-pre10 with CONFIG_TAGGING_NONE=y.
if it crash then i'll test without VSERVER_COWBL option.

> > on the ext4 it dies in few minutes. recent dmesg and decoded
> > patterns at:
> > http://pluto.agmk.net/kernel/vs-crash-3.0.3-vs2.3.1-pre10/
>
> running kernel builds inside a guest for a day now (on ext4)
> without any apparent issues or kernel messages ... maybe you
> could elaborate on the setup, especially the ext4 mount, the
> options, number of mounts, etc
>
> maybe also the load which seems to trigger that issue would
> be helful/interesting ...

one the dual opteron server the ext4 partition exists on mdX (soft raid0 on 4 disks)
and it's mounted at /home/atest with rw,noatime options. machine has 64GB ecc ram,
128GB of unused swap and average load during testing ~15.

one the i3-540 desktop the ext4 partition exists on single linear lvm (1 disk)
with the same mount options. machine has 8GB non-ecc ram with 4GB of unused swap.

totally different hardware configurations and the same lock looks like more generic problem
related to smp and/or avg. load. moreover, i've discovered today on the vanilla 3.0.2 kernel
a nice lockup with better stacktrace -> https://lkml.org/lkml/2011/8/30/112

BR,
Paweł.
Received on Tue Aug 30 15:54:23 2011

[Next/Previous Months] [Main vserver Project Homepage] [Howto Subscribe/Unsubscribe] [Paul Sladen's vserver stuff]
Generated on Tue 30 Aug 2011 - 15:54:24 BST by hypermail 2.1.8