vserver development mailing list: Re: [Vserver] Hard freezes on SMP?

From: Herbert Poetzl <herbert_at_13thfloor.at>
Date: Fri 28 Oct 2005 - 14:45:02 BST
Message-ID: <20051028134502.GE11552@MAIL.13thfloor.at>

On Thu, Oct 27, 2005 at 10:56:58PM +0200, Grzegorz Nosek wrote:
> Hello all,
>
> I have noticed a disturbing pattern on my smp systems with vserver
> patches (2.6.13.4 with vserver patches from gentoo). If the load
> average is quite high (in my situation it was about 70 in one context
> and about 30 in context 1), I experience random hard freezes. These
> are not (IMO) from thrashing etc., as the machine easily survives
> higher loads. At the console I can switch between VTs and that's about
> all I can do. No networking or anything.
>
> To keep it as clear as possible (too much blood in my caffeine
> stream): I have two SMP machines (a dual 1.8GHz Xeon and an AMD64 x2
> 3800+ in 32-bit mode). While experimenting with vserver guests on the
> Xeons I have often encountered oopses when the vserver was not shut
> down properly (due to issues with my initscripts). It looked like
> this:
>
> - vserver vXXX start
> (some errors from my scripts)
> ...
> - vserver vXXX stop
> (shutdown messages, hanging after 'Deconfiguring network interfaces')
>
> vwait waits and waits forever (I haven't patched it yet)
>
> after killing vwait I can no longer access the context (chcontext
> segfaults with a kernel oops - I should still have logs somewhere if
> you are interested)

yes, definitely, any _oops_ or _stack_ _trace_ issued while a
linux-vserver kernel is running _is_ interesting and should
be reported back to the linux-vserver kernel developers ...

> The stack traces apparently have null dereferences in an impossible
> place. The oops seems to happen in __create_vx_info, just after
> returning from __dealloc_vx_info. That line contains an instruction
> like mov %eax,%esi or something to this effect (not accessing memory
> at all).

this is something we fixed in devel recently (two weeks ago?)
as you said, you are using the gentoo (devel) branch this might
be related, but still not updated ...

> I also experienced occassional lockups under high load (a make -j100
> kernel build inside one vserver :) )

could be related to, we had a thread and discussion about
similar effects, once again, this only affects the devel
branch and is already fixed ...

> I have compiled the kernel again with vserver debugging and history
> logging (whatever it is called) and yesterday when I was shutting down
> a vserver vwait didn't exit too. So I killed it and wanted to
> chcontext into that vserver to invoke the kernel oops and have some
> more debugging info. The machine locked hard (under zero load). I was
> unable to recover any debugging info as it didn't hit syslog (will
> build something with network console soon probably.
>
> The AMD64 box was experiencing random lockups too, not related to
> shutting down vservers or anything like that, just when the load was a
> bit higher. I booted a uni-processor kernel and it seems to work OK so
> far.
>
> Has anybody experienced similar problems? I can run the boxes UP for
> now but I'd really need SMP before going into production.
>
> OK, enough of this babbling ;) I suspect that some part of vserver
> support is not SMP-safe in some way. Although I have no real debugging
> data, my gut feeling says it's some spinlock deadlock (and some deep
> bowels add that it might be inside the scheduler). I'll try to gather
> some more information (with a kernel with all possible debugging on
> and a network console).

finally here the fix(es) we did :)

(they are in 2.1.0-rc4)

http://vserver.13thfloor.at/Experimental/delta-2.6.13.3-vs2.1.0-rc3-rc3.1.diff.bz2
http://vserver.13thfloor.at/Experimental/delta-2.6.13.3-vs2.1.0-rc3.1-rc3.2.diff.bz2
http://vserver.13thfloor.at/Experimental/delta-2.6.13.3-vs2.1.0-rc3.2-rc3.3.diff.bz2
http://vserver.13thfloor.at/Experimental/delta-2.6.13.3-vs2.1.0-rc3.3-rc3.4.diff.bz2

> If you need more information about my setup, feel free to ask.

as usual, output of testme.sh would be helpful ...
(as written on the testing page)

thanks,
Herbert

> Best regards,
> Grzegorz Nosek
> _______________________________________________
> Vserver mailing list
> Vserver@list.linux-vserver.org
> http://list.linux-vserver.org/mailman/listinfo/vserver
_______________________________________________
Vserver mailing list
Vserver@list.linux-vserver.org
http://list.linux-vserver.org/mailman/listinfo/vserver
Received on Fri Oct 28 14:45:37 2005