On Thu, Oct 27, 2005 at 10:56:58PM +0200, Grzegorz Nosek wrote:
> Hello all,
> I have noticed a disturbing pattern on my smp systems with vserver
> patches (184.108.40.206 with vserver patches from gentoo). If the load
> average is quite high (in my situation it was about 70 in one context
> and about 30 in context 1), I experience random hard freezes. These
> are not (IMO) from thrashing etc., as the machine easily survives
> higher loads. At the console I can switch between VTs and that's about
> all I can do. No networking or anything.
> To keep it as clear as possible (too much blood in my caffeine
> stream): I have two SMP machines (a dual 1.8GHz Xeon and an AMD64 x2
> 3800+ in 32-bit mode). While experimenting with vserver guests on the
> Xeons I have often encountered oopses when the vserver was not shut
> down properly (due to issues with my initscripts). It looked like
> - vserver vXXX start
> (some errors from my scripts)
> - vserver vXXX stop
> (shutdown messages, hanging after 'Deconfiguring network interfaces')
> vwait waits and waits forever (I haven't patched it yet)
> after killing vwait I can no longer access the context (chcontext
> segfaults with a kernel oops - I should still have logs somewhere if
> you are interested)
yes, definitely, any _oops_ or _stack_ _trace_ issued while a
linux-vserver kernel is running _is_ interesting and should
be reported back to the linux-vserver kernel developers ...
> The stack traces apparently have null dereferences in an impossible
> place. The oops seems to happen in __create_vx_info, just after
> returning from __dealloc_vx_info. That line contains an instruction
> like mov %eax,%esi or something to this effect (not accessing memory
> at all).
this is something we fixed in devel recently (two weeks ago?)
as you said, you are using the gentoo (devel) branch this might
be related, but still not updated ...
> I also experienced occassional lockups under high load (a make -j100
> kernel build inside one vserver :) )
could be related to, we had a thread and discussion about
similar effects, once again, this only affects the devel
branch and is already fixed ...
> I have compiled the kernel again with vserver debugging and history
> logging (whatever it is called) and yesterday when I was shutting down
> a vserver vwait didn't exit too. So I killed it and wanted to
> chcontext into that vserver to invoke the kernel oops and have some
> more debugging info. The machine locked hard (under zero load). I was
> unable to recover any debugging info as it didn't hit syslog (will
> build something with network console soon probably.
> The AMD64 box was experiencing random lockups too, not related to
> shutting down vservers or anything like that, just when the load was a
> bit higher. I booted a uni-processor kernel and it seems to work OK so
> Has anybody experienced similar problems? I can run the boxes UP for
> now but I'd really need SMP before going into production.
> OK, enough of this babbling ;) I suspect that some part of vserver
> support is not SMP-safe in some way. Although I have no real debugging
> data, my gut feeling says it's some spinlock deadlock (and some deep
> bowels add that it might be inside the scheduler). I'll try to gather
> some more information (with a kernel with all possible debugging on
> and a network console).
finally here the fix(es) we did :)
(they are in 2.1.0-rc4)
> If you need more information about my setup, feel free to ask.
as usual, output of testme.sh would be helpful ...
(as written on the testing page)
> Best regards,
> Grzegorz Nosek
> Vserver mailing list
Vserver mailing list
Received on Fri Oct 28 14:45:37 2005