[Vserver] Hard freezes on SMP?

From: Grzegorz Nosek <grzegorz.nosek_at_gmail.com>
Date: Thu 27 Oct 2005 - 21:56:58 BST
Message-ID: <121a28810510271356s52de640ay@mail.gmail.com>

Hello all,

I have noticed a disturbing pattern on my smp systems with vserver
patches ( with vserver patches from gentoo). If the load
average is quite high (in my situation it was about 70 in one context
and about 30 in context 1), I experience random hard freezes. These
are not (IMO) from thrashing etc., as the machine easily survives
higher loads. At the console I can switch between VTs and that's about
all I can do. No networking or anything.

To keep it as clear as possible (too much blood in my caffeine
stream): I have two SMP machines (a dual 1.8GHz Xeon and an AMD64 x2
3800+ in 32-bit mode). While experimenting with vserver guests on the
Xeons I have often encountered oopses when the vserver was not shut
down properly (due to issues with my initscripts). It looked like

- vserver vXXX start
(some errors from my scripts)
- vserver vXXX stop
(shutdown messages, hanging after 'Deconfiguring network interfaces')

vwait waits and waits forever (I haven't patched it yet)

after killing vwait I can no longer access the context (chcontext
segfaults with a kernel oops - I should still have logs somewhere if
you are interested)

The stack traces apparently have null dereferences in an impossible
place. The oops seems to happen in __create_vx_info, just after
returning from __dealloc_vx_info. That line contains an instruction
like mov %eax,%esi or something to this effect (not accessing memory
at all).

I also experienced occassional lockups under high load (a make -j100
kernel build inside one vserver :) )

I have compiled the kernel again with vserver debugging and history
logging (whatever it is called) and yesterday when I was shutting down
a vserver vwait didn't exit too. So I killed it and wanted to
chcontext into that vserver to invoke the kernel oops and have some
more debugging info. The machine locked hard (under zero load). I was
unable to recover any debugging info as it didn't hit syslog (will
build something with network console soon probably.

The AMD64 box was experiencing random lockups too, not related to
shutting down vservers or anything like that, just when the load was a
bit higher. I booted a uni-processor kernel and it seems to work OK so

Has anybody experienced similar problems? I can run the boxes UP for
now but I'd really need SMP before going into production.

OK, enough of this babbling ;) I suspect that some part of vserver
support is not SMP-safe in some way. Although I have no real debugging
data, my gut feeling says it's some spinlock deadlock (and some deep
bowels add that it might be inside the scheduler). I'll try to gather
some more information (with a kernel with all possible debugging on
and a network console).

If you need more information about my setup, feel free to ask.

Best regards,
 Grzegorz Nosek
Vserver mailing list
Received on Thu Oct 27 22:03:44 2005

[Next/Previous Months] [Main vserver Project Homepage] [Howto Subscribe/Unsubscribe] [Paul Sladen's vserver stuff]
Generated on Thu 27 Oct 2005 - 22:03:51 BST by hypermail 2.1.8