Re: [vserver] [resend] vs2.2.0.x and scheduler getting stuck

From: Daniel Hokka Zakrisson <daniel_at_hozac.com>
Date: Sun 18 May 2008 - 15:24:51 BST
Message-ID: <34270.192.168.102.6.1211120691.squirrel@intranet>

Grzegorz Nosek wrote:
> 2008/5/17 Daniel Hokka Zakrisson <daniel@hozac.com>:
>> Grzegorz Nosek wrote:
>>> Hi all,
>>>
>>> (sorry if you receive this message twice)
>>>
>>> I've been experiencing some weird hangs (boom and it's dead, no panic,
>>> not even a softlockup or lockdep warning). It happens about once a
>>> month
>>> (of course, in production only), usually under some load (I suspected
>>> I/O but it might be simple CPU usage too). It happened on several very
>>> different machines (2-way pentium3, 4-way opteron 270).
>>>
>>> After enabling the nmi watchdog I could trace it back to schedule(), or
>>> at least the nmi watchdog felt the need to kill the machine right in
>>> the
>>> middle of a schedule() call.
>>
>> So I guess it can happen on vanilla kernels as well... This was noticed
>> and fixed on PlanetLab, and the patches are already fed upstream (see
>> http://vserver.13thfloor.at/Experimental/delta-sched-fix0{4,5,6}.diff),
>> to
>> be included in the next release.
>
> The fix06.diff patch looks like a proper fix, although (guessing) it
> could still lock the scheduler when all contexts are paused (is this
> even possible? probably not).

Paused processes are placed on the hold queue.

> I'm not really sure how it can happen on vanilla kernels, as it's a
> loop added by the vserver patch (both the label and the goto). Care to
> enlighten me? :)

Vanilla referred to vanilla Linux-VServer, i.e. without the PlanetLab
patches.

-- 
Daniel Hokka Zakrisson
Received on Sun May 18 15:25:08 2008
[Next/Previous Months] [Main vserver Project Homepage] [Howto Subscribe/Unsubscribe] [Paul Sladen's vserver stuff]
Generated on Sun 18 May 2008 - 15:25:11 BST by hypermail 2.1.8