[vserver] Kernel bug 2.6.29.6 grsec+vserver

From: Ed W <lists_at_wildgooses.com>
Date: Wed 08 Jul 2009 - 10:16:47 BST
Message-ID: <4A5463FF.4010807@wildgooses.com>

Hi, hoping someone can help. I now have a functioning 2.6.29.5-vs
kernel which is booting fine and dandy. I then transfer that .config
over to a freshly patched 2.6.29.6 with the recent grsec+vs patch linked
from the vserver website and still that boots up fine .... until ...

...as soon as I run "vserver xyz start", I get a near immediate kernel
bug as below:

Jul 8 00:26:28 quad [ 35.808607] ------------[ cut here ]------------
Jul 8 00:26:28 quad [ 35.808776] kernel BUG at mm/slab.c:604!
Jul 8 00:26:28 quad [ 35.808932] invalid opcode: 0000 [#1] SMP
Jul 8 00:26:28 quad [ 35.809130] last sysfs file:
/sys/devices/platform/coretemp.3/name
Jul 8 00:26:28 quad [ 35.809288] CPU 1
Jul 8 00:26:28 quad [ 35.809457] Modules linked in: i2c_i801
Jul 8 00:26:28 quad [ 35.809647] Pid: 16, comm: events/1 Not tainted
2.6.29.6-grsec2.1.14-vs2.3.0.36.14 #1 X48-DQ6
Jul 8 00:26:28 quad [ 35.809943] RIP: 0010:[<ffffffff802b0c13>]
[<ffffffff802b0c13>] 0xffffffff802b0c13
Jul 8 00:26:28 quad [ 35.810255] RSP: 0018:ffff88012ed0ddf0 EFLAGS:
00010046
Jul 8 00:26:28 quad [ 35.810412] RAX: ffffe20000023af0 RBX:
ffff88012fc1c080 RCX: ffff88012fc29dc0
Jul 8 00:26:28 quad [ 35.810571] RDX: ffffe20000023af0 RSI:
ffff88012c129000 RDI: 0000000000a32e80
Jul 8 00:26:28 quad [ 35.810730] RBP: ffff88012ec354c0 R08:
0000000000000000 R09: 00000000fffefde6
Jul 8 00:26:28 quad [ 35.810889] R10: 0000000000000000 R11:
0000000000000001 R12: ffffffff80a32e80
Jul 8 00:26:28 quad [ 35.811048] R13: 0000000000000015 R14:
0000000000000018 R15: 0000000000000000
Jul 8 00:26:28 quad [ 35.811207] FS: 0000000000000000(0000)
GS:ffff88012fa55340(0000) knlGS:0000000000000000
Jul 8 00:26:28 quad [ 35.811503] CS: 0010 DS: 0018 ES: 0018 CR0:
000000008005003b
Jul 8 00:26:28 quad [ 35.811660] CR2: ffffffffff600400 CR3:
000000012a06d000 CR4: 00000000000026e0
Jul 8 00:26:28 quad [ 35.811819] DR0: 0000000000000000 DR1:
0000000000000000 DR2: 0000000000000000
Jul 8 00:26:28 quad [ 35.811978] DR3: 0000000000000000 DR6:
00000000ffff0ff0 DR7: 0000000000000400
Jul 8 00:26:28 quad [ 35.812147] Process events/1 (pid: 16,
threadinfo ffff88012ed0c000, task ffff88012ecdc680)
Jul 8 00:26:28 quad [ 35.812443] Stack:
Jul 8 00:26:28 quad [ 35.812522] 0000000000000000 ffff88012ec35418
0000000000000018 ffff88012ec35400
Jul 8 00:26:28 quad [ 35.812522] ffff88012fc29dc0 0000000000000000
ffff88012fc1c080 ffffffff802b0e79
Jul 8 00:26:28 quad [ 35.812522] ffff88012fc1c080 ffff88012ec453c0
ffff88012fc29dc0 ffff88012fc1c080
Jul 8 00:26:28 quad [ 35.812522] Call Trace:
Jul 8 00:26:28 quad [ 35.812522] [<ffffffff802b0e79>] ?
0xffffffff802b0e79
Jul 8 00:26:28 quad [ 35.812522] [<ffffffff802b2ba9>] ?
0xffffffff802b2ba9
Jul 8 00:26:28 quad [ 35.812522] [<ffffffff8025c310>] ?
0xffffffff8025c310
Jul 8 00:26:28 quad [ 35.812522] [<ffffffff8025c2a5>] ?
0xffffffff8025c2a5
Jul 8 00:26:28 quad [ 35.812522] [<ffffffff8025c3f2>] ?
0xffffffff8025c3f2
Jul 8 00:26:28 quad [ 35.812522] [<ffffffff8025faa5>] ?
0xffffffff8025faa5
Jul 8 00:26:28 quad [ 35.812522] [<ffffffff8025faa5>] ?
0xffffffff8025faa5
Jul 8 00:26:28 quad [ 35.812522] [<ffffffff8025f420>] ?
0xffffffff8025f420
Jul 8 00:26:28 quad [ 35.812522] [<ffffffff8021e67a>] ?
0xffffffff8021e67a
Jul 8 00:26:28 quad [ 35.812522] [<ffffffff8025f3e3>] ?
0xffffffff8025f3e3
Jul 8 00:26:28 quad [ 35.812522] [<ffffffff8021e670>] ?
0xffffffff8021e670
Jul 8 00:26:28 quad [ 35.812522] Code: e0 59 f8 ff 48 c1 e8 0c 48 ba
00 00 00 00 00 e2 ff ff 48 6b c0 38 48 01 d0 f6 40 01 40 48 89 c2 74 04
48 8b 50 10 f6 02 80 75 04 <0f> 0b eb fe 48 8b 72 30 49 63 c7 48 8b 8c
c3 68 01 00 00 48 8b
Jul 8 00:26:28 quad [ 35.812522] RIP [<ffffffff802b0c13>]
0xffffffff802b0c13
Jul 8 00:26:28 quad [ 35.812522] RSP <ffff88012ed0ddf0>
Jul 8 00:26:28 quad [ 35.812522] ---[ end trace 34d4c4ebc6cdd1ce ]---

Just for reference on my source code the mm/slab.c:604 line is the
BUG_ON below:

static inline struct slab *page_get_slab(struct page *page)
{
        BUG_ON(!PageSlab(page));
        return (struct slab *)page->lru.prev;
}

Being a glutton for punishment I tried again under a more controlled
test (and broke my raid...). Very similar kernel bug again:

Jul 8 00:33:17 quad [ 197.808559] ------------[ cut here ]------------
Jul 8 00:33:17 quad [ 197.808727] kernel BUG at mm/slab.c:604!
Jul 8 00:33:17 quad [ 197.808883] invalid opcode: 0000 [#1] SMP
Jul 8 00:33:17 quad [ 197.809081] last sysfs file:
/sys/devices/platform/coretemp.3/temp1_input
Jul 8 00:33:17 quad [ 197.809240] CPU 1
Jul 8 00:33:17 quad [ 197.809409] Modules linked in: i2c_i801
Jul 8 00:33:17 quad [ 197.809598] Pid: 16, comm: events/1 Not tainted
2.6.29.6-grsec2.1.14-vs2.3.0.36.14 #1 X48-DQ6
Jul 8 00:33:17 quad [ 197.809895] RIP: 0010:[<ffffffff802b0c13>]
[<ffffffff802b0c13>] 0xffffffff802b0c13
Jul 8 00:33:17 quad [ 197.810206] RSP: 0018:ffff88012ed0ddf0 EFLAGS:
00010046
Jul 8 00:33:17 quad [ 197.810363] RAX: ffffe20000023af0 RBX:
ffff88012fc1c080 RCX: 0000000000000000
Jul 8 00:33:17 quad [ 197.810522] RDX: ffffe20000023af0 RSI:
ffff88012ec35418 RDI: 0000000000a32e80
Jul 8 00:33:17 quad [ 197.810681] RBP: ffff88012ec35418 R08:
0000000000000000 R09: 0000000000000286
Jul 8 00:33:17 quad [ 197.810840] R10: 0000000000000000 R11:
0000000000000001 R12: ffffffff80a32e80
Jul 8 00:33:17 quad [ 197.811000] R13: 0000000000000000 R14:
0000000000000001 R15: 0000000000000000
Jul 8 00:33:17 quad [ 197.811159] FS: 0000000000000000(0000)
GS:ffff88012fa55340(0000) knlGS:0000000000000000
Jul 8 00:33:17 quad [ 197.811455] CS: 0010 DS: 0018 ES: 0018 CR0:
000000008005003b
Jul 8 00:33:17 quad [ 197.811612] CR2: 000075dfcdd59000 CR3:
0000000129c7f000 CR4: 00000000000026e0
Jul 8 00:33:17 quad [ 197.811771] DR0: 0000000000000000 DR1:
0000000000000000 DR2: 0000000000000000
Jul 8 00:33:17 quad [ 197.811930] DR3: 0000000000000000 DR6:
00000000ffff0ff0 DR7: 0000000000000400
Jul 8 00:33:17 quad [ 197.812099] Process events/1 (pid: 16,
threadinfo ffff88012ed0c000, task ffff88012ecdc680)
Jul 8 00:33:17 quad [ 197.812395] Stack:
Jul 8 00:33:17 quad [ 197.812522] 0000000000000000 ffff88012ec35418
0000000000000001 ffff88012ec35400
Jul 8 00:33:17 quad [ 197.812522] ffff88012fc29dc0 0000000000000000
ffff88012fc1c080 ffffffff802b0e79
Jul 8 00:33:17 quad [ 197.812522] ffff88012fc1c080 ffff88012ec453c0
ffff88012fc29dc0 ffff88012fc1c080
Jul 8 00:33:17 quad [ 197.812522] Call Trace:
Jul 8 00:33:17 quad [ 197.812522] [<ffffffff802b0e79>] ?
0xffffffff802b0e79
Jul 8 00:33:17 quad [ 197.812522] [<ffffffff802b2ba9>] ?
0xffffffff802b2ba9
Jul 8 00:33:17 quad [ 197.812522] [<ffffffff8025c310>] ?
0xffffffff8025c310
Jul 8 00:33:17 quad [ 197.812522] [<ffffffff8025c2a5>] ?
0xffffffff8025c2a5
Jul 8 00:33:17 quad [ 197.812522] [<ffffffff8025c3f2>] ?
0xffffffff8025c3f2
Jul 8 00:33:17 quad [ 197.812522] [<ffffffff8025faa5>] ?
0xffffffff8025faa5
Jul 8 00:33:17 quad [ 197.812522] [<ffffffff8025faa5>] ?
0xffffffff8025faa5
Jul 8 00:33:17 quad [ 197.812522] [<ffffffff8025f420>] ?
0xffffffff8025f420
Jul 8 00:33:17 quad [ 197.812522] [<ffffffff8021e67a>] ?
0xffffffff8021e67a
Jul 8 00:33:17 quad [ 197.812522] [<ffffffff8025f3e3>] ?
0xffffffff8025f3e3
Jul 8 00:33:17 quad [ 197.812522] [<ffffffff8021e670>] ?
0xffffffff8021e670
Jul 8 00:33:17 quad [ 197.812522] Code: e0 59 f8 ff 48 c1 e8 0c 48 ba
00 00 00 00 00 e2 ff ff 48 6b c0 38 48 01 d0 f6 40 01 40 48 89 c2 74 04
48 8b 50 10 f6 02 80 75 04 <0f> 0b eb fe 48 8b 72 30 49 63 c7 48 8b 8c
c3 68 01 00 00 48 8b
Jul 8 00:33:17 quad [ 197.812522] RIP [<ffffffff802b0c13>]
0xffffffff802b0c13
Jul 8 00:33:17 quad [ 197.812522] RSP <ffff88012ed0ddf0>
Jul 8 00:33:17 quad [ 197.812522] ---[ end trace e48c562a0eb9460c ]---

Obviously fairly unhelpful without a call trace, but can given this is a
live server and it's trashing the raid every time, I'm more than a
little nervous to keep provoking it at this stage

For reference 2.6.29.2 +vs+hardened works fine. 2.6.29.5+vs works
fine. 2.6.29.6+vs+hardened gives a kernel bug. Intel x48 board + quad
core2 and compiled as 64bit. Clearly it's either a regression in
2.6.29.6 vanilla, or more likely there is a conflict with the grsec
integration in this patch?

Can anyone suggest a way to proceed?

Cheers

Ed W
Received on Wed Jul 8 10:17:15 2009

[Next/Previous Months] [Main vserver Project Homepage] [Howto Subscribe/Unsubscribe] [Paul Sladen's vserver stuff]
Generated on Wed 08 Jul 2009 - 10:17:16 BST by hypermail 2.1.8