Re: [vserver] Kernel 3.18.7 hangs when inserting netconsle module on DELL M620 VRTX Blade

From: Corey Wright <undefined_at_pobox.com>
Date: Tue 07 Apr 2015 - 15:11:00 BST
Message-Id: <20150407091100.9a66b64d33bb5b4c576a534a@pobox.com>

On Tue, 07 Apr 2015 14:31:51 +0200
Urban Loesch <bind@enas.net> wrote:

> Hi,
>
> I'have installed a new DELL VRTX M620 Blade with self compiled kernel
> 3.18.7 and the recent linux-vserver patch.
>
> After system startup I tried to activate the kernel netconsole with remote
> logging enabled.
>
> I executed the following command and the shell i issued it becomes
> unresponsive and it hangs.
> # modprobe netconsole netconsole="@/eth0,514@10.1.10.197/00:10:db:fc:60:0c"
>
> The system load increases slowly and the CPU #3 uses 100% of soft
> irq. Only a soft reset witohut loading the netconsole module solves the
> issue.
>
> I found the following error in the kernel log:
>
> ...
> Apr 7 12:33:23 server2 kernel: [ 180.582285] ------------[ cut here ]------------
> Apr 7 12:33:23 server2 kernel: [ 180.582292] WARNING: CPU: 3 PID: 2978 at kernel/softirq.c:147 __local_bh_enable_ip+0x72/0xa0()
> Apr 7 12:33:23 server2 kernel: [ 180.582303] CPU: 3 PID: 2978 Comm: modprobe Not tainted 3.18.7-vs2.3.7.4-em64t-efigpt #2
> Apr 7 12:33:23 server2 kernel: [ 180.582304] Hardware name: Dell Inc. PowerEdge M620/0NJVT7, BIOS 2.4.3 07/02/2014
> Apr 7 12:33:23 server2 kernel: [ 180.582306] 0000000000000009 ffff881fc4a239e8 ffffffff817438ac 000000001b361b36
> Apr 7 12:33:23 server2 kernel: [ 180.582307] 0000000000000000 ffff881fc4a23a28 ffffffff81051f8c ffffffff81f49040
> Apr 7 12:33:23 server2 kernel: [ 180.582308] 0000000000000200 ffff881fcf0f0dd4 ffff881fcf0f0d58 0000000000000000
> Apr 7 12:33:23 server2 kernel: [ 180.582308] Call Trace:
> Apr 7 12:33:23 server2 kernel: [ 180.582313] [<ffffffff817438ac>] dump_stack+0x46/0x58
> Apr 7 12:33:23 server2 kernel: [ 180.582315] [<ffffffff81051f8c>] warn_slowpath_common+0x8c/0xc0
> Apr 7 12:33:23 server2 kernel: [ 180.582316] [<ffffffff81051fda>] warn_slowpath_null+0x1a/0x20
> Apr 7 12:33:23 server2 kernel: [ 180.582318] [<ffffffff81055fa2>] __local_bh_enable_ip+0x72/0xa0
> Apr 7 12:33:23 server2 kernel: [ 180.582322] [<ffffffff8174992b>] _raw_spin_unlock_bh+0x1b/0x20
> Apr 7 12:33:23 server2 kernel: [ 180.582335] [<ffffffffa00b7f33>] bnx2x_poll+0x83/0x3c0 [bnx2x]
> Apr 7 12:33:23 server2 kernel: [ 180.582338] [<ffffffff81667360>] netpoll_poll_dev+0x110/0x1b0
> Apr 7 12:33:23 server2 kernel: [ 180.582340] [<ffffffff81667567>] netpoll_send_skb_on_dev+0x167/0x240
> Apr 7 12:33:23 server2 kernel: [ 180.582341] [<ffffffff81667912>] netpoll_send_udp+0x2d2/0x400
> Apr 7 12:33:23 server2 kernel: [ 180.582343] [<ffffffffa017985f>] write_msg+0xcf/0x110 [netconsole]
> Apr 7 12:33:23 server2 kernel: [ 180.582347] [<ffffffff8109e22b>] call_console_drivers.constprop.27+0x9b/0x100
> Apr 7 12:33:23 server2 kernel: [ 180.582348] [<ffffffff8109f28a>] console_unlock+0x3ba/0x440
> Apr 7 12:33:23 server2 kernel: [ 180.582350] [<ffffffff810a062a>] register_console+0x29a/0x360
> Apr 7 12:33:23 server2 kernel: [ 180.582351] [<ffffffffa0190000>] ? 0xffffffffa0190000
> Apr 7 12:33:23 server2 kernel: [ 180.582353] [<ffffffffa01901c5>] init_netconsole+0x1c5/0x1000 [netconsole]
> Apr 7 12:33:23 server2 kernel: [ 180.582356] [<ffffffff810002dc>] do_one_initcall+0x8c/0x1c0
> Apr 7 12:33:23 server2 kernel: [ 180.582359] [<ffffffff81180ca2>] ? __vunmap+0xc2/0x110
> Apr 7 12:33:23 server2 kernel: [ 180.582361] [<ffffffff810d7e7d>] load_module+0x1dbd/0x25b0
> Apr 7 12:33:23 server2 kernel: [ 180.582363] [<ffffffff810d4660>] ? show_initstate+0x60/0x60
> Apr 7 12:33:23 server2 kernel: [ 180.582365] [<ffffffff8174ba1f>] ? page_fault+0x1f/0x30
> Apr 7 12:33:23 server2 kernel: [ 180.582366] [<ffffffff810d870a>] SyS_init_module+0x9a/0xc0
> Apr 7 12:33:23 server2 kernel: [ 180.582368] [<ffffffff8174a112>] system_call_fastpath+0x12/0x17
> Apr 7 12:33:23 server2 kernel: [ 180.582369] ---[ end trace f22e6c017282076b ]---
> ...
>
> The system runs with 256GB RAM:
> # free -m
> total used free shared buffers cached
> Mem: 257918 1834 256084 0 19 44
> -/+ buffers/cache: 1770 256148
> Swap: 7627 0 7627
>
> And has 2 six-core cpu's:
> # lscpu
> Architecture: x86_64
> CPU op-mode(s): 32-bit, 64-bit
> Byte Order: Little Endian
> CPU(s): 24
> On-line CPU(s) list: 0-23
> Thread(s) per core: 2
> Core(s) per socket: 6
> Socket(s): 2
> NUMA node(s): 2
> Vendor ID: GenuineIntel
> CPU family: 6
> Model: 62
> Stepping: 4
> CPU MHz: 2599.966
> BogoMIPS: 5200.39
> Virtualization: VT-x
> L1d cache: 32K
> L1i cache: 32K
> L2 cache: 256K
> L3 cache: 15360K
> NUMA node0 CPU(s): 0,2,4,6,8,10,12,14,16,18,20,22
> NUMA node1 CPU(s): 1,3,5,7,9,11,13,15,17,19,21,23
>
>
> Currently there is running a memtest to be sure that there is no ram
> issue. But it takes a while....
>
> A strange thing: the second M620 blade which is installed on the VRTX has
> no problems with inserting the netconsole module. At least until today.
>
> Have you an idea how I can solve this?

https://git.kernel.org/cgit/linux/kernel/git/stable/linux-stable.git/commit/?h=linux-3.18.y&id=e3b175f60e9c79f559d14f76590a701b62c583c4

that commit is highly suspect (as solving your problem) because it was
committed to 3.18.8, while you are running 3.18.7, and it involves polling
in the bnx2x driver, while bnx2x_poll is in your stack trace. that and
netconsole works fine for me here on 3.18.11-vs2.3.7.4 (in my linux-vserver
test virtualbox vm; i don't use netconsole).

if you are running 3.18.7 because that's the latest 3.18 version a
linux-vserver patch was specifically released for, then i can assure you,
though you might have some patch hunks apply with offsets (and maybe even a
little fuzzing; can't remember), it works just fine applied to later 3.18
releases (for others and myself; if not, then i usually email a fix-up patch
to the mailing list).

for future reference, please try to replicate the problem on a
non-vserver-patched kernel (or if you did that already, then please state
it), to help isolate it to linux-vserver.

corey

--
undefined@pobox.com
> Many thanks
> Urban Loesch
Received on Tue Apr 7 15:11:19 2015
[Next/Previous Months] [Main vserver Project Homepage] [Howto Subscribe/Unsubscribe] [Paul Sladen's vserver stuff]
Generated on Tue 07 Apr 2015 - 15:11:19 BST by hypermail 2.1.8