vserver development mailing list: [vserver] State D secure mount process in Kernel 3.4.4

From: Linux VServer <uranus.lv_at_gmail.com>
Date: Tue 10 Jul 2012 - 12:43:27 BST
Message-ID: <CADR5UH62GOPdBr8ukm-aYdQW7zHvcFMmEWOronH+Dmai5WVqHA@mail.gmail.com>

Hi List,

I have state D Processes in task list and at this point no vserver restarts
are possible.

Patch: 3.4.4-vs2.3.3.5-nc
UV: 0.30.216-pre3034

I tracked that problem down and find the following traces in dmesg:

[588133.416078] INFO: task secure-mount:32496 blocked for more than 120
seconds.
[588133.423208] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables
this message.
[588133.431398] secure-mount D ffffffff8180c9a0 0 32496 19953
0x00000000
[588133.431404] ffff880938245c08 0000000000000082 ffff88110ec52de0
0000000000013880
[588133.431411] ffff880938245fd8 ffff880938244010 0000000000013880
0000000000013880
[588133.431418] ffff880938245fd8 0000000000013880 ffff8811fbf12de0
ffff88110ec52de0
[588133.431426] Call Trace:
[588133.431435] [<ffffffff8166b449>] schedule+0x29/0x70
[588133.431440] [<ffffffff8166c06d>] rwsem_down_failed_common+0xcd/0x170
[588133.431446] [<ffffffff811ee280>] ? init_once+0x20/0x20
[588133.431450] [<ffffffff8166c123>] rwsem_down_write_failed+0x13/0x20
[588133.431458] [<ffffffff8132c053>] call_rwsem_down_write_failed+0x13/0x20
[588133.431462] [<ffffffff8166a302>] ? down_write+0x32/0x40
[588133.431468] [<ffffffff8118d90a>] sget+0x36a/0x500
[588133.431474] [<ffffffff811ee6f0>] ? proc_mount+0x150/0x150
[588133.431480] [<ffffffff811ee5fa>] proc_mount+0x5a/0x150
[588133.431484] [<ffffffff8118d3d3>] mount_fs+0x43/0x210
[588133.431488] [<ffffffff811aa495>] vfs_kern_mount+0xc5/0x160
[588133.431491] [<ffffffff811aa5b2>] do_kern_mount+0x52/0x110
[588133.431499] [<ffffffff812b10b8>] ? security_capable+0x18/0x20
[588133.431503] [<ffffffff811abd01>] do_mount+0x231/0x960
[588133.431509] [<ffffffff811ac4c0>] sys_mount+0x90/0xe0
[588133.431516] [<ffffffff81674429>] system_call_fastpath+0x16/0x1b

shortly before this trace I saw the following trace:

[588013.617254] INFO: task apache2:7696 blocked for more than 120 seconds.
[588013.631799] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables
this message.
[588013.641116] apache2 D 000000000000000f 0 7696 9371
0x00000000
[588013.641123] ffff8800c8c294d8 0000000000000082 ffff880ee91e5bc0
0000000000013880
[588013.641129] ffff8800c8c29fd8 ffff8800c8c28010 0000000000013880
0000000000013880
[588013.641134] ffff8800c8c29fd8 0000000000013880 ffff8810eb132de0
ffff880ee91e5bc0
[588013.641139] Call Trace:
[588013.641148] [<ffffffff8166b449>] schedule+0x29/0x70
[588013.641153] [<ffffffff8166c06d>] rwsem_down_failed_common+0xcd/0x170
[588013.641157] [<ffffffff8166c123>] rwsem_down_write_failed+0x13/0x20
[588013.641163] [<ffffffff8132c053>] call_rwsem_down_write_failed+0x13/0x20
[588013.641167] [<ffffffff8166a302>] ? down_write+0x32/0x40
[588013.641173] [<ffffffff81139b1d>] unregister_shrinker+0x1d/0x60
[588013.641178] [<ffffffff8118d159>] deactivate_locked_super+0x49/0x90
[588013.641182] [<ffffffff8118df8a>] deactivate_super+0x4a/0x70
[588013.641187] [<ffffffff811a984a>] mntput_no_expire+0x11a/0x190
[588013.641191] [<ffffffff811a98e3>] mntput+0x23/0x40
[588013.641195] [<ffffffff811a992c>] kern_unmount+0x2c/0x40
[588013.641201] [<ffffffff812a3405>] mq_put_mnt+0x15/0x20
[588013.641204] [<ffffffff812a51af>] put_ipc_ns+0x3f/0x80
[588013.641207] [<ffffffff812a524e>] ipcns_put+0xe/0x10
[588013.641213] [<ffffffff811ed74c>] proc_evict_inode+0x6c/0x80
[588013.641216] [<ffffffff811a5813>] evict+0xb3/0x1c0
[588013.641220] [<ffffffff811a5a3c>] iput+0x11c/0x230
[588013.641223] [<ffffffff811a3a8c>] shrink_dentry_list+0x52c/0x570
[588013.641227] [<ffffffff811a40fc>] prune_dcache_sb+0x18c/0x1c0
[588013.641231] [<ffffffff8118d098>] prune_super+0x158/0x1d0
[588013.641235] [<ffffffff81139d5f>] shrink_slab+0x19f/0x2e0
[588013.641239] [<ffffffff8113c590>] do_try_to_free_pages+0x460/0x590
[588013.641243] [<ffffffff8113c94b>] try_to_free_pages+0x9b/0x120
[588013.641266] [<ffffffffa003d54d>] ?
megasas_build_and_issue_cmd+0x1cd/0x5c0 [megaraid_sas]
[588013.641272] [<ffffffff81130888>] __alloc_pages_nodemask+0x4e8/0x930
[588013.641278] [<ffffffff8108ee7b>] ? dequeue_entity+0x12b/0x310
[588013.641283] [<ffffffff810abc6d>] ? ktime_get_ts+0xad/0xe0
[588013.641290] [<ffffffff8116bbf6>] alloc_pages_vma+0xc6/0x1a0
[588013.641294] [<ffffffff811277b0>] ? sleep_on_page+0x20/0x20
[588013.641298] [<ffffffff8115ff02>] read_swap_cache_async+0xf2/0x150
[588013.641302] [<ffffffff8115ffee>] swapin_readahead+0x8e/0xd0
[588013.641307] [<ffffffff81150610>] handle_pte_fault+0x6a0/0xaf0
[588013.641312] [<ffffffff811a836e>] ? vfsmount_lock_local_unlock+0x1e/0x30
[588013.641317] [<ffffffff8119552d>] ? complete_walk+0x9d/0x120
[588013.641322] [<ffffffff81150c35>] handle_mm_fault+0x1d5/0x330
[588013.641327] [<ffffffff8166fdfa>] do_page_fault+0x14a/0x480
[588013.641331] [<ffffffff8108abf0>] ? __dequeue_entity+0x30/0x50
[588013.641336] [<ffffffff810127a9>] ? __switch_to+0x1a9/0x450
[588013.641342] [<ffffffff81085153>] ? finish_task_switch+0x53/0x100
[588013.641346] [<ffffffff8166ad49>] ? __schedule+0x3f9/0x7d0
[588013.641350] [<ffffffff8166c8e5>] page_fault+0x25/0x30

This apache process remain in state d and so i think the state d from
secure-mount is coming from the hung apache process.

The trace itself looks like that there is a deadlock in the sem handling of
vanilla or linux vserver.

I got these state D processes in Kernel versions since 3.2.

For a test i disabled CONFIG_CFS_BANDWIDTH because this new function was
enabled in kernel since 3.2. Maybe that's the missing clue.

CONFIG_BLK_CGROUP was not set, so therefore this seems not be related to
each other.

Are there any other hinds to get behind this error?

Best regadrs

uranus
Received on Tue Jul 10 12:43:39 2012