Re: [Vserver] stucked file on xfs (x86_64)

From: Pallai Roland <dap_at_mail.index.hu>
Date: Sat 11 Mar 2006 - 12:43:26 GMT
Message-Id: <1142081007.4958.49.camel@localhost.localdomain>

On Wed, 2006-03-08 at 14:29 +0100, Herbert Poetzl wrote:
> On Tue, Mar 07, 2006 at 07:30:26PM +0100, Pallai Roland wrote:
> >
> > I've a weird problem, sometimes a random file "stucks" after 1-2 weeks
> > uptime on a xfs partition within a vs. the xfs is laying on lvm2 &
> >
> > SysRQ+t - very long, copied only the stucked processes, but not all of
> > them is here, cause the 'dmesg' buffer is too small and I haven't a
> > serial console :(
 today I've got another stucked file, and this time I've a full SysRq+t
dump of the system, ask if that could help

 in the last mail I've mentioned that only the 'pdflush' is in state D
on the host, but I was wrong, 'xfssyncd' also stucked in D. both of them
is in the new sysrq+t dump

 I'm not a kernel hacker, but I've noticed that every time is exactly
one process of the stucked ones that doing a dm_request() meanwhile the
pdflush is stucked on dm_* thing too, I think, maybe it's not a dangling
inode lock in the xfs code, but some kind of deadlock in dm*-things..?
don't laugh, I said, INAKH! :)

pdflush D 0000000000000000 0 9262 11 9326 9367 (L-TLB)^@
ffff81005b743aa8 0000000000000046 0000000300000000 ffff8100761f0ed0 ^@
       0000000000000000 0000000000000000 0000000000000096 0000000000000003 ^@
       ffff81005b743a18 ffffffff8012de13 ^@
Call Trace:<ffffffff8012de13>{__wake_up+67} <ffffffff803581d3>{dm_table_unplug_all+51}^@
       <ffffffff8035605d>{dm_unplug_all+29} <ffffffff8015f290>{sync_page+0}^@
       <ffffffff803de614>{io_schedule+52} <ffffffff8015f2d8>{sync_page+72}^@
       <ffffffff803de9e1>{__wait_on_bit_lock+65} <ffffffff8015fa34>{__lock_page+164}^@
       <ffffffff8014a820>{wake_bit_function+0} <ffffffff8014a820>{wake_bit_function+0}^@
       <ffffffff8016c56a>{pagevec_lookup_tag+26} <ffffffff801aceaf>{mpage_writepages+351}^@
       <ffffffff88106250>{:xfs:linvfs_writepage+0} <ffffffff801ab580>{__sync_single_inode+112}
       <ffffffff801ab8c1>{__writeback_single_inode+417} <ffffffff80358145>{dm_table_any_congest
       <ffffffff803560b8>{dm_any_congested+72} <ffffffff80358177>{dm_table_any_congested+71}^@
       <ffffffff801abab2>{sync_sb_inodes+482} <ffffffff8014a1c0>{keventd_create_kthread+0}^@
       <ffffffff801abc15>{writeback_inodes+133} <ffffffff80166a7e>{wb_kupdate+206}^@
       <ffffffff80167510>{pdflush+0} <ffffffff80167464>{__pdflush+292}^@
       <ffffffff8016754a>{pdflush+58} <ffffffff801669b0>{wb_kupdate+0}^@
       <ffffffff8014a182>{kthread+146} <ffffffff8010ea5a>{child_rip+8}^@
       <ffffffff8014a1c0>{keventd_create_kthread+0} <ffffffff8014a0f0>{kthread+0}^@
       <ffffffff8010ea52>{child_rip+0} ^@

glftpd D ffff81006f666000 0 18518 1 18526 24134 (NOTLB)^@
ffff81006ddfba28 0000000000000086 0000000000000292 ffffffff80355fda ^@
       ffff81007ff82e00 0000000000000001 ffff8100422b2140 ffffffff80227256 ^@
       0000000000000001 ffffc200000d9040 ^@
Call Trace:<ffffffff80355fda>{dm_request+122} <ffffffff80227256>{generic_make_request+262}^@
       <ffffffff803dee98>{__down+152} <ffffffff8012dd50>{default_wake_function+0}^@
       <ffffffff803dec8a>{__down_failed+53} <ffffffff88108951>{:xfs:.text.lock.xfs_buf+25}^@
       <ffffffff88106d94>{:xfs:_pagebuf_find+372} <ffffffff88106e82>{:xfs:xfs_buf_get_flags+82}
       <ffffffff88106f8a>{:xfs:xfs_buf_read_flags+26} <ffffffff880f9feb>{:xfs:xfs_trans_read_bu
       <ffffffff880b601c>{:xfs:xfs_alloc_read_agf+108} <ffffffff880f9075>{:xfs:_xfs_trans_commi
       <ffffffff880b5a93>{:xfs:xfs_alloc_fix_freelist+291}^@
       <ffffffff880fa917>{:xfs:xfs_trans_log_inode+39} <ffffffff803deaf2>{__down_read+18}^@
       <ffffffff880b65b8>{:xfs:xfs_free_extent+152} <ffffffff880df434>{:xfs:xfs_efd_init+68}^@
       <ffffffff880fa65b>{:xfs:xfs_trans_get_efd+43} <ffffffff880c52d8>{:xfs:xfs_bmap_finish+23
       <ffffffff880e73f4>{:xfs:xfs_itruncate_finish+420} <ffffffff880ed3a1>{:xfs:xfs_log_reserv
       <ffffffff8810015e>{:xfs:xfs_inactive+558} <ffffffff8810dfb1>{:xfs:linvfs_clear_inode+161
       <ffffffff801a07d0>{clear_inode+224} <ffffffff801a18bd>{generic_delete_inode+205}^@
       <ffffffff801a1b2b>{iput+123} <ffffffff80197223>{sys_unlink+259}^@
       <ffffffff8011fd01>{ia32_sysret+0} ^@

> looks like some xfs inode lock is not released properly
> the reasons for this can be various, updating to the
> latest kernel and vserver patches might help here ...
>
> anyway, will have a more detailed look at it later.
 thanks in advice, I trying different kernels meanwhile, now I rebooted
into 2.6.15.6-vs2.1.1-rc10

--
 d
_______________________________________________
Vserver mailing list
Vserver@list.linux-vserver.org
http://list.linux-vserver.org/mailman/listinfo/vserver
Received on Sat Mar 11 12:44:01 2006
[Next/Previous Months] [Main vserver Project Homepage] [Howto Subscribe/Unsubscribe] [Paul Sladen's vserver stuff]
Generated on Sat 11 Mar 2006 - 12:44:05 GMT by hypermail 2.1.8