vserver development mailing list: Re: [vserver] Kernel panic with 2.4.20ctx-16

Re: [vserver] Kernel panic with 2.4.20ctx-16:
vserver development mailing list
[Next/Previous Months] [Main vserver Project Homepage] [Howto Subscribe/Unsubscribe] [Paul Sladen's vserver stuff]

About this list	Date view	Thread view	Subject view	Author view	Attachment view

From: Herbert Poetzl (herbert_at_13thfloor.at)
Date: Mon 10 Feb 2003 - 13:29:18 GMT

Previous message: Christoph Kuhles: "Re: [vserver] IP routing"
In reply to: Sam Vilain: "Re: [vserver] Kernel panic with 2.4.20ctx-16"
Next in thread: Sam Vilain: "Re: [vserver] Kernel panic with 2.4.20ctx-16"
Next in thread: John Goerzen: "[vserver] Re: Kernel panic with 2.4.20ctx-16"
Reply: Sam Vilain: "Re: [vserver] Kernel panic with 2.4.20ctx-16"

On Tue, Feb 11, 2003 at 08:07:14AM +1300, Sam Vilain wrote:
> On Thu, 06 Feb 2003 23:03, you wrote:
> > On Wed, Feb 05, 2003 at 07:55:19PM +0100, Herbert Poetzl wrote:
> > > On Wed, Feb 05, 2003 at 04:46:21PM +0000, Paul Sladen wrote:
> > > > On Mon, 3 Feb 2003, John Goerzen wrote:
> > >
> > > Justin M Kuntz reported a kernel oops in
> > > sched.c 570 on a 2.4.20 ctx16 with reiserfs
> > > on january 01 2003, so this seems to be
> > > the same race ...
> >
> > Hmm, i'm also using reiserfs on the server which crashed, it might be
> > related.
> >
> > John, are you using reiserfs ?
>
> Hmm, funny you should suspect reiserfs so quickly. You have good reason.
>
> As I've recently become painfully aware, reiserfs can easily break under
> not so unusual circumstances. Though I used to swear by it, I have in 2
> years or so of using it had five unexplained data corruption incidents
> running so-called `stable' versions since early 2.4 days, which is five
> more than all other UNIX filesystems I've used combined. 3 of these have
> been following a system crash, when reiserfs's journalling failed. One of
> these resulted in a complete loss of the filesystem structure, due to the
> inadequacy of the `reiserfsck' tool.
>
> In addition to data corruption, it's not all that hard to create a
> directory structure that even root cannot read; I've just managed to
> create one, and all I was doing was duplicating ~25% of the directory
> structure using an analogue of `cp -al'. Reiserfs really cracks under
> pressure, and that's the last thing you want a filesystem to do!
>
> With these problems under high load, it's hard to think of a truly useful
> application for reiserfs. It really is still experimental as hell; the
> version in 2.4.20 seems particularly bad. Best to stick with ext3/ext2

Hans Reiser will hate you for that *G* ...

> (with the directory hashing patch if you need it). Or try your luck with

hmm, hmm, should I mention that the change to the
ext3 htree extension (which is part of the latest
ext3 versions) easily wiped several partitions,
because the e2fsck tools wasn't up to date ...

> xfs/jfs if you really need the speed.

seems you have detailed information about the
speed/load issues compared between xfs, jfs, reiser
and ext3? please share with us!

> Check out this e-mail seen on the reiserfs list:
> ----
> [... talking about a crash ...]
> And now I can reliably reproduce it. It has nothing to do with MD,
> linear, raid, SMP, or unclean shutdowns.
>
> I can reproduce this bug on a plain IDE disk partition in about three
> hours on Linux 2.4.20 (compiled for SMP but running on UP, full .config
> and system details available on request). My test system has about 4 gigs
> under /etc, /usr, and /var, /dev/hdc2 is 25GB, and there is 1G of swap.
>
>
>
>
> BEGIN cut-and-paste-into-a-root-shell
>
> # Create an empty filesystem:
>
> mkreiserfs -f -f /dev/hdc2
> mount /dev/hdc2 /test
> cd /test
>
> # Script used to control the load average. Note that as written the loops
> # below will keep spawning new processes, so we need some way to throttle
> # them. Change the '-lt 10' to another number to change the number
> # of processes.
>
> cat <<'LC' > loadcheck && chmod 755 loadcheck
> #!/bin/sh
> read av1 av5 av15 rest < /proc/loadavg
> echo -n "Load Average: $av1 ... "
> av1=${av1%.*}
> if [ $av1 -lt 10 ]; then
> echo OK
> exit 0
> else
> echo "Whoa, Nellie!"
> exit 1
> fi
> LC
>
> # Create directories used by test
> mkdir foo bar
>
> # Start up some rsyncs. I use /etc, /usr, and /var because there's a
> # good mixture of files with some hardlinks between them, and on a normal
> # Linux system some of them change from time to time.
>
> while sleep 1m; do
> ./loadcheck || continue;
> for x in usr etc var; do
> rsync -avxHS --delete /$x/. foo/$x/. &
> done;
> done &
>
> # Start up some cp -al's and rm -rf's. Note there are two concurrent
> # sets of 'cp's and two concurrent sets of 'rm's, and each of those
> # has different instances of 'cp' and 'rm' running at different times.
> for x in 1 2; do
> while sleep 1m; do
> ./loadcheck || continue;
> cp -al foo bar/`date +%s` &
> done &
> while sleep 1m; do
> ./loadcheck || continue;
> for x in bar/*; do
> rm -rf $x;
> sleep 1m;
> done &
> done &
> done &
>
> END cut-and-paste-into-a-root-shell
>
>
>
>
> rm and occasionally cp will frequently complain about "No such file
> or directory". This is normal. After about 3 hours, the following
> non-normal messages appear:
>
> readlink lib/R/library/base/help/contrasts: Permission denied
> readlink lib/R/library/base/html/hsv.html: Permission denied
> rm: cannot remove
> `bar/1042550428/usr/src/kernel-source-2.4.20-zb-586-smp/drivers/net/appletalk/ltpc.o':
> Permission denied
> rm: cannot remove
> `bar/1042550428/usr/src/kernel-source-2.4.20-zb-586-smp/drivers/net/aironet4500_proc.c':
> Permission denied
> cp: cannot stat
> `foo/usr/src/kernel-source-2.4.20-zb-586-smp/drivers/net/e1000/.e1000_ethtool.o.flags':
> Permission denied
> cp: cannot stat
> `foo/usr/src/kernel-source-2.4.20-zb-586-smp/drivers/net/.eepro.o.flags':
> Permission denied
>
> This needs a 'reiserfsck --fix-fixable' to fix.
>
> It looks to me like there may be some sort of locking bug triggered by
> concurrent link/unlink/rename calls, but I'm not even a filesystem expert,
> much less a reiserfs expert. ;-)
>
> --
> Sam Vilain, sam_at_vilain.net
>
> To be sure of hitting the target, shoot first, and call whatever you
> hit the target.
> ASHLEIGH BRILLIANT

Previous message: Christoph Kuhles: "Re: [vserver] IP routing"
In reply to: Sam Vilain: "Re: [vserver] Kernel panic with 2.4.20ctx-16"
Next in thread: Sam Vilain: "Re: [vserver] Kernel panic with 2.4.20ctx-16"
Next in thread: John Goerzen: "[vserver] Re: Kernel panic with 2.4.20ctx-16"
Reply: Sam Vilain: "Re: [vserver] Kernel panic with 2.4.20ctx-16"

About this list	Date view	Thread view	Subject view	Author view	Attachment view

[Next/Previous Months] [Main vserver Project Homepage] [Howto Subscribe/Unsubscribe] [Paul Sladen's vserver stuff]
Generated on Mon 10 Feb 2003 - 13:51:14 GMT by hypermail 2.1.3