About this list Date view Thread view Subject view Author view Attachment view

From: Herbert Pötzl (herbert_at_13thfloor.at)
Date: Wed 20 Aug 2003 - 00:26:10 BST


On Tue, Aug 19, 2003 at 02:35:03PM -0700, Roderick A. Anderson wrote:
> As a user of vservers I do not get into the code and barely follow some of
> the threads I find it difficult to mention when I have kernel crashes as I
> can not _really_ explain what was going on when they happened.

about 90% of the kernel 'crashes' do not need any
further explanation (besides a crash report), about
what the user was doing or what was going on ...

> I have turned on kernel logging into /var/log/kernel but do not know what
> else I need to or should do to to get better information for when I do
> have a crash. Case in point this morning or rather last night. Suddenly
> the system froze up. Would not respond to the keyboard and I had to press

freezes and lockups are actually not kernel crashes,
but, if you want to get something useful in such a case
you have to do some preparations, namely

 - setup nmi_watchdog (this will cause a kernel oops
   when the kernel is not responding ...
 - configure magic sys-req (you'll be able to activate
   some kernel task/process/memory info in such a
   case)

 - use lkcd, or a serial line to capture the kernel
   oops (handwritten oopses are as much fun as
   screenshots done with your webcam :( )

> the reset/power switch to get it to come back up. The last messages in
> /var/log/messages before the hang/crash were of the form
>
> kernel: smb_retry: successful ...
>
> then the reboot messages. Nothing identifiable as a cause.
>
> I'm running 2.4.21ctx-17a because I thought the problem was with the
> eepro100 NIC driver and a thread on this list indicated the e100/e1000
> would be a better choice. Hardware seems solid and the crashes are _too_

huh? what does e100/e1000 have to do with 2.4.21?
as fas as I remember those where in 2.4.14 and for
sure are in 2.4.22-rc2 ...

> random for my liking to make me think it is a hardware issue. I did run
> memtest86 on the system before I put it online.

for how long?

> Where I am trying to take this is; what information is needed to help
> determine the cause of the crashes so I (we) can point a finger in the
> right direction - hardware, software, wetware, or the ctx kernel and
> friends. Not to point fingers as much as to lend a hand to the
> developers.

first step for any further investigation will be
some kind of kernel oops, parsed by ksymoops with
the correct kernel System.map ...

further useful information (after a captured oops)
will be a detailed system description, and some
hardware tests ...

HTH,
Herbert

> Cheers,
> Rod
> --
> "Open Source Software - Sometimes you get more than you paid for..."


About this list Date view Thread view Subject view Author view Attachment view
[Next/Previous Months] [Main vserver Project Homepage] [Howto Subscribe/Unsubscribe] [Paul Sladen's vserver stuff]
Generated on Wed 20 Aug 2003 - 00:56:27 BST by hypermail 2.1.3