From: Herbert P÷tzl (herbert_at_13thfloor.at)
Date: Wed 20 Aug 2003 - 13:44:47 BST
On Wed, Aug 20, 2003 at 03:45:54AM -0700, Roderick A. Anderson wrote:
> On Wed, 20 Aug 2003, Herbert P÷tzl wrote:
> > > In this case nothing I can tell. The message file was pretty slim.
> > this doesn't necessarily mean that your case is
> > with the 10%, does it?
> No but it irritates me to no end when I can't find _some_ indication of
> the problem. BTW, I just checked the system and see it is down again. A
> cron job that was scheduled for 1:00 AM did not run so it went down
> When I get into the office I'll have to see if anything shows up.
> > ahh, sorry, but I guess you'll get used to it,
> > probably very soon ...
> Sooner and sooner. With last nights crash I'm going to have to so
> something. This system runs our tracking system. The real bummer is I
> now am wondering about the hardware. We 'acquired' several systems and
> almost every one of them has had problems. Unfortunately since Linux has
> a reputation of running well on almost hardware I get these systems and
> the Windows machines are new hardware.
hmm, usually the windoof user get the hardware, linux
refuses to run stable on, not the other way round ...
this ist mostly because windoof crashes anyway, so adding
some crash probability almost always goes unnoticed ;)
linux tends to push hardware much further than windoof, so
often bad hw designs will result in malfunction ...
> I have other vserver systems running older kernels that stay up for
> months (usually they only get rebooted when new hardware is added or a
> cabinet gets reorganized.) So I can't say at this time if it is the newer
> kernels or the hardware. It could also be the programs/prcesses I'm
> running but my impression was a veserver process is pretty much
> isolated/protected from taking the whole system down if there was a memory
> leak or some such. Is this (mostly) true?
to take a system down, one kernel line is enough,
but as it seems relatively stable (I'm speaking of 2.4.22
and ctx17a) for others, I tend to believe its something
in your hw/config/setup which triggers something in the
kernel (or hardware), unless proven otherwise ...
> > well, a day is a good start, usually people tend
> > to start memtest, sit around for 5 to 10 minutes,
> > until the fascination of increasing numbers and
> > changing patterns subsides, then abort the test,
> > only to claim "I did a memtest, everything was fine!"
> Well I have sometimes ran it for a little as 3-4 hours.
> > you should look at it from a more optimistic
> > perspective: if you are prepared, and fate decides
> > not to strike your system down, isn't that a win too?
> I'm a firm believe in the 'being prepared' talisman. If you have the
> tools/supplies to survive you almost never need them.
> Well I didn't get them in place soon enough this time. As mentioned above
> the server is down again.
> Thanks again,
> "Open Source Software - Sometimes you get more than you paid for..."