From: Mark Lawrence (nomad_at_null.net)
Date: Mon 17 Nov 2003 - 18:00:57 GMT

Hi Matt,

On Mon, 17 Nov 2003, Matt Ayres wrote:
> I have 2 servers who keep crashing, sometimes every few hours... I am
> still working on a console solution so I don't know the exact place
> where the problem is.

I can perhaps provide a little bit of information with regards to this,
since I am hosted on one of those machines :)

You will see from the attached graph, that a symptom (although probably
not the cause) of each hang is a continual descreasing of free and
unbuffered memory. Swap seems to be untouched.

I have also seen an increase of the cpu (up to 60-70%) used for "system"
tasks right before each hang. I bet if you extrapolate the current usage
you could reasonably predict the next hang... which at least make the
problem reproducable :)

So it seems as if something in the virtual memory system is killing the
box, although how the vserver patch affect this I don't know because I
didn't think we actually touched the vm code.

I would theorize that tcp connects are established ok, but that the first
thing to happen in most cases is to grab some free memory which doesn't
exist. Perhaps there is then some thrashing before things come to a
complete halt.

Does anyone know what the normal kernel behaviour is with regards to
releasing cached/buffered memory in order to keep some free? What else
could cause this type of behaviour?

Cheers, Mark.

Mark Lawrence (nomad_at_null.net)

