On Mon, Sep 04, 2006 at 03:08:43PM +0100, GarconDuMonde wrote:
> -----BEGIN PGP SIGNED MESSAGE-----
> Hash: SHA1
> i am involved in running two servers, both of which become
> unresponsive after periods of time for reasons that are unclear.
> here is some background:
> server 1.
> P4 2.8GHz, 1GB RAM, 2x120GB IDE hard drives.
> kernel - 184.108.40.206-vs2.0.2-rc18 installed 30 april 2006 (the box has
> mostly been down since this point, and we have not had an opportunity
> to update it)
relatively old kernel
> this box ran three vservers: one was in production use, one was for
> development of the same software twiki) and the third was for backups
> of a completely different site/software.
> server 2.
> dual xeon 2.8GHz, 5GB RAM, 2x160GB SATA hard drives
> kernel - was 2.6.8 when problems started, but then upgraded to
> linux-image-2.6.16-2-vserver-686 from backports.org. however,
> continued to experience the same problems.
very old kernel
> this box ran approx 15 vservers but the cpu nor the memory were ever
> maxed out. there was no indication in any logfiles on the host of what
> the problem possibly was.
> we have now done extensive testing on the hardware using memtest86,
> smartmontools and cpuburn without finding any problems. the server now
> has uptime of ~40 days using RIP and a 2.6.17 kernel
well, how does it behave with the latest stable release?
(vs2.02 for 220.127.116.11)
> * * *
> server 1 had an extensive amount of work done on it by someone
> extremely knowledgeable in linux security, but the problem could not
> be found. attempting to recreate the situation (on server 1) with
> apache bench did lead to the situation where the box would complete
> the 3 way TCP hand shake or respond to ICMP echo requests but not
> handle TCP connections any further. however, adjusting limits ('as'
> 'rss' and 'nproc') did not prevent the box from becoming unresponsive
> again as soon as it was put back into production use.
I'd suggest to test mainline (not a debian specific kernel)
> * * *
> we do not know how to proceed from here in terms of diagnosing and
> fixing the problem and making the machines once again suitable for
> production use. we are now nearing the stage where we will have to
> give up using linux-vserver unless we can solve the problem. this is
> a shame as quite a few of us have invested time in learning about
> linux-vserver. does anyone have any ideas on how to diagnose and fix
> the problem?
well, many companies do use linux-vserver in production quite
fine, (I'm using it too :) and there are no reports of issues
with the stable branch (2.01 or 2.02), but it should be no
problem to track your specific issues down .. best would be
to pay a visit to the irc channel (#vserver @ irc.oftc.net)
> incidentally, i have heard of other machines that have experienced
> similar problems.
well, did not get reported back to us (at least not that I
> if i can help with diagnosis in any way by providing more information,
> please ask. i also have some munin graphs available that demonstrate
> some of the variables (cpu, memory usage, uptime, individual vserver
> memory usage, etc), taken shortly after the hosts became unreachable -
> i.e. with the most detail of
sure, please contact me on the irc channel and we will have
a closer look at your issues ... but please also use the
latest (stable) release ...
> the hours leading up to them "stopping"
> any help is greatfully received.
> - --
> gpg --keyserver pgp.mit.edu --recv-keys 594B97C2
> Key fingerprint = 7B70 F22D F275 D111 3A04 F9EE 0E25 4944 594B 97C2
> -----BEGIN PGP SIGNATURE-----
> Version: GnuPG v18.104.22.168 (Darwin)
> -----END PGP SIGNATURE-----
> Vserver mailing list
Vserver mailing list
Received on Mon Sep 4 21:37:55 2006