From: Mario Lorenz (ml-vserv_at_vdazone.org)
Date: Wed 04 Aug 2004 - 17:32:55 BST
Am 04. Aug 2004, um 11:16:25 schrieb Ehab Heikal:
> I know this is not the core of this list but could you elaborate on how
> is hardware bad these days. What kinds of tests do you run to reduce
> this. I see that you have very very valueable know-how and would really
> appreciate it :)
The quick version goes like this: Any hardware nowadays, no matter the
brand, is made with one objective in mind: If you can save half a
cent somewhere, do it. (and then its just economies of scale...).
This results in electrical and mechanical specs being streched to
the limit and beyond.
Which in turn results in instable sytems, eg. things where you have difficulty
pointing your finger at something. Random bit errors in the RAM, only every
other day. Mechanical, especially partial, fan failures (fan spins, but
slowly) - with the CPU getting too hot, but only if you put some load on
In the last couple years, increasing numbers of mainboards have their
electrolytic capacitors leaking (several rumours - one is its plain
overstretching & quick wearout, other rumours include industry espionage
of some formula that was incomplete).
Any important system I put in production has therefore
- ECC RAM
- Dual harddrives (RAID), if possible hotswap
- Dual CPU Fans if possible
- Hardware monitoring (Temp, fan RPM) if possible
- a 24 hour load test using cpuburn
- a 24 hour run of memtest86 (www.memtest86.com)
- a 24 hour general load test (infinite loop compiling kernels).
A noticeable percentage of machines do not pass these tests, and they
better fail now and not at some 2:30 a.m. on weekends.
-- Mario Lorenz Internet: <ml_at_vdazone.org> Ham Radio: DL5MLO_at_OK0PKL.#BOH.CZE.EU Remember: In god we trust -- all others we polygraph. -- Jim Christy, Assistant for law enforcement, US Air Force _______________________________________________ Vserver mailing list Vserver_at_list.linux-vserver.org http://list.linux-vserver.org/mailman/listinfo/vserver