Re: [vserver] Exploding Load in v2.3

From: Cryptronic <mail_at_cryptronic.de>
Date: Fri 22 Jan 2010 - 09:32:42 GMT
Message-ID: <4B5970BA.6000603@cryptronic.de>

-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1

On 01/21/10 19:43, Kyle Bader wrote:
>> this is the context when every thing is fine, because we until this thread
>> we didn't realize that the server swaps. Because shortly before the crash
>> our monitoring reported 0 bytes swap used. And at exploding time there
>> were 4019584k of swap in use, so why can this happens when ech vserver
>> has a maximum of 3 GB rss settings? In the dmesh output there was only on
>> process in oom killer. Is there maybe a bug in oomkiller allocating all
>> memroy and swap on the host?
>
> Do you have kernel logs from when oomkiller was triggered?
>
> I would be very interested in seeing the events and actual activity of
> the oomkiller because a co-worker and I have been chasing something
> very similar.
>
Today i found in dmesg another oomkiller action:

Jan 22 10:26:40 host kernel: [313237.887684] apache2 invoked
oom-killer: gfp_mask=0x0, order=0,
oomkilladj=0

Jan 22 10:26:40 host kernel: [313237.887688] apache2 cpuset=/
mems_allowed=0-1

Jan 22 10:26:40 host kernel: [313237.887691] Pid: 15934, comm: apache2
Not tainted 2.6.31.7-vs2.3.0.36.27
#1

Jan 22 10:26:40 host kernel: [313237.887693] Call
Trace:

Jan 22 10:26:40 host kernel: [313237.887702] [<ffffffff810ae4e4>] ?
oom_kill_process+0x8e/0x265

Jan 22 10:26:40 host kernel: [313237.887705] [<ffffffff810ae9bf>] ?
select_bad_process+0xd2/0x12f

Jan 22 10:26:40 host kernel: [313237.887708] [<ffffffff810aea9d>] ?
__out_of_memory+0x81/0x8e

Jan 22 10:26:40 host kernel: [313237.887710] [<ffffffff810aec80>] ?
pagefault_out_of_memory+0x64/0x8c

Jan 22 10:26:40 host kernel: [313237.887715] [<ffffffff8102b0ac>] ?
mm_fault_error+0x39/0xdf

Jan 22 10:26:40 host kernel: [313237.887720] [<ffffffff812c14af>] ?
thread_return+0x3e/0xc7

Jan 22 10:26:40 host kernel: [313237.887723] [<ffffffff8102b3b1>] ?
do_page_fault+0x25f/0x27b

Jan 22 10:26:40 host kernel: [313237.887727] [<ffffffff812c2915>] ?
page_fault+0x25/0x30

Jan 22 10:26:40 host kernel: [313237.887729]
Mem-Info:

Jan 22 10:26:40 host kernel: [313237.887730] Node 0 Normal
per-cpu:

Jan 22 10:26:40 host kernel: [313237.887733] CPU 0: hi: 186,
btch: 31 usd:
107

Jan 22 10:26:40 host kernel: [313237.887735] CPU 1: hi: 186,
btch: 31 usd:
0

Jan 22 10:26:40 host kernel: [313237.887736] CPU 2: hi: 186,
btch: 31 usd:
120

Jan 22 10:26:40 host kernel: [313237.887738] CPU 3: hi: 186,
btch: 31 usd:
0

Jan 22 10:26:40 host kernel: [313237.887740] CPU 4: hi: 186,
btch: 31 usd:
28

Jan 22 10:26:40 host kernel: [313237.887742] CPU 5: hi: 186,
btch: 31 usd:
0

Jan 22 10:26:40 host kernel: [313237.887743] CPU 6: hi: 186,
btch: 31 usd:
51

Jan 22 10:26:40 host kernel: [313237.887745] CPU 7: hi: 186,
btch: 31 usd:
158

Jan 22 10:26:40 host kernel: [313237.887747] CPU 8: hi: 186,
btch: 31 usd:
43

Jan 22 10:26:40 host kernel: [313237.887748] CPU 9: hi: 186,
btch: 31 usd:
102

Jan 22 10:26:40 host kernel: [313237.887750] CPU 10: hi: 186,
btch: 31 usd:
32

Jan 22 10:26:40 host kernel: [313237.887752] CPU 11: hi: 186,
btch: 31 usd:
0

Jan 22 10:26:40 host kernel: [313237.887753] CPU 12: hi: 186,
btch: 31 usd:
93

Jan 22 10:26:40 host kernel: [313237.887755] CPU 13: hi: 186,
btch: 31 usd:
0

Jan 22 10:26:40 host kernel: [313237.887757] CPU 14: hi: 186,
btch: 31 usd:
65

Jan 22 10:26:40 host kernel: [313237.887758] CPU 15: hi: 186,
btch: 31 usd:
156

Jan 22 10:26:40 host kernel: [313237.887760] Node 1 DMA
per-cpu:

Jan 22 10:26:40 host kernel: [313237.887762] CPU 0: hi: 0,
btch: 1 usd:
0

Jan 22 10:26:40 host kernel: [313237.887764] CPU 1: hi: 0,
btch: 1 usd:
0

Jan 22 10:26:40 host kernel: [313237.887766] CPU 2: hi: 0,
btch: 1 usd:
0

Jan 22 10:26:40 host kernel: [313237.887767] CPU 3: hi: 0,
btch: 1 usd:
0

Jan 22 10:26:40 host kernel: [313237.887769] CPU 4: hi: 0,
btch: 1 usd:
0

Jan 22 10:26:40 host kernel: [313237.887770] CPU 5: hi: 0,
btch: 1 usd:
0

Jan 22 10:26:40 host kernel: [313237.887772] CPU 6: hi: 0,
btch: 1 usd:
0

Jan 22 10:26:40 host kernel: [313237.887774] CPU 7: hi: 0,
btch: 1 usd:
0

Jan 22 10:26:40 host kernel: [313237.887775] CPU 8: hi: 0,
btch: 1 usd:
0

Jan 22 10:26:40 host kernel: [313237.887777] CPU 9: hi: 0,
btch: 1 usd:
0

Jan 22 10:26:40 host kernel: [313237.887779] CPU 10: hi: 0,
btch: 1 usd:
0

Jan 22 10:26:40 host kernel: [313237.887780] CPU 11: hi: 0,
btch: 1 usd:
0

Jan 22 10:26:40 host kernel: [313237.887782] CPU 12: hi: 0,
btch: 1 usd:
0

Jan 22 10:26:40 host kernel: [313237.887783] CPU 13: hi: 0,
btch: 1 usd:
0

Jan 22 10:26:40 host kernel: [313237.887785] CPU 14: hi: 0,
btch: 1 usd:
0

Jan 22 10:26:40 host kernel: [313237.887787] CPU 15: hi: 0,
btch: 1 usd:
0

Jan 22 10:26:40 host kernel: [313237.887788] Node 1 DMA32
per-cpu:

Jan 22 10:26:40 host kernel: [313237.887790] CPU 0: hi: 186,
btch: 31 usd:
6

Jan 22 10:26:40 host kernel: [313237.887792] CPU 1: hi: 186,
btch: 31 usd:
178

Jan 22 10:26:40 host kernel: [313237.887793] CPU 2: hi: 186,
btch: 31 usd:
0

Jan 22 10:26:40 host kernel: [313237.887795] CPU 3: hi: 186,
btch: 31 usd:
176

Jan 22 10:26:40 host kernel: [313237.887797] CPU 4: hi: 186,
btch: 31 usd:
0

Jan 22 10:26:40 host kernel: [313237.887799] CPU 5: hi: 186,
btch: 31 usd:
44

Jan 22 10:26:40 host kernel: [313237.887800] CPU 6: hi: 186,
btch: 31 usd:
0

Jan 22 10:26:40 host kernel: [313237.887802] CPU 7: hi: 186,
btch: 31 usd:
159

Jan 22 10:26:40 host kernel: [313237.887804] CPU 8: hi: 186,
btch: 31 usd:
0

Jan 22 10:26:40 host kernel: [313237.887805] CPU 9: hi: 186,
btch: 31 usd:
169

Jan 22 10:26:40 host kernel: [313237.887807] CPU 10: hi: 186,
btch: 31 usd:
0

Jan 22 10:26:40 host kernel: [313237.887809] CPU 11: hi: 186,
btch: 31 usd:
149

Jan 22 10:26:40 host kernel: [313237.887810] CPU 12: hi: 186,
btch: 31 usd:
0

Jan 22 10:26:40 host kernel: [313237.887812] CPU 13: hi: 186,
btch: 31 usd:
169

Jan 22 10:26:40 host kernel: [313237.887814] CPU 14: hi: 186,
btch: 31 usd:
0

Jan 22 10:26:40 host kernel: [313237.887815] CPU 15: hi: 186,
btch: 31 usd:
158

Jan 22 10:26:40 host kernel: [313237.887817] Node 1 Normal
per-cpu:

Jan 22 10:26:40 host kernel: [313237.887819] CPU 0: hi: 186,
btch: 31 usd:
165

Jan 22 10:26:40 host kernel: [313237.887820] CPU 1: hi: 186,
btch: 31 usd:
14

Jan 22 10:26:40 host kernel: [313237.887822] CPU 2: hi: 186,
btch: 31 usd:
0

Jan 22 10:26:40 host kernel: [313237.887824] CPU 3: hi: 186,
btch: 31 usd:
25

Jan 22 10:26:40 host kernel: [313237.887825] CPU 4: hi: 186,
btch: 31 usd:
62

Jan 22 10:26:40 host kernel: [313237.887827] CPU 5: hi: 186,
btch: 31 usd:
134

Jan 22 10:26:40 host kernel: [313237.887829] CPU 6: hi: 186,
btch: 31 usd:
130

Jan 22 10:26:40 host kernel: [313237.887830] CPU 7: hi: 186,
btch: 31 usd:
71

Jan 22 10:26:40 host kernel: [313237.887832] CPU 8: hi: 186,
btch: 31 usd:
67

Jan 22 10:26:40 host kernel: [313237.887834] CPU 9: hi: 186,
btch: 31 usd:
156

Jan 22 10:26:40 host kernel: [313237.887835] CPU 10: hi: 186,
btch: 31 usd:
5

Jan 22 10:26:40 host kernel: [313237.887837] CPU 11: hi: 186,
btch: 31 usd:
39

Jan 22 10:26:40 host kernel: [313237.887838] CPU 12: hi: 186,
btch: 31 usd:
72

Jan 22 10:26:40 host kernel: [313237.887840] CPU 13: hi: 186,
btch: 31 usd:
40

Jan 22 10:26:40 host kernel: [313237.887842] CPU 14: hi: 186,
btch: 31 usd:
34

Jan 22 10:26:40 host kernel: [313237.887843] CPU 15: hi: 186,
btch: 31 usd:
170

Jan 22 10:26:40 host kernel: [313237.887847] Active_anon:3619871
active_file:5826474
inactive_anon:232634

Jan 22 10:26:40 host kernel: [313237.887848] inactive_file:7193754
unevictable:2086 dirty:5508 writeback:0
unstable:0

Jan 22 10:26:40 host kernel: [313237.887849] free:127275 slab:1380361
mapped:408195 pagetables:95354
bounce:0

Jan 22 10:26:40 host kernel: [313237.887851] Node 0 Normal
free:80724kB min:17260kB low:21572kB high:25888kB
active_anon:8610340kB inactive_anon:487940kB active_file:10031720kB
inactive_file:14376252kB unevictable:6308kB present:37232640kB
pages_scanned:0 all_unreclaimable?
no

Jan 22 10:26:40 host kernel: [313237.887857] lowmem_reserve[]: 0 0 0
0

Jan 22 10:26:40 host kernel: [313237.887860] Node 1 DMA free:15648kB
min:4kB low:4kB high:4kB active_anon:0kB inactive_anon:0kB
active_file:0kB inactive_file:0kB unevictable:0kB present:15084kB
pages_scanned:0 all_unreclaimable?
yes

Jan 22 10:26:40 host kernel: [313237.887864] lowmem_reserve[]: 0 3246
36324
36324

Jan 22 10:26:40 host kernel: [313237.887867] Node 1 DMA32
free:224184kB min:1540kB low:1924kB high:2308kB active_anon:229184kB
inactive_anon:101872kB active_file:757164kB inactive_file:1256576kB
unevictable:1204kB present:3324740kB pages_scanned:0
all_unreclaimable?
no

Jan 22 10:26:40 host kernel: [313237.887872] lowmem_reserve[]: 0 0
33077
33077

Jan 22 10:26:40 host kernel: [313237.887875] Node 1 Normal
free:188544kB min:15700kB low:19624kB high:23548kB
active_anon:5639960kB inactive_anon:340724kB active_file:12517012kB
inactive_file:13142188kB unevictable:832kB present:33871360kB
pages_scanned:0 all_unreclaimable?
no

Jan 22 10:26:40 host kernel: [313237.887880] lowmem_reserve[]: 0 0 0
0

Jan 22 10:26:40 host kernel: [313237.887883] Node 0 Normal: 8306*4kB
4309*8kB 431*16kB 36*32kB 0*64kB 0*128kB 0*256kB 0*512kB 0*1024kB
0*2048kB 1*4096kB =
79840kB
Jan 22 10:26:40 host kernel: [313237.887890] Node 1 DMA: 0*4kB 2*8kB
1*16kB 2*32kB 1*64kB 1*128kB 0*256kB 0*512kB 1*1024kB 1*2048kB
3*4096kB =
15648kB

Jan 22 10:26:40 host kernel: [313237.887896] Node 1 DMA32: 15048*4kB
16319*8kB 1804*16kB 15*32kB 0*64kB 0*128kB 0*256kB 0*512kB 0*1024kB
0*2048kB 1*4096kB =
224184kB
Jan 22 10:26:40 host kernel: [313237.887903] Node 1 Normal: 31271*4kB
6963*8kB 205*16kB 12*32kB 0*64kB 0*128kB 0*256kB 0*512kB 0*1024kB
0*2048kB 1*4096kB =
188548kB
Jan 22 10:26:40 host kernel: [313237.887910] 13180489 total pagecache
pages

Jan 22 10:26:40 host kernel: [313237.887911] 43164 pages in swap cache
Jan 22 10:26:40 host kernel: [313237.887913] Swap cache stats: add
510763, delete 467599, find 16375590/16392114
Jan 22 10:26:40 host kernel: [313237.887915] Free swap = 139269360kB
Jan 22 10:26:40 host kernel: [313237.887916] Total swap = 140624968kB
Jan 22 10:26:40 host kernel: [313238.176346] 18874368 pages RAM
Jan 22 10:26:40 host kernel: [313238.176348] 280153 pages reserved
Jan 22 10:26:40 host kernel: [313238.176350] 8324253 pages shared
Jan 22 10:26:40 host kernel: [313238.176351] 11735562 pages non-shared
Jan 22 10:26:40 host kernel: [313238.176354] Out of memory: kill
process apache2(23962:#9046) score 201733 or a child
Jan 22 10:26:40 host kernel: [313238.192825] Killed process
apache2(24250:#9046)

but after that:

# vps aux | grep 15934
www-data 15934 9046 guest 0.0 0.0 277148 58968 ? S
Jan21 0:00 /usr/sbin/apache2 -k start

the process is still found. Maybe this belongs to that problem?

# cat /proc/virtual/9046/limit
Limit current min/max
soft/hard hits
PROC: 29 0/ 45 -1/
- -1 0
VM: 789300 0/ 956157 -1/
- -1 0
VML: 0 0/ 0 -1/
- -1 0
RSS: 136595 0/ 153600 51200/
153600 7
ANON: 69651 0/ 77840 -1/
- -1 0
RMAP: 66944 0/ 80463 -1/
- -1 0
FILES: 389 0/ 611 -1/
- -1 0
OFD: 381 0/ 424 -1/
- -1 0
LOCKS: 4 0/ 9 -1/
- -1 0
SOCK: 11 0/ 14 -1/
- -1 0
MSGQ: 0 0/ 0 -1/
- -1 0
SHM: 0 0/ 0 -1/
- -1 0
SEMA: 2 0/ 2 -1/
- -1 0
SEMS: 2 0/ 2 -1/
- -1 0
DENT: 25806 0/ 25823 -1/
- -1 0

best regars

Oliver
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v2.0.14 (GNU/Linux)

iEYEARECAAYFAktZcLoACgkQOBdlVlcPuhxTpQCgvXq314KWIB55qaRxgrEhOgoE
uYoAnRleme2XzUxDyUcZoY4FEkWDCgEo
=5pw6
-----END PGP SIGNATURE-----
Received on Fri Jan 22 09:33:13 2010

[Next/Previous Months] [Main vserver Project Homepage] [Howto Subscribe/Unsubscribe] [Paul Sladen's vserver stuff]
Generated on Fri 22 Jan 2010 - 09:33:15 GMT by hypermail 2.1.8