[vserver] Linux vServer: general protection fault with apache2 and kernel 2.6.38.6

From: Urban Loesch <bind_at_enas.net>
Date: Tue 19 Jul 2011 - 10:12:17 BST
Message-ID: <4E254A71.6000408@enas.net>

Hi,

I'm new to the list.

We are using linux vserver since 2005 and until the last month it works very well for us.
About 1 month ago we upgraded our self compiled 64bit kernel from Debian lenny "2.6.32.22-vs2.3.0.36.29.6" to squeeze
"2.6.38.6-vs2.3.0.37-rc15" because of the problem listed here:

https://bugzilla.kernel.org/show_bug.cgi?id=16991
http://lkml.org/lkml/2011/1/5/376

Now about 30 day later all of our host servers are running fine except they have a running vserver with apache2 installed.
Sometimes it happes that the vserver totally hangs. It's not possible to enter via "vserver SERVER enter", no
it's possible to restart the vserver or the apache in it. The only thing which gets back to work is a hard reset of the host server via sysrq-trigger.

Before this happens /var/log/syslog on the host shows me the following error:

...
Jul 18 15:45:26 vhost01 kernel: [4638798.444832] general protection fault: 0000 [#1] SMP
Jul 18 15:45:26 vhost01 kernel: [4638798.455138] last sysfs file: /sys/devices/pci0000:00/0000:00:1c.0/0000:03:00.0/host2/scsi_host/host2/proc_name
Jul 18 15:45:26 vhost01 kernel: [4638798.475515] CPU 4
Jul 18 15:45:26 vhost01 kernel: [4638798.479512] Modules linked in: netconsole configfs drbd lru_cache sch_hfsc ip6_queue act_police cls_flow cls_fw
cls_u32 sch_htb sch_ingress sch_sfq xt_realm iptable_raw ip6t_LOG xt_connlimit ip6table_
raw ipt_ULOG ipt_REJECT ipt_REDIRECT ipt_NETMAP ipt_MASQUERADE xt_comment ipt_ECN ipt_ecn ipt_CLUSTERIP ip6t_REJECT ipt_ah ipt_addrtype nf_nat_tftp
nf_nat_snmp_basic nf_nat_sip nf_nat_pptp xt_recent nf_nat_proto_gre nf_nat_irc nf_nat_h32
3 nf_nat_ftp nf_nat_amanda ip6table_mangle xt_NFLOG nfnetlink_log nf_conntrack_ipv6 nf_conntrack_tftp nf_conntrack_sip nf_conntrack_sane
nf_conntrack_proto_udplite nf_conntrack_proto_sctp nf_conntrack_pptp nf_conntrack_proto_gre nf_connt
rack_netlink nf_conntrack_netbios_ns nf_conntrack_irc nf_conntrack_h323 nf_conntrack_ftp ts_kmp nf_conntrack_amanda xt_TPROXY nf_tproxy_core
nf_defrag_ipv6 xt_time xt_TCPMSS xt_tcpmss xt_sctp xt_policy xt_pkttype xt_physdev xt_owner xt_N
FQUEUE xt_multiport xt_mark xt_mac xt_limit xt_length xt_iprange xt_helper xt_ha
Jul 18 15:45:26 vhost01 kernel: shlimit xt_DSCP xt_dscp xt_dccp ipt_LOG xt_connmark xt_CLASSIFY xt_tcpudp xt_conntrack xt_state iptable_nat nf_nat
nf_conntrack_ipv4 nf_defrag_ipv4 nf_conntrack iptable_mangle nfnetlink iptable_filter ip_t
ables ip6table_filter ip6_tables x_tables ipmi_devintf ipmi_si ipmi_msghandler psmouse i7core_edac edac_core tpm_tis tpm pcspkr tpm_bios processor
serio_raw power_meter dcdbas button ses enclosure bnx2 igb megaraid_sas dca [last unloaded
: scsi_wait_scan]
Jul 18 15:45:26 vhost01 kernel: [4638798.734507]
Jul 18 15:45:26 vhost01 kernel: [4638798.737861] Pid: 646, comm: apache2 Not tainted 2.6.38.6-vs2.3.0.37-rc15-rol-em64t #7 Dell Inc. PowerEdge R610/086HF8
Jul 18 15:45:26 vhost01 kernel: [4638798.759501] RIP: 0010:[<ffffffff8103695a>] [<ffffffff8103695a>] task_rq_lock+0x4a/0xa0
Jul 18 15:45:26 vhost01 kernel: [4638798.775920] RSP: 0018:ffff88081da17dc8 EFLAGS: 00010082
Jul 18 15:45:26 vhost01 kernel: [4638798.786892] RAX: 9066669066666605 RBX: 0000000000011d00 RCX: ffffffff814f3720
Jul 18 15:45:26 vhost01 kernel: [4638798.801550] RDX: 0000000000000282 RSI: ffff88081da17e20 RDI: 00007f8f2f645410
Jul 18 15:45:26 vhost01 kernel: [4638798.816205] RBP: ffff88081da17de8 R08: 0000000000000000 R09: 0000000000000001
Jul 18 15:45:26 vhost01 kernel: [4638798.830858] R10: 0000000000002b28 R11: 0000000000000400 R12: 00007f8f2f645410
Jul 18 15:45:26 vhost01 kernel: [4638798.845520] R13: ffff88081da17e20 R14: 0000000000011d00 R15: 0000000000000000
Jul 18 15:45:26 vhost01 kernel: [4638798.860188] FS: 00007f8f308736d0(0000) GS:ffff88083fc40000(0000) knlGS:0000000000000000
Jul 18 15:45:26 vhost01 kernel: [4638798.876748] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
Jul 18 15:45:26 vhost01 kernel: [4638798.888581] CR2: 0000000006b65290 CR3: 000000079dd22000 CR4: 00000000000006e0
Jul 18 15:45:26 vhost01 kernel: [4638798.903236] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
Jul 18 15:45:26 vhost01 kernel: [4638798.917890] DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400
Jul 18 15:45:26 vhost01 kernel: [4638798.932545] Process apache2 (pid: 646, threadinfo ffff88081da16000, task ffff880441ae3e70)
Jul 18 15:45:26 vhost01 kernel: [4638798.949457] Stack:
Jul 18 15:45:26 vhost01 kernel: [4638798.953847] 00007f8f2f645410 ffff8807f9221eb8 000000000000000f 0000000000000000
Jul 18 15:45:26 vhost01 kernel: [4638798.969073] ffff88081da17e58 ffffffff81040bd6 ffff88081da17e48 ffffffff81269f47
Jul 18 15:45:26 vhost01 kernel: [4638798.984290] 0000000000000000 0000000000000003 ffff88081da17e28 0000000000000282
Jul 18 15:45:26 vhost01 kernel: [4638798.999509] Call Trace:
Jul 18 15:45:26 vhost01 kernel: [4638799.004772] [<ffffffff81040bd6>] try_to_wake_up+0x36/0x310
Jul 18 15:45:26 vhost01 kernel: [4638799.016262] [<ffffffff81269f47>] ? idr_remove+0x187/0x1f0
Jul 18 15:45:26 vhost01 kernel: [4638799.027576] [<ffffffff81040f05>] wake_up_process+0x15/0x20
Jul 18 15:45:26 vhost01 kernel: [4638799.039069] [<ffffffff81231580>] freeary+0x1e0/0x260
Jul 18 15:45:26 vhost01 kernel: [4638799.049523] [<ffffffff81232631>] T.616+0x71/0xf0
Jul 18 15:45:26 vhost01 kernel: [4638799.059286] [<ffffffff81131de5>] ? vfs_write+0x125/0x190
Jul 18 15:45:26 vhost01 kernel: [4638799.070426] [<ffffffff81232719>] sys_semctl+0x69/0xa0
Jul 18 15:45:26 vhost01 kernel: [4638799.081050] [<ffffffff81002882>] system_call_fastpath+0x16/0x1b
Jul 18 15:45:26 vhost01 kernel: [4638799.093402] Code: 00 48 c7 c3 00 1d 01 00 49 89 fc 49 89 f5 9c 58 0f 1f 44 00 00 48 89 c2 fa 66 0f 1f 44 00 00 49
89 55 00 49 8b 44 24 08 49 89 de <8b> 40 18 4c 03 34 c5 e0 b2 70 81 4c 89 f7 e8 73 93
4a 00 49 8b
Jul 18 15:45:26 vhost01 kernel: [4638799.132205] RIP [<ffffffff8103695a>] task_rq_lock+0x4a/0xa0
Jul 18 15:45:26 vhost01 kernel: [4638799.143884] RSP <ffff88081da17dc8>
Jul 18 15:45:26 vhost01 kernel: [4638799.151543] ---[ end trace f2cfa0bdeab24e4d ]---
...

Details of our configuration:
Hardware:
DELL PE610
2x Intel(R) Xeon(R) CPU X5550
32GB RAM
RAID 10 with 6 disks

Kernel:
# uname -r
2.6.38.6-vs2.3.0.37-rc15-rol-em64t

# util-vserver:
ii libvserver0 0.30.216-pre2955-1 dynamic libraries for util-vserver
ii util-vserver 0.30.216-pre2955-1 utilities for managing Linux-VServer guests
ii util-vserver-core 0.30.216-pre2955-1 core utilities of util-vserver
ii util-vserver-sysv 0.30.216-pre2955-1 initscripts for util-vserver

# vserver-info
Versions:
                    Kernel: 2.6.38.6-vs2.3.0.37-rc15-rol-em64t
                    VS-API: 0x00020308
                       VCI: 0x0000000013001f11
              util-vserver: 0.30.216-pre2955; Mar 26 2011, 10:28:24

Features:
                CC: gcc, gcc (Debian 4.4.5-8) 4.4.5
                       CXX: g++, g++ (Debian 4.4.5-8) 4.4.5
                  CPPFLAGS: ''
                    CFLAGS: '-g -O2 -std=c99 -Wall -pedantic -W -funit-at-a-time'
                  CXXFLAGS: '-g -O2 -ansi -Wall -pedantic -W -fmessage-length=0 -funit-at-a-time'
                build/host: x86_64-pc-linux-gnu/x86_64-pc-linux-gnu
              Use dietlibc: yes
        Build C++ programs: yes
        Build C99 programs: yes
            Available APIs: compat,v11,fscompat,v13,net,v21,v22,v23,netv2
             ext2fs Source: e2fsprogs
     syscall(2) invocation: alternative
       vserver(2) syscall#: 236/glibc
                crypto api: nss
           python bindings: yes
    use library versioning: yes

Paths:
                    prefix: /usr
         sysconf-Directory: /etc
             cfg-Directory: /etc/vservers
          initrd-Directory: /etc/init.d
        pkgstate-Directory: /var/run/vservers
           vserver-Rootdir: /vservers

Assumed 'SYSINFO' as no other option given; try '--help' for more information.

# ./testme.sh
Linux-VServer Test [V0.17] Copyright (C) 2003-2006 H.Poetzl
chcontext is working.
chbind is working.
Linux 2.6.38.6-vs2.3.0.37-rc15-rol-em64t #7 SMP Mon May 16 19:31:48 CEST 2011 x86_64
Ea 0.30.216 236/glibc (DSa) <compat,v11,fscompat,v13,net,v21,v22,v23,netv2>
VCI: 0002:0308 236 13001f11 (TbsPW)

---
[000]# succeeded.
[001]# succeeded.
[011]# succeeded.
[031]# succeeded.
[101]# succeeded.
[102]# succeeded.
[201]# succeeded.
[202]# succeeded.
# vserver-stat
CTX   PROC    VSZ    RSS  userTIME   sysTIME    UPTIME NAME
72     923  62.1G   8.3G   4h49m33  31m14s10  11h05m48 VSERVERNAME
Only 1 vserver is running on that machine, but I saw the error on an other server with the same configuration. After the downgrade of the second 
server to kernel "2.6.32.41-vs2.3.0.36.29.7" the problem seems to be solved. No problems about 20 days.
Have you any idea how I could solve this, or what could it cause?
If you need more information please let me know.
Thanks for your help
Urban Loesch
Received on Tue Jul 19 10:12:40 2011
[Next/Previous Months] [Main vserver Project Homepage] [Howto Subscribe/Unsubscribe] [Paul Sladen's vserver stuff]
Generated on Tue 19 Jul 2011 - 10:12:41 BST by hypermail 2.1.8