About this list Date view Thread view Subject view Author view Attachment view

From: vherva_at_turing.netspan.fi
Date: Wed 15 Jan 2003 - 21:10:55 GMT


[Please Cc, although I will poll the archive]

We recently upgraded from 2.4.20-pre7-ctx13-owl kernel to
2.4.21-pre2-ctx16-owl (owl stands for the Openwall Linux security
patchset). After that, certain processes ran inside vservers begun to
hang. For example, doing this after fresh boot of host system:

host> vserver www start
host> vserver www enter
(...)
www> ls -l /
<hangs after showing a few lines>
www> id
<hangs>
www> uname -a
<works>
www> ls /
<works>

I first reverted the owl patch and verified it happens with
2.4.21-pre2-ctx16 and then with 2.4.20-ctx16. After few iterations, it
seems the problem happens with 2.4.19ctx-15, but does not happen with
2.4.19ctx-14 (both clean 2.4.19 with only ctx applied).

Details:

[root_at_vserver:www /]ls -l
total 64
drwxr-xr-x 2 root root 4096 Jan 6 15:45 bin
drwxr-xr-x 2 root root 4096 Feb 6 1996 boot
<hangs>
[root_at_vserver:www /]id
<hangs>

Strace shows that "id" (and ls -l, for that matter) hangs looping these
syscalls:

--------------------------------------------------------------
open("/var/yp/binding/test-host.2", O_RDONLY) = -1 ENOENT
(No such file or directory)
gettimeofday({1042665515, 575639}, NULL) = 0
socket(PF_INET, SOCK_DGRAM, IPPROTO_UDP) = 3
getpid() = 18043
bind(3, {sin_family=AF_INET,
sin_port=htons(835), sin_addr=inet_addr("0.0.0.0")}}, 16) = 0
ioctl(3, 0x5421, [1]) = 0
setsockopt(3, SOL_IP, IP_RECVERR, [1], 4) = 0
sendto(3,
"\200;\351t\0\0\0\0\0\0\0\2\0\1\206\240\0\0\0\2\0\0\0\3"...,
56, 0, {sin_family=AF_INET, sin_port=htons(111),
sin_addr=inet_addr("127.0.0.1")}}, 16) = 56
poll([{fd=3, events=POLLIN, revents=POLLERR}], 1, 5000) = 1
recvmsg(3, {msg_name(16)={sin_family=AF_INET,
sin_port=htons(111), sin_addr=inet_addr("10.0.0.1")}},
msg_iov(1)=[{"\200;\351t\0\0\0\0\0\0\0\2\0\1\206\240\0\0\0\2\0\0\0\3"...,
56}], msg_controllen=44, msg_control=0xbfffcd70, ,
msg_flags=MSG_ERRQUEUE}, MSG_ERRQUEUE) = 56
recvfrom(3, 0x804d140, 400, 0, 0xbffff230, 0xbfffcf14) = -1
EAGAIN (Resource temporarily unavailable)
poll([{fd=3,events=POLLIN}], 1, 5000) = 0 ioctl(3, 0x8912, 0xbfffcf20) = 0
ioctl(3, 0x8913, 0xbfffcf60) = 0
sendto(3, "\200;\351t\0\0\0\0\0\0\0\2\0\1\206\240\0\0\0\2\0\0\0\3"..., 56,
0, {sin_family=AF_INET, sin_port=htons(111),
sin_addr=inet_addr("127.0.0.1")}}, 16) = 56
poll([{fd=3, events=POLLIN, revents=POLLERR}], 1, 5000) = 1
recvmsg(3, {msg_name(16)={sin_family=AF_INET, sin_port=htons(111),
sin_addr=inet_addr("10.0.0.1")}},
msg_iov(1)=[{"\200;\351t\0\0\0\0\0\0\0\2\0\1\206\240\0\0\0\2\0\0\0\3"...,
56}], msg_controllen=44, msg_control=0xbfffcc30, ,
msg_flags=MSG_ERRQUEUE}, MSG_ERRQUEUE) = 56
recvfrom(3, 0x804d140, 400, 0, 0xbffff230, 0xbfffcf14) = -1 EAGAIN
(Resource temporarily unavailable)
poll( <unfinished ...>
--------------------------------------------------------------

I have no clue why id and ls would want to poll a network socket. On
ctx14, does that too, but does not hang.

Full straces available at

http://www.netspan.fi/tmp/strace-id-ctx14.txt
http://www.netspan.fi/tmp/strace-id-ctx15.txt

I see there's a bunch of network changes between ctx-14->15
(http://www.netspan.fi/tmp/patch-ctx-14-15). I can't spot anything
obvious, though.

Any ideas?

-- v --

--
Ville Herva   Ville.Herva_at_netspan.fi   +358-50-5164500
Netspan Oy    netspan_at_netspan.fi       PL 65  FIN-02151 Espoo    
              http://www.netspan.fi
For my PGP key, see http://www.netspan.fi/pgp-vherva.html


About this list Date view Thread view Subject view Author view Attachment view
[Next/Previous Months] [Main vserver Project Homepage] [Howto Subscribe/Unsubscribe] [Paul Sladen's vserver stuff]
Generated on Wed 15 Jan 2003 - 21:45:21 GMT by hypermail 2.1.3