[00:00] hmm not sure how to hack this into the e1000 driver [00:01] IIRC there are patches somewhere ... but I'm not able to find them, atm. [00:01] found a patch :) [00:01] http://www.ussg.iu.edu/hypermail/linux/kernel/0311.1/0290.html [00:01] 2.6.0/test9 [00:01] oh? what is wrong with the e1000 driver? i plan to run that driver myself [00:01] err [00:01] 2.6 [00:01] yea [00:02] JonB, nothing, trying to ge netconsole to work with it. [00:02] netconsole ? [00:03] im going to try to hack this 2.6 patch into the e1000 driver [00:03] worth a shot [00:03] hmm, some time ago, I had the netdrv patches in 2.4 patchset ... [00:03] that might solve some issues ... [00:06] hmm seems straight forward to get this working with the 2.4 driver [00:06] okay, go ahead, if it works, we'll include it in all devel releases ;) [00:09] networking is back up [00:09] thats a good sign [00:21] netconsole: network logging started up successfully! [00:21] lets see if it works [00:22] netconsole: using source IP 66.98.-108.92 [00:22] netconsole: using target IP -49.44.-54.92 [00:22] sketchy :) [00:23] but it works [00:23] hehe [00:23] yay [00:23] lets crash this bad boy [00:23] Unable to handle kernel NULL pointer dereference at virtual address 00000018 [00:23] printing eip: [00:23] c01741c6 [00:23] *pde = 00000000 [00:23] Oops: 0002 [00:23] CPU: 0 [00:23] EIP: 0010:[] Tainted: P [00:23] EFLAGS: 00010286 [00:23] eax: 00000000 ebx: f6ea5f50 ecx: c1000020 edx: 000000ee [00:23] esi: f7992000 edi: f7992000 ebp: f6ea5f60 esp: f6ea5f40 [00:23] ds: 0018 es: 0018 ss: 0018 [00:23] Process killer (pid: 2956, stackpage=f6ea5000) [00:23] Stack: f6ea5f50 00004049 c1c15504 0000c708 35393035 00000032 f7992400 00000000 [00:23] f6ea5f7c c012e637 f7992000 0000c707 ffffffff f6ea5f94 bffff7a4 f6ea5fa8 [00:23] c012e99e ffffffff 00000000 00000008 f6e36884 00000000 00000000 ffffffff [00:23] Call Trace: [] [] [] [] [00:23] Code: c7 40 18 13 00 00 00 89 c3 8b 46 08 c7 43 20 c0 8c 3b c0 89 [00:23] post copy: 0 [00:23] post copy: 0 [00:23] bingo :) [00:23] Bertl, good idea :) [00:23] now if only i had been smart enough to compile with -g [00:24] hmm i wonder if anyone has implemented this to do a remote sysrq? [00:25] ipt_sysrq is a new iptables target that allows you to do the same as the magic sysrq key on a keyboard does, but over the network. [00:25] well isnt that spiffy [00:27] JonB (~Jon@0x503e0319.kjnxx7.adsl.tele.dk) left irc: Quit: ChatZilla 0.9.35 [Mozilla rv:1.5.1/20031120] [00:29] hey cool I was cooking ... hmm, I'm still cooking but it seems you got it working, right? [00:30] yep netconsole is working, got oops coming across [00:30] gonna try to get sysrq working over the network as well [00:30] compiling with -g right now [00:32] great, could you make a small howto on linux-vserver.org ... I'm volunteering to make the adaptations for other network cards, if required ... [00:32] Bertl, do you think this will really be useful to the average vserver user though? [00:33] ill do one in general terms, nothing really vserver specific [00:34] testing out the sysrq [00:34] well, lot of people asked about the serial console ... [00:34] netconsole+remote sysrq is slick [00:34] Bertl, in the context of debugging or management? [00:35] debugging ... and ops capturing ... [00:36] ok sure ill hack something up [00:37] sysrq is going to be prone to a nice replay attack but ahwell [00:39] tanjix (ViRu_@pD9049FE1.dip.t-dialin.net) left irc: [00:44] nathan_: after the last lkcd discussion, I was also thinking about a solution/enhancement in this direction, and it sounds useful to me to setup some circular buffer in memory, which keeps the last 64k or 128k of printk messages ... over a reboot and logs them into the dmesg buffer or something like that ... [00:45] Bertl, isnt it a shot in the dark as to what machines will keep memory intact between boots? [00:48] hmm, well, yes, but I don't know any 2GB server, which checks all 2GB ;) [00:48] and with proper checksums, etc ... [00:54] well this is damn slick [00:54] remote sys rescue works as expected [01:07] /usr/src/linux-2.4.23/fs/proc/virtual.c:111 [01:14] wow, that is a service ;) [01:17] hmm so now i have access to pretty much everything i would locally [01:17] sounds great ... [01:17] entry = create_proc_entry(name, [01:17] this isn't supposed to return NULL, but obviously does ... [01:18] try to add a check before entry->vx_flags = VX_ADMIN|VX_WATCH|VX_IDENT; [01:18] panic on null? [01:18] if (!entry) return 0; [01:18] ok [01:20] kgdb over networking using netpoll =o [01:24] yep [01:24] hundreds of nulls [01:25] is this a good sign? [01:25] entry was null! 57675 [01:25] post migrate: 0, 0 [01:25] post copy: 0 [01:25] entry was null! 57676 [01:25] post migrate: 0, 0 [01:25] post copy: 0 [01:25] entry was null! 57677 [01:25] post migrate: 0, 0 [01:25] post copy: 0 [01:25] entry was null! 57678 [01:25] the procfs inode pool seems depleted ... [01:25] is it just because of all the forks? [01:25] funny ... maybe we are leaking inodes? [01:26] on a side note, NMI Watchdog detected LOCKUP on CPU3, eip c010acbe, registers: [01:26] box still up though and nmi coming across on all cpus [01:27] hmm ... anything useful there, anything to put into a ksymoops? [01:27] foo_file = create_proc_entry("foo", 0644, example_dir); [01:27] if(foo_file == NULL) { [01:27] rv = -ENOMEM; [01:27] goto no_foo; [01:27] } [01:27] okay, so the NULL on create_proc_entry, means no memory ... [01:28] ok so am i just depleting all the memory with all the forks? [01:28] could be ... [01:28] http://0x00.org/hidden/bert.txt [01:29] hmm, hmm, looks like a 'bug' in the netconsole code ;) [01:30] ahh, no, actually it's the vx_proc_destroy, now called with an empty entry ... [01:32] hmm, I'll cleanup that mess, and we'll see ... [01:33] at any given time that i can detect, there are around 200 entries in /proc/ [01:33] while entry is null [01:37] the "post migrate" is interesting ... [01:37] I thought I removed that one ... [01:39] am i running old code like a dope? [01:39] doesn't matter, I'll upload a 1.3.2.3 in a few minutes ... [01:40] k [01:57] Tamama (~Tamama@a62-216-20-152.adsl.cistron.nl) joined #vserver. [01:57] oy [01:57] oyoy! [01:58] what was that option again to let a vserver change uid/gid ? :) [01:58] moreso, setgid32() [01:58] you mean for context tagging (xid info in uid/gid)? [01:59] no, i have this apache suexec program that cant change itself to another vserver user :) [02:00] hmm, I'm lost, I don't understand what you are up to?! [02:00] ok [02:00] suexec runs as an apache user, then changes itself to a user/group you specify to run a command [02:00] yup [02:00] in this vserver it fails :) [02:01] suexec as apache plugin or as external script? [02:01] probably the setuid/setgid flags are missing on that one [02:02] currently im running it as an external script to see if it would work [02:02] and i call it with correct arguments :) [02:02] it requires to have the correct ownership and suid flags ... (for the script) [02:03] suexec needs to run as www [02:03] i run it as www [02:03] suexec is owned by www [02:03] :) [02:03] that won't work ;) [02:03] well i cant run it as root lol [02:03] and how do you suppose it should be able to chown to somebody else? [02:04] [2004-01-02 23:49:10]: user mismatch (root instead of www) [02:04] beats me [02:05] hehe [02:05] I have the feeling you are currently trying to accomplish the impossible ... [02:05] probably [02:05] there are usually two ways to do suexec stuff ... [02:05] well httpd runs as user www... [02:05] so how could it _ever_ work? [02:05] a) specifying the user/group within apache ... [02:06] b) using a suid root binary/script which changes the user/gid and drops root priviledges before executing the user script/binary [02:06] Tamama, that has nothing to do with vserver, that is an error in your configuration [02:06] i'm sure it is nathan lol [02:07] Tamama, who is your webserver running as? root? [02:08] Tamama, you need your User directive in apache to match what you compiled suexec with, based on your output that should be www. not root. [02:08] webserver is running as www, but 1 process remains root [02:08] hmmm [02:10] the root process shouldnt be running any cgis [02:10] http://vserver.13thfloor.at/Experimental/delta-2.4.23-vs1.3.2-vs1.3.2.3.diff [02:10] Bertl, is that what i want to test with? [02:10] Tamama: that is okay, if you use the apache built in suexec ... [02:11] nathan_: yup, that is the latest version, ontop of 1.3.2 ... [02:11] repatch, if you get any 'post migrate' you did something wrong ... [02:11] k [02:13] http://httpd.apache.org/docs/suexec.html [02:15] Please note that you need root privileges for the installation step. In order for the wrapper to set the user ID, it must be installed as owner root and must have the setuserid execution bit set for file modes. [02:15] chmod 4750 /usr/local/apache2/bin/suexec [02:15] ;) [02:16] oh well i'll figure it out :D [02:17] oops ... [02:17] nathan_: just discovered a small mistake ... [02:17] doh [02:17] k gimme a new one :) [02:17] I'll fix it immediately ... sec [02:24] nathan_: so what was required to get the netconsole stuff working? [02:25] Bertl, standard patch from the redhat guy + a polling hack for the ethernet driver [02:25] Ingos patch for 2.4.10-C1, I assume? [02:25] C2 is what i used [02:25] ah okay ... [02:25] and the polling stuff was your own patch? the adaptation of the 2.6 stuff, right? [02:26] Bertl, yep, just verbatim from the 2.6 patch. [02:26] very straight forward [02:26] and does this interfere with the 'usual' networking? [02:26] nope [02:26] http://0x00.org/hidden/netconsole-withe1000.diff [02:27] hmm interesting, you use the eepro driver, not the e100(0) ? [02:28] no i use the e1000, that has both in it. the original netconsole C2 had the eepro driver. [02:28] ah, okay, my fault ... I should split up your patch first ;) [02:34] http://vserver.13thfloor.at/Experimental/delta-2.4.23-vs1.3.2-vs1.3.2.4.diff [02:34] (no netconsole included yet ;) [02:35] k [02:36] is that against 1.3.2? [02:36] yes, [02:36] hmm [02:36] patching file fs/devpts/root.c [02:36] Reversed (or previously applied) patch detected! Assume -R? [n] [02:36] after doing patch-2.4.23-vs1.3.2.diff [02:36] after doing patch-2.4.23-vs1.3.2.diff then delta-2.4.23-vs1.3.2-vs1.3.2.4.diff i mean [02:37] that should not be so ... one second ... [02:37] got a -#include on root.c [02:37] only change [02:39] hmm, okay obviously my sources got screwed up ... [02:40] some of the correction done in 1.3.2 are missing in my sources .. I'll correct that ... [02:41] k [02:43] http://vserver.13thfloor.at/Experimental/delta-2.4.23-vs1.3.2-vs1.3.2.4.diff [02:43] okay, now against the 'correct' sources ... [02:43] (hopefully, it seems it's a bad day for my patching ;) [02:43] yep clean [02:44] micah (micah@micha.hampshire.edu) joined #vserver. [02:44] hi micah! [02:44] Hi Bertl! [02:45] I've got a system with: [02:45] /dev/rootvg/homelv 495844 14380 455864 4% /home [02:45] /dev/rootvg/usrlv 1032088 728032 251628 75% /usr [02:45] /dev/rootvg/varlv 1032088 119252 860408 13% /var [02:46] the volume group is composed of the two disk partitions: /dev/sda3 and /dev/sdb3 [02:46] created lv in linear mode [02:46] but I want to turn those into raid + lvm [02:46] how should I go about doing that? [02:46] hmm, you know that this isn't a lvm channel, right? [02:47] but I'm sure you can explain the direct relation to linux-vserver, right? [02:49] err, right [02:49] I'd spoken with someone from here in the past about LVM issues (they told me to come here) [02:49] I think it was JohnB or JonB [02:51] ah okay, well, you want to convert the entire vg (rootvg) to a raid setup, right? [02:52] well, I want to use software raid and I want to use LVM [02:52] so I am not sure the best way to do it [02:52] I've done both individually [02:52] in the past [02:52] but you also want to 'save' the data on that partition? [02:53] right [02:53] I have space that I can work with [02:53] okay, first you have to move everything off from one disk [02:53] either sda3 or sdb3 has to be removed from the vg [02:54] this should not be such an issue ... if there is enough unused space ... [02:54] ok, thats easy [02:54] then I remove that disk from the VG? [02:54] yup [02:55] I can do that [02:55] then you configure a mirror raid between that 'removed' partition and the 'still' used partition, marked as faulty volume ... [02:55] so I set the LVM partition as faulty? [02:55] the raid will be created in degraded mode, and your lvm data will remain [02:56] then you create a new vg, I'd suggest /dev/vgs [02:56] from the degraded raid1 [02:56] split it into /dev/vgs/slash /dev/vgs/usr /dev/vgs/var ... [02:57] i guess what I dont understand is, would my md device be composed of LVM volumes? [02:57] nope, you do it the other way around ... [02:57] ooh [02:57] oooooooh [02:57] so the PV is created out of the md devices [02:57] then you just dump/restore the entire partitions ... [02:58] (simpler than copying) [02:58] remove the 'original' lvm/vg stuff and 'rebuild' the faulty raid ... [02:58] after that, you have your lvm volumes on a raid1 md ... [02:59] sounds like you need to do some funky stuff to resize things [02:59] in the future [02:59] hmm, nope, you 'just' add another raid1 md ... [02:59] to the volume group ... [03:00] ahh, that makes sense [03:00] the problem is wrapping my head around the multiple layers of abstraction :) [03:00] by the way, if you have more disks available (more than 2) it's much better to consider raid5 [03:01] Bertl, dead real quick [03:01] and any messages? [03:01] yeah, although it is slower... and this fits perfectly with the disks we have [03:01] yep, got 3 oops will ksymoops in a sec [03:01] micah: actually raid5 is much faster ... [03:02] argh i forgot i rebuilt from scratch, forgot -g again [03:02] Action: nathan_ kicks himself [03:02] sigh [03:03] Bertl: I'll go screw around, thanks for the brain dump [03:04] ur welcome ... [03:06] kestrel (~athomas@dialup51.optus.net.au) left irc: Ping timeout: 480 seconds [03:06] >>EIP; c017f291 <===== [03:07] will get addr2line when -g is done [03:09] /usr/src/linux-2.4.23/fs/proc/generic.c:575 [03:11] argh! self inflicted ... [03:11] I'm sure this is _not_ my day ;) [03:12] :) [03:12] at least it's friday :) [03:12] least we aren't working blind anymore [03:14] Tamama (~Tamama@a62-216-20-152.adsl.cistron.nl) left irc: Quit: one little two little three little piggies OINK! OINK! OINK! [03:16] nathan_: could you put the entire ksymoops trace somewhere? [03:17] http://0x00.org/hidden/bert.txt [03:19] is it better to make LVM volumes in linear or striped mode? [03:20] depends ... if you want to add/remove them, linear is the better solution, if you are looking for performance, striped might be better ... [03:23] Bertl: hmm, these are on top of raid-1 devices, so would that matter? [03:24] what happens when you put a linear mode LVM ontop of a striped software raid? [03:24] it always matters ;), but there might be unexpected (or hard to understand) results at this level ... [03:24] Action: micah scrunches his brow [03:25] seems like I should just keep it linear in LVM and let the software raid have the striping taken care of [03:25] otherwise it might be a total mess :) [03:26] hmm, wait, you are now talking of striped raid? [03:26] err raid 1 is striped no? [03:27] nono [03:27] raid1 is mirror :p [03:27] in which case... it doesn't make sense to make a LVM stripe [03:27] since the data is being written to one md device anyways [03:27] http://www.sqlmag.com/Articles/Index.cfm?ArticleID=9697 ;) [03:28] yeah, I know, I just had a brain fart [03:28] after doing raid1, you do not have more than 'one' physical volume for your lvm vg [03:29] if you add 'another' raid1 md to your 'existing' lvm vg, then a striped setup could give better performance ... [03:29] nathan_: could you do some addr2line queries for me? [03:30] or put the vmlinux somewhere on the web? [03:30] Bertl, sure [03:30] i can do either, whatever is easier for you [03:30] want addr2line on the backtrace? [03:30] hmm, lets start with the former ... [03:30] c017f291 [03:31] try to reduce that addres by values of 16 (0x10) [03:31] multiples of 0x10 and tell me the result ... [03:31] so c017f281, c017f271 ... [03:31] /usr/src/linux-2.4.23/fs/proc/generic.c:575 [03:31] /usr/src/linux-2.4.23/fs/proc/generic.c:573 [03:31] /usr/src/linux-2.4.23/fs/proc/generic.c:569 [03:32] /usr/src/linux-2.4.23/fs/proc/generic.c:570 [03:32] /usr/src/linux-2.4.23/fs/proc/generic.c:569 [03:32] keep going ... [03:32] that last one was actually 41, not 51, skipped one [03:32] /usr/src/linux-2.4.23/include/asm/string.h:190 [03:33] okay, np [03:33] /usr/src/linux-2.4.23/fs/proc/generic.c:563 [03:33] /usr/src/linux-2.4.23/fs/proc/generic.c:560 [03:33] /usr/src/linux-2.4.23/fs/proc/generic.c:553 [03:33] okay ... [03:33] c0182dee [03:33] /usr/src/linux-2.4.23/fs/proc/virtual.c:146 [03:35] hmm, it seems to me, like we are doing remove_proc_entry, with proc_virtual = NULL ... but I don't see how/when this can happen ... [03:54] so do you want me to test anything else or put some trace in there or something? [03:55] hmm, not sure ... maybe another test run, to verify if it's the same location ... [03:55] ok [03:56] does the network console work in both directions? [03:56] so is attaching a remote debugger possible? [03:56] nope [03:56] simply writes printks over the network [03:56] i saw another patch for network kgdb [03:56] havent tried it [03:56] okay, but you have a tool to send magic sysreqs right? [03:57] yep [04:38] k just finished crashing it again [04:38] doing a ksymoops [04:38] sounds like it lasted longer this time? [04:38] no, i just got distracted [04:38] its instanteous [04:38] i run killer, its dead. [04:38] ah okay ... [04:39] in this case, maybe you could enable debugging next time ... ;) [04:39] -g? [04:39] nope, the VX_DEBUG option ... [04:39] ah [04:39] but lets see where it crashed this time ... [04:40] by the way, thanks for helping here ... [04:40] >>EIP; c017f281 <===== [04:40] >>edx; f79c5e00 <_end+3750693c/38377b9c> [04:40] >>edi; f729328c <_end+36dd3dc8/38377b9c> [04:40] >>ebp; f70abe68 <_end+36bec9a4/38377b9c> [04:40] >>esp; f70abe48 <_end+36bec984/38377b9c> [04:40] Trace; c0182dde [04:40] Trace; c0132b74 [04:40] Trace; c01244ef [04:40] Trace; c0125983 [04:40] Trace; c012cee2 [04:40] Trace; c01096ad [04:40] Trace; c011c067 [04:40] Trace; c0107ba8 [04:40] Trace; c0109998 [04:41] compiling with VX_DEBUG [04:42] hmm, you still have the vmlinux which crashed this time? [04:42] err yea hasnt linked it yet hang [04:43] ack clean killed it [04:43] sorry :/ [04:43] just because the crash is different ... [04:43] >>EIP; c017f291 <===== [04:43] 02:41 < nathan_> >>EIP; c017f281 <===== [04:43] ah [04:44] same ballpark but a little earlier ... [04:44] should i take another crash with VX_DEBUG or go back to old vmlinux and get info on that last crash? [04:44] go for the VX_DEBUG I'd say ... [04:45] this might shed some additional light on the issue ... [04:45] ok [04:47] personally, I suspect that we release some half allocated vx_info, somewhere ... in heavy fork action ;) [04:53] hmm should have logged that a little better [04:56] i wish i hadnt forgotten to setup the remote sysrq this boot [04:56] waiting for softdog [04:56] wuff, wuff ... [04:57] ah I mean, bark, bark ... [05:02] haha [05:02] just rebooted [05:20] had to do a manual reboot, didnt come back after the last crash [05:20] waiting for kernel to boot [05:24] ok got a dump [05:24] isnt crashing the same as without debugging though [05:24] actually yes it is i think [05:24] just mixed in the debugging [05:25] okay, let's see ... [05:25] http://0x00.org/hidden/netconsole.txt.tgz [05:27] HTTP request sent, awaiting response... 404 Not Found [05:27] http://0x00.org/hidden/netconsole.txt.gz [05:27] works ;) [05:27] ah sorry [05:31] hm, okay the (new) debug message actually has a bug *sigh* [05:31] hmm the box isnt booting after these crashes :/ [05:32] that's bad ... [05:32] hmm [05:33] so the debugging info in that one isnt telling much? [05:34] well, it tells a lot, but I guess I wont get a ksymoops/addr2line postprocessing if you machine doesn't come up again, right? [05:34] it will come back up again after i request a manual reboot [05:34] why is your kernel tainted, by the way? [05:35] Bertl, netconsole [05:35] huh? that isn't GPL, or just too old? [05:35] too old [05:35] i believe, or maybe it isnt GPL :) [05:35] but thats why its tained [05:35] tainted even [05:37] seems there are a few issues with smp emulation and bochs with 2.4 [05:43] hmm 2.6.1-rc1 ... guess I have to work on the 2.6 stuff ... [05:49] Action: nathan_ very happy with 2.4 [05:52] serving (~serving@213.186.191.23) left irc: Ping timeout: 480 seconds [06:02] k box is booting to kernel [06:02] gonna get ksymoops [06:02] great, by the way, what's your name? [06:02] nathan :) [06:02] hehe [06:03] hmm, okay, just because I would like to add you to my list of vserver contributors ... [06:03] oh sure, nathan faber if thats what you were asking [06:05] ugh the oops is sloppy mixed with vxd [06:06] i imagine you saw the kernel BUG at /usr/src/linux-2.4.23/include/asm/spinlock.h:86! [06:06] serving (~serving@213.186.191.23) joined #vserver. [06:07] hm, you have spinlock debugging enabled? [06:07] yea [06:07] this oops arent complete :/ [06:07] okay, so we grabbed a lock ... [06:07] lost some of it on the network [06:07] which was locked ... [06:08] nope, the other way round, we unlocked an un-locked lock ... [06:08] yep [06:08] what's up the stack ... [06:09] on the oops surrounding that? [06:09] yup ... [06:10] Trace; c0132cdc [06:10] Trace; c01245dd [06:10] Trace; c0168380 [06:10] Trace; c0125a93 [06:10] Trace; c010995f [06:10] thats the first one that ksymoops shows with the exact input i gave you, havent tried to correct it [06:11] hmm .. [06:11] im gonna try to patch this oops back together [06:17] just isnt much here [06:17] Bertl, maybe i should turn off VX_DEBUG and keep spinlock debugging on and see if i can get a clean oops? [06:17] yup, maybe worth a try ... [06:19] do people just not use vserver with smp boxes or are people just not triggering these deadlocks? [06:19] im surprised there arent more people around with reports [06:20] I guess both applies, and also many do not report bugs at all ... [06:21] im going to email a professor here and see if he will loan me a dual [06:21] I had the vs1.00 with all the races running on an SMP machine with 40 vservers for over 1 month, without any issues ... [06:22] no proc monitoring or such things ... [06:22] seems if the proc status information was killed that 1.22 would have most likely stayed up [06:23] yup, except for proc, all issues seem to be resolved ... [06:26] Nick change: Doener_zZz -> Doener [06:26] is the proc information used by the userspace utils or is it simply there for random statistics but not actively used? [06:26] good morning [06:27] everybody had a good start into the new year? [06:27] hi Doener, morning and yes! [06:27] fine :) [06:27] i had a great new year :) [06:27] nathan_: well, actually the proc interface is the source for many userspace tools ... [06:30] Action: nathan_ wonders if he could live without the proc interface [06:30] for now at least [06:30] hey, we'll fix it .. I want to clean this mess up, once and for all ... [06:30] kestrel (~athomas@dialup51.optus.net.au) joined #vserver. [06:31] ok here we go, spinlock debugging no VX_DEBUG [06:33] joy nothing caught with the spinlocking [06:33] EIP: 0010:[] Tainted: P [06:34] hmm, it could be that we trigger a case where the OOM is the reason for the panic ... [06:34] hmm [06:35] I'm thinking about a killer which doesn't reach OOM ... [06:35] http://0x00.org/hidden/n.txt.gz [06:36] hmm there are quite a few killers running [06:36] maybe this is more trivial like you just said [06:37] hmm, that looks interesting ... I'm curious about the ksymoops ... [06:37] only 961 killers, not huge. [06:37] got that coming in a sec [06:40] Doener: how was your start this year? [06:40] >>EIP; c017f268 <===== [06:40] >>ebx; c1c1358c <_end+17540c8/38377b9c> [06:40] >>edx; f69ab701 <_end+364ec23d/38377b9c> [06:40] >>edi; f69ab5bc <_end+364ec0f8/38377b9c> [06:40] >>ebp; f69b3e68 <_end+364f49a4/38377b9c> [06:40] >>esp; f69b3e48 <_end+364f4984/38377b9c> [06:40] Trace; c0182dde [06:40] Trace; c0132b74 [06:40] Trace; c01244ef [06:40] Trace; c0125983 [06:40] Trace; c012cee2 [06:40] Trace; c01096ad [06:40] Trace; c011c067 [06:40] Trace; c0107ba8 [06:40] Trace; c0109998 [06:41] okay addr2line for c0107ba8, c011c067, c01244ef, c0132b74, c0182dde and c017f268 please ;) [06:42] yep [06:43] Bertl: not that nice... my girl-friend became ill on dec. 30th, has been a lonely party :\ at least she's almost cured now :) [06:44] root@plain [~]# for i in c017f268 c0182dde c0132b74 c01244ef c0125983 c012cee2 c01096ad c011c067 c0107ba8 c0109998; do echo -n "$i: "; addr2line -e /usr/src/linux/vmlinux $i; done [06:44] c017f268: /usr/src/linux-2.4.23/fs/proc/generic.c:569 [06:44] c0182dde: /usr/src/linux-2.4.23/fs/proc/virtual.c:146 [06:44] c0132b74: /usr/src/linux-2.4.23/kernel/vcontext.c:161 [06:44] c01244ef: /usr/src/linux-2.4.23/kernel/exit.c:81 [06:44] c0125983: /usr/src/linux-2.4.23/kernel/exit.c:591 [06:44] c012cee2: /usr/src/linux-2.4.23/include/asm/atomic.h:122 [06:44] c01096ad: /usr/src/linux-2.4.23/arch/i386/kernel/signal.c:648 [06:44] c011c067: /usr/src/linux-2.4.23/include/asm/current.h:9 [06:44] c0107ba8: /usr/src/linux-2.4.23/arch/i386/kernel/process.c:756 [06:44] c0109998: ??:0 [06:44] root@plain [~]# [06:48] hmm, please could you list me the lines kernel/exit.c:81, kernel/vcontext.c:161 and fs/proc/virtual.c:146 in your source ... [06:49] yep [06:49] 79 put_vx_info(vxi); [06:49] 80 } [06:49] 81 if (ipi) { [06:49] 82 task_lock(p); [06:49] 159 vxdprintk("free_vx_info(%p)\n", vxi); [06:49] 160 vx_proc_destroy(vxi); [06:49] 161 kfree(vxi); [06:49] 162 } [06:50] 141 info->vx_procent = NULL; [06:50] 142 remove_proc_entry("status", entry); [06:50] 143 remove_proc_entry("info", entry); [06:50] 144 remove_proc_entry(entry->name, proc_virtual); [06:50] 145 return 0; [06:50] 146 } [06:50] hmm, that's the same I have here, but that doesn't make much sense ... [06:51] hmm [06:51] the stack sequence itself makes sense to me ... [06:52] the addr2line doesnt? [06:56] release_task -> [79:put_vx_info] [vinline.h:49]free_vx_info:160 [06:56] -> [virtual.c:142,143,144]vx_proc_destroy -> remove_proc_entry [06:56] but we get 81, 161, 146 ... [06:57] could it be, that you are usining different sources, or different compile options? [06:58] sure could be, but afaik its all in sync [06:58] hmm [06:58] Action: nathan_ ponders [06:59] im taking bzImage from arch/boot/i386? [06:59] arch/i386/boot/ [06:59] but yes ... [06:59] yea sorry [06:59] and line2addr -e vmlinuz [06:59] and line2addr -e vmlinux [06:59] k hang, just did another crash for good measure [07:00] addr2line -e vmlinux I mean ... [07:00] yep [07:01] Process killer (pid: 2235, stackpage=f69b3000) [07:01] Stack: 00000005 f6bdf268 f69ab724 00000246 f6bdf268 f6bdf214 f69ac6ac f6cca400 [07:01] f69b3e7c c0182dde f6bdf268 c1c1358c f6cca400 f69b3e8c c0132b74 f6cca400 [07:01] f69ac000 f69b3ebc c01244ef f6cca400 c045b9e0 00000000 f69b3ee4 00000046 [07:01] Call Trace: [] [] [] [] [] [07:01] [] [] [] [] [07:01] thats the one i just did [07:04] >>EIP; c017f268 <===== [07:04] >>ebx; c1c1358c <_end+17540c8/38377b9c> [07:04] >>edx; f7e8fd01 <_end+379d083d/38377b9c> [07:04] >>edi; f69f7754 <_end+36538290/38377b9c> [07:04] >>ebp; c1c39f20 <_end+177aa5c/38377b9c> [07:04] >>esp; c1c39f00 <_end+177aa3c/38377b9c> [07:04] Trace; c0182dde [07:04] Trace; c0132b74 [07:04] Trace; c01244ef [07:04] Trace; c0167fe0 [07:04] Trace; c0125983 [07:04] Trace; c010995f [07:05] looks pretty much like the same issue each time ... [07:05] so where is c0132b74 for example ... [07:05] it's in the middle of free_vx_info [07:06] root@plain [~]# addr2line -e /usr/src/linux/vmlinux 0xc0132b74 [07:06] void free_vx_info(struct vx_info *vxi) [07:06] { [07:06] vxdprintk("free_vx_info(%p)\n", vxi); [07:06] vx_proc_destroy(vxi); [07:06] kfree(vxi); [07:06] /usr/src/linux-2.4.23/kernel/vcontext.c:161 [07:06] } [07:06] wait, does /usr/src/linux point to /usr/src/linux-2.4.23/ ? [07:07] try addr2line -e /usr/src/linux-2.4.23/vmlinux 0xc0132b74 [07:07] lrwxrwxrwx 1 root root 13 Dec 15 02:09 /usr/src/linux -> linux-2.4.23// [07:07] root@plain [~]# addr2line -e /usr/src/linux-2.4.23/vmlinux 0xc0132b74 [07:07] /usr/src/linux-2.4.23/kernel/vcontext.c:161 [07:08] okay, and do the kernel build times match? [07:08] on boot (via the netconsole) and the timestamp of /usr/src/linux-2.4.23/vmlinux ? [07:08] root@plain [~]# uname -a [07:08] Linux plain.rackshack.net 2.4.23-vs1.3.2 #7 SMP Fri Jan 2 22:23:04 EST 2004 i686 i686 i386 GNU/Linux [07:08] root@plain [~]# strings /usr/src/linux-2.4.23/vmlinux | grep 'Linux' [07:08] Linux version 2.4.23-vs1.3.2 (root@plain.rackshack.net) (gcc version 3.2.2 20030222 (Red Hat Linux 3.2.2-5)) #7 SMP Fri Jan 2 22:23:04 EST 2004 [07:11] hmm, and this matches the boot messages? [07:12] yep [07:12] one im running now is the same one ive been running and crashing [07:13] hmm, would it be a big deal for you to use an other compiler (2.95/2.96)? [07:14] gcc296 is hashed (/usr/bin/gcc296) [07:14] thanks redhat :) [07:14] lets see if its working [07:14] works [07:15] rebuilding [07:15] just because I have a strange idea ... [07:15] remove_proc_entry(entry->name, proc_virtual); [07:15] in virtual.c (vx_proc_destroy) [07:15] do you want me to make code changes or just build with 296? [07:16] best would be, get a clean kernel, patch with 1.3.2, then 1.3.2.4, make mrproper ... copy your config and rebuild with -g [07:16] ok [07:17] dont forget the debug -g ;) [07:17] yep :) [07:31] ok trying to boot it [07:34] Linux version 2.4.23-vs1.3.2 (root@plain.rackshack.net) (gcc version 2.96 20000731 (Red Hat Linux 7.3 2.96-118)) #1 SMP Fri Jan 2 23:21:08 EST 2004 [07:34] okie lets give it a spin [07:34] yeah, kill it ;) [07:35] shit, got cut off [07:35] SOFTDOG: WDT device closed unexpectedly. WDT will not stop! [07:35] Unable to handle kernel NULL pointer dereference at virtual address 00000030 [07:35] printing eip: [07:35] c0175959 [07:35] *pde = 00000000 [07:35] Oops: 0000 [07:35] CPU: 1 [07:35] EIP: 0010:[] Tainted: P [07:35] EFLAGS: 00010246 [07:35] eax: 00000000 ebx: 00000005 ecx: 00000000 edx: f7692201 [07:35] esi: f79f1c00 edi: f72a9514 ebp: c1c39f08 esp: c1c39efc [07:35] ds: 0018 es: 0018 ss: 0018 [07:35] Process init (pid: 1, stackpage=c1c39000) [07:35] Stack: f72a9208 f72a9208 00000000 c1c39f2c c0178baa f72a9208 5d <0>Rebooting in 5 seconds.. [07:36] well, we have eip and tow stack addreses c1c39f2c c0178baa [07:37] eip: c0175959 [07:39] hmm, bochs seems to be in really bad shape ... [07:40] >>EIP; c0175959 <===== [07:40] >>edx; f7692201 <_end+37206d3d/383abb9c> [07:40] >>esi; f79f1c00 <_end+3756673c/383abb9c> [07:40] >>edi; f72a9514 <_end+36e1e050/383abb9c> [07:40] >>ebp; c1c39f08 <_end+17aea44/383abb9c> [07:40] >>esp; c1c39efc <_end+17aea38/383abb9c> [07:40] /usr/src/linux-2.4.23/fs/proc/generic.c:569 [07:41] okay, let's check for the impossible ... ;) [07:42] damnit killed it again and i still cant get beyond that stack line [07:42] we modify the code, I'm pretty sure I know _what_ happens, but I don't see why! [07:43] ok [07:44] in int vx_proc_destroy(struct vx_info *info) [07:44] remove_proc_entry("status", entry); [07:44] remove_proc_entry("info", entry); [07:44] after that we add ... [07:45] if (!proc_virtual) { [07:45] printk("verry verry bad!"); [07:45] return 0; [07:45] } [07:46] then the rest goes ... [07:46] remove_proc_entry(entry->name, proc_virtual); [07:46] return 0; [07:46] hmm, make it printk("verry verry bad: %p,%p\n", info, entry); [07:47] k [07:53] [...network console shutdown...] [07:53] [...network console startup...] [07:53] Unable to handle kernel NULL pointer dereference at virtual address 00000030 [07:53] printing eip: [07:53] c0175959 [07:53] nathan_ (~nathan@209-6-130-26.c3-0.sbo-ubr1.sbo-ubr.ma.cable.rcn.com) left irc: Excess Flood [07:53] nathan_ (~nathan@209-6-130-26.c3-0.sbo-ubr1.sbo-ubr.ma.cable.rcn.com) joined #vserver. [07:53] oops :) [07:53] silly irc [07:54] the printk didnt make it [07:54] got a complete oops though [07:54] hmm, then maybe it isn't what I thought ... [07:55] as a matter of fact, that would grant me a better sleep ... [07:58] >>EIP; c0175959 <===== [07:58] >>edx; f6f60401 <_end+36ad4f3d/383abb9c> [07:58] >>esi; c1c1358c <_end+17880c8/383abb9c> [07:58] >>edi; f79ead4c <_end+3755f888/383abb9c> [07:58] >>ebp; c1c39f14 <_end+17aea50/383abb9c> [07:58] >>esp; c1c39f08 <_end+17aea44/383abb9c> [07:58] Trace; c0178bc7 [07:58] Trace; c012e74d [07:58] Trace; c0120ec0 [07:58] Trace; c015c827 [07:58] Trace; c012229a [07:58] Trace; c0109283 [07:59] okay c0178bc7 is the interesting one ... [07:59] /usr/src/linux-2.4.23/fs/proc/virtual.c:148 [08:00] remove_proc_entry("info", entry); [08:00] if (!proc_virtual) { [08:00] printk("XXXXXXXXXX verry verry bad: %p,%p\n", info, entry); [08:00] } [08:00] remove_proc_entry(entry->name, proc_virtual); [08:00] okay, that seems to mach .. where is c0175959 (EIP) [08:01] /usr/src/linux-2.4.23/fs/proc/generic.c:569 [08:02] hmm, it looks like we found a procfs issue, unrelated to vserver ;) [08:03] hmm [08:04] do you see the bug? [08:05] hmm, guess that's not too easy, it seems that the linked list in &parent->subdir gets corrupted somewhere ... [08:05] or requires some special locking, which isn't done by some vserver code ... [08:10] hm ... [08:11] please try the following: [08:11] kernel/vcontext.c:160 [08:11] remove the line vx_proc_destroy(vxi); [08:11] done [08:12] and add to ... [08:12] include/linux/vinline.h:47 [08:12] vx_proc_destroy(vxi); [08:13] before ... list_del(&vxi->vx_list); [08:14] done [08:14] this should ensure, that the 'same' proc entry isn't created, while the old is still there .. which could be the reason for that failure ... [08:19] nice one bert [08:19] box isnt dead [08:19] counting seconds as we speak [08:20] lets give it another run [08:20] box is solid as far as this test goes [08:21] sounds good .. but doesn't really explain the issue ... [08:21] I still suspect some 'race' in the proc create/destroy ... [08:21] but this way, we ensure that an overlap should not happen ;) [08:22] which should be sufficient, by the way ... [08:22] oops [08:22] just blew it away [08:23] CPU 0: Machine Check Exception: 0000000000000004 [08:23] 0>Kernel panic: CPU context corrupt [08:23] CPU 2: Machine Check Exception: 0000000000000004 [08:23] kernel BUG at /usr/src/linux-2.4.23/include/asm/spinlock.h:86! [08:23] Kernel panic: Unable to continue [08:23] invalid operand: 0000 [08:23] at 0000000000000000 [08:23] hmm, that is probably the 'other' issue (spinlock) we had ... [08:28] any stack backtrace there? [08:29] yea ive got a good oops [08:29] waiting for box [08:29] might need a manual reboot [08:30] any time the NMI watchdog seems to kick in, the box never comes back [08:30] maybe i should kill the NMI [08:30] could be ... what nmi options do you use? [08:30] =1 [08:31] just the nmi watchdog, nmi_watchdog=1 i beliee [08:31] should be okay for SMP ... [08:32] looks like im gonna need a manual reboot [08:50] ok box is back up [08:51] gonna do a ksymoops [08:51] great ... [08:52] >>EIP; c011dfbb <===== [08:52] >>ebx; c04488a5 [08:52] >>ecx; c03aa1ac [08:52] >>ebp; f67d9e4c <_end+3634e988/383abb9c> [08:52] >>esp; f67d9e40 <_end+3634e97c/383abb9c> [08:52] Trace; c011d2ab [08:52] Trace; c0110578 [08:52] Trace; c0110650 [08:52] Trace; c0110661 [08:52] Trace; c0109374 [08:52] Trace; c0119081 [08:52] Trace; c01218c4 [08:52] Trace; c0121ebd [08:52] hmm [08:53] thats from a machine check exception [08:53] hmm, do you see any vserver code paths here? [08:53] no i dont see any paths in either of the oops i captured [08:54] nmi is disabled this time? [08:55] i am rebooting with nmi disabled right now [08:58] trying to kill it again [09:00] 01:01:12 up 3 min, 1 user, load average: 488.19, 188.60, 69.46 [09:00] having trouble getting it to die [09:00] probably the nmi was about to kill the box, those forks add some pressure ... [09:01] you think it might be an interaction with the nmi? [09:01] im hitting it quite hard right now [09:01] well, it might be, that the nmi 'thought' that the box was hung and activated some stuff ... [09:02] cant get it to go down [09:02] hmm makes sense [09:02] let it run for some hours, if possible, maybe over night ... or so ... [09:02] yea sure easy enough [09:02] should i try to tax /proc/ with the cat or should i just while true; killer? [09:03] whatever you consider appropriate to get the box down ;) [09:03] k ill give it my best [09:04] you can't lose, if you fail it's a sucess too ;) [09:04] true :) [09:04] ok ill stop by tomorrow and catch up with you [09:04] long day of rebooting, im gonna call it a night [09:05] perfect, thanks again for helping out ... [09:05] taxing the box for the night [09:05] ill talk to you tomorrow [09:05] have a good night [09:05] cu [09:05] I will ... [09:05] good night everyone ... [09:05] Nick change: Bertl -> Bertl_zZ [09:06] Nick change: nathan_ -> nathanaway [09:40] nathanaway (~nathan@209-6-130-26.c3-0.sbo-ubr1.sbo-ubr.ma.cable.rcn.com) left irc: Ping timeout: 480 seconds [10:33] noel- (~noel@p50859D8A.dip.t-dialin.net) joined #vserver. [10:41] noel (~noel@pD9E0934E.dip.t-dialin.net) left irc: Ping timeout: 504 seconds [13:21] serving (~serving@213.186.191.23) left irc: Ping timeout: 512 seconds [14:12] Nick change: noel- -> noel [15:12] serving (~serving@213.186.190.33) joined #vserver. [15:33] tanjix (ViRu_@pD904A184.dip.t-dialin.net) joined #vserver. [15:34] tanjix (ViRu_@pD904A184.dip.t-dialin.net) left irc: Client Quit [16:25] Doener_zZz (~doener@pD9588DAC.dip.t-dialin.net) joined #vserver. [16:33] Doener (~doener@pD95881A3.dip.t-dialin.net) left irc: Ping timeout: 480 seconds [16:39] Nick change: Doener_zZz -> Doener [18:11] tanjix (ViRu_@pD9049DA3.dip.t-dialin.net) joined #vserver. [18:12] hi... could some explain me how to get the reboot userspace to work... is there a howto ? [19:16] Bertl_zZ, you've been asleep for 10 hours already :-p [19:16] lol [19:17] tanjix, still here? [19:18] Bertl, when you get around to it, could i have a compatibility output for you an drdb [19:19] compatibility with what? vserver? [19:19] kestrel, yah... [19:20] meaning, would there be a way to "virtualize" a drbd block device and make it a vroot [19:20] ah [19:21] you treat drbd devices exactly as normal block devices, so... [19:21] ... [19:22] the question is... would i have to format my current machine? [19:22] hehe :)...so whatever you do without drbd, you can do with drbd [19:22] infowolfe: yes i'm there [19:22] i do not believe so, i think you can attach a drbd mirror to an existing partition [19:22] but i'd be backing up before doing it, nonetheless [19:23] kestrel, that's definitely a good way to put it... but i'm still wondering if i can drdb simply the folder that the vroot is feeding itself from... and use THAT as a drdb device [19:23] drbd does not work at the filesystem layer [19:23] tanjix, do you have the userspace reboot manager running? [19:23] didn't think so... [19:24] but if that folder is on its own block device, you can mirror that [19:24] so i'd have to drdb my whole vservers partition to allow someone else access to it via drdb (or slap another hard drive in there) [19:24] infowolfe: what script is that ? [19:24] tanjix, what userspace tools are you running? [19:24] infowolfe, that's a good question :) [19:24] yep, that's correct [19:25] @tanjix, that's an important question [19:25] you could try just rsyncing if you just want to copy it...? [19:25] kestrel, thanks for the help on that one... i'm thinking of just "giving" this guy the 27GB partition my vservers are on [19:25] infowolfe, yes but i dont know what exactly want to hear [19:25] tanjix, util-vserver... what version [19:26] 0.26 [19:27] no problem [19:27] tanjix, are you asking how the user manages it from inside the vserver? [19:28] yes - how can he/she reboot or halt their vserver, typing reboot or halt gives an error [19:29] tanjix, of course it does :-p [19:29] what distribution are they running? [19:29] rh9 [19:30] alrightythen... [19:30] do you have an /etc/init.d/rebootmgr? [19:30] yes [19:31] read it first... then chkconfig it to the proper runlevel at the proper spot [19:31] once it's running, your client can use vreboot to reboot herself if she *really* needs to [19:32] i find that it's pretty hard to screw up a vserver enough to *need* a reboot [19:32] (of that vserver) [19:32] Action: infowolfe is back to the local mrtg stuff for his gateway box [19:34] hm when i shoudl activat that with "rebootmgr start" [19:34] ? [19:34] now [19:36] Usage: rebootmgr {start|stop|restart|reload|force-reload|status} [19:36] start it [19:36] how [19:36] /etc/init.d/rebootmgr start [19:37] Starting the reboot manager: rebootmgr. [19:37] ok, to test, vserver enter [19:37] and then type vreboot [19:38] bash: vreboot: command not found [19:39] type exit [19:39] then updatedb && locate vreboot [19:40] brings up a lot of results [19:40] hold on [19:41] for i in /vservers/*; do cp `locate vreboot | grep util-vserver | grep lib` $i/sbin/;done [19:41] that'll put it in every folder under your /vservers directory that has an sbin folder [19:41] it'll FAIL if there isn't an /sbin [19:42] ok done [19:42] now test it like i said above [19:43] ok it did s.th. [19:43] s.th? [19:43] [root@vserver:mmb /]vreboot [19:43] [root@vserver:mmb /]/usr/local/sbin/vserver: line 782: 14383 Killed $CHBIND_CMD $SILENT $IPOPT --bcast $IPROOTBCAST $CHCONTEXT_CMD $SILENT $FLAGS $CAPS --secure --ctx $S_CONTEXT $CAPCHROOT_CMD --suid $USERID . "$@" [19:43] whoa [19:43] something :) [19:43] yah... it just killed your vserver enter :-D [19:43] lol [19:44] i did what you said me .) [19:44] good [19:44] it works [19:45] does the rebootmgr also startup all vservers when the host is rebooted ? [19:45] no [19:45] chkconfig vserver [19:45] i think [19:46] nope [19:46] chkconfig vservers [19:46] no output on that [19:47] # chkconfig: 345 98 10 [19:47] # description: The vservers service is used to start and stop all [19:47] # the virtual servers. [19:47] Action: infowolfe blinks... [19:48] [root@plain root]# chkconfig vservers [19:48] [root@plain root]# [19:48] i don't need to see that... [19:48] i'm just telling you that the file is there on my box [19:48] for i in `locate vservers`; do clear && less vservers;done [19:48] hit q to exit less [19:49] oops [19:49] for i in `locate vservers`; do clear && less $i;done [19:50] ok found it and have it [19:50] did that find something for you? [19:50] put it in /etc/rc.d/init.d/ [19:50] THEN chkconfig it [19:50] Action: infowolfe wonders at how amazing it is that he remembers this stuff without access to a redhat box :-p [19:51] ... referring to chkconfig existing :-p [19:51] hm it brings up lots of files [19:51] i haven't *used* redhat in well over a year now :-D [19:51] i have to exit them all with q [19:51] what does [19:51] or hit ctrl-c [19:51] that should break the operation [19:51] ok out of it.. but what file is it now hehe to put in init.d [19:53] for i in `locate vservers`; do grep "chkconfig" $i && echo $i;done [19:53] i love for loops :-D [19:54] /usr/local/etc/init.d/vservers that's it :) [20:01] hm ok the script is there but no vservers were started [20:04] did you do /etc/init.d/vservers start ? [20:05] and do your /etc/vservers/.conf files include ONBOOT=yes [20:05] i checked the .conf files, on_boot=yes is present [20:05] i really forgot to /etc/init.d/vservers start [20:06] hm when using that my own S_CONTEXT is not being used can that be ? [20:06] meaning? [20:07] i have S_CONTEXT = 1001 (and up) in my configs - now the have 41596 and up [20:07] including the spaces? [20:08] no without - when i type vserver start they have their context set in the cnf [20:08] conf [20:08] ok, they start under the correct context when you type vserver start? [20:08] yes [20:09] i have no clue what the problem is then :-p [20:09] because things work as advertised for me :-\ [20:09] 7h10e7h10e yes that's not the prob so far as the vservers are working :) [20:10] good to hear [20:10] vserver-stat says all vservers are up [20:14] Action: infowolfe wanders off to go read some perl code and find a way to hack cfgstoragemaker to work in a way that is less ugly [20:29] LiT (ghost@AC9AA691.ipt.aol.com) joined #vserver. [20:31] LiT (ghost@AC9AA691.ipt.aol.com) left #vserver. [21:02] nathan_ (~nathan@209-6-130-26.c3-0.sbo-ubr1.sbo-ubr.ma.cable.rcn.com) joined #vserver. [21:20] Nick change: Bertl_zZ -> Bertl [21:23] hi everyone! [21:25] hi bertl [21:31] hwy bert [21:31] Bertl, box has been solid all night [21:31] great! [21:31] that's what I wanted to hear ;) [21:32] nathan_: could you test on/two more things on that version? [21:32] yep [21:32] I made two comments starting with // IMHO ... [21:33] hey Bertl [21:33] how's 1.3.2? [21:33] don't touch it ;) [21:33] VCMD_new_s_context is returning -1 in the last few runs bert [21:34] I released it _very_ quietly, because I knew it's flawed ... (not complete) [21:34] i know it's flawed :-p [21:34] but 1.3.3 will be great1 [21:34] lol, the nproc bug from 1.3.1 is there we know at least :-p [21:34] but 1.3.3 will be great! [21:34] yay! [21:34] fixed that one too ;) [21:35] en passant, so to speak ... [21:36] hmm dont think im hitting MAX_S_CONTEXT [21:36] bert any idea why it would return -1? [21:37] well, if the context could not be allocated maybe ... [21:38] let me have a look at the code ... [21:40] -1 == EPERM [21:40] hmm [21:42] printk(KERN_ERR "no dynamic context available.\n"); [21:42] you would get that log message, if out of contexts (dynamic) [21:43] not getting any printks [21:44] eperm is returned if you are calling it from ctx != 0 and don't hold CAP_SYS_ADMIN ... [21:45] i removed the timing loop and just exit when the forking is done in killer.c [21:45] and run it in a while true loop [21:45] each one does to 5k [21:45] i see random -1s in there [21:47] okay, I assume you never call the vc_new_s_context() with -2 as argument, -1 for all invocations right? [21:48] i do whatever killer.c does, lets see [21:49] put the code somewhere please ... [21:49] ahh, I guess I found it ... [21:50] ugh need to put a SCHED_RR shell on this box when im stressing it [21:50] two more checks on that code, and you can test the ck1- with preemtion and O(1) 8-) [21:50] k [21:50] once i can get the box under my control again [21:54] Action: nathan_ ponders issue sysrq b [21:55] sysreq sync, umount, boot ... [21:55] yep just did a sub :) [21:55] SysRq : Resetting [21:55] weee [21:56] netconsole you are the awesome [21:56] +ipt_sysrq of course [21:58] Bertl, am i changing code or just hacking up some tests? [21:58] so finally we had some good ideas yesterday, yes ... [21:58] first we try two more combos ... maybe only one ... to crash ... [21:59] and we change the -1 to a -12 ;) [22:00] fs/proc/array.c:153 struct vx_info *vxi = task_get_vx_info(current); // IMHO not required [22:00] becomes struct vx_info *vxi = current->vx_info; [22:00] fs/proc/array.c:159 put_vx_info(vxi); [22:00] is removed ... [22:01] you want me to do this in stages or just do it all now? [22:01] first, all at once ... we know the good state, now we try to make it bad ... [22:01] fs/proc/array.c:399 vxi = task_get_vx_info(current); // IMHO not required [22:02] and we change the -1 to a -12 ;) [22:02] becomes vxi = current->vx_info; [22:02] killer.c? second param? [22:02] ctx = sys_vserver (VCMD_new_s_context, -1, &data); [22:02] ctx = sys_vserver (VCMD_new_s_context, -12, &data); [22:02] nope, the return you get ... [22:02] is that what you are saying? [22:02] fs/proc/array.c:402 put_vx_info(vxi); [22:02] is removed ... [22:03] kernel/vcontext.c:329 int ret = -EPERM; [22:03] is replaced by int ret = - ENOMEM; [22:03] I mean ... int ret = -ENOMEM; [22:04] Topic changed on #vserver by noel!~noel@p50859D8A.dip.t-dialin.net: http://linux-vserver.org/ || latest stable 1.22, devel 1.3.2 [22:05] hi noel, hrmpf .. couln't you wait until 1.3.3 is out %-) [22:06] ok [22:06] made all those changes [22:06] Bertl, the -1 to -12 change was the EPERM->ENOMEM right? [22:06] so 2 times task_get..() replaced and the enomem, okay ... [22:06] Bertl: then i will update it again.:) [22:06] webpage is updated.;) [22:06] noel: yes, but I tried to hide the 1.3.2 fact ;) [22:07] http://www.13thfloor.at/vserver/d_release/overview http://www.13thfloor.at/vserver/d_release/v1.3.2/ real good hidden;) [22:09] yeah, but it took you one and a half day to find it, right? [22:09] hey, noel, was a joke, thanks for updating ... [22:11] nope, i was unsure if it was desired. [22:13] now I realized it is and only nobody did it.;) [22:22] pflanze (~chris@ethlife-a.ethz.ch) joined #vserver. [22:23] Hello [22:23] hi pflanze! [22:25] tanjix: please grab and execute 'testme.sh -v' on your system ... [22:25] http://vserver.13thfloor.at/Stuff/testme.sh [22:27] done [22:28] I've noticed this mail: http://groups.google.de/groups?hl=de&lr=&ie=UTF-8&selm=1035296135.1089.35.camel%40zaphod.lucky.linux.kernel&rnum=5 [22:28] tanjix: could I have a look at the output? [22:28] on linux-kernel, Shaya Potter says he doesn't think the vserver approach with the null barrier works. [22:28] bertl sure, need everything ? [22:31] does 000 prevent from doing chdir("/") if "/" is outside the barrier? [22:31] pflanze: hmm, okay, I would be glad to see a script which escapes ... [22:31] or C program or whatever ... [22:32] I'm not convinced that it can't be done, but atm I don't see a way to escape ... [22:33] Action: pflanze writing a C prog [22:35] Bertl, seems my netconsole may have triggered some DoS detectors at my provider [22:35] hmm, and this means? [22:35] Bertl, one of my boxes seems to be firewalled off now [22:36] netconsole not functioning for the time being [22:36] it was weird, i did a sysrq t and then i got disconnected from both boxes [22:36] now one i cant contact on the ip i was sending the netconsole on (but other ips i can on the same box) [22:36] they probably have a rather trivial dos detector [22:36] and most likely it will go back to normal in a few ... [22:37] yep [22:37] just waiting [22:38] tanjix (ViRu_@pD9049DA3.dip.t-dialin.net) left irc: Ping timeout: 480 seconds [22:39] nathan, in the meantime, yould you give the http://vserver.13thfloor.at/Stuff/testme.sh a spin? [22:39] I'm interested in the output of testme.sh -v only the md5 sums ... [22:40] noel: could you please test this too? [22:40] Linux-VServer Test [V0.04] (C) 2003-2004 H.Poetzl [22:40] chcontext failed! [22:40] chbind is working. [22:40] ipv4root: 0100007f/00ffffff ipv4root_bcast: ffffffff ipv4root_refcnt: 2 [22:40] Linux 2.4.23-vs1.3.2 i686/chcontext 0.29/chbind 0.29 [J] [22:41] hmm, and the md5 sums? [22:42] hmm thats all that was output with no args? [22:42] ah ... testme.sh -v [22:42] oh -v ;) [22:42] a70c46d10887fbb706df94ed2a30baa1 /usr/sbin/chbind [22:42] b200b379db85d9ace63a8e019d1bff82 /usr/sbin/chcontext [22:42] 89760cc6f2ec4cab24e485b9f23e48b6 /etc/init.d/vservers [22:43] Bertl: well, fchdir to the previously opened fd gives: Bad file descriptor [22:45] hmm, so no vserver breaker yet? [22:47] Bertl, me? [22:47] hmm, no actually pflanze ... [22:47] ah [22:48] you are still on hold by your provider? [22:48] ahyep [22:48] wish i had bound another ip to the root server [22:48] and you're sure that there is no other routing issue involved? [22:48] oh wait actually [22:48] ah damnit nope wont work [22:49] Bertl, every other ip i can contact on the box [22:49] and i was curiously forced a FIN just as i sent the t [22:49] i just cant get into the root vserver :( [22:51] sending commands works from any ip, right? [22:52] ya [22:52] i can change boxes i use [22:52] this one is just easiest and was all setup [22:53] I would try to send a reboot to that box ... maybe the network stack was trashed ... [22:53] Bertl, no no the one that isnt responding is my stable rock solid box [22:53] the testing box is still up and good [22:53] im positive its a dynamic firewall [22:53] all other networking is working to the box on other ip addresses [22:53] just gotta wait it out [22:54] hmm, ah okay ... now I understand ... [22:54] i think they detected an incoming DoS [22:54] "incoming DoS" [22:54] and they figure in general cases if they make the box go away, people will stop [22:54] so thats what they have done, automagically i guess. [22:54] horrible if you ask me [22:54] but in 99% of the cases its probably fine [22:54] yeah, probably right ... [22:54] why am i always in the 1% :( [22:55] that's called elite ;) [22:55] its back [22:55] certainly was a dynamic firewall [22:55] now im scared to do it again until they give me the nitty gritty on what will trigger it [22:55] i put in a trouble ticket [22:55] whatever ill just test without doing a sysrq t [22:56] Bertl, still getting -1 [22:56] i wonder if its a resource issue [22:56] maybe ill just spawning too many [22:56] i honestly have no idea how many are forking off [22:56] hmm from where do you get -1? [22:56] the new_s_context [22:57] hmm, sure this is the new kernel? [22:57] sorry that I ask ;) [22:57] Bertl: sure. one moment... [22:57] Bertl, ya but i only did a make bzImage and figured make would sort it out [22:57] it is the one i just built though [22:58] okay, where is the 'killer' code you currently use ... [22:58] hang ill upload [22:59] http://0x00.org/hidden/killer.c [23:02] ah, okay, I obviously modified the wrong part ... but it doesn't hurt ... [23:02] is the system stable with those modifications? [23:03] havent stressed it too much [23:03] will in a sec [23:09] okay, I guess I found the -1 issue but it's tricky ... we'll have to test ;) [23:10] what a bunch of dipshits [23:11] they just rebooted my live box now [23:11] argh [23:11] because they couldnt "contact it" [23:11] haha .. sorry ... [23:11] sokay :/ [23:11] god they are dopes [23:12] sadly they treat all their customers like they just got their first install of redhat 9 and think redhat IS linux. [23:27] tanjix (ViRu_@pD904A06C.dip.t-dialin.net) joined #vserver. [23:28] shuri (~shushushu@vserver.electronicbox.net) joined #vserver. [23:28] hi shuri! [23:29] hi Bertl [23:29] happy new years [23:29] to you too ... seems you made it into 2004 ;) [23:29] :P [23:34] Bertl: well the above error message was just a program error of my part. [23:35] I'm now (quite) confident that it's secure. [23:35] hmm, good ;) [23:35] could you post your tests and findings on the ml ? [23:35] My thinking error was that I didn't realize that symlinks don't help, their value is based on rtd too, or they would have to traverse the barrier. [23:44] well this is a huge headache [23:44] now they are booting me into single user mode [23:44] and there is nothing wrong [23:44] its amazing that people can work with computers that dont know how to read [23:45] i think ill be running netconsole from my house now [00:00] --- Sun Jan 4 2004