[00:00] hmm not sure how to hack this into the e1000 driver [00:01] IIRC there are patches somewhere ... but I'm not able to find them, atm. [00:01] found a patch :) [00:01] http://www.ussg.iu.edu/hypermail/linux/kernel/0311.1/0290.html [00:01] 2.6.0/test9 [00:01] oh? what is wrong with the e1000 driver? i plan to run that driver myself [00:01] err [00:01] 2.6 [00:01] yea [00:02] JonB, nothing, trying to ge netconsole to work with it. [00:02] netconsole ? [00:03] im going to try to hack this 2.6 patch into the e1000 driver [00:03] worth a shot [00:03] hmm, some time ago, I had the netdrv patches in 2.4 patchset ... [00:03] that might solve some issues ... [00:06] hmm seems straight forward to get this working with the 2.4 driver [00:06] okay, go ahead, if it works, we'll include it in all devel releases ;) [00:09] networking is back up [00:09] thats a good sign [00:21] netconsole: network logging started up successfully! [00:21] lets see if it works [00:22] netconsole: using source IP 66.98.-108.92 [00:22] netconsole: using target IP -49.44.-54.92 [00:22] sketchy :) [00:23] but it works [00:23] hehe [00:23] yay [00:23] lets crash this bad boy [00:23] Unable to handle kernel NULL pointer dereference at virtual address 00000018 [00:23] printing eip: [00:23] c01741c6 [00:23] *pde = 00000000 [00:23] Oops: 0002 [00:23] CPU: 0 [00:23] EIP: 0010:[] Tainted: P [00:23] EFLAGS: 00010286 [00:23] eax: 00000000 ebx: f6ea5f50 ecx: c1000020 edx: 000000ee [00:23] esi: f7992000 edi: f7992000 ebp: f6ea5f60 esp: f6ea5f40 [00:23] ds: 0018 es: 0018 ss: 0018 [00:23] Process killer (pid: 2956, stackpage=f6ea5000) [00:23] Stack: f6ea5f50 00004049 c1c15504 0000c708 35393035 00000032 f7992400 00000000 [00:23] f6ea5f7c c012e637 f7992000 0000c707 ffffffff f6ea5f94 bffff7a4 f6ea5fa8 [00:23] c012e99e ffffffff 00000000 00000008 f6e36884 00000000 00000000 ffffffff [00:23] Call Trace: [] [] [] [] [00:23] Code: c7 40 18 13 00 00 00 89 c3 8b 46 08 c7 43 20 c0 8c 3b c0 89 [00:23] post copy: 0 [00:23] post copy: 0 [00:23] bingo :) [00:23] Bertl, good idea :) [00:23] now if only i had been smart enough to compile with -g [00:24] hmm i wonder if anyone has implemented this to do a remote sysrq? [00:25] ipt_sysrq is a new iptables target that allows you to do the same as the magic sysrq key on a keyboard does, but over the network. [00:25] well isnt that spiffy [00:27] JonB (~Jon@0x503e0319.kjnxx7.adsl.tele.dk) left irc: Quit: ChatZilla 0.9.35 [Mozilla rv:1.5.1/20031120] [00:29] hey cool I was cooking ... hmm, I'm still cooking but it seems you got it working, right? [00:30] yep netconsole is working, got oops coming across [00:30] gonna try to get sysrq working over the network as well [00:30] compiling with -g right now [00:32] great, could you make a small howto on linux-vserver.org ... I'm volunteering to make the adaptations for other network cards, if required ... [00:32] Bertl, do you think this will really be useful to the average vserver user though? [00:33] ill do one in general terms, nothing really vserver specific [00:34] testing out the sysrq [00:34] well, lot of people asked about the serial console ... [00:34] netconsole+remote sysrq is slick [00:34] Bertl, in the context of debugging or management? [00:35] debugging ... and ops capturing ... [00:36] ok sure ill hack something up [00:37] sysrq is going to be prone to a nice replay attack but ahwell [00:39] tanjix (ViRu_@pD9049FE1.dip.t-dialin.net) left irc: [00:44] nathan_: after the last lkcd discussion, I was also thinking about a solution/enhancement in this direction, and it sounds useful to me to setup some circular buffer in memory, which keeps the last 64k or 128k of printk messages ... over a reboot and logs them into the dmesg buffer or something like that ... [00:45] Bertl, isnt it a shot in the dark as to what machines will keep memory intact between boots? [00:48] hmm, well, yes, but I don't know any 2GB server, which checks all 2GB ;) [00:48] and with proper checksums, etc ... [00:54] well this is damn slick [00:54] remote sys rescue works as expected [01:07] /usr/src/linux-2.4.23/fs/proc/virtual.c:111 [01:14] wow, that is a service ;) [01:17] hmm so now i have access to pretty much everything i would locally [01:17] sounds great ... [01:17] entry = create_proc_entry(name, [01:17] this isn't supposed to return NULL, but obviously does ... [01:18] try to add a check before entry->vx_flags = VX_ADMIN|VX_WATCH|VX_IDENT; [01:18] panic on null? [01:18] if (!entry) return 0; [01:18] ok [01:20] kgdb over networking using netpoll =o [01:24] yep [01:24] hundreds of nulls [01:25] is this a good sign? [01:25] entry was null! 57675 [01:25] post migrate: 0, 0 [01:25] post copy: 0 [01:25] entry was null! 57676 [01:25] post migrate: 0, 0 [01:25] post copy: 0 [01:25] entry was null! 57677 [01:25] post migrate: 0, 0 [01:25] post copy: 0 [01:25] entry was null! 57678 [01:25] the procfs inode pool seems depleted ... [01:25] is it just because of all the forks? [01:25] funny ... maybe we are leaking inodes? [01:26] on a side note, NMI Watchdog detected LOCKUP on CPU3, eip c010acbe, registers: [01:26] box still up though and nmi coming across on all cpus [01:27] hmm ... anything useful there, anything to put into a ksymoops? [01:27] foo_file = create_proc_entry("foo", 0644, example_dir); [01:27] if(foo_file == NULL) { [01:27] rv = -ENOMEM; [01:27] goto no_foo; [01:27] } [01:27] okay, so the NULL on create_proc_entry, means no memory ... [01:28] ok so am i just depleting all the memory with all the forks? [01:28] could be ... [01:28] http://0x00.org/hidden/bert.txt [01:29] hmm, hmm, looks like a 'bug' in the netconsole code ;) [01:30] ahh, no, actually it's the vx_proc_destroy, now called with an empty entry ... [01:32] hmm, I'll cleanup that mess, and we'll see ... [01:33] at any given time that i can detect, there are around 200 entries in /proc/ [01:33] while entry is null [01:37] the "post migrate" is interesting ... [01:37] I thought I removed that one ... [01:39] am i running old code like a dope? [01:39] doesn't matter, I'll upload a 1.3.2.3 in a few minutes ... [01:40] k [01:57] Tamama (~Tamama@a62-216-20-152.adsl.cistron.nl) joined #vserver. [01:57] oy [01:57] oyoy! [01:58] what was that option again to let a vserver change uid/gid ? :) [01:58] moreso, setgid32() [01:58] you mean for context tagging (xid info in uid/gid)? [01:59] no, i have this apache suexec program that cant change itself to another vserver user :) [02:00] hmm, I'm lost, I don't understand what you are up to?! [02:00] ok [02:00] suexec runs as an apache user, then changes itself to a user/group you specify to run a command [02:00] yup [02:00] in this vserver it fails :) [02:01] suexec as apache plugin or as external script? [02:01] probably the setuid/setgid flags are missing on that one [02:02] currently im running it as an external script to see if it would work [02:02] and i call it with correct arguments :) [02:02] it requires to have the correct ownership and suid flags ... (for the script) [02:03] suexec needs to run as www [02:03] i run it as www [02:03] suexec is owned by www [02:03] :) [02:03] that won't work ;) [02:03] well i cant run it as root lol [02:03] and how do you suppose it should be able to chown to somebody else? [02:04] [2004-01-02 23:49:10]: user mismatch (root instead of www) [02:04] beats me [02:05] hehe [02:05] I have the feeling you are currently trying to accomplish the impossible ... [02:05] probably [02:05] there are usually two ways to do suexec stuff ... [02:05] well httpd runs as user www... [02:05] so how could it _ever_ work? [02:05] a) specifying the user/group within apache ... [02:06] b) using a suid root binary/script which changes the user/gid and drops root priviledges before executing the user script/binary [02:06] Tamama, that has nothing to do with vserver, that is an error in your configuration [02:06] i'm sure it is nathan lol [02:07] Tamama, who is your webserver running as? root? [02:08] Tamama, you need your User directive in apache to match what you compiled suexec with, based on your output that should be www. not root. [02:08] webserver is running as www, but 1 process remains root [02:08] hmmm [02:10] the root process shouldnt be running any cgis [02:10] http://vserver.13thfloor.at/Experimental/delta-2.4.23-vs1.3.2-vs1.3.2.3.diff [02:10] Bertl, is that what i want to test with? [02:10] Tamama: that is okay, if you use the apache built in suexec ... [02:11] nathan_: yup, that is the latest version, ontop of 1.3.2 ... [02:11] repatch, if you get any 'post migrate' you did something wrong ... [02:11] k [02:13] http://httpd.apache.org/docs/suexec.html [02:15] Please note that you need root privileges for the installation step. In order for the wrapper to set the user ID, it must be installed as owner root and must have the setuserid execution bit set for file modes. [02:15] chmod 4750 /usr/local/apache2/bin/suexec [02:15] ;) [02:16] oh well i'll figure it out :D [02:17] oops ... [02:17] nathan_: just discovered a small mistake ... [02:17] doh [02:17] k gimme a new one :) [02:17] I'll fix it immediately ... sec [02:24] nathan_: so what was required to get the netconsole stuff working? [02:25] Bertl, standard patch from the redhat guy + a polling hack for the ethernet driver [02:25] Ingos patch for 2.4.10-C1, I assume? [02:25] C2 is what i used [02:25] ah okay ... [02:25] and the polling stuff was your own patch? the adaptation of the 2.6 stuff, right? [02:26] Bertl, yep, just verbatim from the 2.6 patch. [02:26] very straight forward [02:26] and does this interfere with the 'usual' networking? [02:26] nope [02:26] http://0x00.org/hidden/netconsole-withe1000.diff [02:27] hmm interesting, you use the eepro driver, not the e100(0) ? [02:28] no i use the e1000, that has both in it. the original netconsole C2 had the eepro driver. [02:28] ah, okay, my fault ... I should split up your patch first ;) [02:34] http://vserver.13thfloor.at/Experimental/delta-2.4.23-vs1.3.2-vs1.3.2.4.diff [02:34] (no netconsole included yet ;) [02:35] k [02:36] is that against 1.3.2? [02:36] yes, [02:36] hmm [02:36] patching file fs/devpts/root.c [02:36] Reversed (or previously applied) patch detected! Assume -R? [n] [02:36] after doing patch-2.4.23-vs1.3.2.diff [02:36] after doing patch-2.4.23-vs1.3.2.diff then delta-2.4.23-vs1.3.2-vs1.3.2.4.diff i mean [02:37] that should not be so ... one second ... [02:37] got a -#include on root.c [02:37] only change [02:39] hmm, okay obviously my sources got screwed up ... [02:40] some of the correction done in 1.3.2 are missing in my sources .. I'll correct that ... [02:41] k [02:43] http://vserver.13thfloor.at/Experimental/delta-2.4.23-vs1.3.2-vs1.3.2.4.diff [02:43] okay, now against the 'correct' sources ... [02:43] (hopefully, it seems it's a bad day for my patching ;) [02:43] yep clean [02:44] micah (micah@micha.hampshire.edu) joined #vserver. [02:44] hi micah! [02:44] Hi Bertl! [02:45] I've got a system with: [02:45] /dev/rootvg/homelv 495844 14380 455864 4% /home [02:45] /dev/rootvg/usrlv 1032088 728032 251628 75% /usr [02:45] /dev/rootvg/varlv 1032088 119252 860408 13% /var [02:46] the volume group is composed of the two disk partitions: /dev/sda3 and /dev/sdb3 [02:46] created lv in linear mode [02:46] but I want to turn those into raid + lvm [02:46] how should I go about doing that? [02:46] hmm, you know that this isn't a lvm channel, right? [02:47] but I'm sure you can explain the direct relation to linux-vserver, right? [02:49] err, right [02:49] I'd spoken with someone from here in the past about LVM issues (they told me to come here) [02:49] I think it was JohnB or JonB [02:51] ah okay, well, you want to convert the entire vg (rootvg) to a raid setup, right? [02:52] well, I want to use software raid and I want to use LVM [02:52] so I am not sure the best way to do it [02:52] I've done both individually [02:52] in the past [02:52] but you also want to 'save' the data on that partition? [02:53] right [02:53] I have space that I can work with [02:53] okay, first you have to move everything off from one disk [02:53] either sda3 or sdb3 has to be removed from the vg [02:54] this should not be such an issue ... if there is enough unused space ... [02:54] ok, thats easy [02:54] then I remove that disk from the VG? [02:54] yup [02:55] I can do that [02:55] then you configure a mirror raid between that 'removed' partition and the 'still' used partition, marked as faulty volume ... [02:55] so I set the LVM partition as faulty? [02:55] the raid will be created in degraded mode, and your lvm data will remain [02:56] then you create a new vg, I'd suggest /dev/vgs [02:56] from the degraded raid1 [02:56] split it into /dev/vgs/slash /dev/vgs/usr /dev/vgs/var ... [02:57] i guess what I dont understand is, would my md device be composed of LVM volumes? [02:57] nope, you do it the other way around ... [02:57] ooh [02:57] oooooooh [02:57] so the PV is created out of the md devices [02:57] then you just dump/restore the entire partitions ... [02:58] (simpler than copying) [02:58] remove the 'original' lvm/vg stuff and 'rebuild' the faulty raid ... [02:58] after that, you have your lvm volumes on a raid1 md ... [02:59] sounds like you need to do some funky stuff to resize things [02:59] in the future [02:59] hmm, nope, you 'just' add another raid1 md ... [02:59] to the volume group ... [03:00] ahh, that makes sense [03:00] the problem is wrapping my head around the multiple layers of abstraction :) [03:00] by the way, if you have more disks available (more than 2) it's much better to consider raid5 [03:01] Bertl, dead real quick [03:01] and any messages? [03:01] yeah, although it is slower... and this fits perfectly with the disks we have [03:01] yep, got 3 oops will ksymoops in a sec [03:01] micah: actually raid5 is much faster ... [03:02] argh i forgot i rebuilt from scratch, forgot -g again [03:02] Action: nathan_ kicks himself [03:02] sigh [03:03] Bertl: I'll go screw around, thanks for the brain dump [03:04] ur welcome ... [03:06] kestrel (~athomas@dialup51.optus.net.au) left irc: Ping timeout: 480 seconds [03:06] >>EIP; c017f291 <===== [03:07] will get addr2line when -g is done [03:09] /usr/src/linux-2.4.23/fs/proc/generic.c:575 [03:11] argh! self inflicted ... [03:11] I'm sure this is _not_ my day ;) [03:12] :) [03:12] at least it's friday :) [03:12] least we aren't working blind anymore [03:14] Tamama (~Tamama@a62-216-20-152.adsl.cistron.nl) left irc: Quit: one little two little three little piggies OINK! OINK! OINK! [03:16] nathan_: could you put the entire ksymoops trace somewhere? [03:17] http://0x00.org/hidden/bert.txt [03:19] is it better to make LVM volumes in linear or striped mode? [03:20] depends ... if you want to add/remove them, linear is the better solution, if you are looking for performance, striped might be better ... [03:23] Bertl: hmm, these are on top of raid-1 devices, so would that matter? [03:24] what happens when you put a linear mode LVM ontop of a striped software raid? [03:24] it always matters ;), but there might be unexpected (or hard to understand) results at this level ... [03:24] Action: micah scrunches his brow [03:25] seems like I should just keep it linear in LVM and let the software raid have the striping taken care of [03:25] otherwise it might be a total mess :) [03:26] hmm, wait, you are now talking of striped raid? [03:26] err raid 1 is striped no? [03:27] nono [03:27] raid1 is mirror :p [03:27] in which case... it doesn't make sense to make a LVM stripe [03:27] since the data is being written to one md device anyways [03:27] http://www.sqlmag.com/Articles/Index.cfm?ArticleID=9697 ;) [03:28] yeah, I know, I just had a brain fart [03:28] after doing raid1, you do not have more than 'one' physical volume for your lvm vg [03:29] if you add 'another' raid1 md to your 'existing' lvm vg, then a striped setup could give better performance ... [03:29] nathan_: could you do some addr2line queries for me? [03:30] or put the vmlinux somewhere on the web? [03:30] Bertl, sure [03:30] i can do either, whatever is easier for you [03:30] want addr2line on the backtrace? [03:30] hmm, lets start with the former ... [03:30] c017f291 [03:31] try to reduce that addres by values of 16 (0x10) [03:31] multiples of 0x10 and tell me the result ... [03:31] so c017f281, c017f271 ... [03:31] /usr/src/linux-2.4.23/fs/proc/generic.c:575 [03:31] /usr/src/linux-2.4.23/fs/proc/generic.c:573 [03:31] /usr/src/linux-2.4.23/fs/proc/generic.c:569 [03:32] /usr/src/linux-2.4.23/fs/proc/generic.c:570 [03:32] /usr/src/linux-2.4.23/fs/proc/generic.c:569 [03:32] keep going ... [03:32] that last one was actually 41, not 51, skipped one [03:32] /usr/src/linux-2.4.23/include/asm/string.h:190 [03:33] okay, np [03:33] /usr/src/linux-2.4.23/fs/proc/generic.c:563 [03:33] /usr/src/linux-2.4.23/fs/proc/generic.c:560 [03:33] /usr/src/linux-2.4.23/fs/proc/generic.c:553 [03:33] okay ... [03:33] c0182dee [03:33] /usr/src/linux-2.4.23/fs/proc/virtual.c:146 [03:35] hmm, it seems to me, like we are doing remove_proc_entry, with proc_virtual = NULL ... but I don't see how/when this can happen ... [03:54] so do you want me to test anything else or put some trace in there or something? [03:55] hmm, not sure ... maybe another test run, to verify if it's the same location ... [03:55] ok [03:56] does the network console work in both directions? [03:56] so is attaching a remote debugger possible? [03:56] nope [03:56] simply writes printks over the network [03:56] i saw another patch for network kgdb [03:56] havent tried it [03:56] okay, but you have a tool to send magic sysreqs right? [03:57] yep [04:38] k just finished crashing it again [04:38] doing a ksymoops [04:38] sounds like it lasted longer this time? [04:38] no, i just got distracted [04:38] its instanteous [04:38] i run killer, its dead. [04:38] ah okay ... [04:39] in this case, maybe you could enable debugging next time ... ;) [04:39] -g? [04:39] nope, the VX_DEBUG option ... [04:39] ah [04:39] but lets see where it crashed this time ... [04:40] by the way, thanks for helping here ... [04:40] >>EIP; c017f281 <===== [04:40] >>edx; f79c5e00 <_end+3750693c/38377b9c> [04:40] >>edi; f729328c <_end+36dd3dc8/38377b9c> [04:40] >>ebp; f70abe68 <_end+36bec9a4/38377b9c> [04:40] >>esp; f70abe48 <_end+36bec984/38377b9c> [04:40] Trace; c0182dde [04:40] Trace; c0132b74 [04:40] Trace; c01244ef [04:40] Trace; c0125983 [04:40] Trace; c012cee2 [04:40] Trace; c01096ad [04:40] Trace; c011c067 [04:40] Trace; c0107ba8 [04:40] Trace; c0109998 [04:41] compiling with VX_DEBUG [04:42] hmm, you still have the vmlinux which crashed this time? [04:42] err yea hasnt linked it yet hang [04:43] ack clean killed it [04:43] sorry :/ [04:43] just because the crash is different ... [04:43] >>EIP; c017f291 <===== [04:43] 02:41 < nathan_> >>EIP; c017f281 <===== [04:43] ah [04:44] same ballpark but a little earlier ... [04:44] should i take another crash with VX_DEBUG or go back to old vmlinux and get info on that last crash? [04:44] go for the VX_DEBUG I'd say ... [04:45] this might shed some additional light on the issue ... [04:45] ok [04:47] personally, I suspect that we release some half allocated vx_info, somewhere ... in heavy fork action ;) [04:53] hmm should have logged that a little better [04:56] i wish i hadnt forgotten to setup the remote sysrq this boot [04:56]