[00:09] Doener (~doener@pD9588778.dip.t-dialin.net) joined #vserver. [00:23] Madkiss_ (madkiss@madkiss.org) joined #vserver. [00:44] Hello [00:44] What does "split-2.4.23-vs1.22.tar.bz2" contain? [01:15] loger joined #vserver. [01:22] Madkiss_ (madkiss@madkiss.org) got netsplit. [01:23] Madkiss_ (madkiss@madkiss.org) returned to #vserver. [01:52] Nick change: Bertl_oO -> Bertl [01:52] hi all! [01:52] Madkiss? still there? [01:57] hi Bertl [02:00] Bertl: yes [02:00] hi Doener! [02:00] Madkiss_: still want to know what the split-* contains? [02:00] do you remember the kernel oops i showed you? [02:00] Bertl: Please [02:01] Doener: not at the moment, please refresh my memory ... [02:02] Madkiss_: the split-* packages contain the same as the vserver patch, but separated in logical parts called 'split-outs' that is the secret ... [02:02] it's a kernel bug at page_alloc.c you suggested highmem, swapping or high process count as a possible source of the problem [02:03] Madkiss_: this isn't of much interest for the user, but a great help for developers, who try to adapt or understand those patches ... [02:03] Doener: is there an url, where I could have a short peek? [02:04] i'm just looking up where i've stored it [02:04] Madkiss_: does this answer your question? [02:05] ah, there it is: http://vserver4.isp4p.net/bertl/ [02:05] ahh yeah, I remember ... [02:06] today we had the following stuff in the syslog just after the dying of kswapd: http://vserver4.isp4p.net/bertl/dma-error [02:06] could this be telling me that it is a plain hardware issue? [02:07] is this UP or SMP? [02:07] UP [02:08] well, it could be hardware related, but I can't guarantee that this _isn't_ a software issue ... [02:09] the DMA timeout/reset issue, can resukt from: bad cabling, bad chipset support, bad harddisk, electromagnetic interference, overheating, etc ... [02:10] s/resukt/result/ [02:11] if you get it on a regular basis, you should first update the bios (if possible) and check the interface configuration hdparm et al ... [02:12] ok, i guess for now we'll try what happens when swapping is disabled [02:14] maybe updating to vs1.22 would be a good idea too ... [02:16] heya bertl [02:16] hi nathan! [02:16] Bertl, what did you mean by replacing the accesses with get/put? [02:17] would the get just obtain and put release a lock? [02:17] or am i overlooking something? [02:17] did you have a look at the vs1.3.x branch? [02:17] nope i didnt, i went back to 1.2 after you found all those other locking issues. [02:18] does it use the get/put method you described? [02:18] yes, but the locking issues are still presen there, I guess ... [02:19] but you can see how the get/put works there ... [02:19] and probably in a few days .. there will be a new release, which should fix this ... [02:19] taking a look at 1.3 code right now [02:20] alloc/free_vx_info() is simplified ... [02:20] get/put_vx_info does the ref counting ... [02:21] what's not in that version is a task_get_vx_info() which would acquire the task lock, get the vx_info and release the lock, same lock around the free_task_struct part ... [02:22] ah ok i see now, the get/put simply does ref counting and frees on put when refcount==0? [02:22] was that what you were suggesting? [02:22] yup, any reference outside of current context is required to get/put the struct ... [02:23] vx_info disappears automagically when not used anymore ... [02:24] the restructuring also allows a better handling of the dynamic context cases ... [02:27] Bertl, does the kernel prevent deadlocking if a spinlock is attempted to be acquired again on the same cpu? [02:27] Bertl: i'm sorry, that was the old ksymoops output, i've just updated it (looks pretty much the same to me), we're now using vs1.21 on this machine [02:28] Bertl, nevermind, that was dumb. [02:28] i was reading the code and i thought i saw that happening, but it was calling __find_vx_info not find_vx_info [02:28] Action: Bertl was ignoring it anyway ... ;) [02:28] :) [02:28] Action: nathan_ pretend kernel hacker [02:29] Doener: hmm, okay vs1.21 should be okay for UP ... [02:30] Action: nathan_ notes that doeners problem looks like a hardware problem [02:31] nathan_: thought so, but it's a little strange that this is happening quite random on different machines [02:32] about what interval are we talking here? hours? days? [02:32] Doener, oh didnt realize it wasnt an isolated case [02:34] at this machine it didn't happen for some days, on another machine it's been more than 2 weeks since the last issue [02:34] hmm, you have more than the one panic I'm looking at? [02:34] I mean ksymoops trace to look at? [02:36] Bertl, in 1.3, isnt it possible for vx_release_ip_info to release while vx_assign_ip_info is about to atomic_inc? [02:37] where refcount is currently 1 of course [02:38] yes, that is why it was modified a little in my upcoming 1.3.1 release ... [02:39] hmm... [02:39] ah [02:40] nathan_: there are two solutions and I'm currently medidating about what solution is better ... [02:40] maybe you want to medidate about it too? [02:40] sure ill give it a think [02:40] a) do a atomic_inc_return, and check for 1, in which case we return NULL ;) [02:41] b) add the/a spinlock around the atomic inc [02:42] (and the test of course) [02:42] a would be a function returning ip_info and or vx_info and that would just be tested for existence of the infos? [02:44] basically get_vx_info() returns the value 'requested' with the refcount incremented ... so that then, just would allow to return NULL, which would be okay, as you have to do vxi = get_vx_info(a->vx_info) anyway, and check against NULL ... [02:45] b) would require a spinlock around every vxi=get_vx_info(), call [02:45] a) still requires synchronizing of some sort between the get/put methods though wouldnt it? [02:46] nope, not required ... [02:46] assume rc=1 [02:47] hmm, now that you mention it, it's a problem on more than 2 cpus ... [02:47] yea [02:47] the if (ip_info) inc_of_somesort() needs to be atomic [02:47] and we cant just inc without the test [02:48] so there must be a synchronization of some sort as far as i can see [02:48] well, that's not the problem, as atomic_inc_return() is atomic ... [02:48] sure but the if (ip_info) inc() needs to be atomic, the test and inc [02:49] okay, forget it, a) is crap ... thanks for making it clear to me ... [02:50] could there be a less general lock though? [02:50] actually the issue is not the testing/incrementing, it's the fact that it doesn't protect the structures memory from _being_ released under the dec/inc ;) [02:51] right but once you synchronize the testing/incrementing with the testing/decrementing then the issue of memory being released from under the dec/inc goes away i think [02:52] I would say, the best choice for a lock would be the task_lock, and it's only required for a very short period of time ... and only if such a race is possible ... [02:53] basicall doing ... [02:53] task_lock(p); [02:53] vxi = get_vx_info(p->vx_info); [02:53] task_unlock(p); [02:54] and in the deallocation of the task ... [02:54] task_lock(p); [02:54] vxi = p->vx_info; [02:54] p->vx_info = NULL; [02:54] task_unlock(p); [02:54] put_vx_info(vxi); [02:54] yea i like that better than a single global lock [02:55] in the 'current' cases this isn't required at all ... [02:55] so we probably end up with 4-5 cases which need the task lock, and I guess this _is_ acceptable ... [02:57] Doener: any 'other' ksymoops? [02:59] on that machine i've got about 70 others ;) i'm still looking at the other machines, my connection is damn slow today [03:00] okay, so you have plenty of panics, and loggings, right? could you 'arrange' them somehow, so I could have a look at them? [03:01] aka (~aka@h062040166017.gun.cm.kabsi.at) left irc: Quit: Leaving [03:01] that amount of information should be able to draw a pattern, which should allow to pinpoint the cause ... [03:01] i could grep the bugs out of the logs [03:02] you should try to find the 'first' panic after a reboot ... this contains the most information ... [03:02] okay, guys, I'm tired, I'll go to bed now ... [03:03] nathan, thanks for chatting about 1.3.x ... [03:03] good night Bertl [03:03] good night Doener, and if you have something by tomorrow morning I'll ahve a look at it ... [03:04] ok, thanks [03:04] Nick change: Bertl -> Bertl_zZ [03:04] night [03:13] kestrel (~athomas@dialup51.optus.net.au) left irc: Ping timeout: 499 seconds [04:00] does chbind rely on a kernel patch? [04:00] s/a/the vserver/ [04:03] maharaja, yes, it sets the ipv4root [04:27] kestrel (~athomas@dialup51.optus.net.au) joined #vserver. [04:38] kestrel (~athomas@dialup51.optus.net.au) left irc: Ping timeout: 499 seconds [04:39] Nick change: Doener -> doener_zZz [04:45] Nick change: doener_zZz -> Doener [05:53] Nick change: Doener -> Doener_zZz [06:30] kestrel (~athomas@dialup51.optus.net.au) joined #vserver. [07:35] nathan_ (~nathan@209-6-130-26.c3-0.sbo-ubr1.sbo-ubr.ma.cable.rcn.com) left irc: Quit: BitchX-1.0c20cvs -- just do it. [08:15] Simon (~sgarner@apollo.quattro.net.nz) joined #vserver. [08:40] caligula (~junior@adsli217.cofs.net) left irc: Ping timeout: 485 seconds [08:55] Simon (~sgarner@apollo.quattro.net.nz) left irc: Quit: so long, and thanks for all the fish [09:38] Doener` (~doener@pD958883F.dip.t-dialin.net) joined #vserver. [09:46] Doener_zZz (~doener@pD9588778.dip.t-dialin.net) left irc: Ping timeout: 499 seconds [10:06] kestrel (~athomas@dialup51.optus.net.au) left irc: Ping timeout: 499 seconds [11:57] kestrel (~athomas@dialup51.optus.net.au) joined #vserver. [12:22] apw (~apw@212.104.150.41) left irc: Ping timeout: 499 seconds [13:34] zyong (cat@bb220-255-107-245.singnet.com.sg) joined #vserver. [14:00] Madkiss_ (madkiss@madkiss.org) got netsplit. [14:00] Madkiss_ (madkiss@madkiss.org) returned to #vserver. [14:20] Nick change: Bertl_zZ -> Bertl [14:44] morning [14:44] hi [15:08] morning! [15:32] serving (~serving@213.186.191.2) left irc: Read error: Connection reset by peer [15:36] morning [15:36] hi! [15:37] Bertl [15:37] look query [15:38] 13:38 °~=[ {force_} ]=~° (01:19 PM) : [15:38] ye [15:38] hmm, funny stuff ... [15:38] force = nihal [15:39] okay, so the mobo is replaced, and the errors are gone? [15:39] yep [15:39] fine, another problem solved without divine intervention ;) [15:41] <@Nihal> Cpu(s): 49.3% user, 50.7% system, 0.0% nice, 0.0% idle [15:41] <@Nihal> Mem: 1551856k total, 1521776k used, 30080k free, 13280k buffers [15:41] <@Nihal> Swap: 3004144k total, 233612k used, 2770532k free, 107800k cached [15:41] <@Nihal> Cpu(s): 49.3% user, 50.7% system, 0.0% nice, 0.0% idle [15:41] <@Nihal> Mem: 1551856k total, 1521776k used, 30080k free, 13280k buffers [15:41] <@Nihal> Swap: 3004144k total, 233612k used, 2770532k free, 107800k cached [15:41] |01:42| »» <@Nihal> dd if=/dev/urandom bs=1024M count=24 | gzip -c |zcat >/dev/null [15:41] |01:42| »» <@Nihal> dd if=/dev/hda2 bs=512M count=24 | gzip -c |zcat >/dev/null [16:12] Bertl [16:12] you know http://vserver4.isp4p.net/ that ? [16:20] sylvio (sylvio@imk32.mb.uni-magdeburg.de) joined #vserver. [16:33] hi sylvio! [16:34] Bertl [16:34] you know http://vserver4.isp4p.net/ what software so use for that [16:34] LL0rd (~dr@pD9507EFA.dip0.t-ipconnect.de) joined #vserver. [16:34] hi [16:34] xsbyme: I beg you pardon? [16:34] hi LL0rd! [16:35] seeking software to controler vserver in a webinterface [16:35] control * [16:36] ahh, okay, well there where some projects writing frontends ... [16:36] does someone knows howto prewent a high load on a masterserver? [16:39] Bertl [16:39] u know site [16:39] about that [16:42] LL0rd: what is a 'masterserver'? [16:42] xsbyme: no sorry I don't know a site ... [16:42] Nick change: unriel -> riel [16:42] darn [16:42] the hostserver, which runs the vservers [16:43] well a high load there is probably the result from your vservers? [16:43] compare it to the load in ctx 1 [16:44] yes, and how can I 'limit' the load can be done by the vservers? [16:49] ahh, okay, well there where some projects writing frontends ... [16:49] u know a site about that [17:27] serving (~serving@213.186.189.83) joined #vserver. [17:46] LL0rd: well, you can do that with the 'new' scheduler stuff ... [17:47] xsbyme: no, as I said, I don't have any urls, I just know that there 'where' some projects ... [17:47] yes but cant find them :/ [17:48] http://nutexvserver.sourceforge.net/ (quick search with google ;) [17:50] witch string dit you used [17:50] vserver admin frontend [17:53] thnx [17:53] nm [17:57] and how can I use the scheduler stuff? [17:58] Bertl, is there a site, where it is explained? [17:58] well, the O(1) stuff is experimental, and not updated yet ... [17:59] what is your intention? what do you want to accomplish? [18:00] do you know the program "stress" ? [18:01] hmm, not yet .. [18:01] but it sounds familiar ... [18:02] http://freshmeat.net/projects/stress/?topic_id=861 [18:02] when I run this programm on a vserver, then the hostserver gets a high load [18:02] sounds reasonable ... [18:03] what can I do to prevent this? [18:03] that is the 'fair share' component ;) [18:03] you want to block DoS attacks from inside a vserver? [18:03] yes [18:04] the first step is specifying NPROC (to limit the number of processes in a vserver) and resonable ulimits ... [18:05] done, the limit of processes ist set to 40 [18:05] then you can use the SCHED flag to account scheduler fairness for the entire vserver (it will become a single process) [18:06] then you can 'renice' the vserver ... [18:06] but that's it for the stable branch ... [18:07] the experimental stuff allows you to impose memory limits and an adapted tunable O(1) scheduler ... [18:09] what do you mean by 'renice' the vserver? [18:09] S_NICE= [18:10] ah.... ok [18:19] hmm..... Unknown flag SCHED [18:19] S_FLAGS="lock sched nproc" [18:22] and S_NICE=19 ? [18:23] for example ... [18:23] hmm... [18:24] even with that the server gets to busy [18:24] what exactly do you mean by, 'too busy'? [18:25] whey i type a letter it appears after about 30 seconds [18:27] what does vmstat return? [18:28] vmstat [18:28] procs memory swap io system cpu [18:28] r b w swpd free buff cache si so bi bo in cs us sy id [18:28] 0 1 0 16340 2012 592 2884 0 5 1 25 28 11 0 1 99 [18:28] hmm, seems the tool is paging in/out ... [18:29] probably it allocates more memory than available ... [18:29] could be [18:30] if you are interested in the experimental stuff ... [18:30] http://vserver.13thfloor.at/Experimental/ [18:30] probably the combination of rmap+O1.3+ml would do the trick ... [18:31] but as I said, experimental, and not updated yet, if you volunteer to do extensive (some ;) testing, I could update the one or the other ... [18:33] yes of course, i'm interested in testing the experimental stuff [18:40] okay, first get Con Kolivas latest? patch set ... for 2.4.23 [18:43] where can i get this patch from? [18:44] google: 'Con Kolivas kernel patches' [18:44] http://www.plumlocosoft.com/kernel/patches/2.4/2.4.23/2.4.23-ck1/patch-2.4.23-ck1.bz2 [18:45] this has almost everything we need, that is why I consider it a good base ;) [18:45] ;) [18:45] ok [18:46] now we have to adapt the vserver patch ... *G* [18:49] ok, patch applied [18:49] the Con Kolivas kernel patch [18:49] you might want to back out the desktop tuning, for a server ... [18:50] Doener` (~doener@pD958883F.dip.t-dialin.net) left irc: Quit: Leaving [18:52] on this server there is no desktop installed [18:53] RL2 Desktop Tuning (leave this out or back it out if using on a server): patch-2.4.23-rl2-rl2dt-0312022027.bz2 and we have to cut out the vserver scheduler stuff, as this isn't for O(1) yet, but we add it alter on ... [18:54] s/alter/later/ [18:54] and we get ... http://vserver.13thfloor.at/Experimental/patch-2.4.23-ck1-vs1.3.0.2.diff [18:55] this is ontop of ck1 ... [18:55] applied [18:55] it will not do what you want right now, and out of the box, as the vserver scheduler isn't used ... [18:55] sylvio (sylvio@imk32.mb.uni-magdeburg.de) left irc: Read error: Connection reset by peer [18:56] but we have to make sure that this setup is working and stable, before we adapt the O(1) scheduler stuff ... [18:57] say (~say@212.86.243.154) left irc: Read error: Connection reset by peer [18:57] ok, so i have to compile the kernel now [18:58] yes, and do some stress testing on that one ... [18:58] in the meanwhile I'll prepare the scheduler modifications [18:58] ok [19:00] i will talk to you, when i'm ready [19:00] perfect ... [20:02] Doener (~doener@pD958883F.dip.t-dialin.net) joined #vserver. [20:31] LL0rd: this is the patch for the scheduler, but please test without first ... http://vserver.13thfloor.at/Experimental/delta-vs1.3.0.2-vs1.3.0.4.diff [20:58] okay .. cu l8er ... [20:58] Nick change: Bertl -> Bertl_oO [21:08] serving (~serving@213.186.189.83) left irc: Ping timeout: 485 seconds [21:54] Linux_Lord (~dr@pD9507EF2.dip0.t-ipconnect.de) joined #vserver. [22:02] LL0rd (~dr@pD9507EFA.dip0.t-ipconnect.de) left irc: Ping timeout: 499 seconds [23:00] serving (~serving@213.186.189.29) joined #vserver. [23:26] say (~say@212.86.243.154) joined #vserver. [00:00] --- Wed Dec 24 2003