[00:09] Doener (~doener@pD9588778.dip.t-dialin.net) joined #vserver.
[00:23] Madkiss_ (madkiss@madkiss.org) joined #vserver.
[00:44] <Madkiss_> Hello
[00:44] <Madkiss_> What does "split-2.4.23-vs1.22.tar.bz2" contain?
[01:15] loger joined #vserver.
[01:22] Madkiss_ (madkiss@madkiss.org) got netsplit.
[01:23] Madkiss_ (madkiss@madkiss.org) returned to #vserver.
[01:52] Nick change: Bertl_oO -> Bertl
[01:52] <Bertl> hi all!
[01:52] <Bertl> Madkiss? still there?
[01:57] <Doener> hi Bertl
[02:00] <Madkiss_> Bertl: yes
[02:00] <Bertl> hi Doener!
[02:00] <Bertl> Madkiss_: still want to know what the split-* contains?
[02:00] <Doener> do you remember the kernel oops i showed you?
[02:00] <Madkiss_> Bertl: Please
[02:01] <Bertl> Doener: not at the moment, please refresh my memory ...
[02:02] <Bertl> Madkiss_:  the split-* packages contain the same as the vserver patch, but separated in logical parts called 'split-outs' that is the secret ...
[02:02] <Doener> it's a kernel bug at page_alloc.c you suggested highmem, swapping or high process count as a possible source of the problem
[02:03] <Bertl> Madkiss_: this isn't of much interest for the user, but a great help for developers, who try to adapt or understand those patches ...
[02:03] <Bertl> Doener: is there an url, where I could have a short peek?
[02:04] <Doener> i'm just looking up where i've stored it
[02:04] <Bertl> Madkiss_: does this answer your question?
[02:05] <Doener> ah, there it is: http://vserver4.isp4p.net/bertl/
[02:05] <Bertl> ahh yeah, I remember ...
[02:06] <Doener> today we had the following stuff in the syslog just after the dying of kswapd: http://vserver4.isp4p.net/bertl/dma-error
[02:06] <Doener> could this be telling me that it is a plain hardware issue?
[02:07] <Bertl> is this UP or SMP?
[02:07] <Doener> UP
[02:08] <Bertl> well, it could be hardware related, but I can't guarantee that this _isn't_ a software issue ...
[02:09] <Bertl> the DMA timeout/reset issue, can resukt from: bad cabling, bad chipset support, bad harddisk, electromagnetic interference, overheating, etc ...
[02:10] <Bertl> s/resukt/result/
[02:11] <Bertl> if you get it on a regular basis, you should first update the bios (if possible) and check the interface configuration hdparm et al ...
[02:12] <Doener> ok, i guess for now we'll try what happens when swapping is disabled
[02:14] <Bertl> maybe updating to vs1.22 would be a good idea too ...
[02:16] <nathan_> heya bertl
[02:16] <Bertl> hi nathan!
[02:16] <nathan_> Bertl, what did you mean by replacing the accesses with get/put?
[02:17] <nathan_> would the get just obtain and put release a lock?
[02:17] <nathan_> or am i overlooking something?
[02:17] <Bertl> did you have a look at the vs1.3.x branch?
[02:17] <nathan_> nope i didnt, i went back to 1.2 after you found all those other locking issues.
[02:18] <nathan_> does it use the get/put method you described?
[02:18] <Bertl> yes, but the locking issues are still presen there, I guess ...
[02:19] <Bertl> but you can see how the get/put works there ...
[02:19] <Bertl> and probably in a few days .. there will be a new release, which should fix this ...
[02:19] <nathan_> taking a look at 1.3 code right now
[02:20] <Bertl> alloc/free_vx_info() is simplified ...
[02:20] <Bertl> get/put_vx_info does the ref counting ...
[02:21] <Bertl> what's not in that version is a task_get_vx_info() which would acquire the task lock, get the vx_info and release the lock, same lock around the free_task_struct part ...
[02:22] <nathan_> ah ok i see now, the get/put simply does ref counting and frees on put when refcount==0?
[02:22] <nathan_> was that what you were suggesting?
[02:22] <Bertl> yup, any reference outside of current context is required to get/put the struct ...
[02:23] <Bertl> vx_info disappears automagically when not used anymore ...
[02:24] <Bertl> the restructuring also allows a better handling of the dynamic context cases ...
[02:27] <nathan_> Bertl, does the kernel prevent deadlocking if a spinlock is attempted to be acquired again on the same cpu?
[02:27] <Doener> Bertl: i'm sorry, that was the old ksymoops output, i've just updated it (looks pretty much the same to me), we're now using vs1.21 on this machine
[02:28] <nathan_> Bertl, nevermind, that was dumb.
[02:28] <nathan_> i was reading the code and i thought i saw that happening, but it was calling __find_vx_info not find_vx_info
[02:28] Action: Bertl was ignoring it anyway ... ;)
[02:28] <nathan_> :)
[02:28] Action: nathan_ pretend kernel hacker
[02:29] <Bertl> Doener: hmm, okay vs1.21 should be okay for UP ...
[02:30] Action: nathan_ notes that doeners problem looks like a hardware problem
[02:31] <Doener> nathan_: thought so, but it's a little strange that this is happening quite random on different machines
[02:32] <Bertl> about what interval are we talking here? hours? days?
[02:32] <nathan_> Doener, oh didnt realize it wasnt an isolated case
[02:34] <Doener> at this machine it didn't happen for some days, on another machine it's been more than 2 weeks since the last issue
[02:34] <Bertl> hmm, you have more than the one panic I'm looking at?
[02:34] <Bertl> I mean ksymoops trace to look at?
[02:36] <nathan_> Bertl, in 1.3, isnt it possible for vx_release_ip_info to release while vx_assign_ip_info is about to atomic_inc?
[02:37] <nathan_> where refcount is currently 1 of course
[02:38] <Bertl> yes, that is why it was modified a little in my upcoming 1.3.1 release ...
[02:39] <Doener> hmm...
[02:39] <nathan_> ah
[02:40] <Bertl> nathan_: there are two solutions and I'm currently medidating about what solution is better ...
[02:40] <Bertl> maybe you want to medidate about it too?
[02:40] <nathan_> sure ill give it a think
[02:40] <Bertl> a) do a atomic_inc_return, and check for 1, in which case we return NULL ;)
[02:41] <Bertl> b) add the/a spinlock around the atomic inc
[02:42] <Bertl> (and the test of course)
[02:42] <nathan_> a would be a function returning ip_info and or vx_info and that would just be tested for existence of the infos?
[02:44] <Bertl> basically get_vx_info() returns the value 'requested' with the refcount incremented ... so that then, just would allow to return NULL, which would be okay, as you have to do vxi = get_vx_info(a->vx_info) anyway, and check against NULL ...
[02:45] <Bertl> b) would require a spinlock around every vxi=get_vx_info(), call 
[02:45] <nathan_> a) still requires synchronizing of some sort between the get/put methods though wouldnt it?
[02:46] <Bertl> nope, not required ...
[02:46] <Bertl> assume rc=1
[02:47] <Bertl> hmm, now that you mention it, it's a problem on more than 2 cpus ... 
[02:47] <nathan_> yea
[02:47] <nathan_> the if (ip_info) inc_of_somesort() needs to be atomic
[02:47] <nathan_> and we cant just inc without the test
[02:48] <nathan_> so there must be a synchronization of some sort as far as i can see
[02:48] <Bertl> well, that's not the problem, as atomic_inc_return() is atomic ...
[02:48] <nathan_> sure but the if (ip_info) inc() needs to be atomic, the test and inc
[02:49] <Bertl> okay, forget it, a) is crap ... thanks for making it clear to me ...
[02:50] <nathan_> could there be a less general lock though?
[02:50] <Bertl> actually the issue is not the testing/incrementing, it's the fact that it doesn't protect the structures memory from _being_ released under the dec/inc ;)
[02:51] <nathan_> right but once you synchronize the testing/incrementing with the testing/decrementing then the issue of memory being released from under the dec/inc goes away i think
[02:52] <Bertl> I would say, the best choice for a lock would be the task_lock, and it's only required for a very short period of time ... and only if such a race is possible ...
[02:53] <Bertl> basicall doing ...
[02:53] <Bertl> task_lock(p);
[02:53] <Bertl> vxi = get_vx_info(p->vx_info);
[02:53] <Bertl> task_unlock(p);
[02:54] <Bertl> and in the deallocation of the task ...
[02:54] <Bertl> task_lock(p);
[02:54] <Bertl> vxi = p->vx_info;
[02:54] <Bertl> p->vx_info = NULL;
[02:54] <Bertl> task_unlock(p);
[02:54] <Bertl> put_vx_info(vxi);
[02:54] <nathan_> yea i like that better than a single global lock
[02:55] <Bertl> in the 'current' cases this isn't required at all ...
[02:55] <Bertl> so we probably end up with 4-5 cases which need the task lock, and I guess this _is_ acceptable ...
[02:57] <Bertl> Doener: any 'other' ksymoops?
[02:59] <Doener> on that machine i've got about 70 others ;) i'm still looking at the other machines, my connection is damn slow today
[03:00] <Bertl> okay, so you have plenty of panics, and loggings, right? could you 'arrange' them somehow, so I could have a look at them?
[03:01] aka (~aka@h062040166017.gun.cm.kabsi.at) left irc: Quit: Leaving
[03:01] <Bertl> that amount of information should be able to draw a pattern, which should allow to pinpoint the cause ...
[03:01] <Doener> i could grep the bugs out of the logs
[03:02] <Bertl> you should try to find the 'first' panic after a reboot ... this contains the most information ...
[03:02] <Bertl> okay, guys, I'm tired, I'll go to bed now ... 
[03:03] <Bertl> nathan, thanks for chatting about 1.3.x ...
[03:03] <Doener> good night Bertl
[03:03] <Bertl> good night Doener, and if you have something by tomorrow morning I'll ahve a look at it ...
[03:04] <Doener> ok, thanks
[03:04] Nick change: Bertl -> Bertl_zZ
[03:04] <nathan_> night
[03:13] kestrel (~athomas@dialup51.optus.net.au) left irc: Ping timeout: 499 seconds
[04:00] <maharaja> does chbind rely on a kernel patch?
[04:00] <maharaja> s/a/the vserver/
[04:03] <nathan_> maharaja, yes, it sets the ipv4root
[04:27] kestrel (~athomas@dialup51.optus.net.au) joined #vserver.
[04:38] kestrel (~athomas@dialup51.optus.net.au) left irc: Ping timeout: 499 seconds
[04:39] Nick change: Doener -> doener_zZz
[04:45] Nick change: doener_zZz -> Doener
[05:53] Nick change: Doener -> Doener_zZz
[06:30] kestrel (~athomas@dialup51.optus.net.au) joined #vserver.
[07:35] nathan_ (~nathan@209-6-130-26.c3-0.sbo-ubr1.sbo-ubr.ma.cable.rcn.com) left irc: Quit: BitchX-1.0c20cvs -- just do it.
[08:15] Simon (~sgarner@apollo.quattro.net.nz) joined #vserver.
[08:40] caligula (~junior@adsli217.cofs.net) left irc: Ping timeout: 485 seconds
[08:55] Simon (~sgarner@apollo.quattro.net.nz) left irc: Quit: so long, and thanks for all the fish
[09:38] Doener` (~doener@pD958883F.dip.t-dialin.net) joined #vserver.
[09:46] Doener_zZz (~doener@pD9588778.dip.t-dialin.net) left irc: Ping timeout: 499 seconds
[10:06] kestrel (~athomas@dialup51.optus.net.au) left irc: Ping timeout: 499 seconds
[11:57] kestrel (~athomas@dialup51.optus.net.au) joined #vserver.
[12:22] apw (~apw@212.104.150.41) left irc: Ping timeout: 499 seconds
[13:34] zyong (cat@bb220-255-107-245.singnet.com.sg) joined #vserver.
[14:00] Madkiss_ (madkiss@madkiss.org) got netsplit.
[14:00] Madkiss_ (madkiss@madkiss.org) returned to #vserver.
[14:20] Nick change: Bertl_zZ -> Bertl
[14:44] <ccooke> morning
[14:44] <maharaja> hi
[15:08] <Bertl> morning!
[15:32] serving (~serving@213.186.191.2) left irc: Read error: Connection reset by peer
[15:36] <xsbyme> morning
[15:36] <Bertl> hi!
[15:37] <xsbyme> Bertl
[15:37] <xsbyme> look query
[15:38] <Bertl> 13:38 <xsbyme> А~=[ {force_} ]=~А (01:19 PM) : 
[15:38] <xsbyme> ye
[15:38] <Bertl> hmm, funny stuff ...
[15:38] <xsbyme> force = nihal
[15:39] <Bertl> okay, so the mobo is replaced, and the errors are gone?
[15:39] <xsbyme> yep
[15:39] <Bertl> fine, another problem solved without divine intervention ;)
[15:41] <xsbyme> <@Nihal> Cpu(s):  49.3% user,  50.7% system,   0.0% nice,   0.0% idle
[15:41] <xsbyme> <@Nihal> Mem:   1551856k total,  1521776k used,    30080k free,    13280k buffers
[15:41] <xsbyme> <@Nihal> Swap:  3004144k total,   233612k used,  2770532k free,   107800k cached
[15:41] <xsbyme> <@Nihal> Cpu(s):  49.3% user,  50.7% system,   0.0% nice,   0.0% idle
[15:41] <xsbyme> <@Nihal> Mem:   1551856k total,  1521776k used,    30080k free,    13280k buffers
[15:41] <xsbyme> <@Nihal> Swap:  3004144k total,   233612k used,  2770532k free,   107800k cached
[15:41] <xsbyme> |01:42|  ЛЛ <@Nihal> dd if=/dev/urandom bs=1024M count=24 | gzip -c |zcat >/dev/null 
[15:41] <xsbyme> |01:42|  ЛЛ <@Nihal> dd if=/dev/hda2 bs=512M count=24 | gzip -c |zcat >/dev/null
[16:12] <xsbyme> Bertl
[16:12] <xsbyme> you know http://vserver4.isp4p.net/ that ?
[16:20] sylvio (sylvio@imk32.mb.uni-magdeburg.de) joined #vserver.
[16:33] <Bertl> hi sylvio!
[16:34] <xsbyme> Bertl
[16:34] <xsbyme> you know  http://vserver4.isp4p.net/ what software so use for that
[16:34] LL0rd (~dr@pD9507EFA.dip0.t-ipconnect.de) joined #vserver.
[16:34] <LL0rd> hi
[16:34] <Bertl> xsbyme: I beg you pardon?
[16:34] <Bertl> hi LL0rd!
[16:35] <xsbyme> seeking software to controler vserver in a webinterface
[16:35] <xsbyme> control *
[16:36] <Bertl> ahh, okay, well there where some projects writing frontends ...
[16:36] <LL0rd> does someone knows howto prewent a high load on a masterserver?
[16:39] <xsbyme> Bertl
[16:39] <xsbyme> u know site
[16:39] <xsbyme> about that
[16:42] <Bertl> LL0rd: what is a 'masterserver'?
[16:42] <Bertl> xsbyme: no sorry I don't know a site ...
[16:42] Nick change: unriel -> riel
[16:42] <xsbyme> darn
[16:42] <LL0rd> the hostserver, which runs the vservers
[16:43] <Bertl> well a high load there is probably the result from your vservers?
[16:43] <Bertl> compare it to the load in ctx 1
[16:44] <LL0rd> yes, and how can I 'limit' the load can be done by the vservers?
[16:49] <xsbyme> <Bertl> ahh, okay, well there where some projects writing frontends ...
[16:49] <xsbyme> u know a site about that
[17:27] serving (~serving@213.186.189.83) joined #vserver.
[17:46] <Bertl> LL0rd: well, you can do that with the 'new' scheduler stuff ...
[17:47] <Bertl> xsbyme: no, as I said, I don't have any urls, I just know that there 'where' some projects ...
[17:47] <xsbyme> yes but cant find them :/
[17:48] <Bertl> http://nutexvserver.sourceforge.net/ (quick search with google ;)
[17:50] <xsbyme> witch string dit you used
[17:50] <Bertl> vserver admin frontend
[17:53] <xsbyme> thnx
[17:53] <Bertl> nm
[17:57] <LL0rd> and how can I use the scheduler stuff?
[17:58] <LL0rd> Bertl, is there a site, where it is explained?
[17:58] <Bertl> well, the O(1) stuff is experimental, and not updated yet ...
[17:59] <Bertl> what is your intention? what do you want to accomplish?
[18:00] <LL0rd> do you know the program "stress" ?
[18:01] <Bertl> hmm, not yet ..
[18:01] <Bertl> but it sounds familiar ...
[18:02] <LL0rd> http://freshmeat.net/projects/stress/?topic_id=861
[18:02] <LL0rd> when I run this programm on a vserver, then the hostserver gets a high load
[18:02] <Bertl> sounds reasonable ...
[18:03] <LL0rd> what can I do to prevent this?
[18:03] <Bertl> that is the 'fair share' component ;)
[18:03] <Bertl> you want to block DoS attacks from inside a vserver?
[18:03] <LL0rd> yes
[18:04] <Bertl> the first step is specifying NPROC (to limit the number of processes in a vserver) and resonable ulimits ... 
[18:05] <LL0rd> done, the limit of processes ist set to 40
[18:05] <Bertl> then you can use the SCHED flag to account scheduler fairness for the entire vserver (it will become a single process)
[18:06] <Bertl> then you can 'renice' the vserver ...
[18:06] <Bertl> but that's it for the stable branch ...
[18:07] <Bertl> the experimental stuff allows you to impose memory limits and an adapted tunable O(1) scheduler ...
[18:09] <LL0rd> what do you mean by 'renice' the vserver?
[18:09] <Bertl> S_NICE=
[18:10] <LL0rd> ah.... ok
[18:19] <LL0rd> hmm..... Unknown flag SCHED
[18:19] <Bertl> S_FLAGS="lock sched nproc"
[18:22] <LL0rd> and S_NICE=19 ?
[18:23] <Bertl> for example ...
[18:23] <LL0rd> hmm...
[18:24] <LL0rd> even with that the server gets to busy
[18:24] <Bertl> what exactly do you mean by, 'too busy'?
[18:25] <LL0rd> whey i type a letter it appears after about 30 seconds
[18:27] <Bertl> what does vmstat return?
[18:28] <LL0rd> vmstat
[18:28] <LL0rd> procs memory swap io system cpu
[18:28] <LL0rd> r b w swpd free buff cache si so bi bo in cs us sy id
[18:28] <LL0rd> 0 1 0 16340 2012 592 2884 0 5 1 25 28 11 0 1 99
[18:28] <Bertl> hmm, seems the tool is paging in/out ...
[18:29] <Bertl> probably it allocates more memory than available ...
[18:29] <LL0rd> could be
[18:30] <Bertl> if you are interested in the experimental stuff ...
[18:30] <Bertl> http://vserver.13thfloor.at/Experimental/
[18:30] <Bertl> probably the combination of rmap+O1.3+ml would do the trick ...
[18:31] <Bertl> but as I said, experimental, and not updated yet, if you volunteer to do extensive (some ;) testing, I could update the one or the other ...
[18:33] <LL0rd> yes of course, i'm interested in testing the experimental stuff
[18:40] <Bertl> okay, first get Con Kolivas latest? patch set ... for 2.4.23
[18:43] <LL0rd> where can i get this patch from?
[18:44] <Bertl> google: 'Con Kolivas kernel patches' 
[18:44] <Bertl> http://www.plumlocosoft.com/kernel/patches/2.4/2.4.23/2.4.23-ck1/patch-2.4.23-ck1.bz2
[18:45] <Bertl> this has almost everything we need, that is why I consider it a good base ;)
[18:45] <LL0rd> ;)
[18:45] <LL0rd> ok
[18:46] <Bertl> now we have to adapt the vserver patch ... *G*
[18:49] <LL0rd> ok, patch applied
[18:49] <LL0rd> the Con Kolivas kernel patch
[18:49] <Bertl> you might want to back out the desktop tuning, for a server ...
[18:50] Doener` (~doener@pD958883F.dip.t-dialin.net) left irc: Quit: Leaving
[18:52] <LL0rd> on this server there is no desktop installed
[18:53] <Bertl> RL2 Desktop Tuning (leave this out or back it out if using on a server): patch-2.4.23-rl2-rl2dt-0312022027.bz2       and we have to cut out the vserver scheduler stuff, as this isn't for O(1) yet, but we add it alter on ...
[18:54] <Bertl> s/alter/later/
[18:54] <Bertl> and we get ... http://vserver.13thfloor.at/Experimental/patch-2.4.23-ck1-vs1.3.0.2.diff
[18:55] <Bertl> this is ontop of ck1 ...
[18:55] <LL0rd> applied
[18:55] <Bertl> it will not do what you want right now, and out of the box, as the vserver scheduler isn't used ...
[18:55] sylvio (sylvio@imk32.mb.uni-magdeburg.de) left irc: Read error: Connection reset by peer
[18:56] <Bertl> but we have to make sure that this setup is working and stable, before we adapt the O(1) scheduler stuff ...
[18:57] say (~say@212.86.243.154) left irc: Read error: Connection reset by peer
[18:57] <LL0rd> ok, so i have to compile the kernel now
[18:58] <Bertl> yes, and do some stress testing on that one ...
[18:58] <Bertl> in the meanwhile I'll prepare the scheduler modifications 
[18:58] <LL0rd> ok
[19:00] <LL0rd> i will talk to you, when i'm ready
[19:00] <Bertl> perfect ...
[20:02] Doener (~doener@pD958883F.dip.t-dialin.net) joined #vserver.
[20:31] <Bertl> LL0rd: this is the patch for the scheduler, but please test without first ... http://vserver.13thfloor.at/Experimental/delta-vs1.3.0.2-vs1.3.0.4.diff
[20:58] <Bertl> okay .. cu l8er ...
[20:58] Nick change: Bertl -> Bertl_oO
[21:08] serving (~serving@213.186.189.83) left irc: Ping timeout: 485 seconds
[21:54] Linux_Lord (~dr@pD9507EF2.dip0.t-ipconnect.de) joined #vserver.
[22:02] LL0rd (~dr@pD9507EFA.dip0.t-ipconnect.de) left irc: Ping timeout: 499 seconds
[23:00] serving (~serving@213.186.189.29) joined #vserver.
[23:26] say (~say@212.86.243.154) joined #vserver.
[00:00] --- Wed Dec 24 2003