vserver development mailing list: Re: AW: [vserver] Guest hourly/daily/... cron job parallel execution

From: Phil Parris <phil.networkadmin_at_gmail.com>
Date: Thu 17 Jan 2013 - 14:25:40 GMT
Message-ID: <CAE_G9ytsiuQZaDj5HOcxo_9fnZLdFHPh9d-Sctrt4XCwtPeKdw@mail.gmail.com>

Im running into similar issues. Please provide the script. One thing i
thought about was to change cron source and add in a random sleep.
On Jan 17, 2013 5:10 AM, "Fiedler Roman" <Roman.Fiedler@ait.ac.at> wrote:

> > -----Ursprüngliche Nachricht-----
> > Von: Bendtsen, Jon [mailto:Jon.Bendtsen@laerdal.dk]
> >
> > On 17/01/2013, at 11.05, Fiedler Roman <Roman.Fiedler@ait.ac.at> wrote:
> >
> > >> -----Ursprüngliche Nachricht-----
> > >> Von: Bendtsen, Jon [mailto:Jon.Bendtsen@laerdal.dk]
> > >>
> > >> On 17/01/2013, at 10.07, Fiedler Roman <Roman.Fiedler@ait.ac.at>
> > >> wrote:
> > >>
> > >>>> -----Ursprüngliche Nachricht-----
> > >>>> Von: Bendtsen, Jon [mailto:Jon.Bendtsen@laerdal.dk]
> > >>>>
> > >>>> On 17/01/2013, at 09.43, Fiedler Roman <Roman.Fiedler@ait.ac.at>
> > >> wrote:
> > ....
> > >> I see your point. If so maybe we should split up daily/weekly/hourly
> > cronjobs
> > >> even further, such that all the same programs which are hard linked
> > together
> > >> can use the same cache and ram for instructions. But running locate to
> > update
> > >> the database might not be a good choice to all run at the same time.
> > >
> > > I would expect, that caching of the application code itself is not the
> main
> > performance boost. It is more about getting rid of the bottle-necks. As
> you
> > noted, too many locate-updates in parallel will kill disk performance.
> But also
> > loosing the usual disk-cache benefit might be problematic. If e.g. just
> a few
> > databases run a job, the relevant db-content from disk is likely the end
> up
> > completely in OS RAM cache. All disk-reads from db will return
> immediately.
> > When too many DBs execute in parallel, disk content of one process will
> be
> > put in cache, eliminating pages soon need again by another job again.
> >
> > This is not really any different from running many other kinds of virtual
> > machines in parallel. Your storage system needs to perform.
>
> Yes, but it would be nice to have only moderate expensive equipment
> sufficient to fulfill customer requirements instead of very expensive one
> just needed to handle non-interactive load bursts caused by suboptimal
> scheduling.
>
> > >> Maybe we need kind of scheduler in the kernel that notices which
> > processes
> > >> in the different guests are hard linked and then prioritizing running
> those?
> > >>
> > >> Maybe we need some new kind of scheduler system that is made with
> > >> virtualization in mind, such that it adjusts to when there is a low
> load and
> > then
> > >> tries to run the maintenance scripts, meaning that sometimes scripts
> are
> > run
> > >> with only 20 hours between them, other times it might be 36 hours.
> > >
> > > In my opinion, such an intelligent/learning scheduler would allow
> > significance increase in execution performance, but looking of the
> simplicity
> > of current cron, I think, that such a program is years away, if even
> written
> > ever.
> >
> > probably. But I can imagine other ways to do it. I already use the at
> system for
> > some tasks. Like my backup scripts. If the load is too high or the
> system is
> > other wise not ready, I have my backup scripts call 'at now + 1 hour' to
> run the
> > script itself 1 hour later. If it is still not ready, I call +2 hours,
> +4 and finally +8
> > hours.
>
> This is an acceptable way if you have some known resource-hogs and you can
> make them behave more nicely. It might be problematic, if you cannot
> interfere with resource-consuming applications in such a way and with
> suitable granularity.
>
> > But it all hinges on the guest knowing the over all system load of the
> > virtualization host, just like a process in a normal stand alone system
> can see
> > the system load. Maybe we need a new field in the load data? Or a new
> syscall
> > to give those data?
>
> For well-defined setups, this would allow better scheduling. But if one
> guest application does not know about the amount of resources it will need
> (no good estimate), and does not know what the others need (and if they
> know correctly), it may be hard for the application to make the right
> decision. I fear, that most applications won't be well-coded enough to
> integrate into such a scheme.
>
> > >> I am not starting cron daemon with nice. I put nice in side the
> crontab file,
> > like
> > >> these examples:
> > >>
> > >> 37 0 * * * root nice -n 15
> > >>
> > /usr/local/sbin/AD_integration/find_disabled_users_from_AD_in_groups.sh
> > >> 0,15,30,45 * * * * root nice -n 5
> > >> /usr/local/sbin/AD_integration/merge_AD_groups_with_unix.sh
> > >
> > > Ah I see.
> > >
> > > My current solution is:
> > > * Check for each guest, if cron scheduler is installed inside
> > > * Check if guest cron would run hourly/daily by himself, if yes let
> him do it.
> > A misconfigured/malicious guest can always run any process at any time,
> this
> > has to be addressed via other means
> > > * If guest cron is installed, cron.daily/hourly ... directories exist
> but guest
> > scheduler does NOT run those jobs from etc/crontab (cooperative guest) by
> > itself, then start those via vserver-exec
> > > * Make sure to run only a given number of those guest processes in
> parallel
> > >
> > > So all you need to do is to install cron in guest but remove
> run-scripts
> > directives for hourly/daily from /etc/crontab to opt in.
> >
> > Yes, that is a method too, but I would not really like that outsiders,
> like hosting
> > providers ran tasks inside my guest, at least not without my beforehand
> > knowledge.
>
> Since it is an per-guest opt-in, you know, that you activated it. (By
> setting this flag in guest etc/crontab, I acknowledge, that I want my cron
> jobs to run more resource-efficient, knowing that startup time can vary).
> Perhaps it would make sense, to have a more "prominent" way to set the
> flag, so that guest owner knows what is activated, e.g.
> "IWantExternalRoutineCronjobSerialization" marker file. If you are
> interested, I could provide you the current python-script, to see current
> implementation.
>
> Apart from that, If you change the start-time of all guest-jobs as host
> administrator or use nice, as you suggested, this is also a kind of
> interference, not every guest owner might want.
>
> > I guess it works if one administrates both the host and the guests.
>
> With guest-opt-in, this might be simpler to handle in SLA.
>
Received on Thu Jan 17 14:25:49 2013