Re: [RFC] How to handle the rules engine for cgroups

From: Vivek Goyal
Date: Thu Jul 10 2008 - 10:33:07 EST


On Thu, Jul 10, 2008 at 02:23:52AM -0700, Paul Menage wrote:
> On Thu, Jul 3, 2008 at 8:54 AM, Vivek Goyal <vgoyal@xxxxxxxxxx> wrote:
> >
> > As of today it should happen because newly execed process will run into
> > same cgroup as parent. But that's what probably we need to avoid.
> > For example, if an admin has created three cgroups "database", "browser"
> > "others" and a user launches "firefox" from shell (assuming shell is running
> > originally in "others" cgroup), then any memory allocation for firefox should
> > come from "browser" cgroup and not from "others".
>
> I think that I'm a little skeptical that anyone would ever want to do that.
>
> Wouldn't it be a simpler mechanism for the admin to simply have
> wrappers around the "firefox" and "oracle" binaries that move the
> process into the "browser" or "database" cgroup before running the
> real binaries?
>

Well, that would mean first wrappers need to be created around all the
applications which needs to be controlled. Then wrapper needs to
synchronize with the classification daemon if I have been put into
the right cgroup and can I go ahead with launching the real binary etc.
This sounds ugly and putting wrappers around all the applications does
not seem very practical.

> >
> > I am assuming that this will be a requirement for enterprise class
> > systems. Would be good to know the experiences of people who are already
> > doing some kind of work load management.
>
> I can help there. :-) At Google we have two approaches:
>
> - grid jobs, which are moved into the appropriate cgroup (actually,
> currently cpuset) by the grid daemon when it starts the job
>

So grid daemon probably first forks off, determines the right cpuset
move the job there and then do exec?

> - ssh logins, which are moved into the appropriate cpuset by a
> forced-command script specified in the sshd config.
>
> I don't see the rule-based approach being all that useful for our needs.
>
> It's all very well coming up with theoretical cases that a fancy new
> mechanism solves. But it carries more weight if someone can stand up
> and say "Yes, I want to use this on my real cluster of machines". (Or
> even "Yes, if this is implemented I *will* use it on my desktop" would
> be a start)
>

So it boils down to.

1) Can we bear the delay in task classification (Especially, exec). If yes,
then all the classification job can take place in userspace.

2) If no,
a) Then either we need to implement rule based engine to let
kernel do classfication.

b) or we need to do various things in user space as you suggested.
- Pur wrapper around applications.
- Job launcher (ex. Grid daemon) is modified to determine
the right cgroup and place application there before
actually launching the job.

Balbir and other people, any more thoughts on this? How exactly this thing
need to be used in your work environment.

I am little skeptical of options 2b working in most of the scenarios.

Thanks
Vivek
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/