Re: Tiny cpusets -- cpusets for small systems?

From: Paul Menage
Date: Sat Feb 23 2008 - 10:10:53 EST


On Sat, Feb 23, 2008 at 4:09 AM, Paul Jackson <pj@xxxxxxx> wrote:
> A couple of proposals have been made recently by people working Linux
> on smaller systems, for improving realtime isolation and memory
> pressure handling:
>
> (1) cpu isolation for hard(er) realtime
> http://lkml.org/lkml/2008/2/21/517
> Max Krasnyanskiy <maxk@xxxxxxxxxxxx>
> [PATCH sched-devel 0/7] CPU isolation extensions
>
> (2) notify user space of tight memory
> http://lkml.org/lkml/2008/2/9/144
> KOSAKI Motohiro <kosaki.motohiro@xxxxxxxxxxxxxx>
> [PATCH 0/8][for -mm] mem_notify v6
>
> In both cases, some of us have responded "why not use cpusets", and the
> original submitters have replied "cpusets are too fat" (well, they
> were more diplomatic than that, but I guess I can say that ;)

Having read those threads, it looks to me as though:

- the parts of Max's problem that would be solved by cpusets can be
mostly accomplished just via sched_setaffinity()

- Motohiro wants to add a new system-wide API that you would also like
to have available on a per-cpuset basis. (Why not just add two access
points for the same feature?)

I'm don't think that either of these would be enough to justify big
changes to cpusets or cgroups, although eliminating bloat is always a
good thing.

> The primary semantic limit I'd suggest would be supporting exactly
> one layer depth of cpusets, not a full hierarchy. So one could still
> successfully issue from user space 'mkdir /dev/cpuset/foo', but trying
> to do 'mkdir /dev/cpuset/foo/bar' would fail. This reminds me of
> very early FAT file systems, which had just a single, fixed size
> root directory ;). There might even be a configurable fixed upper
> limit on how many /dev/cpuset/* directories were allowed, further
> simplifying the locking and dynamic memory behavior of this apparatus.

I'm not sure that either of these would make much difference to the
overall footprint.

A single layer of cpusets would allow you to simplify
validate_change() but not much else.

I don't see how a fixed upper limit on the number of cpusets makes the
locking sufficiently simpler to save much code.

>
> How this extends to cgroups I don't know; for now I suspect that most
> cgroup module development is motivated by the needs of larger systems,
> not smaller systems. However, cpusets is now a module client of
> cgroups, and it is cgroups that now provides cpusets with its interface
> to the vfs infrastructure. It would seem unfortunate if this relation
> was not continued with tiny cpusets. Perhaps someone can imagine a tiny
> cgroups? This might be the most difficult part of this proposal.

If we wanted to go this way, I can imagine a cgroups config option
that forces just a single hierarchy, which would allow a bunch of
simplifications that would save plenty of text.

>
> Looking at some IA64 sn2 config builds I have laying about, I see the
> following text sizes for a couple of versions, showing the growth of
> the cpuset/cgroup apparatus over time:
>
> 25933 2.6.18-rc3-mm1/kernel/cpuset.o (Aug 2006)
> vs.
> 37823 2.6.25-rc2-mm1/kernel/cgroup.o (Feb 2008)
> 19558 2.6.25-rc2-mm1/kernel/cpuset.o
>
> So the total has grown from 25933 to 57381 text bytes (note that
> this is IA64 arch; most arch's will have proportionately smaller
> text sizes.)

On x86_64 they're:

cgroup.o: 17348
cpuset.o: 8533

Paul
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/