Re: [PATCH v2] Make transparent hugepages cpuset aware

From: David Rientjes
Date: Wed Jun 19 2013 - 17:24:34 EST

On Wed, 19 Jun 2013, Robin Holt wrote:

> The convenience being that many batch schedulers have added cpuset
> support. They create the cpuset's and configure them as appropriate
> for the job as determined by a mixture of input from the submitting
> user but still under the control of the administrator. That seems like
> a fairly significant convenience given that it took years to get the
> batch schedulers to adopt cpusets in the first place. At this point,
> expanding their use of cpusets is under the control of the system
> administrator and would not require any additional development on
> the batch scheduler developers part.

You can't say the same for memcg?

> Here are the entries in the cpuset:
> cgroup.event_control mem_exclusive memory_pressure_enabled notify_on_release tasks
> cgroup.procs mem_hardwall memory_spread_page release_agent
> cpu_exclusive memory_migrate memory_spread_slab sched_load_balance
> cpus memory_pressure mems sched_relax_domain_level
> There are scheduler, slab allocator, page_cache layout, etc controls.

I think this is mostly for historical reasons since cpusets were
introduced before cgroups.

> Why _NOT_ add a thp control to that nicely contained central location?
> It is a concise set of controls for the job.

All of the above seem to be for cpusets primary purpose, i.e. NUMA
optimizations. It has nothing to do with transparent hugepages. (I'm not
saying thp has anything to do with memcg either, but a "memory controller"
seems more appropriate for controlling thp behavior.)

> Maybe I am misunderstanding. Are you saying you want to put memcg
> information into the cpuset or something like that?

I'm saying there's absolutely no reason to have thp controlled by a
cpuset, or ANY cgroup for that matter, since you chose not to respond to
the question I asked: why do you want to control thp behavior for certain
static binaries and not others? Where is the performance regression or
the downside? Is it because of max_ptes_none for certain jobs blowing up
the rss? We need information, and even if were justifiable then it
wouldn't have anything to do with ANY cgroup but rather a per-process
control. It has nothing to do with cpusets whatsoever.

(And I'm very curious why you didn't even cc the cpusets maintainer on
this patch in the first place who would probably say the same thing.)
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at
Please read the FAQ at