Re: [PATCH v2] Make transparent hugepages cpuset aware

From: David Rientjes
Date: Wed Jun 19 2013 - 22:43:32 EST


On Wed, 19 Jun 2013, Robin Holt wrote:

> cpusets was not for NUMA. It has no preference for "nodes" or anything like
> that. It was for splitting a machine into layered smaller groups. Usually,
> we see one cpuset with contains the batch scheduler. The batch scheduler then
> creates cpusets for jobs it starts. Has nothing to do with nodes. That is
> more an administrator issue. They set the minimum grouping of resources
> for scheduled jobs.
>

I disagree with all of the above, it's not what Paul Jackson developed
cpusets for, it's not what he wrote in Documentation/cgroups/cpusets.txt,
and it's not why libnuma immediately supported it. Cpusets is for NUMA,
like it or not.

> > I'm saying there's absolutely no reason to have thp controlled by a
> > cpuset, or ANY cgroup for that matter, since you chose not to respond to
> > the question I asked: why do you want to control thp behavior for certain
> > static binaries and not others? Where is the performance regression or
> > the downside? Is it because of max_ptes_none for certain jobs blowing up
> > the rss? We need information, and even if were justifiable then it
> > wouldn't have anything to do with ANY cgroup but rather a per-process
> > control. It has nothing to do with cpusets whatsoever.
>
> It was a request from our benchmarking group that has found some jobs
> benefit from thp, while other are harmed. Let me ask them for more
> details.
>

Yes, please, because if some jobs are harmed by thp then we need to fix
that regression and not paper around with it with some cpuset-based
solution. People should be able to run with CONFIG_TRANSPARENT_HUGEPAGE
enabled and not be required to enable CONFIG_CPUSETS for optimal behavior.
I'm suspecting that you're referring to enlarged rss because of
khugepaged's max_ptes_none and because you're abusing the purpose of
cpusets for containerization.
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/