Re: [Patch v4 00/22] Cache aware scheduling

From: Qais Yousef

Date: Fri Apr 24 2026 - 20:05:57 EST

On 04/24/26 00:48, Chen, Yu C wrote:
> On 4/23/2026 11:06 PM, Qais Yousef wrote:
> > On 04/21/26 13:57, Tim Chen wrote:
> > > On Tue, 2026-04-21 at 01:34 +0100, Qais Yousef wrote:
> > > > On 04/20/26 17:01, Chen, Yu C wrote:
> > > > > On 4/16/2026 8:27 AM, Qais Yousef wrote:
> > > > > > On 04/01/26 14:52, Tim Chen wrote:
> > > > >
>
> [ ... ]
>
> > > > > I suppose there are two scenarios. The first is enabling/disabling
> > > > > aggregation
> > > > > for a group of tasks, and the second is task tagging. For the first
> > > > > scenario,
> > > > > this can be applied either process-wide or cgroup-wide by providing a flag,
> > > >
> > > > Cgroup-wide tagging doesn't make sense IMO. Process-wide yes.
> > > >
> > >
> > > I think this depends on the usage scenario. In private discussion with
> > > Vern from Tencent, he mentioned that such a cgroup based tagging is useful for them.
> >
> > We all want ponies :)
> >
> > I think this needs a why. It doesn't make sense to group procsses in general.
> > It seems this requirement is tied to elaborate setup to force the kernel to
> > deal with this elaborate setup in a generic manner.
>
> Using cgroup tagging appears to be a trade-off intended to reduce users'
> overhead from interface migration. Based on feedback from several cloud
> providers (Vern, please correct me if my understanding is wrong),
> cgroups are the basic unit for infrastructure construction:
> there could be a large process with N threads: N/2 of them
> are assigned to cgroup1, and the remaining N/2 to cgroup2.

Ah so the problem is single large process. Shouldn't the kernel manage that and
split into LLCs automatically as part of its best effort?

I don't think there's any migration cost for tagging a whole process. It is
just wasting resources. Those who want to squeeze more, they have the option to
do better, but don't have to.

> Updating the attributes associated with a cgroup is
> straightforward - using process-based aggregation might not work

Why not? wake up or LB based, we should be able to see a process has loaded an
LLC and spill to another LLC, no?

> for this scenario - using task-based aggregation would require to
> re-write the mid-layer.

As mentioned above, this is the smartest opportunity to squeeze more out of the
system. It will require more work yes, but totally optional too.

>
> >
> > Anyway with the tagging approach we can easily allow process level LLC sharing
> > via simple description like
> >
> > // shared cookie definition
> > {
> > "WEB_SERVICE_COOKIE": [ "nginx", "postgresql"],
> > "TRANSCODING_COOKIE": [ "decoder", "encoder"]
> > }
> >
> > Which simply tell the utility to reuse the cookie for these processes using the
> > key as a unique identifier.
> >
> > By the way, cookie generation might need kernel help to create a unique id.
> >
> > Still, if someone wants such elaborate setup the first thing to suggest is
> > static portioning via cpuset. Do you know why this is not sufficient?
>
> I suppose it is because the admin want a high system utilization by
> mixing latency-sensitive tasks and batch tasks together to share the
> CPU resource. Using cpuset to isolate them might bring lower utilization.

Hmm yes I get and advocate that. I just don't get it if the request is to tag
multiple process to share LLC. This is not cache aware scheduling, this is
a partitioning problem.

Note we want to use the concept to help with co-locating tasks in a single LLC
to take advantage of bigger L2. Currently it seems the approach is hardwired to
a particular need.