cat cgroup interface proposal (non hierarchical) was Re: [PATCH V15 00/11] x86: Intel Cache Allocation Technology Support

From: Marcelo Tosatti
Date: Mon Nov 02 2015 - 17:21:56 EST


On Fri, Oct 16, 2015 at 11:50:22AM +0200, Peter Zijlstra wrote:
> On Thu, Oct 15, 2015 at 11:28:52PM -0300, Marcelo Tosatti wrote:
> > On Thu, Oct 15, 2015 at 01:36:14PM +0200, Peter Zijlstra wrote:
> > > On Tue, Oct 13, 2015 at 06:31:27PM -0300, Marcelo Tosatti wrote:
> > > > I am rewriting the interface with ioctls, with commands similar to the
> > > > syscall interface proposed.
> > >
> > > Which is horrible for other use cases. I really don't see the problem
> > > with the cgroup stuff.
> >
> > Can you detail what "horrible" means?
>
> Say an RT scenario; you set up your machine with cgroups. You create a
> cpuset with is disjoint from the others, you frob around with the cpu
> cgroup, etc..
>
> So once you're all done, you start your RT app into a cgroup.
>
> But oh, fail, now you have to go muck about with ioctl()s to get the
> cache allocation cruft to work.

Peter, what follows is your cgroup proposal (extended), but
at the end there is a point about impossibility of this cgroup
interface to share cache between tasks, which IMO renders it unuseable
(as it blocks any threads from sharing reserved cache).

If you have any ideas on how to circumvent this, they are appreciated.

Follows non-hierarchical cgroup CAT interface proposal. Thanks to some of the
CC'ed Red Hat folks for early comments.

cgroup CAT interface (non hierarchical):
---------------------------------------

0) Main directory files:

cat_hw_info
-----------
CAT HW information: CBM length, CDP supported, etc.
Information reported per-socket, as sockets can have
different configurations. Perhaps should be inside
sysfs.

1) Sub-directories represent cache reservations (size,type).

mkdir cache_reservation_for_forwarding_app
cd cache_reservation_for_forwarding_app
echo "80KB" > size
echo "data_and_code" > type
echo "socketmask=0xfffff" > socketmask (optional)
echo "1" > enable_reservation
echo "pid-of-forwarding-main-thread pid-of-helper-thread ..." > tasks

Files:

type
----------------
{data_and_code, only_code, only_data}. Type of
L3 CAT cache allocation to use. only_code,only_data only
supported on CDP capable processors.

size
----
size of L3 cache reservation.

rounding
--------
{round_down,round_up} whether to round up / round down
allocation size in kbytes, to cache-way size.

Default: round_up

socketmask
----------
Mask of sockets where the reservation is in effect.
A zero bit means the task will not have the L3 cache
portion that the cgroup references reserved on that socket.
Default: all sockets set.

enable
------
Allocate reservation with parameters set above.

When a reservation is enabled, it reserves L3 cache
space on any socket thats specified in "socketmask".

After cgroup has been enabled by a write of "1" to
"enable_reservation" file, only the "tasks" file can be modified.
To change the size of a cgroup reservation, recreate the directory.

tasks
-----

Contains the list of tasks which use this cache reservation.

Error reporting
---------------

Errors are reported in response to write as appropriate:
for example, write 1 > enable when there is not enough space
for "socketmask" would return -ENOSPC, etc.
Write to "enable" without size being set would return -EINVAL, etc.

Listing
-------
To list which reservations are in place, search for subdirectories
where "enabled" file has value 1.

Semantics: A task has guaranteed cache reservation on any CPU where its
scheduled in, for the lifetime of the cgroup, as long as that task is
not attached to further cgroups.

That is, a task belonging to cgroup-A can have its cache reservation
invalidated when attached to cgroup-B, (reasoning: it might be necessary
to reallocate the CBMs to respect contiguous bits in cache, a
restriction of the CAT HW interface).


-------
BLOCKER
-------

Can't use cgroups for CAT because:

"Control Groups extends the kernel as follows:

- Each task in the system has a reference-counted pointer to a
css_set.

- A css_set contains a set of reference-counted pointers to
cgroup_subsys_state objects, one for each cgroup subsystem
registered in the system."

You need a task to be part of two cgroups at one time,
to support the following configuration:

Task-A: 70% of cache reserved exclusively (reservation-0).
20% of cache reserved (reservation-1).

Task-B: 20% of cache reserved (reservation-1).

Unless reservations are created separately, then added to cgroups:

mount -t cgroup ... /../catcgroup/
cd /../catcgroup/
# create reservations
cd reservations
mkdir reservation-1
echo "80K" > size
echo "socketmask" > ...
echo "1" > enable
mkdir reservation-2
echo "160K" > size
echo "socketmask" > ...
echo "1" > enable
# attach reservation to cgroups
cd /../catcgroup/
mkdir cgroup-for-threaded-app
echo reservation-1 reservation-2 > reservations
echo "mainthread" > tasks
cd ..
mkdir cgroup-for-helper-thread
echo reservation-1 > reservations
echo "helperthread" > tasks
cd ..

This way mainthread and helperthread can share "reservation-1".

But this is abusing cgroups in a way that it has not been designed for.
Who is going to maintain the linkage between reservations and cgroups?


--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/