Re: [patch 1/4] add basic task isolation prctl interface

From: Marcelo Tosatti
Date: Wed Jul 28 2021 - 05:56:36 EST


On Wed, Jul 28, 2021 at 01:45:39AM +0200, Frederic Weisbecker wrote:
> On Tue, Jul 27, 2021 at 11:52:09AM -0300, Marcelo Tosatti wrote:
> > The meaning of isolated is specified as follows:
> >
> > Isolation features
> > ==================
> >
> > - prctl(PR_ISOL_GET, ISOL_SUP_FEATURES, 0, 0, 0) returns the supported
> > features as a return value.
> >
> > - prctl(PR_ISOL_SET, ISOL_FEATURES, bitmask, 0, 0) enables the features in
> > the bitmask.
> >
> > - prctl(PR_ISOL_GET, ISOL_FEATURES, 0, 0, 0) returns the currently
> > enabled features.
>
> So what are the ISOL_FEATURES here? A mode that we enter such as flush
> vmstat _everytime_ we resume to userpace after (and including) this prctl() ?

ISOL_FEATURES is just the "command" type (which you can get and set).

The bitmask would include ISOL_F_QUIESCE_ON_URET, so:

- bitmask = ISOL_F_QUIESCE_ON_URET;
- prctl(PR_ISOL_SET, ISOL_FEATURES, bitmask, 0, 0) enables the features in
the bitmask.

- quiesce_bitmap = prctl(PR_ISOL_GET, PR_ISOL_SUP_QUIESCE_CFG, 0, 0, 0)
(1)

(returns the supported actions to be quiesced).

- prctl(PR_ISOL_SET, PR_ISOL_QUIESCE_CFG, quiesce_bitmask, 0, 0) _sets_
the actions to be quiesced (2)

If an application does not modify "quiesce_bitmask" between
points (1) and (2) above, it will enable quiescing of all
"features" the kernel supports.

Application can, however, modify quiesce_bitmap to its preference.

Flushing vmstat _everytime_ you resume to userspace is enabled only
_after_ prctl(PR_ISOL_ENTER, 0, 0, 0, 0) is performed (which happens
only when isolation is fully configured with the PR_ISOL_SET calls).
OK, will better document that.

> If so I'd rather call that ISOL_MODE because feature is too general.

Well, in the first patchset, there was one "mode" implemented (but
it was possible to implement different modes in the future).

This would allow for example easier integration of "full task isolation"
patchset type of functionality, disallowing syscalls.

I think we'd like to keep that, so i'll keep the previous distinct modes
(but allow configuration of individual features on the bitmap).

> >
> > The supported features are:
> >
> > ISOL_F_QUIESCE_ON_URET: quiesce deferred actions on return to userspace.
> > ----------------------
> >
> > Quiescing of different actions can be performed on return to userspace.
> >
> > - prctl(PR_ISOL_GET, PR_ISOL_SUP_QUIESCE_CFG, 0, 0, 0) returns
> > the supported actions to be quiesced.
> >
> > - prctl(PR_ISOL_SET, PR_ISOL_QUIESCE_CFG, quiesce_bitmask, 0, 0) returns

s/returns/sets/

> > the currently supported actions to be quiesced.
> >
> > - prctl(PR_ISOL_GET, PR_ISOL_QUIESCE_CFG, 0, 0, 0) returns
> > the currently enabled actions to be quiesced.
> >
> > #define ISOL_F_QUIESCE_VMSTAT_SYNC (1<<0)
> > #define ISOL_F_QUIESCE_NOHZ_FULL (1<<1)
> > #define ISOL_F_QUIESCE_DEFER_TLB_FLUSH (1<<2)
>
> And then PR_ISOL_QUIESCE_CFG is a oneshot operation that applies only upon
> return to this ctrl, right? If so perhaps this should be called just
> ISOL_QUIESCE or ISOL_QUIESCE_ONCE or ISOL_REQ ?

There was no one-shot operation implemented in the first patchset. What
application would do to achieve that is:

1. Configure isolation with PR_ISOL_SET (say configure mode which
allows system calls, and when a system call happens, flush all deferred
actions on return to userspace).

2. prctl(PR_ISOL_ENTER, 0, 0, 0, 0) (this actually enables the flushing,
and tags the task_struct as isolated). Here we can transfer this information
from per-task to per-CPU data, for example, to be able to implement
other features such as deferred TLB flushing.

On return from this prctl(), deferrable actions are flushed.

3. latency sensitive loop, with no system calls.

4. some event which requires system calls is noticed:
prctl(PR_ISOL_EXIT, 0, 0, 0, 0)
(this would untag task_struct as isolated).

5. perform system calls A, B, C, D (with no flushing of vmstat,
for example).

6. jmp to 2.

So there is a problem with this logic, which is that one would like
certain isolation functionality to remain enabled between points 4
and 6 (for example, blocking CPU hotplug or other blockable activities
that would cause interruptions).

One way to achieve this would be to replace PR_ISOL_ENTER/PR_ISOL_EXIT
with PR_ISOL_ENABLE, which accepts a bitmask:

1. Configure isolation with PR_ISOL_SET (say configure mode which
allows system calls, and when a system call happens, flush all deferred
actions on return to userspace).

2. enabled_bitmask = ISOL_F_QUIESCE_ON_URET|ISOL_F_BLOCK_INTERRUPTORS;
prctl(PR_ISOL_ENABLE, enabled_bitmask, 0, 0, 0)

On return from this prctl(), deferrable actions are flushed.

3. latency sensitive loop, with no system calls.

4. some event which requires system calls is noticed:

prctl(PR_ISOL_ENABLE, ISOL_F_BLOCK_INTERRUPTORS, 0, 0, 0)
(this would clear ISOL_F_QUIESCE_ON_URET, so no flushing
is performed on return from system calls).

5. perform system calls A, B, C, D (with no flushing of vmstat).

6. jmp to 2.

...

On exit: prctl(PR_ISOL_ENABLE, 0, 0, 0, 0)

IOW: the one-shot operation does not allow the application
to inform the kernel when the latency sensitive loop has
begun or has ended.

>
> But that's just naming debate because otherwise that prctl layout looks good
> to me.
>
> Thanks!

Thank you for the input!