Re: [PATCH RFC 2/5] x86,fpu: delay FPU register loading until switch to userspace

From: Andy Lutomirski
Date: Mon Oct 03 2016 - 22:10:19 EST


On Mon, Oct 3, 2016 at 6:29 PM, Rik van Riel <riel@xxxxxxxxxx> wrote:
> On Mon, 2016-10-03 at 14:36 -0700, Andy Lutomirski wrote:
>> On Mon, Oct 3, 2016 at 2:21 PM, Rik van Riel <riel@xxxxxxxxxx> wrote:
>> >
>> > >
>> > > One of my objections to the current code structure is that I find
>> > > it
>> > > quite hard to keep track of all of the possible states of the fpu
>> > > that
>> > > we have. Off the top of my head, we have:
>> > >
>> > > 1. In-cpu state is valid and in-memory state is invalid. (This
>> > > is
>> > > the
>> > > case when we're running user code, for example.)
>> > > 2. In-memory state is valid.
>> > > 2a. In-cpu state is *also* valid for some particular task. In
>> > > this
>> > > state, no one may write to either FPU state copy for that task
>> > > lest
>> > > they get out of sync.
>> > > 2b. In-cpu state is not valid.
>> >
>> > It gets more fun with ptrace. Look at xfpregs_get
>> > and xfpregs_set, which call fpu__activate_fpstate_read
>> > and fpu__activate_fpstate_write respectively.
>> >
>> > It looks like those functions already deal with the
>> > cases you outline above.
>>
>> Hmm, maybe they could be used. The names are IMO atrocious. I had
>> to
>> read the docs and the code to figure out WTF "activating" the
>> "fpstate" even meant.
>
> If you have good naming ideas, I'd be willing to submit
> patches to rename them :)
>
>> > >
>> > > Rik, your patches further complicate this by making it possible
>> > > to
>> > > have the in-cpu state be valid for a non-current task even when
>> > > non-idle.
>> >
>> > However, we never need to examine the in-cpu state for
>> > a task that is not currently running, or being scheduled
>> > to in the context switch code.
>> >
>> > We always save the FPU register state when a task is
>> > context switched out, so for a non-current task, we
>> > still only need to consider whether the in-memory state
>> > is valid or not.
>> >
>> > The in-cpu state does not matter.
>> >
>> > fpu__activate_fpstate_write will disable lazy restore,
>> > and ensure a full restore from memory.
>>
>> Something has to fix up owner_ctx so that the previous thread doesn't
>> think its CPU state is still valid. I don't think
>> activate_fpstate_write does it, because nothing should (I think) be
>> calling it as a mattor of course on context switch.
>
> These functions are called on tasks that are ptraced,
> and currently stopped. fpu__activate_fpstate_write
> sets fpu->last_cpu to -1, which will cause the next
> context switch to the task to load the fpstate into
> the registers.
>
> The status in the struct fpu keeps track of the status
> of things, though maybe not as explicitly as you would
> want.
>
> Having two separate status booleans for "registers valid"
> and "memory valid" may make more sense.

I have no problem with the concept of "owner_ctx", and I think it's a
perfectly reasonable data structure. My problem with it is that it's
subtle and knowledge of it is spread all over the place. Just going
with "registers valid" in a variable won't work, I think, because
there's nowhere to put it. We need to be able to delete a struct fpu
while that struct fpu might have a valid copy in a different cpu's
registers.

Anyway, feel free to tell me that I'm making this too difficult :)

>
> Running the task in user space, would have to ensure
> "registers valid" is true, and make "memory valid"
> false, because userspace could write to the registers.
>
> Doing a ptrace fpstate_read would make "memory valid"
> true, but does not need to invalidate "registers valid".
>
> Doing a ptrace fpstate_write would make "memory valid"
> true, but would invalidate "registers valid".
>
> Context switching away from a task would make "memory
> valid" true, by virtue of copying the fpstate to
> memory.
>
> Ingo, would you have any objection to patches tracking
> the validity of both register and memory states
> independently?
>
> We can get rid of fpu.counter, since nobody uses it
> any more.

We should definitely do this.

Maybe getting in some cleanups first (my lazy fpu deletion,
fpu.counter removal, etc) first is the way to go.

>
>> > > This is why I'm thinking that we should maybe revisit the
>> > > API a bit to make this clearer and to avoid scattering around all
>> > > the
>> > > state manipulation quite so much. I propose, as a straw-man:
>> > >
>> > > user_fpregs_copy_to_cpu() -- Makes the CPU state be valid for
>> > > current.
>> > > After calling this, we're in state 1 or 2a.
>> > >
>> > > user_fpregs_move_to_cpu() -- Makes the CPU state be valid and
>> > > writable
>> > > for current. Invalidates any in-memory copy. After calling
>> > > this,
>> > > we're in state 1.
>> >
>> > How can we distinguish between these two?
>> >
>> > If the task is running, it can change its FPU
>> > registers at any time, without any obvious thing
>> > to mark the transition from state 2a to state 1.
>>
>> That's true now, but I don't think it's really true any more with
>> your
>> patches. If a task is running in *kernel* mode, then I think that
>> pretty much all of the combinations are possible. When the task
>> actually enters user mode, we have to make sure we're in state 1 so
>> the CPU regs belong to the task and are writable.
>
> That is fair.
>
>> > It sounds like the first function you name would
>> > only be useful if we prevented a switch to user
>> > space, which could immediately do whatever it wants
>> > with the registers.
>> >
>> > Is there a use case for user_fpregs_copy_to_cpu()?
>>
>> With PKRU, any user memory access would need
>> user_fpregs_copy_to_cpu(). I think we should get rid of this and
>> just
>> switch to using RDPKRU and WRPKRU to eagerly update PKRU in every
>> context switch to avoid this. Other than that, I can't think of any
>> use cases off the top of my head unless the signal code wanted to
>> play
>> games with this stuff.
>
> You are right, read_pkru() and write_pkru() can only deal with
> the pkru state being present in registers. Is this because of an
> assumption in the code, or because of a hardware requirement?
>
>> > > user_fpregs_copy_to_memory() -- Makes the in-memory state be
>> > > valid
>> > > for
>> > > current. After calling this, we're in state 2a or 2b.
>> > >
>> > > user_fpregs_move_to_memory() -- Makes the in-memory state be
>> > > valid
>> > > and
>> > > writable for current. After calling this, we're in state 2b.
>> >
>> > We can only ensure that the in memory copy stays
>> > valid, by stopping the task from writing to the
>> > in-CPU copy.
>> >
>> > This sounds a lot like fpu__activate_fpstate_read
>> >
>> > What we need after copying the contents to memory is
>> > a way to invalidate the in-CPU copy, if changes were
>> > made to the in-memory copy.
>>
>> My proposal is that we invalidate the CPU state *before* writing the
>> in-memory state, not after.
>
> Modifying the in-memory FPU state can only be done on a task
> that is not currently running in user space. As long as the
> modification is done before task runs again, that is safe.
>
> I am probably missing some special case you are concerned about.
>
>> > This is already done by fpu__activate_fpstate_write
>> >
>> > >
>> > > Scheduling out calls user_fpregs_copy_to_memory(), as do all the
>> > > MPX
>> > > and PKRU memory state readers. MPX and PKRU will do
>> > > user_fpregs_move_to_memory() before writing to memory.
>> > > kernel_fpu_begin() does user_fpregs_move_to_memory().
>> >
>> > This is done by switch_fpu_prepare.
>> >
>> > Can you think of any other place in the kernel where we
>> > need a 1 -> 2a transition, besides context switching and
>> > ptrace/xfpregs_get?
>>
>> Anything else that tries to read task xstate from memory, i.e. MPX
>> and
>> PKRU. (Although if we switch to eager-switched PKRU, then PKRU stops
>> mattering for this purpose.)
>>
>> Actually, I don't see any way your patches can be compatible with
>> PKRU
>> without switching to eager-switched PKRU.
>
> My patches preserve eager saving of the FPU state, they only
> make restore lazy. That is 1 -> 2a transitions continue to
> be eager.
>
> --
> All Rights Reversed.



--
Andy Lutomirski
AMA Capital Management, LLC