Re: [PATCH] sched/x86: Save [ER]FLAGS on context switch

From: Linus Torvalds
Date: Thu Feb 21 2019 - 17:08:36 EST


On Thu, Feb 21, 2019 at 1:35 PM Thomas Gleixner <tglx@xxxxxxxxxxxxx> wrote:
>
> IMNSHO any call inside a AC region is a bug lurking round the corner. The
> only thing which is tolerable is an exception of some sort.
>
> Enforce that with objtool. End of story.

Not quite that simple.

There are some use cases where you really do want a call - albeit to
special functions - inside the AC region.

The prime example of this is likely "filldir()" in fs/readdir.c, which
is actually somewhat important for some loads (eg samba).

And the whole AC thing made it horribly horribly slow because readdir
is one of those things that doesn't just copy one value (or one
structure) to user space, but writes several different things.

Kind of like signal frame setup does.

You may not realize that right now signal frame setup *does* actually
do a call with AC set. See ia32_setup_rt_frame() that does

put_user_try {
...
compat_save_altstack_ex(&frame->uc.uc_stack, regs->sp);
...
} put_user_catch(err);

is a macro, but inside that macro is a call to sas_ss_reset().

Now, that is an inline function, so it hopefully doesn't actually
cause a call. But it's "only" an inline function, not "always_inline".
We did already have (and I rejected, for now) patches that made those
inlines not very strong...

[ Side note: currently signal frame setup uses the absolutely
*disgusting* put_user_ex() thing, but that's actually what
unsafe_put_user() was designed for. I just never got around to getting
rid of the nasty hot mess of put_user_ex() ]

Anyway, while signal frame setup just writes individual values,
"filldir()" does *both* several individual values _and_ does a
copy_to_user().

And it's that "copy_to_user()" that I want to eventually change to a
"unsafe_copy_to_user()", along with making the individual values be
written with unsafe_put_user().

Right now "filldir()" messes with AC a total of *six* times. It sets
four field values, and then does a "copy_to_user()", all of which
set/clear AC right now. It's wasting hundreds of cycles on this,
because AC is so slow to set/clear.

If AC was cheap, this wouldn't be much of an issue. But AC is really
really expensive. I think it's microcode and serializes the pipeline
or something.

Anyway. We already have a possible call site, and there are good
reasons for future ones too. But they are hopefully all very
controlled.

Linus