Re: [PATCH 4/5] x86/asm/entry/32: Replace RESTORE_RSI_RDI[_RDX] with open-coded 32-bit reads

From: Ingo Molnar
Date: Thu Jun 18 2015 - 05:31:47 EST



* Denys Vlasenko <dvlasenk@xxxxxxxxxx> wrote:

> On 06/15/2015 10:20 PM, Ingo Molnar wrote:
> >> Actually, ecx and r11 need to be loaded first. They are not so much "restored"
> >> as "prepared for SYSRET insn". Every cycle lost in loading these delays SYSRET.
> >> [...]
> >
> > So in the typical case they will still be cached, and so their max latency should
> > be around 3 cycles.
>
> If syscall flushes caches (say, a large read), or sleeps
> and CPU schedules away, then pt_regs->ip,flags are evicted
> and need to be reloaded.
>
> > In fact because they are memory loads, they don't really have dependencies,
> > they should be available to SYSRET almost immediately,
>
> They depend on the memory data.
>
> > i.e. within a cycle - and
> > there's no reason to believe why these loads wouldn't pipeline properly and
> > parallelize with the many other things SYSRET has to do to organize a return to
> > user-space, before it can actually use the target RIP and RFLAGS.
>
> This does not sound right.
>
> If it takes, say, 20 cycles to pull data from e.g. L3 cache to ECX,
> then SYSRET can't possibly complete sooner than in 20 cycles.

Yeah, that's true, but my point is: SYSRET has to do a lot of other things
(permission checks, loading the user mode state - most of which are unrelated to
R11/RCX), which take dozens of cycles, and which are probably overlapped with any
cache misses on arguments such as R11/RCX.

It's not impossible that reordering helps, for example if SYSRET has some internal
dependencies that makes it parallelism worse than ideal - but I'd complicate this
code only if it gives a measurable improvement for cache-cold syscall performance.

Thanks,

Ingo
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/