Re: [PATCH] x86: entry: flush the cache if syscall error

From: Samuel Neves
Date: Fri Oct 12 2018 - 10:29:28 EST


On Fri, Oct 12, 2018 at 2:26 PM Jann Horn <jannh@xxxxxxxxxx> wrote:
>
> On Fri, Oct 12, 2018 at 11:41 AM Samuel Neves <sneves@xxxxxxxxx> wrote:
> >
> > On Thu, Oct 11, 2018 at 8:25 PM Andy Lutomirski <luto@xxxxxxxxxx> wrote:
> > > What exactly is this trying to protect against? And how many cycles
> > > should we expect L1D_FLUSH to take?
> >
> > As far as I could measure, I got 1660 cycles per wrmsr 0x10b, 0x1 on a
> > Skylake chip, and 1220 cycles on a Skylake-SP.
>
> Is that with L1D mostly empty, with L1D mostly full with clean lines,
> or with L1D full of dirty lines that need to be written back?

Mostly empty, as this is flushing repeatedly without bothering to
refill L1d with anything.

On Skylake the (averaged) uops breakdown is something like

port 0: 255
port 1: 143
port 2: 176
port 3: 177
port 4: 524
port 5: 273
port 6: 616
port 7: 182

The number of port 4 dispatches is very close to the number of cache
lines, suggesting one write per line (with respective 176+177+182 port
{2, 3, 7} address generations).

Furthermore, I suspect it also clears L1i cache. For 2^20 wrmsr
executions, we have around 2^20 frontend_retired_l1i_miss events, but
a negligible amount of frontend_retired_l2_miss ones.