Re: [PATCH v2 4/4] x86/static_call: Add inline static call implementation for x86-64

From: Josh Poimboeuf
Date: Wed Dec 12 2018 - 13:29:41 EST

Next message: Tomasz Duszynski: "[PATCH v3 0/3] add support for Sensirion SPS30 PM sensor"
Previous message: Jarkko Sakkinen: "Re: [PATCH v6 4/7] tpm: modify tpm_pcr_read() definition to pass a TPM hash algorithm"
In reply to: Josh Poimboeuf: "Re: [PATCH v2 4/4] x86/static_call: Add inline static call implementation for x86-64"
Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]

On Thu, Nov 29, 2018 at 03:04:20PM -0800, Linus Torvalds wrote:
> On Thu, Nov 29, 2018 at 12:25 PM Josh Poimboeuf <jpoimboe@xxxxxxxxxx> wrote:
> >
> > On Thu, Nov 29, 2018 at 11:27:00AM -0800, Andy Lutomirski wrote:
> > >
> > > I propose a different solution:
> > >
> > > As in this patch set, we have a direct and an indirect version. The
> > > indirect version remains exactly the same as in this patch set. The
> > > direct version just only does the patching when all seems well: the
> > > call instruction needs to be 0xe8, and we only do it when the thing
> > > doesn't cross a cache line. Does that work? In the rare case where
> > > the compiler generates something other than 0xe8 or crosses a cache
> > > line, then the thing just remains as a call to the out of line jmp
> > > trampoline. Does that seem reasonable? It's a very minor change to
> > > the patch set.
> >
> > Maybe that would be ok. If my math is right, we would use the
> > out-of-line version almost 5% of the time due to cache misalignment of
> > the address.
>
> Note that I don't think cache-line alignment is necessarily sufficient.
>
> The I$ fetch from the cacheline can happen in smaller chunks, because
> the bus between the I$ and the instruction decode isn't a full
> cacheline (well, it is _now_ in modern big cores, but it hasn't always
> been).
>
> So even if the cacheline is updated atomically, I could imagine seeing
> a partial fetch from the I$ (old values) and then a second partial
> fetch (new values).
>
> It would be interesting to know what the exact fetch rules are.

So I fixed my test case to do 32-bit writes, and now the results are
making a lot more sense. Now I only get crashes when writing across
cache lines. So maybe we should just go with Andy's suggestion above.

It would be great if some CPU people could confirm that it's safe (for
x86-64 only), since it's not in the SDM. Who can help answer that?

--
Josh

Next message: Tomasz Duszynski: "[PATCH v3 0/3] add support for Sensirion SPS30 PM sensor"
Previous message: Jarkko Sakkinen: "Re: [PATCH v6 4/7] tpm: modify tpm_pcr_read() definition to pass a TPM hash algorithm"
In reply to: Josh Poimboeuf: "Re: [PATCH v2 4/4] x86/static_call: Add inline static call implementation for x86-64"
Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]