Re: [PATCH v2 0/6] Rework REFCOUNT_FULL using atomic_fetch_* operations
From: Ard Biesheuvel
Date: Sat Aug 31 2019 - 13:49:33 EST
On Thu, 29 Aug 2019 at 00:03, Kees Cook <keescook@xxxxxxxxxxxx> wrote:
>
> On Wed, Aug 28, 2019 at 03:14:40PM +0100, Will Deacon wrote:
> > On Wed, Aug 28, 2019 at 09:30:52AM +0200, Peter Zijlstra wrote:
> > > On Tue, Aug 27, 2019 at 05:31:58PM +0100, Will Deacon wrote:
> > > > Will Deacon (6):
> > > > lib/refcount: Define constants for saturation and max refcount values
> > > > lib/refcount: Ensure integer operands are treated as signed
> > > > lib/refcount: Remove unused refcount_*_checked() variants
> > > > lib/refcount: Move bulk of REFCOUNT_FULL implementation into header
> > > > lib/refcount: Improve performance of generic REFCOUNT_FULL code
> > > > lib/refcount: Consolidate REFCOUNT_{MAX,SATURATED} definitions
>
> BTW, can you repeat the timing details into the "Improve performance of
> generic REFCOUNT_FULL code" patch?
>
> > > So I'm not a fan; I itch at the whole racy nature of this thing and I
> > > find the code less than obvious. Yet, I have to agree it is exceedingly
> > > unlikely the race will ever actually happen, I just don't want to be the
> > > one having to debug it.
> >
> > FWIW, I think much the same about the version under arch/x86 ;)
> >
> > > I've not looked at the implementation much; does it do all the same
> > > checks the FULL one does? The x86-asm one misses a few iirc, so if this
> > > is similarly fast but has all the checks, it is in fact better.
> >
> > Yes, it passes all of the REFCOUNT_* tests in lkdtm [1] so I agree that
> > it's an improvement over the asm version.
> >
> > > Can't we make this a default !FULL implementation?
> >
> > My concern with doing that is I think it would make the FULL implementation
> > entirely pointless. I can't see anybody using it, and it would only exist
> > as an academic exercise in handling the theoretical races. That's a change
> > from the current situation where it genuinely handles cases which the
> > x86-specific code does not and, judging by the Kconfig text, that's the
> > only reason for its existence.
>
> Looking at timing details, the new implementation is close enough to the
> x86 asm version that I would be fine to drop the x86-specific case
> entirely as long as we could drop "FULL" entirely too -- we'd have _one_
> refcount_t implementation: it would be both complete and fast.
>
+1
> However, I do think a defconfig image size comparison should be done as
> part of that too. I think this implementation will be larger than the
> x86 asm one (but not by any amount that I think is a problem).
>
It's been ~2 years since I looked at this code in detail, but IIRC, it
looked like the inc-from-zero check was missing from the x86
implementation because it requires a load/compare/increment/store
sequence instead of a single increment instruction taking a memory
operand. Was there more rationale at the time for omitting this
particular case, and if so, was it based on a benchmark? Can we run it
against this implementation as well?
> I'd also note that the saturation speed is likely faster in this
> implementation (i.e. the number of instructions between noticing the
> wrap and setting the saturation value), as it is on the other side of
> a branch instead of across a trap, trap handler lookup, and call. So
> the race window should even be smaller (though I continue to think it
> remains hard enough to hit as to make it a non-issue in all cases: if
> you can schedule INT_MAX / 2 increments before a handful of instructions
> resets it to INT_MAX / 2, I suspect there are much larger problems. :)
>
> --
> Kees Cook